Arxiv Day: Article

Improving Execution Concurrency in Partial-Order Plans via Block-Substitution

Partial-order plans in AI planning facilitate execution flexibility and several other tasks, such as plan reuse, modification, and decomposition, due to their less constrained nature. A Partial-Order Plan (POP) allows two actions with no ordering between them, thus providing the flexibility of executing actions in different sequences. This flexibility can be further extended by enabling parallel execution of actions in a POP to reduce its overall execution time. While extensive studies exist on improving the flexibility of a POP by optimizing its action orderings through plan deordering and reordering, there has been limited focus on the flexibility of executing actions concurrently in a plan. Execution concurrency in a POP can be achieved by incorporating action non-concurrency constraints, specifying which actions can not be executed in parallel. This work formalizes the conditions for non-concurrency constraints to transform a POP into a parallel plan. We also introduce an algorithm to enhance the plan's concurrency by optimizing resource utilization through substitutions of its subplans with respect to the corresponding planning task. Our algorithm employs block deordering that eliminates orderings in a POP by encapsulating coherent actions in blocks, and then exploits blocks as candidate subplans for substitutions. Experiments over the benchmark problems from International Planning Competitions (IPC) exhibit significant improvement in plan concurrency, specifically, with improvement in 25% of the plans, and an overall increase of 2.1% in concurrency.

Updated: 2024-06-25 23:36:13

标题: 通过块替换提高部分顺序计划的执行并发性

摘要: 在人工智能规划中，部分顺序计划促进了执行的灵活性以及其他任务，比如计划重用、修改和分解，这是由于它们较少受限的性质。部分顺序计划（POP）允许两个无序的行动，从而提供了在不同顺序执行行动的灵活性。这种灵活性可以通过在POP中启用行动的并行执行来进一步扩展，以减少其总体执行时间。虽然存在大量研究旨在通过计划去序和重新排序来优化POP的行动排序，但对于在计划中同时执行行动的灵活性的关注有限。在POP中执行并发可以通过纳入行动非并发约束来实现，指定哪些行动不能并行执行。本文规范了非并发约束的条件，将POP转换为并行计划。我们还介绍了一种算法，通过对子计划进行替换来优化资源利用，从而增强计划的并发性。我们的算法采用块去序，通过将相关行动封装成块，然后利用这些块作为替换的候选子计划。在国际规划竞赛（IPC）的基准问题上进行的实验显示，计划的并发性有显著改善，具体而言，有25％的计划改善，并且总体上并发性增加了2.1％。

更新时间: 2024-06-25 23:36:13

领域: cs.AI

下载: http://arxiv.org/abs/2406.18615v1

Gradient Coding with Iterative Block Leverage Score Sampling

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

Updated: 2024-06-25 23:15:13

标题: 使用迭代块杠杆得分采样的梯度编码

摘要: 我们将$\ell_2$子空间嵌入的杠杆分数抽样草图概括为适应对转换数据的子集进行抽样，以便草图方法适用于分布式设置。然后利用这一方法推导了一种用于一阶方法的近似编码计算方法；即梯度编码，以加速在分布式计算网络中出现故障时的线性回归，即慢速节点。我们在分布式网络中复制数据，通过诱导抽样分布实现逼近保证。这项工作的重要性和主要贡献在于将随机数值线性代数与近似编码计算统一起来，同时通过均匀抽样获得诱导的$\ell_2$子空间嵌入。转换为均匀抽样是在不应用随机投影的情况下进行的，就像子采样随机哈达玛变换的情况一样。此外，通过将这种技术纳入编码计算，我们的方案是一个用于近似解决线性回归的迭代草图方法。当通过带替换的抽样进行草图时，我们还提出了加权的方法，以进一步压缩。

更新时间: 2024-06-25 23:15:13

领域: cs.IT,cs.DC,cs.IR,cs.LG,cs.NA,math.IT,math.NA,65B99, 65F10, 65F20, 65F45, 65F55, 68W20, 68W25, 94A20, 68P30, 68P20,G.1.2; G.1.3; G.1.6; G.3; E.4

下载: http://arxiv.org/abs/2308.03096v2

Inherent Challenges of Post-Hoc Membership Inference for Large Language Models

Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs) to gain insight into their training data composition. However, in this paper, we identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets. Using a simple bag-of-words classifier, we demonstrate that datasets used in recent post-hoc MIAs suffer from significant distribution shifts, in some cases achieving near-perfect distinction between members and non-members. This implies that previously reported high MIA performance may be largely attributable to these shifts rather than model memorization. We confirm that randomized, controlled setups eliminate such shifts and thus enable the development and fair evaluation of new MIAs. However, we note that such randomized setups are rarely available for the latest LLMs, making post-hoc data collection still required to infer membership for real-world LLMs. As a potential solution, we propose a Regression Discontinuity Design (RDD) approach for post-hoc data collection, which substantially mitigates distribution shifts. Evaluating various MIA methods on this RDD setup yields performance barely above random guessing, in stark contrast to previously reported results. Overall, our findings highlight the challenges in accurately measuring LLM memorization and the need for careful experimental design in (post-hoc) membership inference tasks.

Updated: 2024-06-25 23:12:07

标题: 大语言模型后验成员推断的固有挑战

摘要: 大型语言模型（LLMs）通常在大量未公开的数据上进行训练，这促使开发事后成员推断攻击（MIAs）以获取有关其训练数据组成的洞察。然而，在本文中，我们确定了事后MIA评估中固有的挑战，因为收集的成员和非成员数据集之间可能存在潜在的分布偏移。使用简单的词袋分类器，我们证明了最近使用的用于事后MIA的数据集存在显着的分布偏移，在某些情况下几乎可以完美地区分成员和非成员。这意味着先前报告的高MIA性能可能主要归因于这些偏移而不是模型记忆。我们确认，随机化、受控的设置可以消除这种偏移，从而实现新MIA的开发和公平评估。然而，我们注意到，对于最新的LLMs，很少有这样的随机化设置可用，因此仍需要进行事后数据收集来推断真实世界LLMs的成员身份。作为潜在解决方案，我们提出了一种回归分界设计（RDD）方法用于事后数据收集，可以显著减少分布偏移。在这种RDD设置上评估各种MIA方法的表现几乎只能略高于随机猜测，与先前报告的结果形成鲜明对比。总的来说，我们的发现突显了准确衡量LLM记忆和在（事后）成员推断任务中需要谨慎实验设计的挑战。

更新时间: 2024-06-25 23:12:07

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.17975v1

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Many empirical studies of labor market questions rely on estimating relatively simple predictive models using small, carefully constructed longitudinal survey datasets based on hand-engineered features. Large Language Models (LLMs), trained on massive datasets, encode vast quantities of world knowledge and can be used for the next job prediction problem. However, while an off-the-shelf LLM produces plausible career trajectories when prompted, the probability with which an LLM predicts a particular job transition conditional on career history will not, in general, align with the true conditional probability in a given population. Recently, Vafa et al. (2024) introduced a transformer-based "foundation model", CAREER, trained using a large, unrepresentative resume dataset, that predicts transitions between jobs; it further demonstrated how transfer learning techniques can be used to leverage the foundation model to build better predictive models of both transitions and wages that reflect conditional transition probabilities found in nationally representative survey datasets. This paper considers an alternative where the fine-tuning of the CAREER foundation model is replaced by fine-tuning LLMs. For the task of next job prediction, we demonstrate that models trained with our approach outperform several alternatives in terms of predictive performance on the survey data, including traditional econometric models, CAREER, and LLMs with in-context learning, even though the LLM can in principle predict job titles that are not allowed in the survey data. Further, we show that our fine-tuned LLM-based models' predictions are more representative of the career trajectories of various workforce subpopulations than off-the-shelf LLM models and CAREER. We conduct experiments and analyses that highlight the sources of the gains in the performance of our models for representative predictions.

Updated: 2024-06-25 23:07:18

标题: LABOR-LLM：基于大型语言模型的语言职业表示

摘要: 许多劳动力市场问题的实证研究依赖于使用小型、精心构建的纵向调查数据集来估计相对简单的预测模型，这些数据集基于手工设计的特征。大型语言模型（LLMs）在大规模数据集上训练，编码了大量的世界知识，可以用于下一个工作预测问题。然而，虽然一个现成的LLM在提示时可以产生合理的职业轨迹，但LLM预测特定工作转换的概率，基于职业历史，通常不会与给定人群中的真实条件概率一致。最近，Vafa等人（2024年）引入了一个基于transformer的“基础模型”CAREER，使用一个大型、不具代表性的简历数据集进行训练，该模型预测工作之间的转换；进一步展示了如何利用迁移学习技术来利用基础模型，构建更好的反映全国代表性调查数据集中条件转换概率的预测模型。本文考虑了一种替代方案，即用LLM的fine-tuning替代CAREER基础模型的fine-tuning。对于下一个工作预测任务，我们证明了用我们的方法训练的模型在调查数据上的预测性能方面优于几种替代方案，包括传统的计量经济模型、CAREER和带有上下文学习的LLM，尽管LLM原则上可以预测调查数据中不允许的工作标题。此外，我们展示了我们的经过fine-tune的LLM模型的预测更能代表各种劳动力子群的职业轨迹，比现成的LLM模型和CAREER更具代表性。我们进行了实验和分析，突出了我们的模型在代表性预测性能提升中的来源。

更新时间: 2024-06-25 23:07:18

领域: cs.LG,cs.CL,econ.EM

下载: http://arxiv.org/abs/2406.17972v1

Neural Optimization with Adaptive Heuristics for Intelligent Marketing System

Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.

Updated: 2024-06-25 22:52:43

标题: 使用自适应启发式神经优化的智能营销系统

摘要: 计算营销在今天的数字世界中变得越来越重要，面临着诸如海量异构数据、多渠道客户旅程和有限的营销预算等挑战。在本文中，我们提出了一个适用于营销人工智能系统的通用框架，即神经优化自适应启发式（NOAH）框架。NOAH是第一个考虑到2B和2C产品以及拥有和付费渠道的营销优化的通用框架。我们描述了NOAH框架的关键模块，包括预测、优化和自适应启发式，并提供了竞价和内容优化的示例。然后我们详细介绍了NOAH成功应用于LinkedIn电子邮件营销系统的情况，展示了比传统排名系统更显著的优势。此外，我们分享了广泛有用的细节和见解，特别是在以下方面：（i）利用生命周期价值解决延迟反馈问题，（ii）通过随机化进行大规模线性规划，（iii）通过受众扩展改善检索，（iv）在定位测试中减少信号稀释，（v）处理统计测试中的零膨胀重尾指标。

更新时间: 2024-06-25 22:52:43

领域: stat.ME,cs.AI,cs.IR,cs.LG,math.OC,G.3; G.1.6; I.2

下载: http://arxiv.org/abs/2405.10490v3

Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

To better interpret the intrinsic mechanism of large language models (LLMs), recent studies focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts. Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to model capacity. To explore this question, we revisit monosemanticity from the feature decorrelation perspective and advocate for its encouragement. We experimentally observe that the current conclusion by wang2024learning, which suggests that decreasing monosemanticity enhances model performance, does not hold when the model changes. Instead, we demonstrate that monosemanticity consistently exhibits a positive correlation with model capacity, in the preference alignment process. Consequently, we apply feature correlation as a proxy for monosemanticity and incorporate a feature decorrelation regularizer into the dynamic preference optimization process. The experiments show that our method not only enhances representation diversity and activation sparsity but also improves preference alignment performance.

Updated: 2024-06-25 22:51:08

标题: 鼓励还是抑制单义性？从特征去相关化的角度重新审视单义性

摘要: 为了更好地解释大型语言模型（LLMs）的内在机制，最近的研究集中在其基本单元的单义性上。单义神经元专门用于单一和特定概念，形成神经元和概念之间的一对一关系。尽管在单义性探测方面进行了大量研究，但单义性对模型容量是有益还是有害仍不清楚。为了探讨这个问题，我们从特征去相关的角度重新审视单义性，并主张鼓励其存在。我们通过实验证明，王2024年学习的当前结论表明，减少单义性会提高模型性能，但当模型改变时并不成立。相反，我们证明了单义性在偏好对齐过程中始终与模型容量呈正相关。因此，我们将特征相关性作为单义性的代理，并将特征去相关正则化器纳入动态偏好优化过程中。实验证明，我们的方法不仅增强了表示多样性和激活稀疏性，还改善了偏好对齐性能。

更新时间: 2024-06-25 22:51:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17969v1

Efficient Document Ranking with Learnable Late Interactions

Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT.

Updated: 2024-06-25 22:50:48

标题: 具有可学习的延迟交互的高效文档排名

摘要: 跨编码器（CE）和双编码器（DE）模型是信息检索中查询-文档相关性的两种基本方法。为了预测相关性，CE模型使用联合查询-文档嵌入，而DE模型维护分解的查询和文档嵌入；通常，前者具有更高的质量，而后者则受益于更低的延迟。最近，提出了晚期交互模型，通过使用基于查询和文档标记嵌入的DE结构，后跟轻量级评分器，实现了更有利的延迟-质量权衡。然而，这些轻量级评分器通常是手工制作的，并且对其逼近能力没有理解；此外，这些评分器需要访问单个文档标记嵌入，这会增加延迟和存储负担。在本文中，我们提出了解决这些问题的新颖可学习的晚期交互模型（LITE）。从理论上讲，我们证明LITE是连续评分函数的通用逼近器，即使对于相对较小的嵌入维度也是如此。在实证方面，LITE在领域内和零样本重新排序任务中均优于以前的晚期交互模型，如ColBERT。例如，对MS MARCO段落重新排序的实验表明，与ColBERT相比，LITE不仅产生了具有更好泛化性能的模型，而且降低了延迟并且只需要0.25倍的存储空间。

更新时间: 2024-06-25 22:50:48

领域: cs.IR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17968v1

Towards Synthesizing Twelve-Lead Electrocardiograms from Two Asynchronous Leads

The electrocardiogram (ECG) records electrical signals in a non-invasive way to observe the condition of the heart, typically looking at the heart from 12 different directions. Several types of the cardiac disease are diagnosed by using 12-lead ECGs Recently, various wearable devices have enabled immediate access to the ECG without the use of wieldy equipment. However, they only provide ECGs with a couple of leads. This results in an inaccurate diagnosis of cardiac disease due to lacking of required leads. We propose a deep generative model for ECG synthesis from two asynchronous leads to ten leads. It first represents a heart condition referring to two leads, and then generates ten leads based on the represented heart condition. Both the rhythm and amplitude of leads generated resemble those of the original ones, while the technique removes noise and the baseline wander appearing in the original leads. As a data augmentation method, our model improves the classification performance of models compared with models using ECGs with only one or two leads.

Updated: 2024-06-25 22:46:15

标题: 朝向从两个异步导联合成十二导联心电图

摘要: 心电图（ECG）以一种非侵入性的方式记录电信号，通常从12个不同的方向观察心脏的状况。通过使用12导联ECG可以诊断出多种类型的心脏疾病。最近，各种可穿戴设备使得可以立即获取ECG而无需使用笨重的设备。然而，它们只提供了几个导联的ECG。这导致心脏疾病的诊断不准确，因为缺少必要的导联。我们提出了一个深度生成模型，用于从两个异步导联合成十个导联的ECG。它首先根据两个导联表示心脏状况，然后基于所表示的心脏状况生成十个导联。生成的导联的节律和振幅与原始导联相似，同时该技术消除了原始导联中出现的噪音和基线漫游。作为一种数据增强方法，我们的模型相对于仅使用一个或两个导联的ECG的模型，提高了模型的分类性能。

更新时间: 2024-06-25 22:46:15

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2103.00006v4

Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs, offering a robust method for representing complex and massive dynamic graphs. The application of DyGETViz extends to a diverse array of domains, including ethology, epidemiology, finance, genetics, linguistics, communication studies, social studies, and international relations. Through its implementation, DyGETViz has revealed or confirmed various critical insights. These include the diversity of content sharing patterns and the degree of specialization within online communities, the chronological evolution of lexicons across decades, and the distinct trajectories exhibited by aging-related and non-related genes. Importantly, DyGETViz enhances the accessibility of scientific findings to non-domain experts by simplifying the complexities of dynamic graphs. Our framework is released as an open-source Python package for use across diverse disciplines. Our work not only addresses the ongoing challenges in visualizing and analyzing DTDG models but also establishes a foundational framework for future investigations into dynamic graph representation and analysis across various disciplines.

Updated: 2024-06-25 22:44:53

标题: 利用动态图嵌入轨迹增强跨学科见解

摘要: 我们开发了DyGETViz，一个新颖的框架，用于有效地可视化动态图（DGs），这在各种现实世界系统中是普遍存在的。该框架利用了最近在离散时间动态图（DTDG）模型方面的进展，以熟练处理动态图固有的时间动态性。DyGETViz有效地捕捉了这些图中的微观和宏观结构变化，提供了一种表示复杂和大规模动态图的强大方法。DyGETViz的应用涵盖了各种领域，包括行为学、流行病学、金融、遗传学、语言学、传播学、社会学和国际关系。通过其实施，DyGETViz揭示或确认了各种关键见解。这些包括在线社区中内容共享模式的多样性和专业化程度，几十年来词汇的时间演变，以及与衰老相关和非相关基因展示的不同轨迹。重要的是，DyGETViz通过简化动态图的复杂性，提高了科学发现对非领域专家的可访问性。我们的框架以开源Python软件包的形式发布，可用于各种学科。我们的工作不仅解决了可视化和分析DTDG模型的持续挑战，而且为未来跨学科动态图表示和分析的研究奠定了基础。

更新时间: 2024-06-25 22:44:53

领域: cs.LG,cs.HC,cs.SI

下载: http://arxiv.org/abs/2406.17963v1

Listening to the Noise: Blind Denoising with Gibbs Diffusion

In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the family of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and spectral index, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe.

Updated: 2024-06-25 22:43:54

标题: 倾听噪声：使用吉布斯扩散进行盲去噪

摘要: 近年来，去噪问题已经与深度生成模型的发展紧密相连。特别是，扩散模型被训练成去噪器，它们建模的分布与贝叶斯图像中的去噪先验相符。然而，通过基于扩散的后验抽样进行去噪需要知道噪声水平和协方差，这样就无法进行盲目去噪。我们通过引入Gibbs扩散（GDiff）来克服这一限制，这是一种通用方法论，用于同时处理信号和噪声参数的后验抽样。假设任意参数化的高斯噪声，我们开发了一个Gibbs算法，交替地从一个条件扩散模型中进行采样步骤，该模型经过训练以将信号先验映射到噪声分布族中，以及一个蒙特卡罗采样器来推断噪声参数。我们的理论分析突出了潜在的问题，指导了诊断使用，并量化了由扩散模型引起的Gibbs稳态分布中的错误。我们展示了我们的方法用于：1）涉及未知振幅和频谱指数的有色噪声的自然图像的盲目去噪，以及2）宇宙学问题，即对宇宙微波背景数据进行分析，其中贝叶斯推断“噪声”参数意味着约束宇宙演化模型。

更新时间: 2024-06-25 22:43:54

领域: stat.ML,astro-ph.CO,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2402.19455v2

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.

Updated: 2024-06-25 22:40:03

标题: NormTab：通过表格数据归一化改进LLMs中的符号推理

摘要: 近年来，大型语言模型（LLMs）在解析文本数据和生成代码方面表现出了卓越的能力。然而，它们在涉及表格数据的任务中，尤其是那些需要符号推理的任务中，面临挑战，原因在于网络表格中常见的表格单元值的结构变异性和不一致性。在本文中，我们介绍了NormTab，一个旨在通过规范化网络表格来增强LLMs符号推理性能的新框架。我们将表格规范化作为一个独立的、一次性的预处理步骤进行研究，利用LLMs来支持对表格数据的符号推理。我们的实验评估是在挑战性的网络表格数据集（如WikiTableQuestion和TabFact）上进行的，结果表明利用NormTab显著改善了符号推理性能，展示了规范化网络表格对增强基于LLMs的符号推理任务的重要性和有效性。

更新时间: 2024-06-25 22:40:03

领域: cs.CL,cs.AI,cs.DB,cs.IR

下载: http://arxiv.org/abs/2406.17961v1

LEDITS++: Limitless Image Editing using Text-to-Image Models

Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming finetuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods.

Updated: 2024-06-25 22:33:57

标题: LEDITS++：使用文本到图像模型进行无限的图像编辑

摘要: 近来，文本到图像扩散模型因其令人惊叹的能力而受到越来越多的关注，它可以仅通过文本输入生成高保真度的图像。随后的研究努力旨在利用并应用其能力到真实图像编辑中。然而，现有的图像到图像方法通常效率低下、不精确且灵活性有限。它们要么需要耗时的微调，要么与输入图像无必要地强烈偏离，要么缺乏对多个同时编辑的支持。为解决这些问题，我们引入了LEDITS++，这是一种高效而多功能、精确的文本图像处理技术。LEDITS++的新颖反演方法无需调整或优化，并且在几个扩散步骤内产生高保真度的结果。其次，我们的方法支持多个同时编辑，并且不受架构限制。第三，我们使用一种新颖的隐式蒙版技术，限制对相关图像区域的更改。我们提出了新颖的TEdBench++基准作为我们详尽评估的一部分。我们的结果展示了LEDITS++的能力以及其相对于先前方法的改进。

更新时间: 2024-06-25 22:33:57

领域: cs.CV,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2311.16711v2

MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by proposing a Meta-Ability Guided Interactive Chain-of-distillation (MAGIC) method. Specifically, a Meta-Ability Knowledge Distillation (MAKD) framework is proposed for decoupling and refining the necessary meta-abilities of VLN agents. A Meta-Knowledge Randomization Weighting (MKRW) and a Meta-Knowledge Transferable Determination (MKTD) module are incorporated to dynamically adjust aggregation weights at the meta-ability and sample levels, respectively. Move beyond the traditional one-step unidirectional distillation, an Interactive Chain-of-Distillation (ICoD) learning strategy is proposed to allow students to give feedback to teachers, forming a new multi-step teacher-student co-evolution pipeline. Remarkably, on the R2R test unseen public leaderboard, our smallest model, MAGIC-S, with only 5% (11M) of the teacher's size, outperforms all previous methods under the same training data. Additionally, our largest model, MAGIC-L, surpasses the previous state-of-the-art by 5.84% in SPL and 3.18% in SR. Furthermore, a new dataset was collected and annotated from our living environments, where MAGIC-S demonstrated superior performance and real-time efficiency. Our code is publicly available on https://github.com/CrystalSixone/VLN-MAGIC.

Updated: 2024-06-25 22:33:41

标题: MAGIC：元能力引导的交互式蒸馏链，用于有效和高效的视觉和语言导航

摘要: 尽管近年来在具身人工智能（E-AI）领域取得了显著进展，但由于其过大的参数规模和计算需求，使得将这些模型整合到机器人领域变得困难。针对Vision-and-Language Navigation（VLN）任务，这是E-AI中的一个核心任务，本文通过提出一种Meta-Ability Guided Interactive Chain-of-distillation（MAGIC）方法，揭示了使用知识蒸馏获取轻量级学生模型的巨大潜力。具体而言，提出了一种Meta-Ability Knowledge Distillation（MAKD）框架，用于解耦和优化VLN代理的必要元能力。其中，Meta-Knowledge Randomization Weighting（MKRW）和Meta-Knowledge Transferable Determination（MKTD）模块被整合进来，以在元能力和样本层面上动态调整聚合权重。超越传统的单向蒸馏，本文提出了一种交互式蒸馏链（ICoD）学习策略，允许学生向老师提供反馈，形成一个新的多步老师-学生共同演化管道。值得注意的是，在R2R测试未见公开排行榜上，我们最小的模型MAGIC-S，仅占教师规模的5%（11M），在相同的训练数据下胜过了所有先前的方法。此外，我们最大的模型MAGIC-L，在SPL和SR方面分别超越了以往的最新技术5.84%和3.18%。此外，我们从生活环境中收集和标注了一个新数据集，MAGIC-S展现出优越的性能和实时效率。我们的代码可在https://github.com/CrystalSixone/VLN-MAGIC 上公开获取。

更新时间: 2024-06-25 22:33:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17960v1

ODIN: A Single Model for 2D and 3D Segmentation

State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved, which capture pixel coordinates for 2D patch tokens and 3D coordinates for 3D feature tokens. ODIN achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO. It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh. When used as the 3D perception engine in an instructable embodied agent architecture, it sets a new state-of-the-art on the TEACh action-from-dialogue benchmark. Our code and checkpoints can be found at the project website (https://odin-seg.github.io).

Updated: 2024-06-25 22:21:17

标题: ODIN：一个适用于2D和3D分割的单一模型

摘要: 当前在当代3D分割基准数据集上的最先进模型（如ScanNet）消耗并标记通过感知的多视角RGB-D图像后处理获得的数据集提供的3D点云。它们通常在领域内进行训练，放弃大规模的2D预训练，并且胜过将姿态RGB-D多视角图像特征化的替代方法。消耗姿态图像与后处理的3D点云之间的性能差距推动了人们对于2D和3D感知需要不同模型架构的信念。在本文中，我们挑战了这种观点，并提出了ODIN（Omni-Dimensional INstance segmentation），一个可以使用交替在2D视图内和3D视图间信息融合的变压器架构来分割和标记2D RGB图像和3D点云的模型。我们的模型通过所涉及的标记的位置编码来区分2D和3D特征操作，这些位置编码捕捉了2D补丁标记的像素坐标和3D特征标记的3D坐标。ODIN在ScanNet200、Matterport3D和AI2THOR 3D实例分割基准上取得了最先进的性能，并在ScanNet、S3DIS和COCO上取得了竞争性的性能。当感知的3D点云替代从3D网格中采样的点云时，它比所有先前的作品表现出色。当作为一个可教授的具体化代理架构中的3D感知引擎时，在TEACh基于对话的动作基准上创造了一个新的最先进。我们的代码和检查点可以在项目网站（https://odin-seg.github.io）找到。

更新时间: 2024-06-25 22:21:17

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2401.02416v3

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models.

Updated: 2024-06-25 22:18:52

标题: 通过学习单调对齐来提高基于LLM的语音合成的稳健性

摘要: 基于大型语言模型（LLM）的文本到语音（TTS）系统在处理大量语音数据集并为新的演讲者生成自然语音方面展现出了显著的能力。然而，基于LLM的TTS模型并不稳健，因为生成的输出可能包含重复的单词、缺失的单词和错位的语音（称为幻觉或注意力错误），特别是当文本中包含多个相同令牌的情况。我们在一个编码器-解码器Transformer模型中研究了这些挑战，并发现在这种模型中的某些交叉注意力头在训练时隐式学习了文本和语音的对齐。为了使对齐更加稳健，我们提出了利用CTC损失和注意力先验的技术，鼓励在文本令牌上进行单调的交叉注意力。我们的引导注意力训练技术不会引入任何新的可学习参数，并显著提高了基于LLM的TTS模型的稳健性。

更新时间: 2024-06-25 22:18:52

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.17957v1

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

We study the finite-time convergence of TD learning with linear function approximation under Markovian sampling. Existing proofs for this setting either assume a projection step in the algorithm to simplify the analysis, or require a fairly intricate argument to ensure stability of the iterates. We ask: \textit{Is it possible to retain the simplicity of a projection-based analysis without actually performing a projection step in the algorithm?} Our main contribution is to show this is possible via a novel two-step argument. In the first step, we use induction to prove that under a standard choice of a constant step-size $\alpha$, the iterates generated by TD learning remain uniformly bounded in expectation. In the second step, we establish a recursion that mimics the steady-state dynamics of TD learning up to a bounded perturbation on the order of $O(\alpha^2)$ that captures the effect of Markovian sampling. Combining these pieces leads to an overall approach that considerably simplifies existing proofs. We conjecture that our inductive proof technique will find applications in the analyses of more complex stochastic approximation algorithms, and conclude by providing some examples of such applications.

Updated: 2024-06-25 22:18:09

标题: 一种简单的基于线性函数逼近的TD学习有限时间分析

摘要: 我们研究了在马尔可夫采样下具有线性函数逼近的TD学习的有限时间收敛性。现有对于这种设置的证明要么假定算法中存在一个投影步骤以简化分析，要么需要一个相当复杂的论证来确保迭代的稳定性。我们提出：\textit{是否可能在不实际执行算法中的投影步骤的情况下保留基于投影的分析的简单性？}我们的主要贡献是通过一个新颖的两步论证来证明这是可能的。在第一步中，我们使用归纳法证明，在标准选择的恒定步长$\alpha$下，由TD学习生成的迭代在期望中保持一致有界。在第二步中，我们建立了一个递归，模拟了TD学习的稳态动态，受到了$O(\alpha^2)$阶的有界扰动的影响，这反映了马尔可夫采样的影响。结合这些部分，导致了一种明显简化现有证明的整体方法。我们推测我们的归纳证明技术将在更复杂的随机逼近算法分析中找到应用，并最后提供了一些这种应用的例子。

更新时间: 2024-06-25 22:18:09

领域: cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2403.02476v2

Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer

We introduce the class of SO-friendly neural networks, which include several models used in practice including networks with 2 layers of hidden weights where the number of inputs is larger than the number of outputs. SO-friendly networks have the property that performing a precise line search to set the step size on each iteration has the same asymptotic cost during full-batch training as using a fixed learning. Further, for the same cost a planesearch can be used to set both the learning and momentum rate on each step. Even further, SO-friendly networks also allow us to use subspace optimization to set a learning rate and momentum rate for each layer on each iteration. We explore augmenting gradient descent as well as quasi-Newton methods and Adam with line optimization and subspace optimization, and our experiments indicate that this gives fast and reliable ways to train these networks that are insensitive to hyper-parameters.

Updated: 2024-06-25 22:06:40

标题: 为什么要进行线性搜索，当你可以进行平面搜索？SO-Friendly神经网络允许对每一层的学习率和动量率进行每次迭代的优化

摘要: 我们介绍了SO友好型神经网络类，其中包括几种实际使用的模型，包括具有2层隐藏权重的网络，其中输入数量大于输出数量。SO友好型网络具有这样的特性，即在每次迭代中执行精确的线搜索来设置步长的渐近成本与使用固定学习率的全批量训练成本相同。此外，对于相同的成本，可以使用平面搜索来设置每一步的学习率和动量率。更进一步，SO友好型网络还允许我们使用子空间优化来设置每一层在每次迭代中的学习率和动量率。我们探讨了通过线优化和子空间优化来增强梯度下降以及拟牛顿方法和Adam方法，我们的实验表明，这提供了快速可靠的方法来训练这些网络，而且不敏感于超参数。

更新时间: 2024-06-25 22:06:40

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.17954v1

LINSCAN -- A Linearity Based Clustering Algorithm

DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms.

Updated: 2024-06-25 21:58:37

标题: LINSCAN -- 一种基于线性度的聚类算法

摘要: DBSCAN和OPTICS是用于识别点簇的强大算法，在数据结构方面几乎不能做出任何假设的领域中具有很大的优势。在本文中，我们利用这些优势，引入了一种新算法LINSCAN，旨在寻找难以找到和难以用现有方法隔离的线性簇。具体来说，通过将点嵌入作为逼近其局部邻域的正态分布，并利用从Kullback Leibler散度导出的距离函数，LINSCAN可以检测和区分在空间上接近但具有正交协方差的线性簇。我们展示了LINSCAN如何应用于地震数据，以识别活动断层，包括相交断层，并确定它们的方向。最后，我们讨论了DBSCAN和OPTICS的泛化必须具备哪些特性，以保留这些算法的稳定性优势。

更新时间: 2024-06-25 21:58:37

领域: cs.LG,cs.CG

下载: http://arxiv.org/abs/2406.17952v1

Navigating High-Degree Heterogeneity: Federated Learning in Aerial and Space Networks

Federated learning offers a compelling solution to the challenges of networking and data privacy within aerial and space networks by utilizing vast private edge data and computing capabilities accessible through drones, balloons, and satellites. While current research has focused on optimizing the learning process, computing efficiency, and minimizing communication overhead, the issue of heterogeneity and class imbalance remains a significant barrier to rapid model convergence. In our study, we explore the influence of heterogeneity on class imbalance, which diminishes performance in ASN-based federated learning. We illustrate the correlation between heterogeneity and class imbalance within grouped data and show how constraints such as battery life exacerbate the class imbalance challenge. Our findings indicate that ASN-based FL faces heightened class imbalance issues even with similar levels of heterogeneity compared to other scenarios. Finally, we analyze the impact of varying degrees of heterogeneity on FL training and evaluate the efficacy of current state-of-the-art algorithms under these conditions. Our results reveal that the heterogeneity challenge is more pronounced in ASN-based federated learning and that prevailing algorithms often fail to effectively address high levels of heterogeneity.

Updated: 2024-06-25 21:57:26

标题: 应该是：高度异质性导航：联合学习在航空和空间网络中的应用

摘要: 联邦学习通过利用通过无人机、气球和卫星可访问的庞大私人边缘数据和计算能力，为空中和空间网络中的网络和数据隐私挑战提供了引人注目的解决方案。尽管当前研究聚焦于优化学习过程、计算效率和最小化通信开销，但异质性和类别不平衡问题仍然是快速模型收敛的重要障碍。在我们的研究中，我们探讨了异质性对类别不平衡的影响，这降低了基于ASN的联邦学习的性能。我们展示了在分组数据中异质性和类别不平衡之间的相关性，并展示了诸如电池寿命等限制如何加剧类别不平衡挑战。我们的研究结果表明，与其他情况相比，基于ASN的FL面临着更加严重的类别不平衡问题，即使在相似的异质性水平下也是如此。最后，我们分析了不同程度的异质性对FL训练的影响，并评估了当前最先进算法在这些条件下的有效性。我们的结果显示，在基于ASN的联邦学习中，异质性挑战更加突出，并且现有算法经常无法有效解决高水平的异质性。

更新时间: 2024-06-25 21:57:26

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.17951v1

The Overcooked Generalisation Challenge

We introduce the Overcooked Generalisation Challenge (OGC) - the first benchmark to study agents' zero-shot cooperation abilities when faced with novel partners and levels in the Overcooked-AI environment. This perspective starkly contrasts a large body of previous work that has trained and evaluated cooperating agents only on the same level, failing to capture generalisation abilities required for real-world human-AI cooperation. Our challenge interfaces with state-of-the-art dual curriculum design (DCD) methods to generate auto-curricula for training general agents in Overcooked. It is the first cooperative multi-agent environment specially designed for DCD methods and, consequently, the first benchmarked with state-of-the-art methods. It is fully GPU-accelerated, built on the DCD benchmark suite minimax, and freely available under an open-source license: https://git.hcics.simtech.uni-stuttgart.de/public-projects/OGC. We show that current DCD algorithms struggle to produce useful policies in this novel challenge, even if combined with recent network architectures that were designed for scalability and generalisability. The OGC pushes the boundaries of real-world human-AI cooperation by enabling the research community to study the impact of generalisation on cooperating agents.

Updated: 2024-06-25 21:51:43

标题: 过度概括挑战

摘要: 我们引入了Overcooked Generalisation Challenge (OGC)——这是第一个用于研究代理人在面对Overcooked-AI环境中的新伙伴和关卡时的零样本合作能力的基准。这一观点与先前大量工作形成鲜明对比，该工作仅对同一级别的合作代理进行训练和评估，未能捕捉现实世界人工智能合作所需的泛化能力。我们的挑战与最先进的双课程设计（DCD）方法接口，为在Overcooked中训练通用代理生成自动课程。这是专为DCD方法设计的第一个合作多代理环境，因此也是第一个通过最先进方法进行基准测试的环境。它完全基于GPU加速，建立在DCD基准套件minimax上，并在开源许可下免费提供：https://git.hcics.simtech.uni-stuttgart.de/public-projects/OGC。我们表明，当前的DCD算法在这一新挑战中难以生成有用的策略，即使结合了最近为可伸缩性和泛化性而设计的网络架构。OGC通过使研究社区能够研究泛化对合作代理的影响，推动了现实世界人工智能合作的界限。

更新时间: 2024-06-25 21:51:43

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.17949v1

Hot-Distance: Combining One-Hot and Signed Distance Embeddings for Segmentation

Machine learning models are only as good as the data to which they are fit. As such, it is always preferable to use as much data as possible in training models. What data can be used for fitting a model depends a lot on the formulation of the task. We introduce Hot-Distance, a novel segmentation target that incorporates the strength of signed boundary distance prediction with the flexibility of one-hot encoding, to increase the amount of usable training data for segmentation of subcellular structures in focused ion beam scanning electron microscopy (FIB-SEM).

Updated: 2024-06-25 20:56:41

标题: 热距离：将 one-hot 和有符号距离嵌入结合用于分割

摘要: 机器学习模型的好坏取决于其拟合的数据。因此，在训练模型时，尽量使用尽可能多的数据是最好的。可以用于拟合模型的数据取决于任务的制定。我们引入了Hot-Distance，一种新颖的分割目标，将有符号边界距离预测的强度与一热编码的灵活性结合起来，以增加在聚焦离子束扫描电子显微镜（FIB-SEM）中用于细胞亚结构分割的可用训练数据量。

更新时间: 2024-06-25 20:56:41

领域: cs.CV,cs.LG,eess.IV,q-bio.QM,I.4.6

下载: http://arxiv.org/abs/2406.17936v1

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Reducing the inference latency of large language models (LLMs) is crucial, and speculative decoding (SD) stands out as one of the most effective techniques. Rather than letting the LLM generate all tokens directly, speculative decoding employs effective proxies to predict potential outputs, which are then verified by the LLM without compromising the generation quality. Yet, deploying SD in real online LLM serving systems (with continuous batching) does not always yield improvement -- under higher request rates or low speculation accuracy, it paradoxically increases latency. Furthermore, there is no best speculation length work for all workloads under different system loads. Based on the observations, we develop a dynamic framework SmartSpec. SmartSpec dynamically determines the best speculation length for each request (from 0, i.e., no speculation, to many tokens) -- hence the associated speculative execution costs -- based on a new metric called goodput, which characterizes the current observed load of the entire system and the speculation accuracy. We show that SmartSpec consistently reduces average request latency by up to 3.2x compared to non-speculative decoding baselines across different sizes of target models, draft models, request rates, and datasets. Moreover, SmartSpec can be applied to different styles of speculative decoding, including traditional, model-based approaches as well as model-free methods like prompt lookup and tree-style decoding.

Updated: 2024-06-25 20:53:16

标题: 通过优化Goodput为大型语言模型提供服务的投机解码 (Note: "Goodput" 是指有效的吞吐量，即传输层数据的实际传输速率)

摘要: 减少大型语言模型（LLMs）的推理延迟至关重要，而猜测解码（SD）被认为是最有效的技术之一。与其让LLM直接生成所有令牌，猜测解码利用有效的代理来预测潜在的输出，然后由LLM验证，而不会影响生成质量。然而，在实际在线LLM服务系统（具有连续批处理）中部署SD并不总是会带来改进 - 在更高的请求率或低猜测准确率下，它反而会增加延迟。此外，在不同系统负载下，没有适用于所有工作负载的最佳猜测长度的研究。基于这些观察结果，我们开发了一个动态框架SmartSpec。SmartSpec根据一个称为goodput的新度量，动态确定每个请求的最佳猜测长度（从0，即无猜测，到多个令牌） - 因此，相关的猜测执行成本 - 这个度量特征了整个系统当前观察到的负载和猜测准确度。我们展示了与非猜测解码基线相比，SmartSpec在不同大小的目标模型、草稿模型、请求率和数据集上能够一致将平均请求延迟减少高达3.2倍。此外，SmartSpec可以应用于不同风格的猜测解码，包括传统的基于模型的方法以及无模型方法，如提示查找和树状解码。

更新时间: 2024-06-25 20:53:16

领域: cs.AI,cs.PF

下载: http://arxiv.org/abs/2406.14066v2

CAT: Interpretable Concept-based Taylor Additive Models

As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with relevant names and their ground-truth values. In response, we propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process. CAT does not have to require domain experts to annotate concepts and their ground-truth values. Instead, it only requires users to simply categorize input features into broad groups, which can be easily accomplished through a quick metadata review. Specifically, CAT first embeds each group of input features into one-dimensional high-level concept representation, and then feeds the concept representations into a new white-box Taylor Neural Network (TaylorNet). The TaylorNet aims to learn the non-linear relationship between the inputs and outputs using polynomials. Evaluation results across multiple benchmarks demonstrate that CAT can outperform or compete with the baselines while reducing the need of extensive model parameters. Importantly, it can explain model predictions through high-level concepts that human can understand.

Updated: 2024-06-25 20:43:15

标题: CAT: 可解释的基于概念的泰勒加法模型

摘要: 作为一种新兴的可解释技术，广义加性模型（GAMs）采用神经网络来分别学习每个特征的非线性函数，然后通过线性模型将它们组合起来进行最终预测。尽管GAMs能够解释深度神经网络（DNNs）在特征级别上的工作，但它们需要大量的模型参数，容易过拟合，使其难以训练和扩展。此外，在真实世界的数据集中，具有许多特征，基于特征的解释可解释性会降低。为了解决这些问题，最近的研究已经转向基于概念的可解释方法。这些方法试图在进行预测之前将概念学习作为一个中间步骤，以人类可理解的概念解释预测结果。然而，这些方法需要领域专家大量标记具有相关名称和真实值的概念。作为回应，我们提出了CAT，一种新颖的可解释概念基于泰勒加性模型，简化了这个过程。CAT不需要领域专家标注概念及其真实值。相反，它只需要用户简单地将输入特征分类到广泛的组别中，这可以通过快速元数据审查轻松完成。具体来说，CAT首先将每组输入特征嵌入到一维高级概念表示中，然后将概念表示输入到新的白盒泰勒神经网络（TaylorNet）中。TaylorNet旨在使用多项式学习输入和输出之间的非线性关系。跨多个基准测试的评估结果表明，CAT可以胜过或与基线竞争，同时减少了对大量模型参数的需求。重要的是，它可以通过人类可理解的高级概念解释模型预测。

更新时间: 2024-06-25 20:43:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.17931v1

Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds

In this paper, we focus on learning a linear time-invariant (LTI) model with low-dimensional latent variables but high-dimensional observations. We provide an algorithm that recovers the high-dimensional features, i.e. column space of the observer, embeds the data into low dimensions and learns the low-dimensional model parameters. Our algorithm enjoys a sample complexity guarantee of order $\tilde{\mathcal{O}}(n/\epsilon^2)$, where $n$ is the observation dimension. We further establish a fundamental lower bound indicating this complexity bound is optimal up to logarithmic factors and dimension-independent constants. We show that this inevitable linear factor of $n$ is due to the learning error of the observer's column space in the presence of high-dimensional noises. Extending our results, we consider a meta-learning problem inspired by various real-world applications, where the observer column space can be collectively learned from datasets of multiple LTI systems. An end-to-end algorithm is then proposed, facilitating learning LTI systems from a meta-dataset which breaks the sample complexity lower bound in certain scenarios.

Updated: 2024-06-25 20:28:29

标题: 从高维观测中学习低维潜在动态：非渐近性和下界

摘要: 在这篇论文中，我们关注学习具有低维潜变量但高维观测的线性时不变（LTI）模型。我们提供了一种算法，可以恢复高维特征，即观察者的列空间，将数据嵌入低维并学习低维模型参数。我们的算法保证了样本复杂度为$\tilde{\mathcal{O}}(n/\epsilon^2)$，其中$n$是观测维度。我们进一步建立了一个基本下界，表明这个复杂度界限在对数因子和与维度无关的常数方面是最优的。我们表明在高维噪声存在的情况下，观察者的列空间学习误差导致这种不可避免的$n$的线性因素。扩展我们的结果，我们考虑了一个受各种现实应用启发的元学习问题，其中观察者列空间可以从多个LTI系统的数据集中集体学习。然后提出了一个端到端算法，可以从元数据集中学习LTI系统，在某些情况下打破了样本复杂度的下界。

更新时间: 2024-06-25 20:28:29

领域: eess.SY,cs.IT,cs.LG,cs.SY,math.IT

下载: http://arxiv.org/abs/2405.06089v3

Online Calibrated and Conformal Prediction Improves Bayesian Optimization

Accurate uncertainty estimates are important in sequential model-based decision-making tasks such as Bayesian optimization. However, these estimates can be imperfect if the data violates assumptions made by the model (e.g., Gaussianity). This paper studies which uncertainties are needed in model-based decision-making and in Bayesian optimization, and argues that uncertainties can benefit from calibration -- i.e., an 80% predictive interval should contain the true outcome 80% of the time. Maintaining calibration, however, can be challenging when the data is non-stationary and depends on our actions. We propose using simple algorithms based on online learning to provably maintain calibration on non-i.i.d. data, and we show how to integrate these algorithms in Bayesian optimization with minimal overhead. Empirically, we find that calibrated Bayesian optimization converges to better optima in fewer steps, and we demonstrate improved performance on standard benchmark functions and hyperparameter optimization tasks.

Updated: 2024-06-25 20:14:27

标题: 在线校准和一致性预测改进贝叶斯优化

摘要: 准确的不确定性估计在顺序模型为基础的决策任务中至关重要，如贝叶斯优化。然而，如果数据违反模型的假设（例如，高斯性），这些估计可能不完善。本文研究了在基于模型的决策和贝叶斯优化中需要哪些不确定性，并认为不确定性可以从校准中受益--即，80%的预测区间应该在80%的时间内包含真实结果。然而，在数据是非平稳的并且取决于我们的行动时，保持校准可能是具有挑战性的。我们提出使用基于在线学习的简单算法来可靠地在非独立同分布数据上维持校准，并展示如何在贝叶斯优化中集成这些算法而几乎没有额外负担。实证上，我们发现校准的贝叶斯优化在更少的步骤中收敛到更好的最优解，并且在标准基准函数和超参数优化任务上表现出更好的性能。

更新时间: 2024-06-25 20:14:27

领域: cs.LG,stat.ML,I.2; I.5

下载: http://arxiv.org/abs/2112.04620v5

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) in the perspective of sample complexity and generalization properties. First, we show that the path norm (as well as the Barron norm) is able to obtain width-independence sample complexity bounds, which allows for uniform convergence guarantees. Based on this result, we derive the improved result of metric entropy for $\epsilon$-covering up to $O(\epsilon^{-\frac{2d}{d+2}})$ ($d$ is the input dimension and the depending constant is at most linear order of $d$) via the convex hull technique, which demonstrates the separation with kernel methods with $\Omega(\epsilon^{-d})$ to learn the target function in a Barron space. Second, this metric entropy result allows for building a sharper generalization bound under a general moment hypothesis setting, achieving the rate at $O(n^{-\frac{d+2}{2d+2}})$. Our analysis is novel in that it offers a sharper and refined estimation for metric entropy with a linear dimension dependence and unbounded sampling in the estimation of the sample error and the output error.

Updated: 2024-06-25 20:08:29

标题: 学习具有规范约束、超参数化、两层神经网络

摘要: 最近的研究表明，再生核希尔伯特空间（RKHS）不适合用于神经网络来建模函数，因为当尝试逼近甚至一个单个的ReLU神经元时，无法避免维度诅咒（CoD）（Bach，2017）。在本文中，我们从样本复杂性和泛化特性的角度研究了适用于参数过度的两层神经网络的适用函数空间（例如，路径范数，Barron范数）。首先，我们展示了路径范数（以及Barron范数）能够获得独立于宽度的样本复杂性界限，从而实现了统一收敛保证。基于这一结果，我们通过凸包技术推导了$\epsilon$-覆盖的度量熵的改进结果，达到了$O(\epsilon^{-\frac{2d}{d+2}})$的速率（其中$d$是输入维度，且相关常数最多是$d$的线性阶），这展示了与Barron空间中学习目标函数的核方法的$\Omega(\epsilon^{-d})$的分离。其次，这一度量熵结果使得在一般矩假设设置下建立更尖锐的泛化界限成为可能，实现了$O(n^{-\frac{d+2}{2d+2}})$的速率。我们的分析是新颖的，因为它提供了一个对度量熵进行更为尖锐和精细估计的线性维度依赖性，并在样本误差和输出误差的估计中具有无界采样。

更新时间: 2024-06-25 20:08:29

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.18769v2

GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval

In our recent research, we have developed a framework called GraphSnapShot, which has been proven an useful tool for graph learning acceleration. GraphSnapShot is a framework for fast cache, storage, retrieval and computation for graph learning. It can quickly store and update the local topology of graph structure and allows us to track patterns in the structure of graph networks, just like take snapshots of the graphs. In experiments, GraphSnapShot shows efficiency, it can achieve up to 30% training acceleration and 73% memory reduction for lossless graph ML training compared to current baselines such as dgl.This technique is particular useful for large dynamic graph learning tasks such as social media analysis and recommendation systems to process complex relationships between entities.

Updated: 2024-06-25 20:00:32

标题: GraphSnapShot: 使用快速存储和检索实现图机器学习加速

摘要: 在我们最近的研究中，我们开发了一个名为GraphSnapShot的框架，已被证明是一种有用的用于图学习加速的工具。GraphSnapShot是一个用于图学习的快速缓存、存储、检索和计算的框架。它可以快速存储和更新图结构的局部拓扑，并允许我们跟踪图网络结构中的模式，就像拍摄图的快照一样。在实验中，GraphSnapShot显示出了高效性，与当前基准线如dgl相比，可以实现高达30%的训练加速和73%的内存减少，而且在无损图机器学习训练方面尤为有效。这种技术对于处理实体之间复杂关系的大型动态图学习任务尤其有用，比如社交媒体分析和推荐系统。

更新时间: 2024-06-25 20:00:32

领域: cs.LG,cs.DC,cs.SI

下载: http://arxiv.org/abs/2406.17918v1

Camera Model Identification Using Audio and Visual Content from Videos

The identification of device brands and models plays a pivotal role in the realm of multimedia forensic applications. This paper presents a framework capable of identifying devices using audio, visual content, or a fusion of them. The fusion of visual and audio content occurs later by applying two fundamental fusion rules: the product and the sum. The device identification problem is tackled as a classification one by leveraging Convolutional Neural Networks. Experimental evaluation illustrates that the proposed framework exhibits promising classification performance when independently using audio or visual content. Furthermore, although the fusion results don't consistently surpass both individual modalities, they demonstrate promising potential for enhancing classification performance. Future research could refine the fusion process to improve classification performance in both modalities consistently. Finally, a statistical significance test is performed for a more in-depth study of the classification results.

Updated: 2024-06-25 19:56:21

标题: 摄像机模型识别：利用视频中的音频和视觉内容

摘要: 设备品牌和型号的识别在多媒体取证应用领域中起着关键作用。本文提出了一个能够使用音频、视觉内容或它们的融合来识别设备的框架。通过应用两个基本融合规则：乘积和总和，视觉和音频内容的融合稍后发生。设备识别问题被视为一个分类问题，利用卷积神经网络来处理。实验评估表明，所提出的框架在独立使用音频或视觉内容时展现出有希望的分类性能。此外，尽管融合结果并不总是超越两个单独的模态，但它们展示了增强分类性能的有希望潜力。未来的研究可以完善融合过程，以持续改善两种模态的分类性能。最后，进行了统计显著性检验，以进行更深入地研究分类结果。

更新时间: 2024-06-25 19:56:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.17916v1

Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation

Dental panoramic radiographs offer vast diagnostic opportunities, but training supervised deep learning networks for automatic analysis of those radiology images is hampered by a shortage of labeled data. Here, a different perspective on this problem is introduced. A semi-supervised learning framework is proposed to classify thirteen dental conditions on panoramic radiographs, with a particular emphasis on teeth. Large language models were explored to annotate the most common dental conditions based on dental reports. Additionally, a masked autoencoder was employed to pre-train the classification neural network, and a Vision Transformer was used to leverage the unlabeled data. The analyses were validated using two of the most extensive datasets in the literature, comprising 8,795 panoramic radiographs and 8,029 paired reports and images. Encouragingly, the results consistently met or surpassed the baseline metrics for the Matthews correlation coefficient. A comparison of the proposed solution with human practitioners, supported by statistical analysis, highlighted its effectiveness and performance limitations; based on the degree of agreement among specialists, the solution demonstrated an accuracy level comparable to that of a junior specialist.

Updated: 2024-06-25 19:56:12

标题: 利用大型语言模型和实例分割在全景X光片中进行半监督分类的牙科疾病：一个真实世界数据集评估

摘要: 牙科全景X射线图提供了广泛的诊断机会，但由于标记数据的短缺，训练监督深度学习网络自动分析这些放射学图像的过程受到阻碍。在这里，引入了一个不同的视角来解决这个问题。提出了一个半监督学习框架，用于对全景X射线图上的十三种牙科疾病进行分类，特别强调牙齿。利用大型语言模型根据牙科报告对最常见的牙科疾病进行注释。此外，采用掩蔽自动编码器对分类神经网络进行预训练，并使用Vision Transformer来利用未标记数据。分析结果使用文献中最广泛的两个数据集进行验证，包括8,795张全景X射线图和8,029个成对报告和图像。令人鼓舞的是，结果始终达到或超过了马修斯相关系数的基准指标。通过与人类从业者进行比较，并经过统计分析的支持，突出了其有效性和性能限制；根据专家之间的一致性程度，该解决方案表明其准确性水平与初级专家相当。

更新时间: 2024-06-25 19:56:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17915v1

Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant reductions in developer toil, with up to 50% time saved in code documentation and autocompletion, and 30-40% in repetitive coding tasks, unit test generation, debugging, and pair programming. However, Copilot struggles with complex tasks, large functions, multiple files, and proprietary contexts, particularly with C/C++ code. We project a 33-36% time reduction for coding-related tasks in a cloud-first software development lifecycle. This study aims to quantify productivity improvements, identify underperforming scenarios, examine practical benefits and challenges, investigate performance variations across programming languages, and discuss emerging issues related to code quality, security, and developer experience.

Updated: 2024-06-25 19:51:21

标题: 改变软件开发：评估GitHub Copilot在现实项目中的效率和挑战

摘要: 生成式人工智能技术承诺改变产品开发周期。本研究评估了使用GitHub Copilot，一款AI驱动的编码助手，带来的效率提升、改进领域和新兴挑战。我们确定了15个软件开发任务，并通过大型专有代码库上的真实项目评估了Copilot的好处。我们的研究结果表明，开发者的繁重工作大幅减少，代码文档和自动补全节省了多达50%的时间，重复编码任务、单元测试生成、调试和配对编程节省了30-40%的时间。然而，Copilot在处理复杂任务、大函数、多个文件和专有上下文（尤其是C/C++代码）方面存在困难。我们预计在云为先的软件开发周期中，编码相关任务的时间将减少33-36%。本研究旨在量化生产力提升，确定表现不佳的情景，检验实际的好处和挑战，研究不同编程语言之间的性能差异，并讨论与代码质量、安全性和开发者体验相关的新兴问题。

更新时间: 2024-06-25 19:51:21

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.17910v1

Unbiasing on the Fly: Explanation-Guided Human Oversight of Machine Learning System Decisions

The widespread adoption of ML systems across critical domains like hiring, finance, and healthcare raises growing concerns about their potential for discriminatory decision-making based on protected attributes. While efforts to ensure fairness during development are crucial, they leave deployed ML systems vulnerable to potentially exhibiting discrimination during their operations. To address this gap, we propose a novel framework for on-the-fly tracking and correction of discrimination in deployed ML systems. Leveraging counterfactual explanations, the framework continuously monitors the predictions made by an ML system and flags discriminatory outcomes. When flagged, post-hoc explanations related to the original prediction and the counterfactual alternatives are presented to a human reviewer for real-time intervention. This human-in-the-loop approach empowers reviewers to accept or override the ML system decision, enabling fair and responsible ML operation under dynamic settings. While further work is needed for validation and refinement, this framework offers a promising avenue for mitigating discrimination and building trust in ML systems deployed in a wide range of domains.

Updated: 2024-06-25 19:40:55

标题: 即时去偏见化：基于解释的人类监督机器学习系统决策

摘要: 跨领域广泛采用机器学习系统，如招聘、金融和医疗保健等关键领域，引发了人们对其基于受保护属性可能导致歧视性决策的担忧。尽管在开发过程中确保公平性至关重要，但部署的机器学习系统仍然容易在操作过程中表现出歧视性。为了解决这一问题，我们提出了一个新颖的框架，用于对部署的机器学习系统进行歧视性的实时跟踪和校正。利用反事实解释，该框架持续监测机器学习系统所做的预测，并标记出具有歧视性结果的情况。一旦被标记，与原始预测和反事实替代方案相关的事后解释将呈现给人类审阅者进行实时干预。这种人机协作方法赋予审阅者接受或覆盖机器学习系统决策的权力，实现在动态环境下公平和负责任的机器学习操作。虽然需要进一步的验证和完善工作，但该框架为减轻歧视并在各种领域部署的机器学习系统中建立信任提供了一个有希望的途径。

更新时间: 2024-06-25 19:40:55

领域: cs.AI

下载: http://arxiv.org/abs/2406.17906v1

Application of Liquid Rank Reputation System for Twitter Trend Analysis on Bitcoin

Analyzing social media trends can create a win-win situation for both creators and consumers. Creators can receive fair compensation, while consumers gain access to engaging, relevant, and personalized content. This paper proposes a new model for analyzing Bitcoin trends on Twitter by incorporating a 'liquid democracy' approach based on user reputation. This system aims to identify the most impactful trends and their influence on Bitcoin prices and trading volume. It uses a Twitter sentiment analysis model based on a reputation rating system to determine the impact on Bitcoin price change and traded volume. In addition, the reputation model considers the users' higher-order friends on the social network (the initial Twitter input channels in our case study) to improve the accuracy and diversity of the reputation results. We analyze Bitcoin-related news on Twitter to understand how trends and user sentiment, measured through our Liquid Rank Reputation System, affect Bitcoin price fluctuations and trading activity within the studied time frame. This reputation model can also be used as an additional layer in other trend and sentiment analysis models. The paper proposes the implementation, challenges, and future scope of the liquid rank reputation model.

Updated: 2024-06-25 19:35:25

标题: Liquid Rank声誉系统在比特币Twitter趋势分析中的应用

摘要: 分析社交媒体趋势可以为创作者和消费者创造双赢局面。创作者可以获得公平的补偿，而消费者可以获得引人入胜、相关和个性化的内容。本文提出了一种新模型，通过引入基于用户声誉的“液态民主”方法来分析Twitter上的比特币趋势。该系统旨在识别对比特币价格和交易量产生最大影响的趋势及其影响。它利用基于声誉评级系统的Twitter情感分析模型来确定对比特币价格变化和交易量的影响。此外，声誉模型考虑社交网络上的用户的高阶朋友（在我们的案例研究中是初始Twitter输入渠道），以提高声誉结果的准确性和多样性。我们分析了Twitter上与比特币相关的新闻，以了解通过我们的Liquid Rank Reputation System测量的趋势和用户情感如何影响研究时间范围内的比特币价格波动和交易活动。这种声誉模型还可以作为其他趋势和情感分析模型的附加层次使用。本文提出了液体排名声誉模型的实施、挑战和未来范围。

更新时间: 2024-06-25 19:35:25

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2406.17904v1

Scientific Machine Learning Based Reduced-Order Models for Plasma Turbulence Simulations

This paper focuses on the construction of non-intrusive Scientific Machine Learning (SciML) Reduced-Order Models (ROMs) for plasma turbulence simulations. In particular, we propose using Operator Inference (OpInf) to build low-cost physics-based ROMs from data for such simulations. As a representative example, we focus on the Hasegawa-Wakatani (HW) equations used for modeling two-dimensional electrostatic drift-wave turbulence. For a comprehensive perspective of the potential of OpInf to construct accurate ROMs, we consider three setups for the HW equations by varying a key model parameter, namely the adiabaticity coefficient. These setups lead to the formation of complex and nonlinear dynamics, which makes the construction of accurate ROMs of any kind challenging. We generate the training datasets by performing direct numerical simulations of the HW equations and recording the computed state data and outputs the over a time horizon of 100 time units in the turbulent phase. We then use these datasets to construct OpInf ROMs for predictions over 400 additional time units. Our results show that the OpInf ROMs capture the important features of the turbulent dynamics and generalize beyond the training time horizon while reducing the computational effort of the high-fidelity simulation by up to five orders of magnitude. In the broader context of fusion research, this shows that non-intrusive SciML ROMs have the potential to drastically accelerate numerical studies, which can ultimately enable tasks such as the design of optimized fusion devices.

Updated: 2024-06-25 19:30:28

标题: 基于科学机器学习的等离子体湍流模拟简化模型

摘要: 本文关注非侵入式科学机器学习（SciML）简化模型（ROMs）在等离子体湍流模拟中的构建。具体来说，我们提出使用操作推断（OpInf）从数据构建基于物理的低成本ROMs用于这种模拟。作为代表性例子，我们关注用于建模二维静电漂移波湍流的Hasegawa-Wakatani（HW）方程。为了全面了解OpInf构建准确ROMs的潜力，我们考虑通过改变关键模型参数，即绝热系数，对HW方程进行三种设置。这些设置导致复杂和非线性动态的形成，这使得任何类型的准确ROMs的构建具有挑战性。我们通过对HW方程进行直接数值模拟并在湍流阶段的时间范围内记录计算状态数据和输出来生成训练数据集，持续时间为100个时间单位。然后我们使用这些数据集构建OpInf ROMs进行预测，持续时间为额外的400个时间单位。我们的结果表明，OpInf ROMs捕捉了湍流动态的重要特征，并且在超出训练时间范围的情况下具有泛化能力，同时将高保真度模拟的计算工作量减少了高达五个数量级。在聚变研究的更广泛背景下，这表明非侵入式SciML ROMs具有大大加速数值研究的潜力，最终可以实现优化聚变装置的设计等任务。

更新时间: 2024-06-25 19:30:28

领域: physics.comp-ph,cs.CE,cs.LG,physics.plasm-ph

下载: http://arxiv.org/abs/2401.05972v2

Domain Adaptation of Echocardiography Segmentation Via Reinforcement Learning

Performance of deep learning segmentation models is significantly challenged in its transferability across different medical imaging domains, particularly when aiming to adapt these models to a target domain with insufficient annotated data for effective fine-tuning. While existing domain adaptation (DA) methods propose strategies to alleviate this problem, these methods do not explicitly incorporate human-verified segmentation priors, compromising the potential of a model to produce anatomically plausible segmentations. We introduce RL4Seg, an innovative reinforcement learning framework that reduces the need to otherwise incorporate large expertly annotated datasets in the target domain, and eliminates the need for lengthy manual human review. Using a target dataset of 10,000 unannotated 2D echocardiographic images, RL4Seg not only outperforms existing state-of-the-art DA methods in accuracy but also achieves 99% anatomical validity on a subset of 220 expert-validated subjects from the target domain. Furthermore, our framework's reward network offers uncertainty estimates comparable with dedicated state-of-the-art uncertainty methods, demonstrating the utility and effectiveness of RL4Seg in overcoming domain adaptation challenges in medical image segmentation.

Updated: 2024-06-25 19:26:39

标题: 通过强化学习实现超声心动图分割的领域自适应

摘要: 深度学习分割模型在不同医学影像领域之间的可转移性表现受到明显挑战，特别是在试图将这些模型调整到目标领域时，目标领域缺乏足够注释数据以进行有效微调。虽然现有的领域适应（DA）方法提出了缓解这一问题的策略，但这些方法并没有明确地将人工验证的分割先验纳入其中，从而影响了模型产生解剖合理分割的潜力。我们引入了RL4Seg，这是一个创新的强化学习框架，它减少了在目标领域中需要大量专家注释数据的需求，并消除了冗长的人工审查的需要。使用一个包含10,000张未注释的2D超声心动图像的目标数据集，RL4Seg不仅在准确性上优于现有的最先进的DA方法，还在目标领域中的220个专家验证的受试者子集上实现了99％的解剖学有效性。此外，我们的框架的奖励网络提供了与专门的最先进的不确定性方法相当的不确定性估计，展示了RL4Seg在克服医学图像分割中的领域适应挑战中的实用性和有效性。

更新时间: 2024-06-25 19:26:39

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17902v1

Entity Augmentation for Efficient Classification of Vertically Partitioned Data with Limited Overlap

Vertical Federated Learning (VFL) is a machine learning paradigm for learning from vertically partitioned data (i.e. features for each input are distributed across multiple "guest" clients and an aggregating "host" server owns labels) without communicating raw data. Traditionally, VFL involves an "entity resolution" phase where the host identifies and serializes the unique entities known to all guests. This is followed by private set intersection to find common entities, and an "entity alignment" step to ensure all guests are always processing the same entity's data. However, using only data of entities from the intersection means guests discard potentially useful data. Besides, the effect on privacy is dubious and these operations are computationally expensive. We propose a novel approach that eliminates the need for set intersection and entity alignment in categorical tasks. Our Entity Augmentation technique generates meaningful labels for activations sent to the host, regardless of their originating entity, enabling efficient VFL without explicit entity alignment. With limited overlap between training data, this approach performs substantially better (e.g. with 5% overlap, 48.1% vs 69.48% test accuracy on CIFAR-10). In fact, thanks to the regularizing effect, our model performs marginally better even with 100% overlap.

Updated: 2024-06-25 19:20:10

标题: 实体增强以提高具有有限重叠的垂直分割数据的有效分类

摘要: 纵向联邦学习（VFL）是一种用于从纵向分区数据中学习的机器学习范例（即每个输入的特征分布在多个“客户”客户端上，一个聚合“主机”服务器拥有标签），而无需通信原始数据。传统上，VFL 包括一个“实体解决”阶段，其中主机识别和序列化所有客户端已知的唯一实体。然后进行私有集交集以找到共同实体，并进行“实体对齐”步骤以确保所有客户端始终处理相同实体的数据。然而，仅使用交集中的实体数据意味着客户端丢弃了可能有用的数据。此外，对隐私的影响存疑，并且这些操作在计算上是昂贵的。我们提出了一种新颖的方法，消除了分类任务中对集合交集和实体对齐的需求。我们的实体增强技术为发送到主机的激活生成有意义的标签，无论其来自哪个实体，从而实现了高效的 VFL，而无需显式实体对齐。在训练数据之间存在有限的重叠时，这种方法表现得更好（例如，在 CIFAR-10 上的测试准确率从 5% 的重叠到 48.1% 对 69.48%）。实际上，由于正则化效果，即使有 100% 的重叠，我们的模型也表现得稍微更好。

更新时间: 2024-06-25 19:20:10

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2406.17899v1

Human-centered In-building Embodied Delivery Benchmark

Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station. This environment also includes autonomous human characters and robots with grasping and mobility capabilities, as well as a large number of interactive items. Based on this environment, we have built a delivery dataset containing 13k language instructions to guide robots in providing services. We simulate human behavior through human characters and sample their various needs in daily life. Finally, we proposed a method centered around a large multimodal model to serve as the baseline system for this dataset. Compared to past embodied data work, our work focuses on a virtual environment centered around human-robot interaction for commercial scenarios. We believe this will bring new perspectives and exploration angles to the embodied community.

Updated: 2024-06-25 19:19:10

标题: 以人为中心的楼内实体交付基准

摘要: 最近，人们普遍接受并推广了具有体现智能的概念，这自然引发人们考虑该领域的商业化潜力。在这项工作中，我们提出了一个具体的商业场景模拟，即以人类为中心的楼内体现交付。此外，为了这个场景，我们从零开始开发了一个全新的虚拟环境系统，构建了一个模仿极地研究站的多层连通建筑空间。这个环境还包括具有抓取和移动能力的自主人类角色和机器人，以及大量的互动物品。基于这个环境，我们建立了一个包含13k语言指令的交付数据集，用于指导机器人提供服务。我们通过人类角色模拟人类行为，并抽样他们在日常生活中的各种需求。最后，我们提出了一种以大型多模态模型为核心的方法，作为这个数据集的基线系统。与过去的体现数据工作相比，我们的工作侧重于以人类-机器人互动为中心的商业场景的虚拟环境。我们相信这将为体现社区带来新的视角和探索角度。

更新时间: 2024-06-25 19:19:10

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.17898v1

Efficient and Effective Implicit Dynamic Graph Neural Network

Implicit graph neural networks have gained popularity in recent years as they capture long-range dependencies while improving predictive performance in static graphs. Despite the tussle between performance degradation due to the oversmoothing of learned embeddings and long-range dependency being more pronounced in dynamic graphs, as features are aggregated both across neighborhood and time, no prior work has proposed an implicit graph neural model in a dynamic setting. In this paper, we present Implicit Dynamic Graph Neural Network (IDGNN) a novel implicit neural network for dynamic graphs which is the first of its kind. A key characteristic of IDGNN is that it demonstrably is well-posed, i.e., it is theoretically guaranteed to have a fixed-point representation. We then demonstrate that the standard iterative algorithm often used to train implicit models is computationally expensive in our dynamic setting as it involves computing gradients, which themselves have to be estimated in an iterative manner. To overcome this, we pose an equivalent bilevel optimization problem and propose an efficient single-loop training algorithm that avoids iterative computation by maintaining moving averages of key components of the gradients. We conduct extensive experiments on real-world datasets on both classification and regression tasks to demonstrate the superiority of our approach over the state-of-the-art baselines. We also demonstrate that our bi-level optimization framework maintains the performance of the expensive iterative algorithm while obtaining up to \textbf{1600x} speed-up.

Updated: 2024-06-25 19:07:21

标题: 高效且有效的隐式动态图神经网络

摘要: 最近，隐式图神经网络在静态图中捕捉长程依赖关系并提高预测性能而广受欢迎。尽管由于学习嵌入的过度平滑和跨邻域和时间聚合特征，动态图中的性能下降和长程依赖更为显著，但先前的研究尚未在动态环境中提出隐式图神经模型。在本文中，我们提出了一种新颖的隐式动态图神经网络（IDGNN），这是其首创性质。IDGNN的一个关键特征是其明显是有解的，即理论上保证具有一个固定点表示。然后我们证明，通常用于训练隐式模型的标准迭代算法在我们的动态环境中计算代价昂贵，因为涉及计算梯度，这些梯度本身必须以迭代方式估计。为了克服这一问题，我们提出了一个等效的双层优化问题，并提出了一个高效的单循环训练算法，通过保持梯度的关键组件的移动平均值来避免迭代计算。我们在真实数据集上进行了大量实验，包括分类和回归任务，以展示我们的方法优于最先进的基线方法。我们还证明，我们的双层优化框架在获得高达1600倍加速的同时保持了昂贵迭代算法的性能。

更新时间: 2024-06-25 19:07:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.17894v1

Visualize and Paint GAN Activations

We investigate how generated structures of GANs correlate with their activations in hidden layers, with the purpose of better understanding the inner workings of those models and being able to paint structures with unconditionally trained GANs. This gives us more control over the generated images, allowing to generate them from a semantic segmentation map while not requiring such a segmentation in the training data. To this end we introduce the concept of tileable features, allowing us to identify activations that work well for painting.

Updated: 2024-06-25 19:05:11

标题: 可视化和绘制GAN激活

摘要: 我们研究了生成对抗网络（GANs）的生成结构与它们在隐藏层中的激活之间的相关性，目的是更好地理解这些模型的内部工作原理，并能够使用无条件训练的GANs绘制结构。这使我们能够更好地控制生成的图像，可以从语义分割地图中生成它们，而不需要在训练数据中具有这样的分割。为此，我们引入了可平铺特征的概念，使我们能够识别适用于绘画的激活。

更新时间: 2024-06-25 19:05:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.15636v2

Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom scores in three years SCIT. Methods: The research develops and analyzes two models, sequential latent-variable model (SLVM) of Sequential Latent Actor-Critic (SLAC) and Long Short-Term Memory (LSTM) evaluating them based on scoring and adherence prediction capabilities. Results: Excluding the biased samples at the first time step, the predictive adherence accuracy of the SLAC models is from 60\% to 72\%, and for LSTM models, it is 66\% to 84\%, varying according to the time steps. The range of Root Mean Square Error (RMSE) for SLAC models is between 0.93 and 2.22, while for LSTM models it is between 1.09 and 1.77. Notably, these RMSEs are significantly lower than the random prediction error of 4.55. Conclusion: We creatively apply sequential models in the long-term management of SCIT with promising accuracy in the prediction of SCIT nonadherence in AR patients. While LSTM outperforms SLAC in adherence prediction, SLAC excels in score prediction for patients undergoing SCIT for AR. The state-action-based SLAC adds flexibility, presenting a novel and effective approach for managing long-term AIT.

Updated: 2024-06-25 19:01:58

标题: 连续模型用于预测过敏性鼻炎患者皮下免疫疗法的依从性

摘要: 目标：皮下免疫治疗（SCIT）是过敏性鼻炎（AR）的持久性因果治疗。如何增强患者的依从性以最大化变应原免疫治疗（AIT）的效益在AIT的管理中起着至关重要的作用。本研究旨在利用新颖的机器学习模型精确预测AR患者在三年SCIT中的非依从性风险和相关局部症状评分。方法：研究开发并分析两个模型，顺序潜变量模型（SLVM）和长短期记忆（LSTM），基于评分和依从性预测能力对它们进行评估。结果：在第一个时间步中排除偏倚样本后，SLAC模型的预测依从性准确率在60％至72％之间，而LSTM模型的准确率在66％至84％之间，根据时间步骤的不同而变化。SLAC模型的均方根误差（RMSE）范围为0.93至2.22，而LSTM模型的范围为1.09至1.77。值得注意的是，这些RMSE明显低于随机预测误差4.55。结论：我们在SCIT的长期管理中创造性地应用顺序模型，对AR患者的SCIT非依从性进行准确预测，表现出有希望的准确性。虽然LSTM在依从性预测方面优于SLAC，但SLAC在接受AR SCIT的患者的评分预测方面表现优异。基于状态-动作的SLAC增加了灵活性，呈现出一种新颖且有效的管理长期AIT的方法。

更新时间: 2024-06-25 19:01:58

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2401.11447v3

SigKAN: Signature-Weighted Kolmogorov-Arnold Networks for Time Series

We propose a novel approach that enhances multivariate function approximation using learnable path signatures and Kolmogorov-Arnold networks (KANs). We enhance the learning capabilities of these networks by weighting the values obtained by KANs using learnable path signatures, which capture important geometric features of paths. This combination allows for a more comprehensive and flexible representation of sequential and temporal data. We demonstrate through studies that our SigKANs with learnable path signatures perform better than conventional methods across a range of function approximation challenges. By leveraging path signatures in neural networks, this method offers intriguing opportunities to enhance performance in time series analysis and time series forecasting, among other fields.

Updated: 2024-06-25 18:58:39

标题: SigKAN：用于时间序列的签名加权 Kolmogorov-Arnold 网络

摘要: 我们提出了一种新颖的方法，通过可学习的路径签名和科尔莫戈洛夫-阿诺德网络（KANs）来增强多元函数逼近。我们通过使用可学习的路径签名对KANs获得的数值进行加权来增强这些网络的学习能力，这些路径签名捕捉了路径的重要几何特征。这种组合允许更全面和灵活地表示顺序和时间数据。通过研究表明，我们的具有可学习路径签名的SigKANs在一系列函数逼近挑战中表现优于传统方法。通过在神经网络中利用路径签名，这种方法为增强时间序列分析和时间序列预测等领域的性能提供了有趣的机会。

更新时间: 2024-06-25 18:58:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.17890v1

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design

CTBench is introduced as a benchmark to assess language models (LMs) in aiding clinical study design. Given study-specific metadata, CTBench evaluates AI models' ability to determine the baseline features of a clinical trial (CT), which include demographic and relevant features collected at the trial's start from all participants. These baseline features, typically presented in CT publications (often as Table 1), are crucial for characterizing study cohorts and validating results. Baseline features, including confounders and covariates, are also necessary for accurate treatment effect estimation in studies involving observational data. CTBench consists of two datasets: "CT-Repo," containing baseline features from 1,690 clinical trials sourced from clinicaltrials.gov, and "CT-Pub," a subset of 100 trials with more comprehensive baseline features gathered from relevant publications. Two LM-based evaluation methods are developed to compare the actual baseline feature lists against LM-generated responses. "ListMatch-LM" and "ListMatch-BERT" use GPT-4o and BERT scores (at various thresholds), respectively, for evaluation. To establish baseline results, advanced prompt engineering techniques using LLaMa3-70B-Instruct and GPT-4o in zero-shot and three-shot learning settings are applied to generate potential baseline features. The performance of GPT-4o as an evaluator is validated through human-in-the-loop evaluations on the CT-Pub dataset, where clinical experts confirm matches between actual and LM-generated features. The results highlight a promising direction with significant potential for improvement, positioning CTBench as a useful tool for advancing research on AI in CT design and potentially enhancing the efficacy and robustness of CTs.

Updated: 2024-06-25 18:52:48

标题: CTBench：用于评估临床试验设计中语言模型能力的综合基准

摘要: CTBench被引入作为一个基准，用于评估语言模型（LMs）在辅助临床研究设计方面的表现。考虑到研究特定的元数据，CTBench评估人工智能模型确定临床试验（CT）基线特征的能力，这些特征包括在试验开始时从所有参与者收集的人口统计学和相关特征。这些基线特征通常在CT出版物中呈现（通常作为表1），对于描述研究群体和验证结果至关重要。基线特征，包括混杂因素和协变量，在涉及观察数据的研究中准确估计治疗效果也是必要的。CTBench由两个数据集组成：“CT-Repo”，包含来自clinicaltrials.gov的1,690个临床试验的基线特征，以及“CT-Pub”，一个从相关出版物中收集的具有更全面基线特征的100个试验的子集。开发了两种基于LM的评估方法，用于比较实际基线特征列表与LM生成的响应。“ListMatch-LM”和“ListMatch-BERT”分别使用GPT-4o和BERT分数（在各种阈值下）进行评估。为了建立基线结果，使用LLaMa3-70B-Instruct和GPT-4o进行高级提示工程技术，在零-shot和三-shot学习设置中生成潜在的基线特征。通过人机协同评估验证了GPT-4o作为评估者的性能，在CT-Pub数据集上，临床专家确认实际特征与LM生成特征之间的匹配。结果突显了一个具有显著改进潜力的有希望的方向，将CTBench定位为推动AI在CT设计研究中的有用工具，并有可能提高CT的效力和健壮性。

更新时间: 2024-06-25 18:52:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17888v1

Federated Dynamical Low-Rank Training with Global Loss Convergence Guarantees

In this work, we propose a federated dynamical low-rank training (FeDLRT) scheme to reduce client compute and communication costs - two significant performance bottlenecks in horizontal federated learning. Our method builds upon dynamical low-rank splitting schemes for manifold-constrained optimization to create a global low-rank basis of network weights, which enables client training on a small coefficient matrix. A consistent global low-rank basis allows us to incorporate a variance correction scheme and prove global loss descent and convergence to a stationary point. Dynamic augmentation and truncation of the low-rank bases automatically optimizes computing and communication resource utilization. We demonstrate the efficiency of FeDLRT in an array of computer vision benchmarks and show a reduction of client compute and communication costs by up to an order of magnitude with minimal impacts on global accuracy.

Updated: 2024-06-25 18:51:08

标题: 具有全局损失收敛保证的联邦式动态低秩训练

摘要: 在这项工作中，我们提出了一种联邦动态低秩训练（FeDLRT）方案，以减少客户端计算和通信成本 - 这是水平联邦学习中两个重要的性能瓶颈。我们的方法基于用于流形约束优化的动态低秩分裂方案，以创建网络权重的全局低秩基础，从而使客户端在一个小系数矩阵上进行训练。一致的全局低秩基础使我们能够结合方差校正方案，并证明全局损失下降和收敛到稳定点。低秩基础的动态增强和截断自动优化计算和通信资源利用。我们在一系列计算机视觉基准测试中展示了FeDLRT的效率，并显示客户端计算和通信成本可降低一个数量级，对全局准确性影响很小。

更新时间: 2024-06-25 18:51:08

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2406.17887v1

Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

In Explainable AI, rule extraction translates model knowledge into logical rules, such as IF-THEN statements, crucial for understanding patterns learned by black-box models. This could significantly aid in fields like disease diagnosis, disease progression estimation, or drug discovery. However, such application domains often contain imbalanced data, with the class of interest underrepresented. Existing methods inevitably compromise the performance of rules for the minor class to maximise the overall performance. As the first attempt in this field, we propose a model-agnostic approach for extracting rules from specific subgroups of data, featuring automatic rule generation for numerical features. This method enhances the regional explainability of machine learning models and offers wider applicability compared to existing methods. We additionally introduce a new method for selecting features to compose rules, reducing computational costs in high-dimensional spaces. Experiments across various datasets and models demonstrate the effectiveness of our methods.

Updated: 2024-06-25 18:47:50

标题: 通过自动和与模型无关的规则提取实现区域性可解释性

摘要: 在可解释的人工智能中，规则提取将模型知识转化为逻辑规则，例如IF-THEN语句，这对于理解黑匣子模型学习到的模式至关重要。这可以在疾病诊断、疾病进展估计或药物发现等领域显著帮助。然而，这些应用领域通常包含不平衡数据，感兴趣的类别数量不足。现有方法不可避免地会牺牲次要类别的规则性能，以最大化整体性能。作为该领域的首次尝试，我们提出了一种从特定数据子集中提取规则的模型无关方法，具有针对数值特征的自动规则生成。这种方法增强了机器学习模型的区域可解释性，并相对于现有方法具有更广泛的适用性。我们另外引入一种新的特征选择方法来组成规则，减少了在高维空间中的计算成本。在各种数据集和模型上的实验证明了我们方法的有效性。

更新时间: 2024-06-25 18:47:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17885v1

Cyber Security Operations Educational Gamification Application Listing

This listing contains a total of 80 gamification applications (GA)s used in cyber security operations (CSO) undergraduate education, from 74 publications, published between 2007 and June 2022. The listing outlines each GA identified and provides a short overview of each. This listing serves as both a comprehensive repository of existing GAs in cybersecurity undergraduate education, and as a starting point for adding new CSO GAs to the list. Contact the first author to add a CSO GA to the next version of the list.

Updated: 2024-06-25 18:42:52

标题: 网络安全运营教育游戏化应用清单

摘要: 这份清单包含了在网络安全运营本科教育中使用的80个游戏化应用程序（GA），来自74篇发表于2007年至2022年6月之间的出版物。该清单概述了每个确定的GA，并提供了简要概述。这份清单既是现有网络安全本科教育中游戏化应用程序的全面存储库，也是将新的网络安全运营游戏化应用程序添加到清单的起点。请联系第一作者，以将CSO GA添加到下一个版本的清单中。

更新时间: 2024-06-25 18:42:52

领域: cs.CR,cs.GT,cs.HC,H.0; J.0; K.3

下载: http://arxiv.org/abs/2406.17882v1

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples.

Updated: 2024-06-25 18:38:28

标题: SpeechX：神经编解码器语言模型作为多功能语音变换器

摘要: 最近基于音频文本提示的生成式语音模型取得了显著的进展，使得高质量的零样本文本转语音等创新成为可能。然而，现有模型在处理涉及转换输入语音和处理在恶劣声学条件下捕获的音频的各种音频文本语音生成任务时仍面临限制。本文介绍了SpeechX，一种多功能语音生成模型，能够进行零样本TTS和各种语音转换任务，处理干净和嘈杂信号。SpeechX将神经编解码器语言建模与多任务学习相结合，使用任务相关提示，实现统一和可扩展的建模，为在语音增强和转换任务中利用文本输入提供一致的方法。实验结果显示，SpeechX在各种任务中表现出良好的效果，包括零样本TTS、噪声抑制、目标说话者提取、语音去除以及带有或不带背景噪声的语音编辑，其性能可与专门模型相媲美甚至更优。请访问https://aka.ms/speechx查看演示样本。

更新时间: 2024-06-25 18:38:28

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2308.06873v2

Confabulation: The Surprising Value of Large Language Model Hallucinations

This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication. In other words, it has potential value. Specifically, we analyze popular hallucination benchmarks and reveal that hallucinated outputs display increased levels of narrativity and semantic coherence relative to veridical outputs. This finding reveals a tension in our usually dismissive understandings of confabulation. It suggests, counter-intuitively, that the tendency for LLMs to confabulate may be intimately associated with a positive capacity for coherent narrative-text generation.

Updated: 2024-06-25 18:37:19

标题: 混淆：大型语言模型幻觉的惊人价值

摘要: 本文系统地为大型语言模型（LLM）的幻觉或“虚构”提出了一种防御性观点，认为其可能是一种资源而非绝对的负面陷阱。传统观点认为虚构在本质上存在问题，人工智能研究应该消除这一缺陷。在本文中，我们论证并实证地证明了LLM虚构的可测语义特征反映了人类倾向于利用增加的叙事性作为认知资源进行意义构建和沟通。换句话说，它具有潜在价值。具体而言，我们分析了流行的幻觉基准，并揭示了幻觉输出相对于真实输出显示出更高水平的叙事性和语义连贯性。这一发现揭示了我们通常对虚构的鄙视理解中存在的一种紧张关系。它反直觉地表明，LLM倾向于虚构可能与生成连贯叙述文本的积极能力密切相关。

更新时间: 2024-06-25 18:37:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.04175v2

ET tu, CLIP? Addressing Common Object Errors for Unseen Environments

We introduce a simple method that employs pre-trained CLIP encoders to enhance model generalization in the ALFRED task. In contrast to previous literature where CLIP replaces the visual encoder, we suggest using CLIP as an additional module through an auxiliary object detection objective. We validate our method on the recently proposed Episodic Transformer architecture and demonstrate that incorporating CLIP improves task performance on the unseen validation set. Additionally, our analysis results support that CLIP especially helps with leveraging object descriptions, detecting small objects, and interpreting rare words.

Updated: 2024-06-25 18:35:13

标题: ET tu, CLIP? 解决未知环境中的常见对象错误

摘要: 我们介绍了一种简单的方法，利用预训练的CLIP编码器来增强ALFRED任务中模型的泛化能力。与先前文献中将CLIP替换视觉编码器的方法相比，我们建议将CLIP作为附加模块，通过辅助目标检测目标来使用。我们在最近提出的Episodic Transformer架构上验证了我们的方法，并证明了整合CLIP可以提高未见验证集上的任务性能。此外，我们的分析结果支持CLIP特别有助于利用对象描述、检测小对象和解释罕见词语。

更新时间: 2024-06-25 18:35:13

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.17876v1

Generating Music with Structure Using Self-Similarity as Attention

Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator (SING) system, a deep learning autonomous music generation system with two layers. The first is a vanilla Long Short Term Memory layer, and the second is the proposed attention layer. During generation, this attention mechanism imposes a suggested structure from a template piece on the generated music. We train SING on the MAESTRO dataset using a novel variable batching method, and compare its performance to the same model without the attention mechanism. The addition of our proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.

Updated: 2024-06-25 18:26:07

标题: 使用自相似性作为关注点生成具有结构的音乐

摘要: 尽管深度学习和生成AI方面的创新不断发展，但在音乐生成中创建长期结构以及音乐作品中常见的重复结构层仍然是一个挑战。我们提出了一种注意力层，采用一种新颖的方法，将用户提供的自相似矩阵应用到先前的时间步，并在我们的相似性激励神经生成器（SING）系统中进行演示，这是一个具有两层的深度学习自主音乐生成系统。第一层是一个普通的长短期记忆层，第二层是提出的注意力层。在生成过程中，这种注意力机制将一个模板作品中的建议结构施加到生成的音乐上。我们使用一种新颖的可变批处理方法在MAESTRO数据集上训练SING，并将其性能与没有注意力机制的相同模型进行比较。我们提出的注意力机制的添加显著提高了网络复制特定结构的能力，并且在未见测试集上的表现优于没有注意力机制的模型。

更新时间: 2024-06-25 18:26:07

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.15647v2

Contextual Dynamic Pricing with Strategic Buyers

Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.

Updated: 2024-06-25 18:25:54

标题: 含战略买家的情境动态定价

摘要: 个性化定价是企业常用的一种消费者特定定价政策的实施方式，它基于个体特征来调整价格。在这个过程中，买家也可以通过策略性地操纵其特征数据来获得更低的价格，但会产生一定的操纵成本。这种策略行为可能会阻碍企业最大化利润。本文研究了具有策略性买家的情境动态定价问题。卖方看不到买家的真实特征，而是根据买家的策略行为看到被操纵的特征。此外，卖方看不到买家对产品的估值，而只能看到一个二元响应，表明是否发生销售。鉴于这些挑战，我们提出了一个将买家的策略行为纳入在线学习中以最大化卖方累积收入的策略性动态定价政策。我们首先证明，现有忽略买家策略行为的非策略性定价政策导致线性 $\Omega(T)$ 遗憾，其中 $T$ 为总时间跨度，表明这些政策不比随机定价政策好。然后我们建立了我们提出的政策实现了 $O(\sqrt{T})$ 的次线性遗憾上限。重要的是，我们的政策不仅仅是现有动态定价政策和处理策略行为的算法的结合。我们的政策还可以适应边际操纵成本事先未知的情况。为了解决这个问题，我们同时估计在线定价政策中的估值参数和成本参数，这也被证明实现了 $O(\sqrt{T})$ 的遗憾上限。大量实验证实了我们的理论发展，并展示了我们的政策相对于其他不了解策略行为的定价政策的卓越性能。

更新时间: 2024-06-25 18:25:54

领域: stat.ML,cs.AI,cs.GT,cs.LG

下载: http://arxiv.org/abs/2307.04055v2

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast and showing an option to protect a pre-trained classifier against natural distribution shifts, claiming to be ready for real-world scenarios. However, its efficacy in handling adversarial examples has been neglected in the majority of studies. This paper investigates the adversarial robustness of the 16 post-hoc detectors on several evasion attacks and discuss a roadmap towards adversarial defense in OOD detectors.

Updated: 2024-06-25 18:21:17

标题: 解读事后OOD检测器对抗性鲁棒性的定义

摘要: 在现实世界场景中安全部署深度学习模型，检测到分布外（OOD）输入至关重要。近年来，许多OOD检测器已经开发，并且甚至进行了标准化基准测试，即OpenOOD。事后检测器的数量正在迅速增长，并显示一种选择，可以保护预训练的分类器免受自然分布转移的影响，声称已准备好应对现实世界场景。然而，在大多数研究中，它们对处理敌对示例的有效性已被忽视。本文研究了16个事后检测器在多种规避攻击中的对抗鲁棒性，并讨论了在OOD检测器中走向对抗性防御的路线图。

更新时间: 2024-06-25 18:21:17

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.15104v2

Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback

Current representations used in reasoning steps of large language models can mostly be categorized into two main types: (1) natural language, which is difficult to verify; and (2) non-natural language, usually programming code, which is difficult for people who are unfamiliar with coding to read. In this paper, we propose to use a semi-structured form to represent reasoning steps of large language models. Specifically, we use relation tuples, which are not only human-readable but also machine-friendly and easier to verify than natural language. We implement a framework that includes three main components: (1) introducing relation tuples into the reasoning steps of large language models; (2) implementing an automatic verification process of reasoning steps with a local code interpreter based on relation tuples; and (3) integrating a simple and effective dynamic feedback mechanism, which we found helpful for self-improvement of large language models. The experimental results on various arithmetic datasets demonstrate the effectiveness of our method in improving the arithmetic reasoning ability of large language models. The source code is available at https://github.com/gpgg/art.

Updated: 2024-06-25 18:21:00

标题: 通过关系元组、验证和动态反馈提高大型语言模型的算术推理能力

摘要: 当前在大型语言模型的推理步骤中使用的表示主要可分为两种类型：（1）自然语言，难以验证；和（2）非自然语言，通常是编程代码，对于不熟悉编码的人来说很难阅读。在本文中，我们提出使用半结构化形式来表示大型语言模型的推理步骤。具体而言，我们使用关系元组，这不仅易于人类阅读，而且对机器友好，比自然语言更容易验证。我们实现了一个包括三个主要组件的框架：（1）将关系元组引入大型语言模型的推理步骤；（2）基于关系元组实现推理步骤的自动验证过程，使用本地代码解释器；和（3）整合一个简单有效的动态反馈机制，我们发现有助于大型语言模型的自我改进。在各种算术数据集上的实验结果表明，我们的方法在提高大型语言模型的算术推理能力方面是有效的。源代码可在https://github.com/gpgg/art获得。

更新时间: 2024-06-25 18:21:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17873v1

Deep Learning for Multi-Label Learning: A Comprehensive Survey

Multi-label learning is a rapidly growing research area that aims to predict multiple labels from a single input data point. In the era of big data, tasks involving multi-label classification (MLC) or ranking present significant and intricate challenges, capturing considerable attention in diverse domains. Inherent difficulties in MLC include dealing with high-dimensional data, addressing label correlations, and handling partial labels, for which conventional methods prove ineffective. Recent years have witnessed a notable increase in adopting deep learning (DL) techniques to address these challenges more effectively in MLC. Notably, there is a burgeoning effort to harness the robust learning capabilities of DL for improved modelling of label dependencies and other challenges in MLC. However, it is noteworthy that comprehensive studies specifically dedicated to DL for multi-label learning are limited. Thus, this survey aims to thoroughly review recent progress in DL for multi-label learning, along with a summary of open research problems in MLC. The review consolidates existing research efforts in DL for MLC,including deep neural networks, transformers, autoencoders, and convolutional and recurrent architectures. Finally, the study presents a comparative analysis of the existing methods to provide insightful observations and stimulate future research directions in this domain.

Updated: 2024-06-25 18:20:40

标题: 深度学习在多标签学习中的应用：综合调查

摘要: 多标签学习是一个迅速发展的研究领域，旨在从单个输入数据点中预测多个标签。在大数据时代，涉及多标签分类（MLC）或排名的任务面临重大而复杂的挑战，在不同领域引起了相当大的关注。MLC中固有的困难包括处理高维数据、解决标签相关性和处理部分标签，传统方法对此效果不佳。近年来，越来越多地采用深度学习（DL）技术更有效地应对MLC中的这些挑战。值得注意的是，人们正在努力利用DL的强大学习能力，以改善对标签依赖性等MLC中的其他挑战的建模。然而，值得注意的是，专门致力于DL用于多标签学习的全面研究有限。因此，本调查旨在全面审查最近在DL用于多标签学习方面取得的进展，以及对MLC中的开放研究问题的总结。该调查整合了DL用于MLC的现有研究工作，包括深度神经网络、变压器、自编码器以及卷积和循环架构。最后，该研究提供了对现有方法的比较分析，以提供深刻的观察和激励未来在该领域的研究方向。

更新时间: 2024-06-25 18:20:40

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2401.16549v3

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into "pools" of covariate combinations where the outcome differs across the pools (but not within a pool). Existing approaches (i) search for a single "optimal" partition under assumptions about the association between covariates or (ii) sample from the entire set of possible partitions. Both these approaches ignore the reality that, especially with correlation structure in covariates, many ways to partition the covariate space may be statistically indistinguishable, despite very different implications for policy or science. We develop an alternative perspective, called Rashomon Partition Sets (RPSs). Each item in the RPS partitions the space of covariates using a tree-like geometry. RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations, and do so using a prior that makes no assumptions about associations between covariates. This prior is the $\ell_0$ prior, which we show is minimax optimal. Given the RPS we calculate the posterior of any measurable function of the feature effects vector on outcomes, conditional on being in the RPS. We also characterize approximation error relative to the entire posterior and provide bounds on the size of the RPS. Simulations demonstrate this framework allows for robust conclusions relative to conventional regularization techniques. We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance.

Updated: 2024-06-25 18:17:43

标题: 使用拉肖门分区方法稳健地估计因子数据中的异质性

摘要: 在观察数据和随机对照试验中，许多统计分析都会问：感兴趣的结果如何随可观察协变量的组合变化而变化？各种药物组合如何影响健康结果，或者技术采纳如何取决于激励和人口统计学？我们的目标是将这个因子空间划分为“池”，其中结果在池之间有所不同（但在池内相同）。现有方法（i）在关于协变量之间关联的假设下寻找单一“最优”划分或（ii）从可能的划分全集中抽样。这两种方法都忽略了这样一个现实，即尤其是在协变量的相关结构中，许多划分协变量空间的方式在统计上可能是无法区分的，尽管对政策或科学有非常不同的影响。我们开发了一种另类视角，称为Rashomon划分集（RPSs）。RPSs中的每个项目使用类似树状的几何形状来划分协变量空间。RPSs包含所有具有接近最大后验划分的后验值的划分，即使它们提供实质上不同的解释，并且使用一种不对协变量之间关联做出假设的先验。这种先验是$\ell_0$先验，我们展示其是最小最大优化的。给定RPS，我们计算在RPS中的情况下，功能效应向量对结果的任何可测函数的后验。我们还表征了相对于整个后验的近似误差，并提供了关于RPS大小的边界。模拟表明，这种框架允许相对于传统的正则化技术得出健壮的结论。我们将我们的方法应用于三个实证设置：慈善捐赠的价格影响，染色体结构（端粒长度）和微型金融的引入。

更新时间: 2024-06-25 18:17:43

领域: stat.ME,cs.LG,econ.EM,stat.CO,stat.ML

下载: http://arxiv.org/abs/2404.02141v2

Asynchronous Authentication

A myriad of authentication mechanisms embody a continuous evolution from verbal passwords in ancient times to contemporary multi-factor authentication. Nevertheless, digital asset heists and numerous identity theft cases illustrate the urgent need to revisit the fundamentals of user authentication. We abstract away credential details and formalize the general, common case of asynchronous authentication, with unbounded message propagation time. Our model, which might be of independent interest, allows for eventual message delivery, while bounding execution time to maintain cryptographic guarantees. Given credentials' fault probabilities (e.g., loss or leak), we seek mechanisms with the highest success probability. We show that every mechanism is dominated by some Boolean mechanism -- defined by a monotonic Boolean function on presented credentials. We present an algorithm for finding approximately optimal mechanisms. Previous work analyzed Boolean mechanisms specifically, but used brute force, which quickly becomes prohibitively complex. We leverage the problem structure to reduce complexity by orders of magnitude. The algorithm is readily applicable to practical settings. For example, we revisit the common approach in cryptocurrency wallets that use a handful of high-quality credentials. We show that adding low-quality credentials improves security by orders of magnitude.

Updated: 2024-06-25 18:14:44

标题: 异步身份验证

摘要: 认证机制的多样性体现了从古代口头密码到当代多因素认证的持续演变。然而，数字资产盗窃和众多身份盗窃案例表明有必要重新审视用户认证的基本原则。我们将凭据细节抽象化，并形式化异步认证的一般常见情况，其中消息传播时间不受限制。我们的模型可能具有独立的兴趣，允许最终消息传递，同时将执行时间限制在维护加密保证的范围内。鉴于凭据的故障概率（例如丢失或泄露），我们寻求具有最高成功概率的机制。我们展示了每种机制都被一些布尔机制所支配--由呈现的凭据上的单调布尔函数定义。我们提出了一种寻找近似最优机制的算法。先前的研究专门分析了布尔机制，但使用蛮力很快变得复杂难以承受。我们利用问题结构将复杂度减少了几个数量级。该算法可轻松应用于实际设置。例如，我们重新审视使用少量高质量凭据的加密货币钱包的常见方法。我们表明，添加低质量凭据可以将安全性提高几个数量级。

更新时间: 2024-06-25 18:14:44

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2312.13967v2

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. The taxonomy establishes connections between various descriptions and approaches to risk, highlighting the overlaps and discrepancies between public and private sector conceptions of risk. By providing this unified framework, we aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.

Updated: 2024-06-25 18:13:05

标题: AI风险分类解析（AIR 2024）：从政府法规到企业政策

摘要: 我们提出了一个综合的人工智能风险分类法，源自于欧盟、美国和中国的八项政府政策以及全球16家公司的政策，这是朝着建立生成式人工智能安全评估统一语言迈出的重要一步。我们确定了314个独特的风险类别，组织成一个四层级的分类法。在最高级别，这个分类法涵盖了系统和运营风险、内容安全风险、社会风险以及法律和权利风险。该分类法建立了风险描述和方法之间的联系，突出了公共和私营部门对风险概念之间的重叠和差异。通过提供这一统一框架，我们旨在通过跨部门信息共享和在生成式人工智能模型和系统风险缓解的最佳实践促进AI安全的进步。

更新时间: 2024-06-25 18:13:05

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.17864v1

Expected Grad-CAM: Towards gradient faithfulness

Although input-gradients techniques have evolved to mitigate and tackle the challenges associated with gradients, modern gradient-weighted CAM approaches still rely on vanilla gradients, which are inherently susceptible to the saturation phenomena. Despite recent enhancements have incorporated counterfactual gradient strategies as a mitigating measure, these local explanation techniques still exhibit a lack of sensitivity to their baseline parameter. Our work proposes a gradient-weighted CAM augmentation that tackles both the saturation and sensitivity problem by reshaping the gradient computation, incorporating two well-established and provably approaches: Expected Gradients and kernel smoothing. By revisiting the original formulation as the smoothed expectation of the perturbed integrated gradients, one can concurrently construct more faithful, localized and robust explanations which minimize infidelity. Through fine modulation of the perturbation distribution it is possible to regulate the complexity characteristic of the explanation, selectively discriminating stable features. Our technique, Expected Grad-CAM, differently from recent works, exclusively optimizes the gradient computation, purposefully designed as an enhanced substitute of the foundational Grad-CAM algorithm and any method built therefrom. Quantitative and qualitative evaluations have been conducted to assess the effectiveness of our method.

Updated: 2024-06-25 18:10:15

标题: 预期的Grad-CAM：朝着梯度忠实性前进

摘要: 尽管输入梯度技术已经发展出来以减轻和应对梯度相关的挑战，但现代梯度加权CAM方法仍然依赖于普通梯度，这种方法本质上容易受到饱和现象的影响。尽管最近的增强措施已经纳入反事实梯度策略作为缓解措施，但这些局部解释技术仍然缺乏对基线参数的敏感性。我们的工作提出了一种梯度加权CAM增强方法，通过重新塑造梯度计算，结合两种成熟且可证明的方法：期望梯度和核平滑。通过重新审视原始公式，将扰动积分梯度的平滑期望，可以同时构建更加忠实、局部化和稳健的解释，从而最大程度地减少不忠实性。通过对扰动分布进行精细调节，可以调节解释的复杂特征，有选择地区分稳定特征。我们的技术Expected Grad-CAM与最近的工作不同，它专门优化梯度计算，旨在成为Grad-CAM算法及其衍生方法的增强替代品。我们已经进行了定量和定性评估，以评估我们的方法的有效性。

更新时间: 2024-06-25 18:10:15

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.01274v2

What type of inference is planning?

Multiple types of inference are available for probabilistic graphical models, e.g., marginal, maximum-a-posteriori, and even marginal maximum-a-posteriori. Which one do researchers mean when they talk about "planning as inference"? There is no consistency in the literature, different types are used, and their ability to do planning is further entangled with specific approximations or additional constraints. In this work we use the variational framework to show that all commonly used types of inference correspond to different weightings of the entropy terms in the variational problem, and that planning corresponds _exactly_ to a _different_ set of weights. This means that all the tricks of variational inference are readily applicable to planning. We develop an analogue of loopy belief propagation that allows us to perform approximate planning in factored state Markov decisions processes without incurring intractability due to the exponentially large state space. The variational perspective shows that the previous types of inference for planning are only adequate in environments with low stochasticity, and allows us to characterize each type by its own merits, disentangling the type of inference from the additional approximations that its practical use requires. We validate these results empirically on synthetic MDPs and tasks posed in the International Planning Competition.

Updated: 2024-06-25 18:05:31

标题: 规划是哪种类型的推理？

摘要: 概率图模型有多种类型的推理方法，例如边缘推理、最大后验概率推理，甚至是边缘最大后验概率推理。当研究人员谈论“规划作为推理”时，他们指的是哪种推理方法？文献中没有一致性，使用了不同类型的推理方法，它们的规划能力进一步受到特定近似或额外约束的影响。在这项研究中，我们使用变分框架展示，所有常用的推理方法对应于变分问题中熵项的不同加权，并且规划恰好对应于一组不同的权重。这意味着所有变分推理的技巧都可以轻松应用于规划中。我们开发了一种类似于循环信念传播的方法，允许我们在因子状态马尔可夫决策过程中进行近似规划，而不会因为指数级的状态空间而变得难以处理。变分视角显示，先前用于规划的推理方法仅适用于低随机性环境，并允许我们通过其各自的优点来表征每种类型，将推理类型与其实际应用所需的额外近似分离。我们在合成MDP和国际规划竞赛中提出的任务上经验验证了这些结果。

更新时间: 2024-06-25 18:05:31

领域: cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.17863v1

Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding

Large Language Models (LLMs) have demonstrated a powerful ability for text generation. However, achieving optimal results with a given prompt or instruction can be challenging, especially for billion-sized models. Additionally, undesired behaviors such as toxicity or hallucinations can manifest. While much larger models (e.g., ChatGPT) may demonstrate strength in mitigating these issues, there is still no guarantee of complete prevention. In this work, we propose formalizing text generation as a future-constrained generation problem to minimize undesirable behaviors and enforce faithfulness to instructions. The estimation of future constraint satisfaction, accomplished using LLMs, guides the text generation process. Our extensive experiments demonstrate the effectiveness of the proposed approach across three distinct text generation tasks: keyword-constrained generation (Lin et al., 2020), toxicity reduction (Gehman et al., 2020), and factual correctness in question-answering (Gao et al., 2023).

Updated: 2024-06-25 18:01:24

标题: 解锁预测性文本生成：大语言模型解码的受限方法

摘要: 大型语言模型（LLMs）已经展示了强大的文本生成能力。然而，尤其对于亿级大小的模型，要实现给定提示或指令的最佳结果可能具有挑战性。此外，不良行为，如毒性或幻觉，也可能表现出来。虽然更大的模型（例如ChatGPT）可能表现出在减轻这些问题方面的实力，但仍然不能完全保证预防。在这项工作中，我们提出将文本生成形式化为未来约束生成问题，以最小化不良行为并强制执行对指令的忠实。使用LLMs完成未来约束满足的估计指导文本生成过程。我们的广泛实验表明所提出的方法在三个不同的文本生成任务上的有效性：关键词约束生成（Lin等，2020）、毒性减少（Gehman等，2020）和问答中的事实正确性（Gao等，2023）。

更新时间: 2024-06-25 18:01:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.06149v3

Probing many-body Bell correlation depth with superconducting qubits

Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications.

Updated: 2024-06-25 18:00:00

标题: 用超导量子比特探测多体贝尔相关深度

摘要: 量子非局域性描述了一种比纠缠更强的量子相关性形式。它驳斥了爱因斯坦对局域实在性的信仰，是量子力学中最独特和神秘的特征之一。在各种实际应用中，从密码学和认证随机数生成到自测试和机器学习，量子非局域性是实现量子优势的关键资源。然而，在量子多体系统中检测非局域性尤其具有挑战性。本文报道了利用完全可编程超导量子处理器在多达24个量子比特中证实了真正的多体Bell相关性的实验。特别地，我们将能量作为Bell相关性的证据，并通过变分的方式跨越一系列阈值逐渐降低多体系统的能量，在实验数据中可以证实越来越多的Bell相关性深度。作为说明性示例，我们通过变分方式准备了一个73个量子比特的二维蜂窝模型的低能态，并通过测量超过对应的经典上限达到了48个标准偏差的能量来证实其Bell相关性。此外，我们通过奇偶振荡和多量子相干技术高效测量能量，变分准备一系列低能态，并证实它们的真正多体Bell相关性，最多可达24个量子比特。我们的结果建立了一个可行的方法，用于准备和证实多体Bell相关性，这不仅为量子设备提供了超越纠缠的更精细的基准，还为在各种实际应用中利用多体Bell相关性提供了有价值的指导。

更新时间: 2024-06-25 18:00:00

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2406.17841v1

EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data

Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces. While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks. Instead, RL agents that can act over useful, temporally extended skills rather than low-level actions can learn new tasks more easily. Prior work in skill-based RL either requires expert supervision to define useful skills, which is hard to scale, or learns a skill-space from offline data with heuristics that limit the adaptability of the skills, making them difficult to transfer during downstream RL. Our approach, EXTRACT, instead utilizes pre-trained vision language models to extract a discrete set of semantically meaningful skills from offline data, each of which is parameterized by continuous arguments, without human supervision. This skill parameterization allows robots to learn new tasks by only needing to learn when to select a specific skill and how to modify its arguments for the specific task. We demonstrate through experiments in sparse-reward, image-based, robot manipulation environments that EXTRACT can more quickly learn new tasks than prior works, with major gains in sample efficiency and performance over prior skill-based RL. Website at https://www.jessezhang.net/projects/extract/.

Updated: 2024-06-25 17:50:03

标题: 提取：通过从离线数据中提取可转移机器人技能来实现高效的策略学习.

摘要: 大多数强化学习（RL）方法侧重于在低级动作空间上学习最优策略。虽然这些方法在训练环境中表现良好，但缺乏灵活性，无法转移到新任务上。相反，能够利用有用的、时间延长的技能而不是低级动作来行动的RL代理可以更容易地学习新任务。以前的技能驱动RL要么需要专家监督来定义有用的技能，这很难扩展，要么通过启发式从离线数据中学习技能空间，这些启发限制了技能的适应性，在下游的RL过程中很难转移。我们的方法EXTRACT利用预训练的视觉语言模型从离线数据中提取一组离散的语义有意义的技能，每个技能由连续参数化，无需人类监督。这种技能参数化使机器人只需要学习何时选择特定技能以及如何修改其参数以适应特定任务，就能学习新任务。我们通过在稀疏奖励、基于图像的机器人操作环境中的实验表明，EXTRACT比以前的工作更快地学习新任务，在样本效率和性能方面获得了重大提升。网站 https://www.jessezhang.net/projects/extract/。

更新时间: 2024-06-25 17:50:03

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17768v1

Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective

Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data while not forgetting past learned classes. The common evaluation protocol for CIL algorithms is to measure the average test accuracy across all classes learned so far -- however, we argue that solely focusing on maximizing the test accuracy may not necessarily lead to developing a CIL algorithm that also continually learns and updates the representations, which may be transferred to the downstream tasks. To that end, we experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning and propose new analysis methods. Our experiments show that most state-of-the-art algorithms prioritize high stability and do not significantly change the learned representation, and sometimes even learn a representation of lower quality than a naive baseline. However, we observe that these algorithms can still achieve high test accuracy because they enable a model to learn a classifier that closely resembles an estimated linear classifier trained for linear probing. Furthermore, the base model learned in the first task, which involves single-task learning, exhibits varying levels of representation quality across different algorithms, and this variance impacts the final performance of CIL algorithms. Therefore, we suggest that the representation-level evaluation should be considered as an additional recipe for more diverse evaluation for CIL algorithms.

Updated: 2024-06-25 17:49:35

标题: 朝向多样化评估类增量学习：一个表示学习的视角

摘要: Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data while not forgetting past learned classes. The common evaluation protocol for CIL algorithms is to measure the average test accuracy across all classes learned so far -- however, we argue that solely focusing on maximizing the test accuracy may not necessarily lead to developing a CIL algorithm that also continually learns and updates the representations, which may be transferred to the downstream tasks. To that end, we experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning and propose new analysis methods. Our experiments show that most state-of-the-art algorithms prioritize high stability and do not significantly change the learned representation, and sometimes even learn a representation of lower quality than a naive baseline. However, we observe that these algorithms can still achieve high test accuracy because they enable a model to learn a classifier that closely resembles an estimated linear classifier trained for linear probing. Furthermore, the base model learned in the first task, which involves single-task learning, exhibits varying levels of representation quality across different algorithms, and this variance impacts the final performance of CIL algorithms. Therefore, we suggest that the representation-level evaluation should be considered as an additional recipe for more diverse evaluation for CIL algorithms.

更新时间: 2024-06-25 17:49:35

领域: cs.LG

下载: http://arxiv.org/abs/2206.08101v3

BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning

Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learning (ICL), have shown great promise and allow LLMs to be treated as black boxes. In the past, KE was primarily employed in English contexts, whereas the potential for cross-lingual KE in current English-centric LLMs has not been fully explored. To foster more research in this direction, we introduce the BMIKE-53 benchmark for evaluating cross-lingual KE on 53 diverse languages across three KE task types. We also propose a gradient-free KE method called Multilingual In-context Knowledge Editing (MIKE) and evaluate it on BMIKE-53. Our evaluation focuses on cross-lingual knowledge transfer in terms of reliability, generality, locality, and portability, offering valuable insights and a framework for future research in cross-lingual KE. Our code and data are publicly accessible via the anonymous repository at https://anonymous.4open.science/r/MIKE.

Updated: 2024-06-25 17:48:56

标题: BMIKE-53：利用上下文学习调查跨语言知识编辑

摘要: 大型语言模型（LLMs）具有广泛的参数知识，但这种知识很难通过新信息进行更新，因为重新训练非常昂贵，对于闭源模型来说是不可行的。知识编辑（KE）已经成为一种可行的解决方案，可以更新LLMs的知识而不影响其整体性能。受上下文学习（ICL）启发，即时KE方法表现出很大的潜力，可以让LLMs被视为黑匣子。过去，KE主要用于英语环境，而当前以英语为中心的LLMs中跨语言KE的潜力尚未得到充分探索。为了促进更多朝这个方向的研究，我们介绍了用于评估53种不同语言上的跨语言KE的BMIKE-53基准，并提出了一种无梯度KE方法，称为多语言上下文知识编辑（MIKE），并在BMIKE-53上进行评估。我们的评估重点放在跨语言知识转移方面的可靠性、普遍性、局部性和可移植性，为未来跨语言KE研究提供了宝贵的见解和框架。我们的代码和数据可以通过匿名存储库https://anonymous.4open.science/r/MIKE 公开访问。

更新时间: 2024-06-25 17:48:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17764v1

DiffusionPDE: Generative PDE-Solving Under Partial Observation

We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions.

Updated: 2024-06-25 17:48:24

标题: DiffusionPDE：在部分观测下生成PDE求解

摘要: 我们引入了一个通用框架，使用生成扩散模型解决偏微分方程（PDEs）。特别是，我们关注的是在不具备应用经典求解器所需的完整场景知识的情况下的场景。大多数现有的正向或反向PDE方法在数据观测或基础系数不完整时表现不佳，这是真实世界测量的常见假设。在这项工作中，我们提出了DiffusionPDE，它可以同时填补缺失信息并通过建模解和系数空间的联合分布来解决PDE问题。我们展示了学习的生成先验导致了一个多功能框架，能够准确解决各种PDEs在部分观测下，显著优于当前前向和反向方向的最先进方法。

更新时间: 2024-06-25 17:48:24

领域: cs.LG,cs.AI,cs.CV,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.17763v1

Solving Hard Mizar Problems with Instantiation and Strategy Invention

In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.

Updated: 2024-06-25 17:47:13

标题: 用实例化和策略发明解决难题米查尔问题

摘要: 在这项工作中，我们使用几种ATP和AI方法证明了3000多个以前未被证明的Mizar/MPTP问题，将ATP解决的Mizar问题数量从75％提高到80％以上。首先，我们开始尝试使用cvc5 SMT求解器，该求解器使用了几种基于实例化的启发式方法，与先前应用于Mizar的基于超越的系统不同，并添加了许多新解决方案。然后，我们使用自动化策略发明来开发cvc5策略，大大改进了cvc5在困难问题上的性能。特别是，最佳发明的策略解决了比先前可用的最佳cvc5策略多14％的问题。我们还展示了不同的子句化方法对这种基于实例化的方法有很大影响，再次产生许多新解决方案。总体而言，这些方法解决了14163个以前未解决的困难Mizar问题中的3021个（21.3％）。这是Mizar大型理论基准的一个新里程碑，也是对Mizar的锤子方法的大幅增强。

更新时间: 2024-06-25 17:47:13

领域: cs.AI,cs.LG,cs.LO,cs.SC

下载: http://arxiv.org/abs/2406.17762v1

Human-Object Interaction from Human-Level Instructions

Intelligent agents need to autonomously navigate and interact within contextual environments to perform a wide range of daily tasks based on human-level instructions. These agents require a foundational understanding of the world, incorporating common sense and knowledge, to interpret such instructions. Moreover, they must possess precise low-level skills for movement and interaction to execute the detailed task plans derived from these instructions. In this work, we address the task of synthesizing continuous human-object interactions for manipulating large objects within contextual environments, guided by human-level instructions. Our goal is to generate synchronized object motion, full-body human motion, and detailed finger motion, all essential for realistic interactions. Our framework consists of a large language model (LLM) planning module and a low-level motion generator. We use LLMs to deduce spatial object relationships and devise a method for accurately determining their positions and orientations in target scene layouts. Additionally, the LLM planner outlines a detailed task plan specifying a sequence of sub-tasks. This task plan, along with the target object poses, serves as input for our low-level motion generator, which seamlessly alternates between navigation and interaction modules. We present the first complete system that can synthesize object motion, full-body motion, and finger motion simultaneously from human-level instructions. Our experiments demonstrate the effectiveness of our high-level planner in generating plausible target layouts and our low-level motion generator in synthesizing realistic interactions for diverse objects. Please refer to our project page for more results: https://hoifhli.github.io/.

Updated: 2024-06-25 17:46:28

标题: 从人类级别指令中的人物-对象交互

摘要: 智能代理需要在环境中自主导航和互动，执行各种基于人类级别指令的日常任务。这些代理需要对世界有基础的理解，融合常识和知识，以解释这些指令。此外，它们必须具有精确的低级技能，用于移动和互动，以执行从这些指令派生的详细任务计划。在这项工作中，我们致力于合成在环境中操纵大型物体的连续人-物互动任务，这些任务受人类级别指令的指导。我们的目标是生成同步的物体运动、全身人体运动和详细的手指运动，这些对于真实互动至关重要。我们的框架包括一个大型语言模型（LLM）规划模块和一个低级运动生成器。我们使用LLMs推断空间物体关系，并设计一种准确确定它们在目标场景布局中位置和方向的方法。此外，LLM规划器概述了一个详细的任务计划，指定了一系列子任务。这个任务计划，连同目标物体的姿势，作为我们的低级运动生成器的输入，它无缝地在导航和互动模块之间交替。我们提出了第一个能够同时从人类级别指令中合成物体运动、全身运动和手指运动的完整系统。我们的实验证明了我们高级规划器在生成可信目标布局方面的有效性，以及我们的低级运动生成器在为不同物体合成真实互动方面的有效性。请参考我们的项目页面获取更多结果：https://hoifhli.github.io/。

更新时间: 2024-06-25 17:46:28

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.17840v1

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Large language models (LLMs) are commonly used for long-form question answering, which requires them to generate paragraph-length answers to complex questions. While long-form QA has been well-studied in English via many different datasets and evaluation metrics, this research has not been extended to cover most other languages. To bridge this gap, we introduce CaLMQA, a collection of 2.6K complex questions spanning 23 languages, including under-resourced, rarely-studied languages such as Fijian and Kirundi. Our dataset includes both naturally-occurring questions collected from community web forums as well as questions written by native speakers, whom we hire for this purpose. Our process yields diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers. We conduct automatic evaluation across a suite of open- and closed-source models using our novel metric CaLMScore, which detects incorrect language and token repetitions in answers, and observe that the quality of LLM-generated answers degrades significantly for some low-resource languages. We perform human evaluation on a subset of models and see that model performance is significantly worse for culturally specific questions than for culturally agnostic questions. Our findings highlight the need for further research in LLM multilingual capabilities and non-English LFQA evaluation.

Updated: 2024-06-25 17:45:26

标题: CaLMQA：探索跨23种语言的具有文化特色的长篇问题回答

摘要: 大型语言模型（LLMs）通常用于长篇问题回答，这要求它们生成复杂问题的段落长度答案。虽然长篇问题回答在英语中已经通过许多不同数据集和评估指标进行了深入研究，但这项研究并未扩展到覆盖大多数其他语言。为了弥补这一差距，我们引入了CaLMQA，这是一个涵盖23种语言的2600个复杂问题集合，包括一些资源匮乏、很少被研究的语言，如斐济语和基隆迪语。我们的数据集包括从社区网络论坛收集的自然出现的问题，以及由我们聘请的母语人士为此编写的问题。我们的过程产生了反映文化主题（如传统、法律、新闻）和母语使用的多样化、复杂问题。我们使用我们的新指标CaLMScore在一系列开源和闭源模型上进行自动评估，该指标检测答案中的错误语言和标记重复，并观察到对于一些资源匮乏的语言，LLM生成的答案质量明显下降。我们对一部分模型进行人工评估，发现对于特定文化问题，模型性能显著较差，而对于文化无关问题则较好。我们的研究结果突显了需要进一步研究LLM多语言能力和非英语长篇问题回答评估的重要性。

更新时间: 2024-06-25 17:45:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17761v1

Adam-mini: Use Fewer Learning Rates To Gain More

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the number of learning rates in Adam: Instead of assigning an individual learning rate for each parameter using $1/\sqrt{v}$, Adam-mini uses the average of $v$ within a pre-defined parameter block as the learning rate for that block. Such a design is inspired by two empirical findings. First, the Hessian of Transformers exhibits a near-block diagonal structure with different sizes of dense sub-blocks. Second, for each of these dense sub-blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. Adam-mini provides one cost-effective way to find these good learning rates and manage to cut down $\geq$ 90% $v$ in Adam. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on 2x A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Updated: 2024-06-25 17:45:06

标题: Adam-mini：使用更少的学习率获得更多收益

摘要: 我们提出了Adam-mini，这是一种优化器，其性能与AdamW相当或更好，但内存占用量减少了45%至50%。Adam-mini通过减少Adam中学习率的数量来减少内存：与使用$1/\sqrt{v}$为每个参数分配单独学习率不同，Adam-mini使用预定义参数块内$v$的平均值作为该块的学习率。这种设计受到两个经验发现的启发。首先，Transformers的Hessian矩阵表现出近似块对角结构，具有不同尺寸的稠密子块。其次，对于这些稠密子块中的每一个，存在一个高质量的学习率，可以在资源充足的情况下胜过Adam。Adam-mini提供了一种经济有效的方法来找到这些良好的学习率，并设法削减Adam中的$\geq$ 90% $v$。经验证，Adam-mini在各种语言模型（从125M到7B）的预训练、监督微调和RLHF方面的表现与AdamW相当或更好。Adam-mini的减少内存占用量还减轻了GPU和CPU之间的通信开销，从而提高了吞吐量。例如，当在2x A800-80GB GPU上对Llama2-7B进行预训练时，Adam-mini的吞吐量比AdamW高出49.6%，节省了33%的预训练时间。

更新时间: 2024-06-25 17:45:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16793v2

Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguistic information that can improve accuracy in a variety of tasks. In this paper, we present a decoder-only Discrete Multimodal Language Model (DMLM), which can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision). We explore several critical aspects of discrete multi-modal models, including the loss function, weight initialization, mixed training supervision, and codebook. Our results show that DMLM benefits significantly, across multiple tasks and datasets, from a combination of supervised and unsupervised training. Moreover, for ASR, it benefits from initializing DMLM from a pretrained LLM, and from a codebook derived from Whisper activations.

Updated: 2024-06-25 17:44:00

标题: 使用预训练的大型语言模型的离散多模态变压器用于混合监督语音处理

摘要: 最近关于离散语音分词的研究为模型打开了一扇门，可以在多种模态下无缝执行多项任务，例如语音识别、文本转语音、语音到语音的翻译。此外，从大量文本语料库中预训练的大型语言模型(LLMs)包含丰富的语言信息，可以提高各种任务的准确性。在本文中，我们提出了一个仅解码器的离散多模态语言模型(DMLM)，可以灵活应用于多种任务(ASR、T2S、S2TT等)和模态(文本、语音、视觉)。我们探讨了离散多模态模型的几个关键方面，包括损失函数、权重初始化、混合训练监督和码书。我们的结果表明，DMLM通过结合监督和无监督训练，在多个任务和数据集上显著受益。此外，对于ASR，从预训练的LLM初始化DMLM以及从Whisper激活派生的码书对其有益。

更新时间: 2024-06-25 17:44:00

领域: cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.06582v2

Interpreting Attention Layer Outputs with Sparse Autoencoders

Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability. Sparse autoencoders (SAEs) are a popular method for decomposing the internal activations of trained transformers into sparse, interpretable features, and have been applied to MLP layers and the residual stream. In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition. We demonstrate this on transformers from several model families and up to 2B parameters. We perform a qualitative study of the features computed by attention layers, and find multiple families: long-range context, short-range context and induction features. We qualitatively study the role of every head in GPT-2 Small, and estimate that at least 90% of the heads are polysemantic, i.e. have multiple unrelated roles. Further, we show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work. For example, we explore the mystery of why models have so many seemingly redundant induction heads, use SAEs to motivate the hypothesis that some are long-prefix whereas others are short-prefix, and confirm this with more rigorous analysis. We use our SAEs to analyze the computation performed by the Indirect Object Identification circuit (Wang et al.), validating that the SAEs find causally meaningful intermediate variables, and deepening our understanding of the semantics of the circuit. We open-source the trained SAEs and a tool for exploring arbitrary prompts through the lens of Attention Output SAEs.

Updated: 2024-06-25 17:43:13

标题: 用稀疏自编码器解释注意力层输出

摘要: 将模型激活分解为可解释的组件是机制解释中的一个关键问题。稀疏自动编码器(SAEs)是将训练好的变换器内部激活分解为稀疏、可解释特征的一种流行方法，并已应用于MLP层和残差流。在这项工作中，我们在注意力层输出上训练SAEs，并展示SAEs在这里也找到了稀疏、可解释的分解。我们在几个模型家族的变压器上展示了这一点，参数量高达20亿。我们对注意力层计算的特征进行了定性研究，并发现了多个类别：长距离上下文、短距离上下文和归纳特征。我们定性研究了GPT-2 Small中每个头的作用，并估计至少90%的头是多义的，即具有多个不相关的作用。此外，我们展示了稀疏自动编码器是一种有用的工具，可以使研究人员比以往更详细地解释模型行为。例如，我们探讨了为什么模型具有似乎冗余的归纳头的谜团，使用SAEs来推动一些是长前缀而其他一些是短前缀的假设，并通过更严格的分析进行确认。我们使用我们的SAEs来分析间接对象识别电路(Wang等人)执行的计算，验证SAEs找到因果相关的中间变量，并加深我们对电路语义的理解。我们开源了训练过的SAEs和一个工具，用于通过注意力输出SAEs的镜头探索任意提示。

更新时间: 2024-06-25 17:43:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.17759v1

Regularization and Optimal Multiclass Learning

The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.

Updated: 2024-06-25 17:42:18

标题: 正则化和最优多类学习

摘要: 经验风险最小化（ERM）的典型学习算法在各种设置中已知会失败，其中统一收敛并不能表征学习。因此，在机器学习实践中，充斥着更丰富的算法技术，成功地控制模型容量。然而，在这些更一般的设置中，没有这样的技术或原则能够脱颖而出，来描述最佳学习。本文的目的是描述正则化在ERM失败的可能是最简单的设置中的作用：具有任意标签集合的多类学习。利用单一包含图（OIGs），我们展示了与经过考验的算法原则相契合的最佳学习算法：奥卡姆剃刀（Occam's Razor）体现为结构风险最小化（SRM），最大熵原理和贝叶斯推理。尤其值得注意的是，我们引入了一种最佳学习器，它在两个维度上放松了结构风险最小化：允许正则化函数“局部”到数据点，并且使用一个无监督学习阶段在一开始学习这个正则化器。我们通过展示它们是必要的来证明这些放松：去除任一维度都不能产生一个接近最佳学习器。我们还从OIGs中提取出一种组合序列，我们称之为Hall复杂度，这是第一个准确描述问题的传导误差率的序列。最后，我们介绍了OIGs和传导学习设置到不可知情况的泛化，我们展示了使用Hamming图的最佳方向（根据节点的出度减去一个节点相关信用系统来判断）准确描述最佳学习器。我们展示了一个最大熵程序使用的最佳学习器，证明了Hall复杂度的不可知版本再次准确描述错误率。

更新时间: 2024-06-25 17:42:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.13692v2

Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

We are exposed to much information trying to influence us, such as teaser messages, debates, politically framed news, and propaganda - all of which use persuasive language. With the recent interest in Large Language Models (LLMs), we study the ability of LLMs to produce persuasive text. As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains to measure and benchmark to what degree LLMs produce persuasive text - both when explicitly instructed to rewrite text to be more or less persuasive and when only instructed to paraphrase. To this end, we construct a new dataset, Persuasive-Pairs, of pairs each consisting of a short text and of a text rewritten by an LLM to amplify or diminish persuasive language. We multi-annotate the pairs on a relative scale for persuasive language. This data is not only a valuable resource in itself, but we also show that it can be used to train a regression model to predict a score of persuasive language between text pairs. This model can score and benchmark new LLMs across domains, thereby facilitating the comparison of different LLMs. Finally, we discuss effects observed for different system prompts. Notably, we find that different 'personas' in the system prompt of LLaMA3 change the persuasive language in the text substantially, even when only instructed to paraphrase. These findings underscore the importance of investigating persuasive language in LLM generated text.

Updated: 2024-06-25 17:40:47

标题: 测量和基准大型语言模型生成有说服力语言的能力

摘要: 我们暴露于许多试图影响我们的信息中，例如引人入胜的信息、辩论、政治框架新闻和宣传 - 所有这些都使用有说服力的语言。随着对大型语言模型（LLMs）的近期关注，我们研究了LLMs生成有说服力文本的能力。与以往关注特定领域或类型的说服工作不同，我们进行了跨越各种领域的普遍研究，以衡量和基准测试LLMs生成有说服力文本的程度 - 无论是明确要求重写文本以更具说服力还是仅要求改写。为此，我们构建了一个新数据集，名为Persuasive-Pairs，其中每对包括一段简短文本和LLM重写的文本，以增强或减弱说服力语言。我们在相对刻度上对这些对进行多重注释，评估说服力语言。这些数据不仅本身是有价值的资源，而且我们还展示它可以用来训练回归模型，以预测文本对之间的说服力得分。该模型可以评分和基准测试不同领域的新LLMs，从而促进不同LLMs的比较。最后，我们讨论了对不同系统提示的观察效果。值得注意的是，我们发现LLaMA3系统提示中的不同“人物角色”显着改变了文本中的说服力语言，即使只是要求改写。这些发现强调了调查LLM生成文本中说服力语言的重要性。

更新时间: 2024-06-25 17:40:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17753v1

Enhancing Active Learning for Sentinel 2 Imagery through Contrastive Learning and Uncertainty Estimation

In this paper, we introduce a novel method designed to enhance label efficiency in satellite imagery analysis by integrating semi-supervised learning (SSL) with active learning strategies. Our approach utilizes contrastive learning together with uncertainty estimations via Monte Carlo Dropout (MC Dropout), with a particular focus on Sentinel-2 imagery analyzed using the Eurosat dataset. We explore the effectiveness of our method in scenarios featuring both balanced and unbalanced class distributions. Our results show that the proposed method performs better than several other popular methods in this field, enabling significant savings in labeling effort while maintaining high classification accuracy. These findings highlight the potential of our approach to facilitate scalable and cost-effective satellite image analysis, particularly advantageous for extensive environmental monitoring and land use classification tasks.

Updated: 2024-06-25 17:40:35

标题: 通过对比学习和不确定性估计增强Sentinel 2影像的主动学习

摘要: 在这篇论文中，我们介绍了一种旨在通过将半监督学习（SSL）与主动学习策略相结合来增强卫星图像分析中标签效率的新方法。我们的方法利用对比学习以及通过蒙特卡罗Dropout（MC Dropout）估计不确定性，特别关注使用Eurosat数据集分析的Sentinel-2图像。我们在涉及平衡和不平衡类分布的情景中探讨了我们方法的有效性。我们的结果表明，所提出的方法在这一领域比其他几种流行方法表现更好，能够在保持高分类准确性的同时显著节省标记工作量。这些发现突显了我们方法促进可伸缩和具有成本效益的卫星图像分析的潜力，特别适用于广泛的环境监测和土地利用分类任务。

更新时间: 2024-06-25 17:40:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.13285v2

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation

The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.

Updated: 2024-06-25 17:34:52

标题: 在NVIDIA Jetson Nano上为实时系统进行深度学习模型基准测试：一项实证调查

摘要: 复杂深度学习（DL）模型的增长已经彻底改变了各种应用，包括基于计算机视觉的解决方案，促使它们整合到实时系统中。然而，这些模型资源密集的特性对于低计算能力和低内存设备的部署，如嵌入式和边缘设备，提出了挑战。本文从实证角度探讨了对这些复杂DL模型的优化，以分析它们在嵌入式设备上的功能，特别是在NVIDIA Jetson Nano上。它评估了经过优化的模型在图像分类和视频动作检测方面的推断速度的有效性。实验结果显示，经过优化的模型平均表现出比未经优化的模型快16.11%。这不仅强调了在模型开发和部署中考虑硬件约束和环境可持续性的关键性需要，还凸显了模型优化在实现AI辅助技术在资源受限计算系统上广泛部署中的关键作用。它也证明了将硬件特定的模型优化作为优先考虑的做法可以带来高效和可扩展的解决方案，从而大幅降低能源消耗和碳足迹。

更新时间: 2024-06-25 17:34:52

领域: cs.AR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17749v1

A New Perspective on Shampoo's Preconditioner

Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connection between the $\textit{optimal}$ Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo's approximation. In particular, the $\textit{square}$ of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. Additionally, for the Hessian approximation viewpoint, we empirically study the impact of various practical tricks to make Shampoo more computationally efficient (such as using the batch gradient and the empirical Fisher) on the quality of Hessian approximation.

Updated: 2024-06-25 17:34:51

标题: 一种关于洗发水预处理剂的新视角

摘要: 香波（Shampoo）是一种二阶优化算法，它使用Kronecker乘积预处理器，最近引起了机器学习社区的日益关注。Shampoo使用的预处理器可以被视为Hessian的Gauss-Newton分量的近似，或者是Adagrad维护的梯度协方差矩阵的近似。我们提供了$\textit{optimal}$ Kronecker乘积近似与Shampoo所做的近似之间的显式和新颖的联系。我们的连接突显了关于Shampoo近似的一个微妙但常见的误解。特别地，Shampoo优化器使用的近似的$\textit{square}$等价于计算前述最佳Kronecker乘积近似的幂迭代算法的单步。通过各种数据集和架构的实证研究，我们证明这接近于最佳Kronecker乘积近似。此外，对于Hessian近似观点，我们实证研究了各种实用技巧对于使Shampoo更具计算效率（例如使用批量梯度和经验Fisher）对Hessian近似质量的影响。

更新时间: 2024-06-25 17:34:51

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.17748v1

Probing the effects of broken symmetries in machine learning

Symmetry is one of the most central concepts in physics, and it is no surprise that it has also been widely adopted as an inductive bias for machine-learning models applied to the physical sciences. This is especially true for models targeting the properties of matter at the atomic scale. Both established and state-of-the-art approaches, with almost no exceptions, are built to be exactly equivariant to translations, permutations, and rotations of the atoms. Incorporating symmetries -- rotations in particular -- constrains the model design space and implies more complicated architectures that are often also computationally demanding. There are indications that non-symmetric models can easily learn symmetries from data, and that doing so can even be beneficial for the accuracy of the model. We put a model that obeys rotational invariance only approximately to the test, in realistic scenarios involving simulations of gas-phase, liquid, and solid water. We focus specifically on physical observables that are likely to be affected -- directly or indirectly -- by symmetry breaking, finding negligible consequences when the model is used in an interpolative, bulk, regime. Even for extrapolative gas-phase predictions, the model remains very stable, even though symmetry artifacts are noticeable. We also discuss strategies that can be used to systematically reduce the magnitude of symmetry breaking when it occurs, and assess their impact on the convergence of observables.

Updated: 2024-06-25 17:34:09

标题: 探究机器学习中破坏对称性的影响

摘要: 对称性是物理学中最核心的概念之一，因此很容易理解为什么它也被广泛应用作为物理科学中机器学习模型的归纳偏差。这在针对原子尺度物质性质的模型中尤为明显。几乎所有已建立和最先进的方法都被构建为对原子的平移、排列和旋转具有完全等变性，几乎没有例外。将对称性（特别是旋转）纳入模型设计空间会限制模型的复杂性，并意味着通常也会具有计算需求较高的结构。有迹象表明，非对称模型可以轻松地从数据中学习对称性，这样做甚至可能有利于模型的准确性。我们将仅大致遵守旋转不变性的模型放置于涉及气相、液相和固相水模拟的现实场景中进行测试。我们特别关注可能受到对称性破坏直接或间接影响的物理可观察量，在插值、大规模区域使用模型时发现几乎没有任何影响。即使在外推性气相预测中，模型仍然非常稳定，尽管存在对称性瑕疵。我们还讨论了可以用来系统性减少对称性破坏程度的策略，并评估其对可观察量的收敛的影响。

更新时间: 2024-06-25 17:34:09

领域: physics.chem-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17747v1

Diverse Part Synthesis for 3D Shape Creation

Methods that use neural networks for synthesizing 3D shapes in the form of a part-based representation have been introduced over the last few years. These methods represent shapes as a graph or hierarchy of parts and enable a variety of applications such as shape sampling and reconstruction. However, current methods do not allow easily regenerating individual shape parts according to user preferences. In this paper, we investigate techniques that allow the user to generate multiple, diverse suggestions for individual parts. Specifically, we experiment with multimodal deep generative models that allow sampling diverse suggestions for shape parts and focus on models which have not been considered in previous work on shape synthesis. To provide a comparative study of these techniques, we introduce a method for synthesizing 3D shapes in a part-based representation and evaluate all the part suggestion techniques within this synthesis method. In our method, which is inspired by previous work, shapes are represented as a set of parts in the form of implicit functions which are then positioned in space to form the final shape. Synthesis in this representation is enabled by a neural network architecture based on an implicit decoder and a spatial transformer. We compare the various multimodal generative models by evaluating their performance in generating part suggestions. Our contribution is to show with qualitative and quantitative evaluations which of the new techniques for multimodal part generation perform the best and that a synthesis method based on the top-performing techniques allows the user to more finely control the parts that are generated in the 3D shapes while maintaining high shape fidelity when reconstructing shapes.

Updated: 2024-06-25 17:33:31

标题: 多样部分合成用于3D形状创建

摘要: 在过去几年中，已经引入了使用神经网络合成3D形状的方法，这些方法以部分表示的形式表示形状。这些方法将形状表示为部件的图形或层次结构，可以实现各种应用，如形状抽样和重建。然而，当前的方法不允许根据用户的偏好轻松地重新生成单个形状部件。在本文中，我们研究了允许用户为单个部件生成多个不同建议的技术。具体来说，我们尝试使用多模态深度生成模型，允许对形状部件进行多样化抽样，并专注于之前形状合成研究中未考虑的模型。为了对这些技术进行比较研究，我们引入了一种在部分表示中合成3D形状的方法，并评估了所有部分建议技术在该合成方法中的应用。在我们的方法中，形状被表示为一组以隐式函数形式的部分，然后在空间中定位以形成最终形状。在这种表示中的合成由基于隐式解码器和空间变换器的神经网络架构实现。我们通过评估它们在生成部分建议方面的表现来比较各种多模态生成模型。我们的贡献是通过定性和定量评估来展示哪些新的多模态部分生成技术表现最佳，并且基于表现最佳的技术的合成方法允许用户更精细地控制在重建形状时生成的3D形状的部分，同时保持高形状保真度。

更新时间: 2024-06-25 17:33:31

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.09384v2

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, reconstruction of inherently predictable sequences, and recollection of sequences that are neither. We demonstrate the usefulness of our taxonomy by using it to construct a predictive model for memorization. By analyzing dependencies and inspecting the weights of the predictive model, we find that different factors influence the likelihood of memorization differently depending on the taxonomic category.

Updated: 2024-06-25 17:32:16

标题: 背诵、重构、回忆：LMS中的记忆作为一个多方面现象

摘要: 在语言模型中，记忆通常被视为一种同质现象，忽视了记忆数据的具体情况。相反，我们将记忆建模为一组描述每个样本并将其与模型和语料库相关联的复杂因素的效果。为了理解这些因素，我们将记忆分解为一个分类法：高度重复序列的背诵，固有可预测序列的重建，以及既非重复也非可预测序列的回忆。我们通过使用这些分类法构建一个预测性模型来展示其实用性。通过分析依赖关系并检查预测模型的权重，我们发现不同的因素根据分类类别而异地影响记忆的可能性。

更新时间: 2024-06-25 17:32:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17746v1

Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item's sequential information from user behaviors, ignoring query's sequential signal and query-item correlation. In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users' search interests and tackle previous challenges. (i) EGIN utilizes query and item's correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN's graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications.

Updated: 2024-06-25 17:31:04

标题: 轻量级端到端图兴趣网络用于电子商务搜索中的CTR预测

摘要: 点击率（CTR）预测对于改善电子商务搜索中的用户体验和收入具有重要影响。随着深度学习的发展，基于图的方法被充分利用来利用从用户行为和其他信息中提取的图结构来帮助嵌入学习。然而，大多数先前的基于图的方法主要集中在推荐场景上，因此它们的图结构高度依赖于从用户行为中提取的项目的顺序信息，忽略了查询的顺序信号和查询-项目的相关性。在本文中，我们提出了一种名为轻量级端到端图兴趣网络（EGIN）的新方法，以有效挖掘用户的搜索兴趣并解决先前的挑战。 (i) EGIN利用搜索系统中的查询和项目的相关性和顺序信息构建异质图，以更好地预测电子商务搜索中的CTR。 (ii) EGIN的图嵌入学习共享相同的训练输入，并与CTR预测一起进行联合训练，使端到端框架在大规模搜索系统中部署起来轻松。所提出的EGIN由三部分组成：查询-项目异质图、轻量级图采样和多兴趣网络。查询-项目异质图通过提出的轻量级图采样高效地捕获查询和项目的相关性和顺序信息。多兴趣网络被精心设计为利用图嵌入以捕获查询和项目之间的各种相似关系，以增强最终的CTR预测。我们在公共和工业数据集上进行了大量实验，以证明所提出的EGIN的有效性。与主要的CTR预测任务相比，图学习的训练成本相对较低，确保在实际应用中的效率。

更新时间: 2024-06-25 17:31:04

领域: cs.IR,cs.LG,H.3.3

下载: http://arxiv.org/abs/2406.17745v1

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds. Our approach utilizes a transformer-based method, extending SAM to the 3D domain. We leverage part-level and object-level annotations and introduce a data engine to generate pseudo labels from SAM, thereby distilling 2D knowledge into our 3D model. Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates a variety of applications, such as 3D annotation. Codes and demo can be found at https://github.com/zyc00/Point-SAM.

Updated: 2024-06-25 17:28:03

标题: Point-SAM：针对点云的可提示3D分割模型

摘要: Segment Anything Model（SAM）显着推动了图像分割的2D基础模型的发展。然而，在3D模型中取得类似的成功仍然是一个挑战，原因是数据格式不统一、模型轻量化以及标记数据稀缺等问题。因此，我们提出了一种针对点云的3D可提示分割模型（Point-SAM）。我们的方法利用基于Transformer的方法，将SAM扩展到3D领域。我们利用部分级别和对象级别的注释，并引入一个数据引擎，从SAM生成伪标签，从而将2D知识提炼到我们的3D模型中。我们的模型在几个室内和室外基准测试中表现优于最先进的模型，并展示了各种应用，例如3D注释。代码和演示可以在https://github.com/zyc00/Point-SAM 找到。

更新时间: 2024-06-25 17:28:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17741v1

Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

Recent efforts to scale Transformer models have demonstrated rapid progress across a wide range of tasks (Wei et al., 2022). However, fine-tuning these models for downstream tasks is expensive due to their large parameter counts. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative by allowing us to fine-tune models by updating only a small number of parameters. In this work, we propose a general framework for parameter efficient fine-tuning (PEFT), based on structured unrestricted-rank matrices (SURM) which can serve as a drop-in replacement for popular approaches such as Adapters and LoRA. Unlike other methods like LoRA, SURMs provides more flexibility in finding the right balance between compactness and expressiveness. This is achieved by using low displacement rank matrices (LDRMs), which hasn't been used in this context before. SURMs remain competitive with baselines, often providing significant quality improvements while using a smaller parameter budget. SURMs achieve 5-7% accuracy gains on various image classification tasks while replacing low-rank matrices in LoRA. It also results in up to 12x reduction of the number of parameters in adapters (with virtually no loss in quality) on the GLUE benchmark.

Updated: 2024-06-25 17:26:05

标题: 结构化无限制秩矩阵用于参数高效微调

摘要: 最近，对Transformer模型进行规模化的努力显示出了在各种任务上的快速进展（Wei等，2022年）。然而，由于其庞大的参数数量，为下游任务微调这些模型是昂贵的。参数高效微调（PEFT）方法已经成为一种可行的替代方案，可以通过仅更新少量参数来微调模型。在这项工作中，我们提出了一个基于结构化无约束秩矩阵（SURM）的参数高效微调（PEFT）的通用框架，这可以作为适配器和LoRA等流行方法的替代品。与LoRA等其他方法不同，SURMs在在紧凑性和表达能力之间找到合适的平衡方面提供了更多的灵活性。这是通过使用低位移秩矩阵（LDRMs）实现的，这在此前的上下文中并未被使用。SURMs保持与基准线的竞争力，通常在使用更小的参数预算的情况下提供显著的质量改进。在替换LoRA中的低秩矩阵的同时，SURMs在各种图像分类任务中实现了5-7%的准确率提升。在GLUE基准测试中，它还导致适配器中参数数量的最多减少12倍（几乎没有质量损失）。

更新时间: 2024-06-25 17:26:05

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.17740v1

Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model

Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy's knowledge and is unable to update the original taxonomy structure (usually involving non-leaf nodes). In this paper, we propose a two-stage method called ATTEMPT for taxonomy completion. Our method inserts new concepts into the correct position by finding a parent node and labeling child nodes. Specifically, by combining local nodes with prompts to generate natural sentences, we take advantage of pre-trained language models for hypernym/hyponymy recognition. Experimental results on two public datasets (including six domains) show that ATTEMPT performs best on both taxonomy completion and extension tasks, surpassing existing methods.

Updated: 2024-06-25 17:25:02

标题: 找到父节点然后标记子节点：一种使用预训练语言模型的两阶段分类完整方法

摘要: 分类法将领域概念组织成层次结构，对于构建知识系统和下游应用至关重要。随着领域知识的不断发展，分类法需要不断更新以包括新的概念。先前的方法主要集中在将概念添加到现有层次树的叶节点上，这并未充分利用分类法的知识，并且无法更新原始的分类法结构（通常涉及非叶节点）。在本文中，我们提出了一种名为ATTEMPT的两阶段方法用于分类法完成。我们的方法通过找到父节点并标记子节点，将新概念插入到正确的位置。具体而言，通过将本地节点与提示组合以生成自然句子，我们利用预训练语言模型进行上位词/下位词识别。在两个公共数据集上的实验结果（包括六个领域）表明，ATTEMPT在分类法完成和扩展任务上表现最佳，超过了现有方法。

更新时间: 2024-06-25 17:25:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17739v1

LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

While state-of-the-art Large Language Models (LLMs) have shown impressive performance on many tasks, there has been extensive research on undesirable model behavior such as hallucinations and bias. In this work, we investigate how the quality of LLM responses changes in terms of information accuracy, truthfulness, and refusals depending on three user traits: English proficiency, education level, and country of origin. We present extensive experimentation on three state-of-the-art LLMs and two different datasets targeting truthfulness and factuality. Our findings suggest that undesirable behaviors in state-of-the-art LLMs occur disproportionately more for users with lower English proficiency, of lower education status, and originating from outside the US, rendering these models unreliable sources of information towards their most vulnerable users.

Updated: 2024-06-25 17:24:07

标题: LLM定向低效主要影响弱势用户

摘要: 尽管最先进的大型语言模型（LLM）在许多任务上表现出色，但人们已经进行了大量研究，关注模型行为中的幻觉和偏见等不良现象。在这项工作中，我们调查了LLM响应的质量如何随着信息准确性、真实性和拒绝程度的变化而变化，具体取决于三种用户特征：英语水平、教育水平和国籍。我们在三种最先进的LLM和两个针对真实性和事实性的不同数据集上进行了广泛的实验。我们的研究结果表明，最先进的LLM中的不良行为在英语水平较低、教育水平较低和来自美国以外的用户中发生的比例更高，使得这些模型成为向最脆弱用户提供信息的不可靠来源。

更新时间: 2024-06-25 17:24:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17737v1

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(\frac{n}{\Delta^2})$ requires $\Omega(\frac{\log{(1/\Delta)}}{\log\log{(1/\Delta)}})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(\frac{1}{\Delta}))$-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC'20].

Updated: 2024-06-25 17:20:06

标题: 最佳策略躲避：纯探索多臂老虎机中的近似最优多通道流式下界

摘要: 我们通过多次传递的流式算法为多臂赌博机（MABs）中的纯探索问题提供了近乎最优的样本-传递权衡：任何使用$O(\frac{n}{\Delta^2})$的最优样本复杂度的次线性内存的流式算法都需要$\Omega(\frac{\log{(1/\Delta)}}{\log\log{(1/\Delta)}})$次传递。在这里，$n$是臂的数量，$\Delta$是最佳臂和次优臂之间的奖励差距。我们的结果与Jin等人[ICML'21]的$O(\log(\frac{1}{\Delta}))$次传递算法相匹配（直到较低阶项），该算法仅使用$O(1)$内存，并回答了Assadi和Wang[STOC'20]提出的一个开放问题。

更新时间: 2024-06-25 17:20:06

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2309.03145v2

Fast gradient-free activation maximization for neurons in spiking neural networks

Elements of neural networks, both biological and artificial, can be described by their selectivity for specific cognitive features. Understanding these features is important for understanding the inner workings of neural networks. For a living system, such as a neuron, whose response to a stimulus is unknown and not differentiable, the only way to reveal these features is through a feedback loop that exposes it to a large set of different stimuli. The properties of these stimuli should be varied iteratively in order to maximize the neuronal response. To utilize this feedback loop for a biological neural network, it is important to run it quickly and efficiently in order to reach the stimuli that maximizes certain neurons' activation with the least number of iterations possible. Here we present a framework with an efficient design for such a loop. We successfully tested it on an artificial spiking neural network (SNN), which is a model that simulates the asynchronous spiking activity of neurons in living brains. Our optimization method for activation maximization is based on the low-rank Tensor Train decomposition of the discrete activation function. The optimization space is the latent parameter space of images generated by SN-GAN or VQ-VAE generative models. To our knowledge, this is the first time that effective AM has been applied to SNNs. We track changes in the optimal stimuli for artificial neurons during training and show that highly selective neurons can form already in the early epochs of training and in the early layers of a convolutional spiking network. This formation of refined optimal stimuli is associated with an increase in classification accuracy. Some neurons, especially in the deeper layers, may gradually change the concepts they are selective for during learning, potentially explaining their importance for model performance.

Updated: 2024-06-25 17:08:56

标题: 在尖峰神经网络中快速无梯度激活最大化

摘要: 神经网络的元素，无论是生物的还是人工的，都可以通过它们对特定认知特征的选择性来描述。理解这些特征对于理解神经网络的内部运作非常重要。对于一个生物系统，比如神经元，其对刺激的反应是未知且不可区分的，揭示这些特征的唯一方法是通过一个反馈循环，将其暴露于一组不同的刺激中。这些刺激的特性应该被迭代地改变，以最大化神经元的反应。为了利用这个反馈循环来进行生物神经网络的研究，重要的是要快速高效地运行，以便在可能的最少迭代次数内达到最大化某些神经元激活的刺激。在这里，我们提出了一个有效设计的框架用于这样的循环。我们成功地将其应用于人工尖峰神经网络（SNN），它是一个模拟生物大脑中神经元异步尖峰活动的模型。我们的激活最大化优化方法基于离散激活函数的低秩张量列车分解。优化空间是由SN-GAN或VQ-VAE生成模型生成的图像的潜在参数空间。据我们所知，这是首次将有效的激活最大化应用于SNN。我们追踪人工神经元在训练过程中最佳刺激的变化，并展示高度选择性神经元可以在训练的早期阶段和卷积尖峰网络的早期层中形成。这种精细最佳刺激的形成与分类准确性的提高相关。一些神经元，特别是在更深的层中，可能会在学习过程中逐渐改变它们选择的概念，这可能解释了它们对模型性能的重要性。

更新时间: 2024-06-25 17:08:56

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2401.10748v2

Inducing Riesz and orthonormal bases in $L^2$ via composition operators

We investigate perturbations of orthonormal bases of $L^2$ via a composition operator $C_h$ induced by a mapping $h$. We provide a comprehensive characterization of the mapping $h$ required for the perturbed sequence to form an orthonormal or Riesz basis. Restricting our analysis to differentiable mappings, we reveal that all Riesz bases of the given form are induced by bi-Lipschitz mappings. In addition, we discuss implications of these results for approximation theory, highlighting the potential of using bijective neural networks to construct complete sequences with favorable approximation properties.

Updated: 2024-06-25 17:07:01

标题: 通过合成算子在$L^2$中引导Riesz和正交基请问还有其他什么可以帮助您的吗？:)

摘要: 我们通过由映射$h$诱导的复合算子$C_h$来研究$L^2$的正交基的扰动。我们提供了对于映射$h$的全面描述，以便扰动序列形成正交或Riesz基。将我们的分析限制为可微映射，我们揭示了给定形式的所有Riesz基都是由双Lipschitz映射诱导的。此外，我们讨论了这些结果对逼近理论的影响，突出了使用双射神经网络构建具有有利逼近性质的完备序列的潜力。

更新时间: 2024-06-25 17:07:01

领域: math.FA,cs.LG,cs.NA,math.NA,47B33, 42C15

下载: http://arxiv.org/abs/2406.18613v1

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems.

Updated: 2024-06-25 17:06:57

标题: 何时自我预测有帮助？理解强化学习中的辅助任务

摘要: 我们研究了辅助学习任务（如观察重建和潜在自我预测）对强化学习中表示学习问题的影响。我们还研究了它们如何与MDP中的干扰和观察函数相互作用。我们在线性模型假设下提供了观察重建、潜在自我预测和TD学习在存在干扰和观察函数时的学习动态的理论分析。通过这种形式化，我们能够解释为什么潜在自我预测是一个有用的辅助任务，而观察重建在独立使用时可以提供更有用的特征。我们的实证分析表明，从我们的学习动态框架获得的见解可以在非线性神经网络中预测这些损失函数的行为，从而加强了线性模型框架的实用性，不仅用于理论分析，还用于应用问题的实际受益。

更新时间: 2024-06-25 17:06:57

领域: cs.LG

下载: http://arxiv.org/abs/2406.17718v1

A Temporal Stochastic Bias Correction using a Machine Learning Attention model

Climate models are biased with respect to real-world observations. They usually need to be adjusted before being used in impact studies. The suite of statistical methods that enable such adjustments is called bias correction (BC). However, BC methods currently struggle to adjust temporal biases. Because they mostly disregard the dependence between consecutive time points. As a result, climate statistics with long-range temporal properties, such as heatwave duration and frequency, cannot be corrected accurately. This makes it more difficult to produce reliable impact studies on such climate statistics. This paper offers a novel BC methodology to correct temporal biases. This is made possible by rethinking the philosophy behind BC. We will introduce BC as a time-indexed regression task with stochastic outputs. Rethinking BC enables us to adapt state-of-the-art machine learning (ML) attention models and thereby learn different types of biases, including temporal asynchronicities. With a case study of heatwave duration statistics in Abuja, Nigeria, and Tokyo, Japan, we show more accurate results than current climate model outputs and alternative BC methods.

Updated: 2024-06-25 17:03:22

标题: 使用机器学习注意力模型进行时间序列随机偏差校正

摘要: 气候模型在与现实观测相比存在偏差。通常在用于影响研究之前需要进行调整。使这些调整成为可能的一套统计方法被称为偏差校正（BC）。然而，目前BC方法在调整时间偏差方面存在困难。因为它们大多忽略了连续时间点之间的依赖关系。因此，具有长期时间特性的气候统计数据，如热浪持续时间和频率，无法被准确校正。这使得在这些气候统计数据上进行可靠的影响研究更加困难。本文提出了一种新颖的BC方法来校正时间偏差。通过重新思考BC背后的哲学，这一点成为可能。我们将BC作为一个具有随机输出的时间索引回归任务进行介绍。重新思考BC使我们能够利用最先进的机器学习（ML）注意力模型，并从而学习不同类型的偏差，包括时间不同步性。通过对尼日利亚阿布贾和日本东京的热浪持续时间统计数据的案例研究，我们展示了比当前气候模型输出和替代BC方法更准确的结果。

更新时间: 2024-06-25 17:03:22

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2402.14169v5

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Patients often face difficulties in understanding their hospitalizations, while healthcare workers have limited resources to provide explanations. In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries. To this end, we release (i) a rigorous labeling protocol for errors in medical texts and (ii) a publicly available dataset of annotated hallucinations in 100 doctor-written and 100 generated summaries. We show that fine-tuning on hallucination-free data effectively reduces hallucinations from 2.60 to 1.55 per summary for Llama 2, while preserving relevant information. We observe a similar effect on GPT-4 (0.70 to 0.40), when the few-shot examples are hallucination-free. We also conduct a qualitative evaluation using hallucination-free and improved training data. We find that common quantitative metrics do not correlate well with faithfulness and quality. Finally, we test GPT-4 for automatic hallucination detection, which clearly outperforms common baselines.

Updated: 2024-06-25 17:02:10

标题: 一种基于数据的方法：利用大型语言模型生成忠实和高质量的患者总结

摘要: 患者常常在理解他们的住院治疗时遇到困难，而医护人员资源有限，无法提供解释。在这项工作中，我们研究了大型语言模型基于医生笔记生成患者摘要的潜力，并研究了训练数据对生成摘要的忠实度和质量的影响。为此，我们发布了（i）医学文本中错误的严格标注协议和（ii）100份医生撰写的和100份生成的摘要中注释的幻觉的公开可用数据集。我们展示了在无幻觉数据上微调可以有效地将Llama 2每份摘要中的幻觉从2.60减少到1.55，同时保留相关信息。当少样本示例无幻觉时，我们观察到了GPT-4的类似效果（0.70到0.40）。我们还使用无幻觉和改进的训练数据进行了定性评估。我们发现常见的定量指标与忠实度和质量并不相关。最后，我们测试了GPT-4进行自动幻觉检测，其明显优于常见的基线。

更新时间: 2024-06-25 17:02:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.15422v2

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

We present XCube (abbreviated as $\mathcal{X}^3$), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes. Our model can generate millions of voxels with a finest effective resolution of up to $1024^3$ in a feed-forward fashion without time-consuming test-time optimization. To achieve this, we employ a hierarchical voxel latent diffusion model which generates progressively higher resolution grids in a coarse-to-fine manner using a custom framework built on the highly efficient VDB data structure. Apart from generating high-resolution objects, we demonstrate the effectiveness of XCube on large outdoor scenes at scales of 100m$\times$100m with a voxel size as small as 10cm. We observe clear qualitative and quantitative improvements over past approaches. In addition to unconditional generation, we show that our model can be used to solve a variety of tasks such as user-guided editing, scene completion from a single scan, and text-to-3D. The source code and more results can be found at https://research.nvidia.com/labs/toronto-ai/xcube/.

Updated: 2024-06-25 17:01:54

标题: XCube：使用稀疏体素层次结构进行大规模3D生成建模

摘要: 我们提出了XCube（缩写为$\mathcal{X}^3$），这是一个新颖的生成模型，用于具有任意属性的高分辨率稀疏3D体素网格。我们的模型可以以前馈方式生成数百万个体素，在不需要耗时的测试时间优化的情况下，具有高达$1024^3$的最精细有效分辨率。为了实现这一点，我们采用了一种分层体素潜扩散模型，以一种从粗到细的方式使用基于高效VDB数据结构的定制框架逐渐生成更高分辨率的网格。除了生成高分辨率对象外，我们展示了XCube在100m $\times$ 100m尺度的大型户外场景上的效果，体素大小可小至10cm。我们观察到与过去方法相比，清晰的定性和定量改进。除了无条件生成之外，我们展示了我们的模型可以用于解决各种任务，如用户引导编辑、从单次扫描完成场景和文本到3D。源代码和更多结果可以在https://research.nvidia.com/labs/toronto-ai/xcube/找到。

更新时间: 2024-06-25 17:01:54

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2312.03806v2

InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation

The emergence of large-scale pre-trained models has heightened their application in various downstream tasks, yet deployment is a challenge in environments with limited computational resources. Knowledge distillation has emerged as a solution in such scenarios, whereby knowledge from large teacher models is transferred into smaller student' models, but this is a non-trivial process that traditionally requires technical expertise in AI/ML. To address these challenges, this paper presents InFiConD, a novel framework that leverages visual concepts to implement the knowledge distillation process and enable subsequent no-code fine-tuning of student models. We develop a novel knowledge distillation pipeline based on extracting text-aligned visual concepts from a concept corpus using multimodal models, and construct highly interpretable linear student models based on visual concepts that mimic a teacher model in a response-based manner. InFiConD's interface allows users to interactively fine-tune the student model by manipulating concept influences directly in the user interface. We validate InFiConD via a robust usage scenario and user study. Our findings indicate that InFiConD's human-in-the-loop and visualization-driven approach enables users to effectively create and analyze student models, understand how knowledge is transferred, and efficiently perform fine-tuning operations. We discuss how this work highlights the potential of interactive and visual methods in making knowledge distillation and subsequent no-code fine-tuning more accessible and adaptable to a wider range of users with domain-specific demands.

Updated: 2024-06-25 16:56:45

标题: InFiConD：基于概念的知识蒸馏的交互式无代码微调

摘要: 大规模预训练模型的出现加剧了它们在各种下游任务中的应用，然而在计算资源有限的环境中部署仍然是一个挑战。知识蒸馏已经成为在这种情况下的解决方案，其中来自大型教师模型的知识被转移到较小的学生模型中，但这是一个非常复杂的过程，传统上需要在人工智能/机器学习方面具有技术专长。为了解决这些挑战，本文提出了InFiConD，这是一个新颖的框架，利用视觉概念实现知识蒸馏过程，并使学生模型随后能够进行无代码微调。我们基于从概念语料库中使用多模态模型提取文本对齐的视觉概念开发了一个新颖的知识蒸馏流水线，并基于视觉概念构建了高度可解释的线性学生模型，以一种基于响应的方式模仿教师模型。InFiConD的界面允许用户通过直接在用户界面中操纵概念影响来交互式地微调学生模型。我们通过一个强大的使用场景和用户研究验证了InFiConD。我们的研究结果表明，InFiConD的人机交互和基于可视化的方法使用户能够有效地创建和分析学生模型，了解知识如何转移，并高效地执行微调操作。我们讨论了这项工作如何突显了交互和可视方法在使知识蒸馏和随后的无代码微调更具可访问性和适应性，以满足具有特定领域需求的更广泛用户群体的潜力。

更新时间: 2024-06-25 16:56:45

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.17838v1

Compositional Models for Estimating Causal Effects

Many real-world systems can be represented as sets of interacting components. Examples of such systems include computational systems such as query processors, natural systems such as cells, and social systems such as families. Many approaches have been proposed in traditional (associational) machine learning to model such structured systems, including statistical relational models and graph neural networks. Despite this prior work, existing approaches to estimating causal effects typically treat such systems as single units, represent them with a fixed set of variables and assume a homogeneous data-generating process. We study a compositional approach for estimating individual treatment effects (ITE) in structured systems, where each unit is represented by the composition of multiple heterogeneous components. This approach uses a modular architecture to model potential outcomes at each component and aggregates component-level potential outcomes to obtain the unit-level potential outcomes. We discover novel benefits of the compositional approach in causal inference - systematic generalization to estimate counterfactual outcomes of unseen combinations of components and improved overlap guarantees between treatment and control groups compared to the classical methods for causal effect estimation. We also introduce a set of novel environments for empirically evaluating the compositional approach and demonstrate the effectiveness of our approach using both simulated and real-world data.

Updated: 2024-06-25 16:56:17

标题: 用于估计因果效应的组合模型

摘要: 许多现实世界的系统可以被表示为一组相互作用的组件。这种系统的例子包括计算系统，如查询处理器，自然系统，如细胞，以及社会系统，如家庭。在传统机器学习中提出了许多方法来建模这种结构化系统，包括统计关系模型和图神经网络。尽管存在先前的工作，但现有的估计因果效应的方法通常将这些系统视为单个单位，用一组固定的变量来表示它们，并假设一个同质的数据生成过程。我们研究了一种用于估计结构化系统中个体处理效应（ITE）的组合方法，其中每个单位由多个异质组件的组合表示。这种方法使用模块化架构来模拟每个组件的潜在结果，并将组件级潜在结果聚合到获得单位级潜在结果。我们发现了组合方法在因果推断中的新优势 - 系统化概括，用于估计未见组件组合的反事实结果，并改进了与经典因果效应估计方法相比，治疗组和对照组之间的重叠保证。我们还引入了一组用于实证评估组合方法的新环境，并使用模拟和真实世界数据展示了我们方法的有效性。

更新时间: 2024-06-25 16:56:17

领域: cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.17714v1

Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos

Recent advancements in data-driven approaches for remote photoplethysmography (rPPG) have significantly improved the accuracy of remote heart rate estimation. However, the performance of such approaches worsens considerably under video compression, which is nevertheless necessary to store and transmit video data efficiently. In this paper, we present a novel approach to address the impact of video compression on rPPG estimation, which leverages a pulse-signal magnification transformation to adapt compressed videos to an uncompressed data domain in which the rPPG signal is magnified. We validate the effectiveness of our model by exhaustive evaluations on two publicly available datasets, UCLA-rPPG and UBFC-rPPG, employing both intra- and cross-database performance at several compression rates. Additionally, we assess the robustness of our approach on two additional highly compressed and widely-used datasets, MAHNOB-HCI and COHFACE, which reveal outstanding heart rate estimation results.

Updated: 2024-06-25 16:53:21

标题: 在压缩视频中进行的远程心率估计的深度脉冲信号放大

摘要: 最近，基于数据驱动方法的远程光电测量技术（rPPG）在提高远程心率估计准确性方面取得了显著进展。然而，这种方法在视频压缩下性能明显下降，而视频压缩是必要的以便高效存储和传输视频数据。本文介绍了一种新方法来解决视频压缩对rPPG估计的影响，该方法利用脉冲信号放大变换将压缩视频调整到未压缩数据域，从而放大rPPG信号。我们通过对两个公开可用数据集UCLA-rPPG和UBFC-rPPG进行详尽评估，在多个压缩率下进行了数据库内和跨数据库的性能评估，验证了我们模型的有效性。此外，我们还在另外两个高度压缩和广泛使用的数据集MAHNOB-HCI和COHFACE上评估了我们方法的鲁棒性，结果显示出优秀的心率估计效果。

更新时间: 2024-06-25 16:53:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02652v2

Data curation via joint example selection further accelerates multimodal learning

Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the dependencies between data and thus naturally yield criteria for measuring the joint learnability of a batch. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points. As performance improves by selecting from larger super-batches, we also leverage recent advances in model approximation to reduce the associated computational overhead. As a result, our approach--multimodal contrastive learning with joint example selection (JEST)--surpasses state-of-the-art models with up to 13$\times$ fewer iterations and 10$\times$ less computation. Essential to the performance of JEST is the ability to steer the data selection process towards the distribution of smaller, well-curated datasets via pretrained reference models, exposing the level of data curation as a new dimension for neural scaling laws.

Updated: 2024-06-25 16:52:37

标题: 数据管理通过联合示例选择进一步加速多模态学习

摘要: 数据筛选是大规模预训练的一个关键组成部分。在这项工作中，我们展示了联合选择数据批次比独立选择示例更有效地学习。多模态对比目标展示了数据之间的依赖关系，从而自然产生了衡量批次联合可学习性的标准。我们推导出一个简单易行的算法用于选择这样的批次，这显著加速了训练，超越了单独优先考虑数据点。随着选择更大的超批次来提高性能，我们还利用了模型近似的最新进展来减少相关的计算开销。因此，我们的方法--多模态对比学习与联合示例选择（JEST）--在最多13倍较少的迭代和10倍较少的计算的情况下超越了最先进的模型。JEST的性能关键在于能够通过预训练的参考模型引导数据选择过程朝向较小、经过精心筛选的数据集的分布，将数据筛选水平作为神经缩放定律的一个新维度。

更新时间: 2024-06-25 16:52:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17711v1

FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model

Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data. However, many domain-specific data are privately distributed across multiple owners. Thus, this dilemma raises the interest in how to perform LLM fine-tuning in federated learning (FL). However, confronted with limited computation and communication capacities, FL clients struggle to fine-tune an LLM effectively. To this end, we introduce FedBiOT, a resource-efficient LLM fine-tuning approach to FL. Specifically, our method involves the server generating a compressed LLM and aligning its performance with the full model. Subsequently, the clients fine-tune a lightweight yet important part of the compressed model, referred to as an adapter. Notice that as the server has no access to the private data owned by the clients, the data used for alignment by the server has a different distribution from the one used for fine-tuning by clients. We formulate the problem into a bi-level optimization problem to minimize the negative effect of data discrepancy and derive the updating rules for the server and clients. We conduct extensive experiments on LLaMA-2, empirically showing that the adapter has exceptional performance when reintegrated into the global LLM. The results also indicate that the proposed FedBiOT significantly reduces resource consumption compared to existing benchmarks, all while achieving comparable performance levels.

Updated: 2024-06-25 16:45:47

标题: FedBiOT：在联邦学习中进行局部微调而无需完整模型

摘要: 大型语言模型（LLMs）在经过一些适当数据的微调后，在许多领域特定任务上表现出惊人的性能。然而，许多领域特定数据是分布在多个所有者之间的私有数据。因此，这一困境引发了对如何在联邦学习（FL）中执行LLM微调的兴趣。然而，面对有限的计算和通信能力，FL客户端很难有效地对LLM进行微调。为此，我们引入了FedBiOT，一种资源高效的LLM微调方法用于FL。具体而言，我们的方法涉及服务器生成一个压缩的LLM，并将其性能与完整模型对齐。随后，客户端微调压缩模型的轻量但重要部分，称为适配器。需要注意的是，由于服务器无法访问客户拥有的私有数据，因此服务器用于对齐的数据与客户端用于微调的数据具有不同的分布。我们将问题形式化为一个双层优化问题，以最小化数据差异的负面影响，并推导出服务器和客户端的更新规则。我们在LLaMA-2上进行了大量实验，实验证明当适配器重新整合到全局LLM时，适配器表现出异常的性能。结果还表明，所提出的FedBiOT与现有基准相比显著降低了资源消耗，同时实现了可比较的性能水平。

更新时间: 2024-06-25 16:45:47

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2406.17706v1

Can independent Metropolis beat crude Monte Carlo?

Assume that we would like to estimate the expected value of a function $F$ with respect to a density $\pi$. We prove that if $\pi$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $\pi$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than that of the crude Monte Carlo estimator. The control variates construction requires no extra computational effort but assumes that the expected value of $F$ under $q$ is analytically available. We illustrate this result by calculating the marginal likelihood in a linear regression model with prior-likelihood conflict and a non-conjugate prior. Furthermore, we propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced. We demonstrate its applicability in a Bayesian logistic and Gaussian process regression problems and we rigorously justify our asymptotic arguments under easily verifiable and essentially minimal conditions.

Updated: 2024-06-25 16:38:53

标题: 独立的Metropolis算法能胜过原始的蒙特卡洛算法吗？

摘要: 假设我们想要估计一个函数$F$相对于密度$\pi$的期望值。我们证明，如果$\pi$在KL散度下足够接近另一个密度$q$，那么一个独立的Metropolis采样估计器，从$\pi$中获得样本，其建议密度为$q，并且基于控制变量的方差缩减计算策略，比粗糙的蒙特卡罗估计器具有更小的渐近方差。控制变量的构建不需要额外的计算工作，但假设在$q$下$F$的期望值是可以分析获得的。我们通过在具有先验-似然冲突和非共轭先验的线性回归模型中计算边际似然来说明这一结果。此外，我们提出了一种自适应独立Metropolis算法，该算法调整建议密度，使其与目标的KL散度减少。我们在贝叶斯 logistic 回归和高斯过程回归问题中展示其适用性，并在易于验证和基本最小条件下严格证明了我们的渐近论点。

更新时间: 2024-06-25 16:38:53

领域: math.ST,cs.LG,stat.TH

下载: http://arxiv.org/abs/2406.17699v1

Identifying Nonstationary Causal Structures with High-Order Markov Switching Models

Causal discovery in time series is a rapidly evolving field with a wide variety of applications in other areas such as climate science and neuroscience. Traditional approaches assume a stationary causal graph, which can be adapted to nonstationary time series with time-dependent effects or heterogeneous noise. In this work we address nonstationarity via regime-dependent causal structures. We first establish identifiability for high-order Markov Switching Models, which provide the foundations for identifiable regime-dependent causal discovery. Our empirical studies demonstrate the scalability of our proposed approach for high-order regime-dependent structure estimation, and we illustrate its applicability on brain activity data.

Updated: 2024-06-25 16:38:27

标题: 使用高阶马尔可夫切换模型识别非平稳因果结构

摘要: 时间序列中的因果发现是一个快速发展的领域，在气候科学和神经科学等其他领域有着广泛的应用。传统方法假定一个稳态的因果图，可以适应具有时间相关效应或异质噪声的非稳态时间序列。在这项工作中，我们通过制度相关的因果结构来解决非稳态性问题。我们首先建立了高阶马尔可夫切换模型的可识别性，为可识别制度相关因果发现打下基础。我们的实证研究表明，我们提出的方法在高阶制度相关结构估计方面具有可扩展性，并且我们在大脑活动数据上展示了其适用性。

更新时间: 2024-06-25 16:38:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.17698v1

InterVLS: Interactive Model Understanding and Improvement with Vision-Language Surrogates

Deep learning models are widely used in critical applications, highlighting the need for pre-deployment model understanding and improvement. Visual concept-based methods, while increasingly used for this purpose, face challenges: (1) most concepts lack interpretability, (2) existing methods require model knowledge, often unavailable at run time. Additionally, (3) there lacks a no-code method for post-understanding model improvement. Addressing these, we present InterVLS. The system facilitates model understanding by discovering text-aligned concepts, measuring their influence with model-agnostic linear surrogates. Employing visual analytics, InterVLS offers concept-based explanations and performance insights. It enables users to adjust concept influences to update a model, facilitating no-code model improvement. We evaluate InterVLS in a user study, illustrating its functionality with two scenarios. Results indicates that InterVLS is effective to help users identify influential concepts to a model, gain insights and adjust concept influence to improve the model. We conclude with a discussion based on our study results.

Updated: 2024-06-25 16:37:48

标题: InterVLS: 通过视觉语言替代品进行交互式模型理解和改进

摘要: 深度学习模型广泛用于关键应用，突显了对于模型理解和改进的预部署需求。尽管视觉概念为基础的方法越来越被用于此目的，但面临挑战：(1) 大多数概念缺乏可解释性，(2) 现有方法需要模型知识，通常在运行时不可用。此外，(3) 缺乏一种无代码方法来改进模型的后理解。为了解决这些问题，我们提出了InterVLS。该系统通过发现与文本对齐的概念，利用模型无关的线性替代物来衡量它们的影响力，促进模型理解。利用视觉分析，InterVLS提供基于概念的解释和性能见解。它使用户能够调整概念影响力以更新模型，促进无代码模型改进。我们通过用户研究评估了InterVLS，用两种场景说明了其功能。结果表明，InterVLS有效地帮助用户识别对模型有影响的概念，获得见解并调整概念影响以改进模型。最后，我们根据研究结果展开讨论。

更新时间: 2024-06-25 16:37:48

领域: cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2311.03547v2

HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DTA prediction method, termed HGTDP-DTA, which utilizes dynamic prompts within a hybrid Graph-Transformer framework. Our method generates context-specific prompts for each drug-target pair, enhancing the model's ability to capture unique interactions. The introduction of prompt tuning further optimizes the prediction process by filtering out irrelevant noise and emphasizing task-relevant information, dynamically adjusting the input features of the molecular graph. The proposed hybrid Graph-Transformer architecture combines structural information from Graph Convolutional Networks (GCNs) with sequence information captured by Transformers, facilitating the interaction between global and local information. Additionally, we adopted the multi-view feature fusion method to project molecular graph views and affinity subgraph views into a common feature space, effectively combining structural and contextual information. Experiments on two widely used public datasets, Davis and KIBA, show that HGTDP-DTA outperforms state-of-the-art DTA prediction methods in both prediction performance and generalization ability.

Updated: 2024-06-25 16:33:33

标题: HGTDP-DTA：具有动态提示的混合图-变换器用于药物靶标结合亲和力预测

摘要: 药物靶标结合亲和力（DTA）是药物筛选的关键标准。现有的实验方法耗时且依赖有限的结构和域信息。虽然基于学习的方法可以模拟序列和结构信息，但往往难以整合上下文数据，并且缺乏对药物靶标相互作用的全面建模。在本研究中，我们提出了一种新颖的DTA预测方法，称为HGTDP-DTA，该方法利用混合图转换器框架中的动态提示。我们的方法为每对药物靶标生成特定上下文的提示，增强了模型捕获独特相互作用的能力。引入提示调整进一步通过过滤掉无关噪音和强调任务相关信息来优化预测过程，动态调整分子图的输入特征。所提出的混合图转换器架构结合了图卷积网络（GCNs）提取的结构信息和Transformers捕获的序列信息，促进了全局和局部信息之间的交互。此外，我们采用了多视图特征融合方法，将分子图视图和亲和亚图视图投影到一个共同的特征空间中，有效结合了结构和上下文信息。对两个广泛使用的公共数据集Davis和KIBA的实验表明，HGTDP-DTA在预测性能和泛化能力方面均优于最先进的DTA预测方法。

更新时间: 2024-06-25 16:33:33

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.17697v1

Protecting the 'Stop Using My Data' Right through Blockchain-assisted Evidence Generation

In order to provide personalized services to users, Internet-based platforms collect and utilize user-generated behavioral data. Although the 'stop using my data' right should be a fundamental data right, which allows individuals to request their personal data to be no longer utilized by online platforms, the existing preventive data protection measures (e.g., cryptographic data elimination, differential privacy) are unfortunately not applicable. This work aims to develop the first Evidence Generation Framework for deterring post-acquisition data right violations. We formulated the 'stop using my data' problem, which captures a vantage facet of the multi-faceted notion of 'right to be forgotten'. We designed and implemented the first blockchain-assisted system to generate evidence for deterring the violations of the 'stop using my data' right. Our system employs a novel two-stage evidence generation protocol whose efficacy is ensured by a newly proposed Lemma. To validate our framework, we conducted a case study on recommendation systems with systematic evaluation experiments using two real-world datasets: the measured success rate exceeds 99%.

Updated: 2024-06-25 16:32:37

标题: 通过区块链辅助证据生成保护“停止使用我的数据”权利

摘要: 为了向用户提供个性化的服务，基于互联网的平台收集并利用用户生成的行为数据。尽管“停止使用我的数据”权利应该是一项基本的数据权利，允许个人要求在线平台不再利用他们的个人数据，但现有的预防数据保护措施（例如，加密数据消除、差分隐私）不幸地不适用。本工作旨在开发第一个用于阻止后续数据权利违规的证据生成框架。我们提出了“停止使用我的数据”问题，这捕捉了“被遗忘权”的多方面概念的一个重要方面。我们设计并实现了第一个受区块链辅助的系统，用于生成阻止“停止使用我的数据”权利违规的证据。我们的系统采用了一种新颖的两阶段证据生成协议，其有效性由一个新提出的引理保证。为了验证我们的框架，我们对推荐系统进行了案例研究，使用两个真实世界的数据集进行系统评估实验：测得成功率超过99%。

更新时间: 2024-06-25 16:32:37

领域: cs.CR

下载: http://arxiv.org/abs/2406.17694v1

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

The alignment process changes several properties of a large language model's (LLM's) output distribution. We analyze two aspects of post-alignment distributional shift of LLM responses. First, we re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Alignment suppresses irrelevant and unhelpful content while shifting the output distribution toward longer responses that cover information spanning several responses from the base LLM, essentially presenting diverse information in a single response. Finding little evidence that alignment suppresses useful information, it is natural to ask the opposite question: do aligned models surface information that cannot be recovered from base models? Our second investigation shows this is not the case and the behavior of aligned models is recoverable from base models without fine-tuning. A combination of in-context examples and lower-resolution semantic hints about response content can elicit responses from base LLMs that are as similar to alignment-tuned LLM responses as alignment-tuned LLM responses are to each other. Taken together, these results indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior, providing further evidence for the Superficial Alignment Hypothesis. They also show that in-context alignment can go surprisingly far as a strategy for imitating aligned LLMs without fine-tuning. Our code and data is available at https://github.com/thomlake/investigating-alignment.

Updated: 2024-06-25 16:32:33

标题: 从分布式到欧文托普利思主义：探讨大型语言模型的对齐

摘要: 对齐过程改变了大型语言模型（LLM）输出分布的几个属性。我们分析了LLM响应的后对齐分布变化的两个方面。首先，我们重新审视了之前报告的对齐后响应多样性减少的情况。我们的分析表明，响应多样性的明显下降在很大程度上可以通过质量控制和信息聚合来解释。对齐抑制了不相关和无用的内容，同时将输出分布转向覆盖来自基本LLM的多个响应的信息的更长响应，基本上在单个响应中呈现多样的信息。发现对齐并未抑制有用信息的证据很少，很自然地提出了相反的问题：对齐模型是否呈现无法从基础模型中恢复的信息？我们的第二项调查显示并非如此，对齐模型的行为可以在不进行微调的情况下从基础模型中恢复。通过提供上下文示例和关于响应内容的低分辨率语义提示的组合，可以引发基础LLM的响应，这些响应与经过对齐调整的LLM响应一样类似于对齐调整的LLM响应之间的相似性。综合考虑，这些结果表明当前的对齐技术捕捉了但并未扩展基本LLM行为的有用子集，进一步为表面对齐假设提供了证据。它们还表明，在上下文对齐作为一种策略来模仿对齐调整的LLM而不进行微调可以走得出乎意料的远。我们的代码和数据可在https://github.com/thomlake/investigating-alignment上找到。

更新时间: 2024-06-25 16:32:33

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17692v1

Analysis of learning a flow-based generative model from limited sample complexity

We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.

Updated: 2024-06-25 16:32:20

标题: 限制样本复杂性中学习基于流的生成模型的分析

摘要: 我们研究了训练一个基于流的生成模型的问题，该模型由一个两层自动编码器参数化，用于从高维高斯混合中进行采样。我们对这个问题进行了尖锐的端到端分析。首先，我们提供了一个紧密的封闭形式表征学习速度场，当它由一个浅层去噪自动编码器参数化时，该自动编码器是在目标分布的有限数量$n$个样本上进行训练的。基于这个分析，我们提供了相应生成流的尖锐描述，该流将基本高斯密度向前推进，以近似目标密度。特别地，我们提供了生成混合物的平均值与目标混合物平均值之间距离的封闭形式公式，我们表明这个距离随着$\Theta_n(\frac{1}{n})$衰减。最后，这个速率被证明实际上是贝叶斯最优的。

更新时间: 2024-06-25 16:32:20

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.03575v2

Unified Auto-Encoding with Masked Diffusion

At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarity in their methodologies suggests a promising avenue for an auto-encoder capable of both de-noising tasks. We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a mixed masked and noised image for subsequent timesteps. By integrating features useful for diffusion modeling and for predicting masked patch tokens, UMD achieves strong performance in downstream generative and representation learning tasks, including linear probing and class-conditional generation. This is achieved without the need for heavy data augmentations, multiple views, or additional encoders. Furthermore, UMD improves over the computational efficiency of prior diffusion based methods in total training time. We release our code at https://github.com/philippe-eecs/small-vision.

Updated: 2024-06-25 16:24:34

标题: 统一自动编码与遮蔽扩散

摘要: 成功的生成式和自监督表示学习模型的核心都是包含了某种形式的图像破坏的重建目标。扩散模型通过一个定期的高斯破坏过程实现这一方法，而掩蔽自编码器模型则通过掩盖图像的补丁来实现。尽管它们的方法不同，但其方法学上的相似性暗示了一种有前途的自编码器，能够同时进行去噪任务。我们提出了一个统一的自监督目标，名为统一掩蔽扩散（UMD），它在一个单一的自编码框架中结合了基于补丁和基于噪声的破坏技术。具体而言，UMD通过在扩散变压器（DiT）训练过程中引入一个额外的无噪声、高掩蔽表示步骤来修改扩散噪声计划，并利用混合掩蔽和噪声的图像进行后续时间步骤。通过整合对扩散建模和预测掩蔽补丁令牌有用的特征，UMD在下游生成和表示学习任务中取得了强大的性能，包括线性探测和条件生成。这是在不需要大量数据增强、多个视图或额外编码器的情况下实现的。此外，UMD在总训练时间上优于先前基于扩散的方法的计算效率。我们在https://github.com/philippe-eecs/small-vision 上发布了我们的代码。

更新时间: 2024-06-25 16:24:34

领域: cs.CV,cs.AI,I.2.10

下载: http://arxiv.org/abs/2406.17688v1

PiPar: Pipeline Parallelism for Collaborative Machine Learning

Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We identify idling resources on the server and devices due to sequential computation and communication as the principal cause of low resource utilization. A novel framework PiPar that leverages pipeline parallelism for CML techniques is developed to substantially improve resource utilization. A new training pipeline is designed to parallelize the computations on different hardware resources and communication on different bandwidth resources, thereby accelerating the training process in CML. A low overhead automated parameter selection method is proposed to optimize the pipeline, maximizing the utilization of available resources. The experimental results confirm the validity of the underlying approach of PiPar and highlight that when compared to federated learning: (i) the idle time of the server can be reduced by up to 64.1x, and (ii) the overall training time can be accelerated by up to 34.6x under varying network conditions for a collection of six small and large popular deep neural networks and four datasets without sacrificing accuracy. It is also experimentally demonstrated that PiPar achieves performance benefits when incorporating differential privacy methods and operating in environments with heterogeneous devices and changing bandwidths.

Updated: 2024-06-25 16:17:27

标题: PiPar：用于协作机器学习的流水线并行ism

摘要: 协作机器学习（CML）技术，如联邦学习，已被提出用于跨多个移动设备和服务器训练深度学习模型。CML技术具有保护隐私的特点，因为每个设备上训练的本地模型而不是设备的原始数据与服务器共享。然而，由于资源利用率低，CML训练效率低下。我们确定了服务器和设备上的闲置资源由于连续计算和通信而导致资源利用率低的主要原因。开发了一种利用流水线并行性的新框架PiPar，可大幅提高资源利用率。设计了一个新的训练流水线，以在不同硬件资源上并行计算，并在不同带宽资源上并行通信，从而加快CML中的训练过程。提出了一种低开销的自动参数选择方法来优化流水线，最大化可用资源的利用。实验结果证实了PiPar的基本方法的有效性，并强调与联邦学习相比：（i）服务器的空闲时间可以减少高达64.1倍，（ii）在各种网络条件下，整体训练时间可以加速高达34.6倍，用于六个小型和大型流行深度神经网络和四个数据集而不牺牲准确性。还通过实验证明，当结合差分隐私方法并在具有异构设备和不断变化带宽的环境中运行时，PiPar实现了性能优势。

更新时间: 2024-06-25 16:17:27

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2302.12803v2

Feudal Graph Reinforcement Learning

Graph-based representations and message-passing modular policies constitute prominent approaches to tackling composable control problems in Reinforcement Learning (RL). However, as shown by recent graph deep learning literature, such local message-passing operators can create information bottlenecks and hinder global coordination. The issue becomes more serious in tasks requiring high-level planning. In this work, we propose a novel methodology, named Feudal Graph Reinforcement Learning (FGRL), that addresses such challenges by relying on hierarchical RL and a pyramidal message-passing architecture. In particular, FGRL defines a hierarchy of policies where high-level commands are propagated from the top of the hierarchy down through a layered graph structure. The bottom layers mimic the morphology of the physical system, while the upper layers correspond to higher-order sub-modules. The resulting agents are then characterized by a committee of policies where actions at a certain level set goals for the level below, thus implementing a hierarchical decision-making structure that can naturally implement task decomposition. We evaluate the proposed framework on a graph clustering problem and MuJoCo locomotion tasks; simulation results show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors learning hierarchical decision-making policies.

Updated: 2024-06-25 16:16:49

标题: 封建图强化学习

摘要: 基于图形表示和消息传递模块化策略构成了解决强化学习中可组合控制问题的突出方法。然而，正如最近的图深度学习文献所展示的那样，局部消息传递操作符可能会产生信息瓶颈，并阻碍全局协调。这个问题在需要高级规划的任务中变得更加严重。在这项工作中，我们提出了一种名为Feudal Graph Reinforcement Learning (FGRL)的新方法，通过依赖于分层强化学习和金字塔形消息传递架构来解决这些挑战。具体而言，FGRL定义了一个策略层次结构，其中高级命令从层次结构的顶部通过分层图结构向下传播。底层模拟了物理系统的形态，而上层对应于高阶子模块。然后，由一组策略委员会所描述的代理特征，在某个层次上的行动为下面的层次设定目标，从而实现了一个可以自然实现任务分解的层次决策结构。我们在一个图聚类问题和MuJoCo运动任务上评估了提出的框架；模拟结果表明，FGRL与相关基线相比具有优势。此外，对命令传播机制的深入分析提供了证据，表明引入的消息传递方案有利于学习层次决策策略。

更新时间: 2024-06-25 16:16:49

领域: cs.LG

下载: http://arxiv.org/abs/2304.05099v4

Transformer Normalisation Layers and the Independence of Semantic Subspaces

Recent works have shown that transformers can solve contextual reasoning tasks by internally executing computational graphs called circuits. Circuits often use attention to logically match information from subspaces of the representation, e.g. using position-in-sequence to identify the previous token. In this work, we consider a semantic subspace to be any independent subspace of the latent representation that can fully determine an attention distribution. We show that Pre-Norm, the placement of normalisation layer used by state-of-the-art transformers, violates this ability unless the model learns a strict representation structure of orthogonal spheres. This is because it causes linear subspaces to interfere through their common normalisation factor. Theoretically, we analyse circuit stability by modelling this interference as random noise on the $L_2$-norms of the query/key/value vectors, predicting a phenomenon of circuit collapse when sparse-attention shifts to a different token. Empirically, we investigate the sensitivity of real-world models trained for mathematical addition, observing a 1% rate of circuit collapse when the norms are artificially perturbed by $\lesssim$10%. We contrast Pre-Norm with QKV-Norm, which places normalisation after the attention head's linear operators. Theoretically this relaxes the representational constraints. Empirically we observe comparable in-distribution but worse out-of-distribution performance.

Updated: 2024-06-25 16:16:38

标题: Transformer规范化层和语义子空间的独立性

摘要: 最近的研究表明，transformers可以通过内部执行称为电路的计算图来解决上下文推理任务。电路通常使用注意力来逻辑匹配来自表示的子空间的信息，例如使用序列中的位置来识别先前的令牌。在这项工作中，我们认为语义子空间是可以完全确定注意力分布的潜在表示的任何独立子空间。我们表明，Pre-Norm，即最先进transformers使用的规范化层的放置，除非模型学习严格的正交球结构的表示，否则会违反这种能力。这是因为它通过它们的共同规范化因子导致线性子空间干扰。从理论上讲，我们通过将这种干扰建模为查询/键/值向量的$L_2$-范数上的随机噪声来分析电路的稳定性，预测了当稀疏注意力转移到不同的令牌时电路崩溃的现象。在实证上，我们调查了为数学加法而训练的真实世界模型的敏感性，观察到当范数被人为扰动$\lesssim$10%时，电路崩溃的率为1%。我们将Pre-Norm与QKV-Norm进行对比，后者将规范化放置在注意力头的线性运算符之后。从理论上讲，这放宽了表示约束。在实证上，我们观察到在分布内表现相当，但在分布外表现更差。

更新时间: 2024-06-25 16:16:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17837v1

A Moonshot for AI Oracles in the Sciences

Nobel laureate Philip Anderson and Elihu Abrahams once stated that, "even if machines did contribute to normal science, we see no mechanism by which they could create a Kuhnian revolution and thereby establish a new physical law." In this Perspective, we draw upon insights from the philosophies of science and artificial intelligence (AI) to propose necessary conditions of precisely such a mechanism for generating revolutionary mathematical theories. Recent advancements in AI suggest that satisfying the proposed necessary conditions by machines may be plausible; thus, our proposed necessary conditions also define a moonshot challenge. We also propose a heuristic definition of the intelligibility of mathematical theories to accelerate the development of machine theorists.

Updated: 2024-06-25 16:15:57

标题: 科学领域中的人工智能预言：一个月球登陆计划

摘要: 诺贝尔奖得主菲利普·安德森和伊莱胡·亚伯拉罕曾经说过，“即使机器对正常科学有所贡献，我们也看不到它们能够创造库恩式革命，从而建立新的物理定律的机制。”在这个观点中，我们借鉴了科学和人工智能哲学的见解，提出了产生革命性数学理论的必要条件。人工智能的最新进展表明，机器可能满足提出的必要条件，因此，我们提出的必要条件也定义了一个宏大的挑战。我们还提出了数学理论的可理解性的启发式定义，以加速机器理论家的发展。

更新时间: 2024-06-25 16:15:57

领域: cs.AI,cs.CY,math.HO,physics.soc-ph

下载: http://arxiv.org/abs/2406.17836v1

Multi-Modal Conformal Prediction Regions with Simple Structures by Optimizing Convex Shape Templates

Conformal prediction is a statistical tool for producing prediction regions for machine learning models that are valid with high probability. A key component of conformal prediction algorithms is a \emph{non-conformity score function} that quantifies how different a model's prediction is from the unknown ground truth value. Essentially, these functions determine the shape and the size of the conformal prediction regions. While prior work has gone into creating score functions that produce multi-model prediction regions, such regions are generally too complex for use in downstream planning and control problems. We propose a method that optimizes parameterized \emph{shape template functions} over calibration data, which results in non-conformity score functions that produce prediction regions with minimum volume. Our approach results in prediction regions that are \emph{multi-modal}, so they can properly capture residuals of distributions that have multiple modes, and \emph{practical}, so each region is convex and can be easily incorporated into downstream tasks, such as a motion planner using conformal prediction regions. Our method applies to general supervised learning tasks, while we illustrate its use in time-series prediction. We provide a toolbox and present illustrative case studies of F16 fighter jets and autonomous vehicles, showing an up to $68\%$ reduction in prediction region area compared to a circular baseline region.

Updated: 2024-06-25 16:10:41

标题: 通过优化凸形模板实现简单结构的多模态符合预测区域

摘要: 共形预测是一种用于生成与高概率有效的机器学习模型预测区域的统计工具。共形预测算法的一个关键组成部分是一个非共形得分函数，用于量化模型的预测与未知真实值之间的差异程度。本质上，这些函数确定了共形预测区域的形状和大小。尽管先前的工作已经致力于创建产生多模型预测区域的得分函数，但这些区域通常对于下游规划和控制问题来说过于复杂。我们提出了一种方法，通过优化参数化的形状模板函数在校准数据上，从而得到产生最小体积预测区域的非共形得分函数。我们的方法产生的预测区域是多模态的，因此能够正确捕捉具有多个模式的分布的残差，并且是实用的，因此每个区域都是凸的，可以轻松地纳入下游任务，如使用共形预测区域的运动规划器。我们的方法适用于一般的监督学习任务，同时我们还展示了其在时间序列预测中的应用。我们提供了一个工具箱，并展示了F16战斗机和自动驾驶车辆的实证案例研究，显示与圆形基线区域相比，预测区域面积最多减少了68%。

更新时间: 2024-06-25 16:10:41

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2312.07434v2

Locally Differentially Private Distributed Online Learning with Guaranteed Optimality

Distributed online learning is gaining increased traction due to its unique ability to process large-scale datasets and streaming data. To address the growing public awareness and concern on privacy protection, plenty of algorithms have been proposed to enable differential privacy in distributed online optimization and learning. However, these algorithms often face the dilemma of trading learning accuracy for privacy. By exploiting the unique characteristics of online learning, this paper proposes an approach that tackles the dilemma and ensures both differential privacy and learning accuracy in distributed online learning. More specifically, while ensuring a diminishing expected instantaneous regret, the approach can simultaneously ensure a finite cumulative privacy budget, even in the infinite time horizon. To cater for the fully distributed setting, we adopt the local differential-privacy framework, which avoids the reliance on a trusted data curator, and, hence, provides stronger protection than the classic "centralized" (global) differential privacy. To the best of our knowledge, this is the first algorithm that successfully ensures both rigorous local differential privacy and learning accuracy. The effectiveness of the proposed algorithm is evaluated using machine learning tasks, including logistic regression on the the "mushrooms" datasets and CNN-based image classification on the "MNIST" and "CIFAR-10" datasets.

Updated: 2024-06-25 16:07:24

标题: 局部差分隐私分布式在线学习与保证最优性

摘要: 分布式在线学习因其处理大规模数据集和数据流的独特能力而受到越来越多的关注。为了解决日益增长的公众对隐私保护的关注和担忧，许多算法已被提出，以实现在分布式在线优化和学习中的差分隐私。然而，这些算法往往面临着以牺牲学习准确性换取隐私的困境。本文利用在线学习的独特特点，提出了一种方法来解决这一困境，并确保在分布式在线学习中同时实现差分隐私和学习准确性。具体来说，该方法可以同时确保一个有限的累积隐私预算，即使在无限时间跨度内，同时确保一个递减的预期瞬时遗憾。为了适应完全分布式的环境，我们采用了本地差分隐私框架，避免了对可信数据策划者的依赖，因此提供比传统的“集中式”（全局）差分隐私更强的保护。据我们所知，这是第一个成功确保严格的本地差分隐私和学习准确性的算法。提出的算法的有效性通过使用机器学习任务进行评估，包括对“蘑菇”数据集进行逻辑回归和对“MNIST”和“CIFAR-10”数据集进行基于CNN的图像分类。

更新时间: 2024-06-25 16:07:24

领域: cs.LG,cs.CR,cs.MA

下载: http://arxiv.org/abs/2306.14094v2

LaTable: Towards Large Tabular Models

Tabular data is one of the most ubiquitous modalities, yet the literature on tabular generative foundation models is lagging far behind its text and vision counterparts. Creating such a model is hard, due to the heterogeneous feature spaces of different tabular datasets, tabular metadata (e.g. dataset description and feature headers), and tables lacking prior knowledge (e.g. feature order). In this work we propose LaTable: a novel tabular diffusion model that addresses these challenges and can be trained across different datasets. Through extensive experiments we find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples. On the other hand, we explore the poor zero-shot performance of LaTable, and what it may teach us about building generative tabular foundation models with better zero- and few-shot generation capabilities.

Updated: 2024-06-25 16:03:50

标题: LaTable: 朝着大型表格模型前进

摘要: 表格数据是最常见的数据形式之一，然而关于表格生成式基础模型的文献远远落后于文本和视觉领域。创建这样的模型很困难，因为不同表格数据集具有异构的特征空间，表格元数据（例如数据集描述和特征头），以及缺乏先验知识的表格（例如特征顺序）。在这项工作中，我们提出了LaTable：一种新颖的表格扩散模型，可以解决这些挑战，并可以在不同数据集上进行训练。通过大量实验证明，LaTable在分布内生成方面优于基线，并且微调LaTable可以更好地生成少量样本的分布外数据集。另一方面，我们探讨了LaTable零样本性能较差的问题，以及它可能教导我们如何构建具有更好零样本和少样本生成能力的生成式表格基础模型。

更新时间: 2024-06-25 16:03:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.17673v1

GLAD: Improving Latent Graph Generative Modeling with Simple Quantization

Exploring the graph latent structures has not garnered much attention in the graph generative research field. Yet, exploiting the latent space is as crucial as working on the data space for discrete data such as graphs. However, previous methods either failed to preserve the permutation symmetry of graphs or lacked an effective approaches to model appropriately within the latent space. To mitigate those issues, we propose a simple, yet effective discrete latent graph diffusion generative model. Our model, namely GLAD, not only overcomes the drawbacks of existing latent approaches, but also alleviates inherent issues present in diffusion methods applied on the graph space. We validate our generative model on the molecular benchmark datasets, on which it demonstrates competitive performance compared with the state-of-the-art baselines.

Updated: 2024-06-25 16:01:57

标题: GLAD：利用简单量化改进潜在图生成建模

摘要: 探索图的潜在结构在图生成研究领域并没有引起太多关注。然而，对于离散数据如图形来说，利用潜在空间与在数据空间上进行工作同样至关重要。然而，先前的方法要么未能保留图的排列对称性，要么缺乏在潜在空间中适当建模的有效方法。为了缓解这些问题，我们提出了一个简单但有效的离散潜在图扩散生成模型。我们的模型，即GLAD，不仅克服了现有潜在方法的缺点，还缓解了应用于图空间上的扩散方法中存在的固有问题。我们在分子基准数据集上验证了我们的生成模型，在这些数据集上，与最先进的基线相比，它表现出竞争性的性能。

更新时间: 2024-06-25 16:01:57

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.16883v2

LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

We introduce LLM-ARC, a neuro-symbolic framework designed to enhance the logical reasoning capabilities of Large Language Models (LLMs), by combining them with an Automated Reasoning Critic (ARC). LLM-ARC employs an Actor-Critic method where the LLM Actor generates declarative logic programs along with tests for semantic correctness, while the Automated Reasoning Critic evaluates the code, runs the tests and provides feedback on test failures for iterative refinement. Implemented using Answer Set Programming (ASP), LLM-ARC achieves a new state-of-the-art accuracy of 88.32% on the FOLIO benchmark which tests complex logical reasoning capabilities. Our experiments demonstrate significant improvements over LLM-only baselines, highlighting the importance of logic test generation and iterative self-refinement. We achieve our best result using a fully automated self-supervised training loop where the Actor is trained on end-to-end dialog traces with Critic feedback. We discuss potential enhancements and provide a detailed error analysis, showcasing the robustness and efficacy of LLM-ARC for complex natural language reasoning tasks.

Updated: 2024-06-25 15:52:15

标题: LLM-ARC：利用自动推理评论者增强LLMs

摘要: 我们介绍了LLM-ARC，这是一个旨在增强大型语言模型（LLMs）的逻辑推理能力的神经符号框架，通过将它们与自动推理评论家（ARC）相结合。LLM-ARC采用了一个演员评论家的方法，其中LLM演员生成声明性逻辑程序以及测试语义正确性，而自动推理评论家评估代码，运行测试并提供有关测试失败的反馈以进行迭代改进。LLM-ARC使用Answer Set Programming（ASP）实现，在测试复杂逻辑推理能力的FOLIO基准测试中实现了新的最先进准确率为88.32%。我们的实验表明，在LLM基线之上实现了显著改进，突出了逻辑测试生成和迭代自我完善的重要性。我们通过完全自动化的自监督训练循环取得最佳结果，其中演员在端到端对话跟踪上接受评论家反馈的训练。我们讨论了潜在的增强措施，并提供了详细的错误分析，展示了LLM-ARC在复杂的自然语言推理任务中的鲁棒性和有效性。

更新时间: 2024-06-25 15:52:15

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2406.17663v1

Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .

Updated: 2024-06-25 15:50:32

标题: 草地：使用结构稀疏梯度进行计算高效低内存的LLM训练

摘要: 大型语言模型（LLM）的训练和微调常常受限于有限的GPU存储器。现有的基于投影的优化方法通过将梯度投影到一个较低维度的子空间中来解决这个问题，从而减少优化器状态存储器的占用，但它们通常依赖于密集的投影矩阵，这可能会引入计算和存储开销。在这项工作中，我们提出了Grass（GRAdient Stuctured Sparsification），这是一种新颖的方法，利用稀疏投影将梯度转换为结构化稀疏的更新。这种设计不仅显著降低了优化器状态的内存使用量，还最小化了梯度的内存占用、计算和通信成本，从而实现了显著的吞吐量提升。对预训练和微调任务的大量实验表明，Grass实现了与全秩训练和现有基于投影的方法竞争性能。值得注意的是，Grass使得在单个40GB A100 GPU上能够进行13B参数的LLaMA模型的半精度预训练成为可能，这是以前方法不可行的，而且在8-GPU系统上能够实现多达2倍的吞吐量改进。代码可以在https://github.com/aashiqmuhamed/GRASS 找到。

更新时间: 2024-06-25 15:50:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.17660v1

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding capabilities, their performance is still far from being satisfactory in planning tasks. At the same time, although classical task planners, such as PDDL-based, are strong in planning for long-horizon tasks, they do not work well in open worlds where unforeseen situations are common. In this paper, we propose a novel task planning and execution framework, called DKPROMPT, which automates VLM prompting using domain knowledge in PDDL for classical planning in open worlds. Results from quantitative experiments show that DKPROMPT outperforms classical planning, pure VLM-based and a few other competitive baselines in task completion rate.

Updated: 2024-06-25 15:49:47

标题: DKPROMPT：领域知识提示视觉语言模型用于开放世界规划

摘要: 视觉-语言模型（VLMs）已经应用于机器人任务规划问题，其中机器人接收自然语言任务，并基于视觉输入生成计划。尽管当前的VLMs已经展示出强大的视觉-语言理解能力，但它们在规划任务中的表现仍然远远不令人满意。与此同时，虽然传统的任务规划器，如基于PDDL的规划器，在规划长期任务方面表现强大，但在常见的未知情况下并不表现良好。在本文中，我们提出了一种新颖的任务规划和执行框架，称为DKPROMPT，它利用PDDL中的领域知识自动化VLM提示，用于在开放世界中进行经典规划。定量实验结果显示，DKPROMPT在任务完成率方面优于传统规划、纯VLM和其他少数竞争基线。

更新时间: 2024-06-25 15:49:47

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.17659v1

MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection

Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-k anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4% mAP and 55.0% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline, where anchor proposals are modelled as learnable embeddings.

Updated: 2024-06-25 15:46:39

标题: MDHA：具有混合锚点的多尺度可变形变压器用于多视角3D物体检测

摘要: 多视角3D物体检测是自动驾驶系统的关键组成部分。当代基于查询的方法主要依赖于特定数据集的3D锚点的初始化，引入偏见，或者利用密集的注意力机制，这在计算上是低效且不可扩展的。为了克服这些问题，我们提出了MDHA，一个新颖的稀疏查询框架，它利用来自多视角、多尺度输入的混合锚点构建自适应的3D输出提议。固定的2D锚点与深度预测相结合，形成2.5D锚点，然后投影得到3D提议。为了确保高效性，我们提出的Anchor Encoder执行稀疏细化并选择前k个锚点和特征。此外，现有的多视图注意力机制依赖于将参考点投影到多个图像，而我们的新颖的圆形可变形注意力机制只投影到单个图像，但允许参考点无缝地关注相邻图像，提高了效率而不影响性能。在nuScenes验证集上，使用ResNet101骨干网络，它实现了46.4%的mAP和55.0%的NDS。MDHA明显优于基准线，其中锚点提议被建模为可学习的嵌入。

更新时间: 2024-06-25 15:46:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.17654v1

ELIZA Reinterpreted: The world's first chatbot was not intended as a chatbot at all

ELIZA, often considered the world's first chatbot, was written by Joseph Weizenbaum in the early 1960s. Weizenbaum did not intend to invent the chatbot, but rather to build a platform for research into human-machine conversation and the important cognitive processes of interpretation and misinterpretation. His purpose was obscured by ELIZA's fame, resulting in large part from the fortuitous timing of it's creation, and it's escape into the wild. In this paper I provide a rich historical context for ELIZA's creation, demonstrating that ELIZA arose from the intersection of some of the central threads in the technical history of AI. I also briefly discuss how ELIZA escaped into the world, and how its accidental escape, along with several coincidental turns of the programming language screws, led both to the misapprehension that ELIZA was intended as a chatbot, and to the loss of the original ELIZA to history for over 50 years.

Updated: 2024-06-25 15:41:40

标题: 重新解读ELIZA：世界上第一个聊天机器人并非本意是聊天机器人

摘要: ELIZA，通常被认为是世界上第一个聊天机器人，是由约瑟夫·魏岑鲍姆在1960年代初编写的。魏岑鲍姆并没有打算发明聊天机器人，而是建立一个用于研究人机对话以及重要的认知过程——解释和误解的平台。他的初衷被ELIZA的名声所掩盖，这在很大程度上是由于它的创造时机的幸运以及其逃逸至外界。在本文中，我为ELIZA的创造提供了丰富的历史背景，展示了ELIZA是如何从人工智能技术史上一些核心线索的交汇点产生的。我还简要讨论了ELIZA是如何逃逸到外界的，以及它的意外逃逸以及几次巧合的编程语言转变，导致人们错误认为ELIZA是旨在成为聊天机器人，并导致原始ELIZA在50多年的时间内失传于历史。

更新时间: 2024-06-25 15:41:40

领域: cs.AI,cs.CL,cs.CY,cs.HC

下载: http://arxiv.org/abs/2406.17650v1

Privacy Preserving Reinforcement Learning for Population Processes

We consider the problem of privacy protection in Reinforcement Learning (RL) algorithms that operate over population processes, a practical but understudied setting that includes, for example, the control of epidemics in large populations of dynamically interacting individuals. In this setting, the RL algorithm interacts with the population over $T$ time steps by receiving population-level statistics as state and performing actions which can affect the entire population at each time step. An individual's data can be collected across multiple interactions and their privacy must be protected at all times. We clarify the Bayesian semantics of Differential Privacy (DP) in the presence of correlated data in population processes through a Pufferfish Privacy analysis. We then give a meta algorithm that can take any RL algorithm as input and make it differentially private. This is achieved by taking an approach that uses DP mechanisms to privatize the state and reward signal at each time step before the RL algorithm receives them as input. Our main theoretical result shows that the value-function approximation error when applying standard RL algorithms directly to the privatized states shrinks quickly as the population size and privacy budget increase. This highlights that reasonable privacy-utility trade-offs are possible for differentially private RL algorithms in population processes. Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes.

Updated: 2024-06-25 15:41:26

标题: 人口过程的隐私保护强化学习

摘要: 我们考虑在人口过程中进行隐私保护的问题，这是强化学习（RL）算法的一个实际但研究不足的设置，例如，在动态相互作用个体的大型人口中控制流行病。在这种情况下，RL算法通过接收人口水平统计数据作为状态与人口在T个时间步中互动，并执行可以在每个时间步影响整个人口的操作。个体的数据可能会在多次互动中被收集，他们的隐私必须随时得到保护。我们通过Pufferfish隐私分析澄清了在人口过程中存在相关数据时差分隐私（DP）的贝叶斯语义。然后，我们提供了一个元算法，可以将任何RL算法作为输入，并使其具有差分隐私性质。这是通过采用使用DP机制在RL算法接收它们作为输入之前的每个时间步对状态和奖励信号进行私有化的方法实现的。我们的主要理论结果表明，将标准RL算法直接应用于私有化状态时的值函数逼近误差会随着人口规模和隐私预算的增加而迅速缩小。这突出了在人口过程中差分隐私RL算法可以实现合理的隐私效用权衡。我们的理论发现通过在大人口规模上进行模拟流行病控制问题上的实验进行了验证。

更新时间: 2024-06-25 15:41:26

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.17649v1

The Use of AI-Robotic Systems for Scientific Discovery

The process of developing theories and models and testing them with experiments is fundamental to the scientific method. Automating the entire scientific method then requires not only automation of the induction of theories from data, but also experimentation from design to implementation. This is the idea behind a robot scientist -- a coupled system of AI and laboratory robotics that has agency to test hypotheses with real-world experiments. In this chapter we explore some of the fundamentals of robot scientists in the philosophy of science. We also map the activities of a robot scientist to machine learning paradigms, and argue that the scientific method shares an analogy with active learning. We demonstrate these concepts using examples from previous robot scientists, and also from Genesis: a next generation robot scientist designed for research in systems biology, comprising a micro-fluidic system with 1000 computer-controlled micro-bioreactors and interpretable models based in controlled vocabularies and logic.

Updated: 2024-06-25 15:33:01

标题: 使用AI-机器人系统进行科学发现

摘要: 开发理论和模型，并用实验证明它们的过程对于科学方法至关重要。自动化整个科学方法不仅需要从数据中归纳理论的自动化，还需要实验从设计到实施的自动化。这就是机器科学家的理念——一种人工智能和实验室机器人的耦合系统，具有执行真实世界实验的能力来测试假设。在本章中，我们探讨了机器科学家在科学哲学中的一些基本概念。我们还将机器科学家的活动映射到机器学习范式，并认为科学方法与主动学习有类似之处。我们使用以前机器科学家和Genesis的例子来演示这些概念：Genesis是为系统生物学研究设计的下一代机器科学家，包括一个具有1000个计算机控制的微生物反应器的微流控系统，并建立在受控词汇和逻辑基础上的可解释模型。

更新时间: 2024-06-25 15:33:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17835v1

Banishing LLM Hallucinations Requires Rethinking Generalization

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.

Updated: 2024-06-25 15:31:01

标题: 消除LLM幻觉需要重新思考泛化

摘要: 尽管大型语言模型（LLMs）具有强大的聊天、编码和推理能力，但它们经常产生幻觉。传统智慧认为，幻觉是创造性和事实性之间平衡的结果，可以通过将LLM与外部知识源联系起来来减轻，但无法完全消除。通过广泛系统的实验证明，这些传统方法未能解释LLMs在实践中为何产生幻觉。具体来说，我们展示了LLMs增强版的大规模混合记忆专家（MoME）可以轻松记忆大量的随机数字数据集。我们用理论构建验证了这些实验结果，表明简单的神经网络在培训损失超过阈值时会产生幻觉，通常在训练互联网规模数据时发生。我们通过与传统检索方法抑制幻觉进行比较来解释我们的发现。我们利用我们的发现设计了第一代消除幻觉的模型——Lamini-1——它在数百万个动态检索的大规模混合记忆专家中存储事实。

更新时间: 2024-06-25 15:31:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17642v1

BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating multiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a model list associated with different variations of the input data created through TTA. Then, we use BMA to combine model predictions weighted by their respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance.

Updated: 2024-06-25 15:24:06

标题: BayTTA：使用贝叶斯模型平均优化测试时间增强的不确定性感知医学图像分类

摘要: 测试时间增强（TTA）是计算机视觉任务测试阶段常用的技术。它涉及聚合输入数据的多个增强版本。在执行TTA后，使用简单平均公式组合预测是一种常见且直接的方法。本文介绍了一种优化TTA的新框架，称为BayTTA（基于贝叶斯的TTA），它基于贝叶斯模型平均（BMA）。首先，我们生成一个与通过TTA创建的输入数据的不同变化相关联的模型列表。然后，我们使用BMA结合模型预测，加权其各自的后验概率。这种方法允许考虑模型不确定性，从而增强相关机器学习或深度学习模型的预测性能。我们在包括皮肤癌、乳腺癌和胸部X射线图像在内的各种公共数据上评估了BayTTA的性能，并包括两个知名的基因编辑数据集CRISPOR和GUIDE-seq。我们的实验结果表明，BayTTA可以有效地集成到用于医学图像分析的最先进深度学习模型中，以及一些流行的预训练CNN模型，如VGG-16、MobileNetV2、DenseNet201、ResNet152V2和InceptionRes-NetV2，从而提高它们的准确性和鲁棒性性能。

更新时间: 2024-06-25 15:24:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17640v1

Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP

Contrastive Language--Image Pre-training (CLIP) has manifested remarkable improvements in zero-shot classification and cross-modal vision-language tasks. Yet, from a geometrical point of view, the CLIP embedding space has been found to have a pronounced modality gap. This gap renders the embedding space overly sparse and disconnected, with different modalities being densely distributed in distinct subregions of the hypersphere. In this work, we aim at answering two main questions: 1. Does sharing the parameter space between the multi-modal encoders reduce the modality gap? 2. Can the gap be mitigated by pushing apart the uni-modal embeddings via intra-modality separation? We design AlignCLIP, in order to answer these questions and show that answers to both questions are positive. Through extensive experiments, we show that AlignCLIP achieves noticeable enhancements in the cross-modal alignment of the embeddings, and thereby, reduces the modality gap, while maintaining the performance across several downstream evaluations, such as zero-shot image classification, zero-shot multi-modal retrieval and zero-shot semantic text similarity.

Updated: 2024-06-25 15:24:02

标题: 缓解差距：探讨改善CLIP中跨模态对齐的方法

摘要: 对比语言-图像预训练（CLIP）在零样本分类和跨模态视觉-语言任务中表现出显著的改进。然而，从几何角度来看，发现CLIP嵌入空间存在明显的模态差距。这个差距使得嵌入空间过于稀疏和分离，不同的模态在超球面的不同子区域中密集分布。在这项工作中，我们旨在回答两个主要问题：1.在多模态编码器之间共享参数空间是否减少模态差距？2.通过在模态内部分离来推开单模态嵌入，能否减轻这种差距？我们设计了AlignCLIP，以回答这些问题，并展示了对两个问题的答案都是积极的。通过大量实验证明，AlignCLIP在嵌入的跨模态对齐方面取得显著的改进，从而减少了模态差距，同时在诸如零样本图像分类、零样本多模态检索和零样本语义文本相似性等多个下游评估中保持了性能。

更新时间: 2024-06-25 15:24:02

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17639v1

Aligning Diffusion Models with Noise-Conditioned Perception

Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs, Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception, leading to slower and less efficient training during the preference alignment stage. We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues. Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within this embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost. For SDXL, our approach provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt following against original open-sourced SDXL-DPO on the PartiPrompts dataset, while significantly reducing compute. Our approach not only improves the efficiency and quality of human preference alignment for diffusion models but is also easily integrable with other optimization techniques. The training code and LoRA weights will be available here: https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1

Updated: 2024-06-25 15:21:50

标题: 将扩散模型与噪声调节感知对齐

摘要: 最近在人类偏好优化方面取得的进展，最初是为了语言模型（LMs）而开发的，已经显示出了对文本到图像扩散模型的潜力，增强了提示对齐、视觉吸引力和用户偏好。与LMs不同，扩散模型通常在像素或VAE空间中进行优化，这与人类感知不太匹配，导致在偏好对齐阶段训练速度较慢且效率较低。我们建议在扩散模型的U-Net嵌入空间中使用感知目标来解决这些问题。我们的方法涉及在该嵌入空间内使用直接偏好优化（DPO）、对比偏好优化（CPO）和监督微调（SFT）对Stable Diffusion 1.5和XL进行微调。该方法在各种指标上显著优于标准的潜在空间实现，包括质量和计算成本。对于SDXL，我们的方法在PartiPrompts数据集上相对于原始开源的SDXL-DPO提供了60.8\%的一般偏好、62.2\%的视觉吸引力和52.1\%的提示跟随，同时显著减少了计算量。我们的方法不仅提高了扩散模型人类偏好对齐的效率和质量，而且还可以轻松与其他优化技术集成。训练代码和LoRA权重将在此处提供：https://huggingface.co/alexgambashidze/SDXL_NCP-DPO_v0.1

更新时间: 2024-06-25 15:21:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17636v1

Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels

Computational social science (CSS) practitioners often rely on human-labeled data to fine-tune supervised text classifiers. We assess the potential for researchers to augment or replace human-generated training data with surrogate training labels from generative large language models (LLMs). We introduce a recommended workflow and test this LLM application by replicating 14 classification tasks and measuring performance. We employ a novel corpus of English-language text classification data sets from recent CSS articles in high-impact journals. Because these data sets are stored in password-protected archives, our analyses are less prone to issues of contamination. For each task, we compare supervised classifiers fine-tuned using GPT-4 labels against classifiers fine-tuned with human annotations and against labels from GPT-4 and Mistral-7B with few-shot in-context learning. Our findings indicate that supervised classification models fine-tuned on LLM-generated labels perform comparably to models fine-tuned with labels from human annotators. Fine-tuning models using LLM-generated labels can be a fast, efficient and cost-effective method of building supervised text classifiers.

Updated: 2024-06-25 15:20:25

标题: 自动标注中的知识蒸馏：LLM生成的训练标签进行监督文本分类

摘要: 计算社会科学（CSS）从业者通常依赖人工标记数据来微调监督文本分类器。我们评估研究人员能否通过生成性大型语言模型（LLMs）的替代训练标签来增强或替代人工生成的训练数据。我们介绍了一个推荐的工作流程，并通过复制14个分类任务并测量性能来测试这种LLM应用。我们利用最近高影响期刊上的CSS文章中的英语文本分类数据集的新颖语料库。由于这些数据集存储在受密码保护的存档中，我们的分析更少受到污染问题的影响。对于每个任务，我们比较了使用GPT-4标签微调的监督分类器与使用人工注释微调的分类器以及使用GPT-4和Mistral-7B标签进行少量样本上下文学习微调的分类器。我们的研究结果表明，使用LLM生成的标签微调的监督分类模型表现与使用人工注释微调的模型相当。使用LLM生成的标签微调模型可以是建立监督文本分类器的一种快速、高效和经济有效的方法。

更新时间: 2024-06-25 15:20:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17633v1

KANQAS: Kolmogorov Arnold Network for Quantum Architecture Search

Quantum architecture search~(QAS) is a promising direction for optimization and automated design of quantum circuits towards quantum advantage. Recent techniques in QAS focus on machine learning-based approaches from reinforcement learning, like deep Q-network. While multi-layer perceptron-based deep Q-networks have been applied for QAS, their interpretability remains challenging due to the high number of parameters. In this work, we evaluate the practicality of KANs in quantum architecture search problems, analyzing their efficiency in terms of the probability of success, frequency of optimal solutions and their dependencies on various degrees of freedom of the network. In a noiseless scenario, the probability of success and the number of optimal quantum circuit configurations to generate the multi-qubit maximally entangled states are significantly higher than MLPs. Moreover in noisy scenarios, KAN can achieve a better fidelity in approximating maximally entangled state than MLPs, where the performance of the MLP significantly depends on the choice of activation function. Further investigation reveals that KAN requires a very small number of learnable parameters compared to MLPs, however, the average time of executing each episode for KAN is much higher.

Updated: 2024-06-25 15:17:01

标题: KANQAS：科尔莫戈洛夫-阿诺德网络用于量子架构搜索

摘要: 量子架构搜索（QAS）是优化和自动设计量子电路朝着量子优势的一个有前景的方向。最近在QAS中的技术主要集中在基于强化学习的机器学习方法，如深度Q网络。虽然基于多层感知器的深度Q网络已经应用于QAS，但由于参数数量较高，它们的可解释性仍然具有挑战性。在本研究中，我们评估了KAN在量子架构搜索问题中的实用性，分析了它们在成功概率、最优解的频率以及其对网络各种自由度的依赖方面的效率。在无噪声情况下，成功概率和生成多量子比特最大纠缠态的最优量子电路配置的数量显著高于MLPs。此外，在有噪声情况下，KAN可以比MLPs更好地逼近最大纠缠态的保真度，而MLP的表现显著取决于激活函数的选择。进一步的调查揭示，与MLPs相比，KAN需要非常少量的可学习参数，然而，对于KAN执行每个周期的平均时间要高得多。

更新时间: 2024-06-25 15:17:01

领域: quant-ph,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2406.17630v1

Controlling Moments with Kernel Stein Discrepancies

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.

Updated: 2024-06-25 15:16:17

标题: 用核斯坦距离控制矩阵

摘要: 核Stein差异（KSDs）衡量分布近似的质量，并且可以在目标密度具有难以计算的归一化常数时计算。显著的应用包括诊断近似MCMC抽样器和未归一化统计模型的拟合度检验。本文分析了KSD的收敛控制属性。我们首先表明，用于弱收敛控制的标准KSD无法控制矩收敛。为了解决这一限制，我们接下来提供了在哪些条件下，替代扩散KSD可以控制矩和弱收敛的充分条件。作为一个直接的结果，我们为每个$q>0$开发了第一个已知能够准确表征$q$-Wasserstein收敛的KSD。

更新时间: 2024-06-25 15:16:17

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2211.05408v4

Video Inpainting Localization with Contrastive Learning

Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over state-of-the-arts. Code and dataset will be available at https://github.com/multimediaFor/ViLocal.

Updated: 2024-06-25 15:15:54

标题: 使用对比学习的视频修复定位

摘要: Deep video inpainting通常被用作恶意操作，用于删除重要对象以创建虚假视频。盲目识别填充区域是非常重要的。本信函提出了一种简单而有效的视频修复定位与对比学习方案（ViLocal）。具体而言，将3D Uniformer编码器应用于视频噪声残差，学习有效的时空取证特征。为了增强区分能力，采用监督对比学习来捕获填充视频的局部不一致性，通过吸引/排斥正负原始和伪造像素对。通过专门的两阶段训练策略，通过轻量级卷积解码器生成像素级修复定位地图。为了准备足够的训练样本，我们构建了一个包含2500个视频的视频对象分割数据集，每帧都有像素级注释。广泛的实验结果验证了ViLocal相对于最新技术的优越性。代码和数据集将在https://github.com/multimediaFor/ViLocal上提供。

更新时间: 2024-06-25 15:15:54

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.17628v1

Querying Labeled Time Series Data with Scenario Programs

In order to ensure autonomous vehicles are safe for on-road deployment, simulation-based testing has become an integral complement to on-road testing. The rise in simulation testing and validation reflects a growing need to verify that AV behavior is consistent with desired outcomes even in edge case scenarios $-$ which may seldom or never appear in on-road testing data. This raises a critical question: to what extent are AV failures in simulation consistent with data collected from real-world testing? As a result of the gap between simulated and real sensor data (sim-to-real gap), failures in simulation can either be spurious (simulation- or simulator-specific issues) or relevant (safety-critical AV system issues). One possible method for validating if simulated time series failures are consistent with real world time series sensor data could involve retrieving instances of the failure scenario from a real-world time series dataset, in order to understand AV performance in these scenarios. Adopting this strategy, we propose a formal definition of what constitutes a match between a real-world labeled time series data item and a simulated scenario written from a fragment of the Scenic probabilistic programming language for simulation generation. With this definition of a match, we develop a querying algorithm that identifies the subset of a labeled time series dataset matching a given scenario. To allow this approach to be used to verify the safety of other cyber-physical systems (CPS), we present a definition and algorithm for matching scalable beyond the autonomous vehicles domain. Experiments demonstrate the precision and scalability of the algorithm for a set of challenging and uncommon time series scenarios identified from the nuScenes autonomous driving dataset. We include a full system implementation of the querying algorithm freely available for use across a wide range of CPS.

Updated: 2024-06-25 15:15:27

标题: 使用场景程序查询带标签的时间序列数据

摘要: 为了确保自动驾驶汽车在路上部署时是安全的，基于仿真的测试已成为路上测试的重要补充。仿真测试和验证的增加反映了对AV行为与期望结果一致性的需求不断增长，即使在边缘情况下，这种情况很少出现或在路上测试数据中从未出现。这带来了一个关键问题：仿真中的AV故障在多大程度上与从真实世界测试中收集的数据一致？由于仿真和真实传感器数据之间的差距（仿真到真实的差距），仿真中的故障可能是虚假的（与仿真或模拟器相关的问题）或相关的（安全关键AV系统问题）。验证仿真时间序列故障是否与真实世界时间序列传感器数据一致的一种可能方法可能涉及从真实世界时间序列数据集中检索故障场景的实例，以便了解这些情况下AV的性能。采用这种策略，我们提出了一个正式的定义，说明什么构成了一个真实世界标记的时间序列数据项与用Scenic概率编程语言片段编写的仿真场景之间的匹配。通过这个匹配的定义，我们开发了一个查询算法，识别与给定场景匹配的标记时间序列数据集的子集。为了使这种方法能够用于验证其他网络物理系统（CPS）的安全性，我们提出了一个超越自动驾驶汽车领域的可扩展匹配的定义和算法。实验展示了从nuScenes自动驾驶数据集中识别的一组具有挑战性和罕见的时间序列情景的算法的精度和可扩展性。我们提供了一个完整的系统实现，可供在各种CPS中免费使用的查询算法。

更新时间: 2024-06-25 15:15:27

领域: cs.FL,cs.LG,cs.RO,I.2.9; I.6.4; C.4; F.4; H.3.3

下载: http://arxiv.org/abs/2406.17627v1

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.

Updated: 2024-06-25 15:14:12

标题: 对彩票票假说和迭代幅值修剪的深入洞察

摘要: 深度神经网络的彩票票假设强调了重新训练使用的初始化对通过迭代幅度剪枝过程获得的更稀疏网络的重要性。对于为什么彩票票假设提出的特定初始化在泛化（和训练）性能方面往往更好的解释尚不清楚。此外，迭代幅度剪枝中的基本原则，如剪枝较小幅度的权重和迭代过程的作用，缺乏全面的理解和解释。在这项工作中，我们尝试通过对在迭代幅度剪枝过程的不同阶段获得的解决方案的体积/几何和损失景观特征进行经验研究，以提供对这些现象的洞察。

更新时间: 2024-06-25 15:14:12

领域: cs.LG

下载: http://arxiv.org/abs/2403.15022v3

CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference

As large language models (LLMs) constantly evolve, ensuring their safety remains a critical research problem. Previous red-teaming approaches for LLM safety have primarily focused on single prompt attacks or goal hijacking. To the best of our knowledge, we are the first to study LLM safety in multi-turn dialogue coreference. We created a dataset of 1,400 questions across 14 categories, each featuring multi-turn coreference safety attacks. We then conducted detailed evaluations on five widely used open-source LLMs. The results indicated that under multi-turn coreference safety attacks, the highest attack success rate was 56% with the LLaMA2-Chat-7b model, while the lowest was 13.9% with the Mistral-7B-Instruct model. These findings highlight the safety vulnerabilities in LLMs during dialogue coreference interactions.

Updated: 2024-06-25 15:13:02

标题: CoSafe：评估多轮对话指代中大型语言模型的安全性

摘要: 随着大型语言模型（LLMs）不断发展，确保它们的安全性仍然是一个关键的研究问题。先前针对LLM安全性的红队方法主要集中在单一提示攻击或目标劫持上。据我们所知，我们是第一个研究LLM在多轮对话共指安全性方面的团队。我们创建了一个包含14个类别、共1400个问题的数据集，每个问题都涉及多轮共指安全攻击。然后我们对五种广泛使用的开源LLM进行了详细评估。结果显示，在多轮共指安全攻击下，LLaMA2-Chat-7b模型的最高攻击成功率为56％，而Mistral-7B-Instruct模型的最低攻击成功率为13.9％。这些发现突显了LLM在对话共指交互中的安全漏洞。

更新时间: 2024-06-25 15:13:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17626v1

iWISDM: Assessing instruction following in multimodal models at scale

The ability to perform complex tasks from detailed instructions is a key to many remarkable achievements of our species. As humans, we are not only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs (either text or vision), narrowing the scope of multimodal assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap between these models' ability to precisely follow instructions with that of humans.The code of iWISDM is available on GitHub at https://github.com/BashivanLab/iWISDM.

Updated: 2024-06-25 15:12:01

标题: iWISDM：在大规模多模型中评估指令遵循

摘要: 能够根据详细说明执行复杂任务的能力是我们物种许多显著成就的关键。作为人类，我们不仅能够执行各种各样的任务，还能够执行非常复杂的任务，可能需要完成数百或数千个步骤。大型语言模型及其更近期的多模态对应物，可以整合文本和视觉输入，在执行复杂任务方面取得了前所未有的成功。然而，大多数现有的基准测试主要局限于单模态输入（文本或视觉），从而缩小了多模态评估的范围，尤其是在多模态环境中遵循指令的情况下。为了填补这一差距，我们引入了经过设计的指导虚拟视觉决策制定（iWISDM）环境，旨在生成各种各样的视觉语言任务，涵盖不同复杂性水平。利用iWISDM，我们编制了三个不同的基准测试，涵盖了不同复杂性水平的视觉任务遵循指导，并评估了几个新开发的多模态模型在这些基准测试中的表现。我们的研究结果将iWISDM确立为一个强大的基准测试，用于评估现有和新兴多模态模型的指令遵循能力，并突出了这些模型在精确遵循指令方面与人类之间存在很大差距。iWISDM的代码可在GitHub上找到，网址为https://github.com/BashivanLab/iWISDM。

更新时间: 2024-06-25 15:12:01

领域: cs.AI

下载: http://arxiv.org/abs/2406.14343v3

Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across different research focuses, different personality psychometrics, and different LLMs make it challenging to have a holistic overview and further pose difficulties in applying findings to real-world applications. In this paper, we present a comprehensive review by categorizing current studies into three research problems: self-assessment, exhibition, and recognition, based on the intrinsic characteristics and external manifestations of personality in LLMs. For each problem, we provide a thorough analysis and conduct in-depth comparisons of their corresponding solutions. Besides, we summarize research findings and open challenges from current studies and further discuss their underlying causes. We also collect extensive publicly available resources to facilitate interested researchers and developers. Lastly, we discuss the potential future research directions and application scenarios. Our paper is the first comprehensive survey of up-to-date literature on personality in LLMs. By presenting a clear taxonomy, in-depth analysis, promising future directions, and extensive resource collections, we aim to provide a better understanding and facilitate further advancements in this emerging field.

Updated: 2024-06-25 15:08:44

标题: 自我评估、展示和认可：大型语言模型中个性特征的综述

摘要: 随着大型语言模型（LLMs）在基于文本的交互中表现出越来越类似于人类，越来越多的研究者对LLMs中的个性进行研究感兴趣。然而，心理个性研究的多样性和LLMs的快速发展导致了这个跨学科领域中广泛但碎片化的研究格局。跨越不同研究重点、不同个性心理测量和不同LLMs的广泛研究使得全面了解变得具有挑战性，并进一步在将研究结果应用于实际应用中遇到困难。在本文中，我们通过将当前研究分类为自我评估、展示和识别三个研究问题，基于LLMs中个性的内在特征和外部表现，提出了一项全面的综述。对于每个问题，我们提供了深入的分析，并对其相应解决方案进行了深入比较。此外，我们总结了当前研究的研究发现和开放性挑战，并进一步讨论了它们的潜在原因。我们还收集了大量公开可用的资源，以便帮助感兴趣的研究人员和开发者。最后，我们讨论了潜在的未来研究方向和应用场景。我们的论文是关于LLMs中个性的最新文献的首次全面调查。通过提供清晰的分类法、深入分析、有前途的未来方向和广泛的资源收集，我们旨在提供更好的理解，并促进这一新兴领域的进一步发展。

更新时间: 2024-06-25 15:08:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17624v1

Univariate Skeleton Prediction in Multivariate Systems Using Transformers

Symbolic regression (SR) methods attempt to learn mathematical expressions that approximate the behavior of an observed system. However, when dealing with multivariate systems, they often fail to identify the functional form that explains the relationship between each variable and the system's response. To begin to address this, we propose an explainable neural SR method that generates univariate symbolic skeletons that aim to explain how each variable influences the system's response. By analyzing multiple sets of data generated artificially, where one input variable varies while others are fixed, relationships are modeled separately for each input variable. The response of such artificial data sets is estimated using a regression neural network (NN). Finally, the multiple sets of input-response pairs are processed by a pre-trained Multi-Set Transformer that solves a problem we termed Multi-Set Skeleton Prediction and outputs a univariate symbolic skeleton. Thus, such skeletons represent explanations of the function approximated by the regression NN. Experimental results demonstrate that this method learns skeleton expressions matching the underlying functions and outperforms two GP-based and two neural SR methods.

Updated: 2024-06-25 15:07:06

标题: 使用Transformer进行多元系统中的单变量骨架预测

摘要: Symbolic regression (SR)方法试图学习数学表达式，以近似观察系统的行为。然而，在处理多变量系统时，它们通常无法识别解释每个变量和系统响应之间关系的功能形式。为了开始解决这个问题，我们提出了一种可解释的神经符号回归方法，它生成旨在解释每个变量如何影响系统响应的单变量符号骨架。通过分析人工生成的多组数据，其中一个输入变量变化而其他变量保持不变，为每个输入变量单独建模关系。使用回归神经网络（NN）估计这种人工数据集的响应。最后，多组输入-响应对由一个预先训练的Multi-Set Transformer处理，解决了我们所称的Multi-Set Skeleton Prediction问题，并输出一个单变量符号骨架。因此，这些骨架代表了由回归NN近似的函数的解释。实验结果表明，这种方法学习到与基础函数匹配的骨架表达式，并且优于两种基于GP和两种神经SR方法。

更新时间: 2024-06-25 15:07:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17834v1

Fundamental Bounds on Online Strategic Classification

We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from non-strategic online classification. For instance, whereas in the non-strategic case, a mistake bound of $\ln|H|$ is achievable via the halving algorithm when the target function belongs to a known class $H$, we show that no deterministic algorithm can achieve a mistake bound $o(\Delta)$ in the strategic setting, where $\Delta$ is the maximum degree of the manipulation graph (even when $|H|=O(\Delta)$). We obtain an algorithm achieving mistake bound $O(\Delta\ln|H|)$. We also extend this to the agnostic setting and obtain an algorithm with a $\Delta$ multiplicative regret, and we show no deterministic algorithm can achieve $o(\Delta)$ multiplicative regret. Next, we study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences. In the first model, at each round the learner deterministically chooses a probability distribution over classifiers inducing expected values on each vertex (probabilities of being classified as positive), which the strategic agents respond to. We show that any learner in this model has to suffer linear regret. On the other hand, in the second model, while the adversary who selects the next agent must respond to the learner's probability distribution over classifiers, the agent then responds to the actual hypothesis classifier drawn from this distribution. Surprisingly, we show this model is more advantageous to the learner, and we design randomized algorithms that achieve sublinear regret bounds against both oblivious and adaptive adversaries.

Updated: 2024-06-25 15:06:33

标题: 在线战略分类的基本限制

摘要: 我们研究了在线二元分类问题，其中战略代理可以通过预定义的方式操纵可观察特征，以获得积极的分类。我们展示了这种设置在基本方式上与非战略在线分类有所不同。例如，在非战略情况下，当目标函数属于已知类$H$时，通过二分算法可以实现$ln|H|$的错误边界，而我们表明在战略设置中，没有确定性算法可以实现小于$\Delta$的错误边界，其中$\Delta$是操纵图的最大度数（即使$|H|=O(\Delta)$）。我们得到一个实现错误边界$O(\Delta\ln|H|)$的算法。我们还将此扩展到不可知的设置，并获得一个具有$\Delta$乘法后悔的算法，并且我们表明没有确定性算法可以实现小于$\Delta$的乘法后悔。接下来，我们研究了基于随机选择是在代理响应之前还是之后进行的两个随机模型，并展示它们存在基本差异。在第一个模型中，每一轮学习者确定性地选择一个概率分布，该分布导致每个顶点的期望值（被分类为积极的概率），战略代理做出响应。我们展示了在这个模型中，任何学习者都必须承受线性后悔。另一方面，在第二个模型中，尽管选择下一个代理的对手必须响应于学习者对分类器的概率分布，但代理随后响应于从该分布中抽取的实际假设分类器。令人惊讶的是，我们表明这个模型对于学习者更有利，并且我们设计了随机算法，可以实现针对无意识和自适应对手的次线性后悔边界。

更新时间: 2024-06-25 15:06:33

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2302.12355v2

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG

How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to complete training $n$-grams and (ii) $n$-novelty, the proportion of $n$-grams generated by an LM that did not appear in the training data (for arbitrarily large $n$). To enable arbitrary-length $n$-gram search over a corpus in constant time, we develop Rusty-DAWG, a novel search tool inspired by indexing of genomic data. We compare the novelty of LM-generated text to human-written text and explore factors that affect generation novelty, focusing on the Pythia models. We find that, for $n > 4$, LM-generated text is less novel than human-written text, though it is more novel for smaller $n$. Larger LMs and more constrained decoding strategies both decrease novelty. Finally, we show that LMs complete $n$-grams with lower loss if they are more frequent in the training data. Overall, our results reveal factors influencing the novelty of LM-generated text, and we release Rusty-DAWG to facilitate further pretraining data research.

Updated: 2024-06-25 15:02:40

标题: 使用 Rusty-DAWG 评估语言模型的 n-Gram 新颖性

摘要: 语言模型（LMs）生成的文本相对于它们的训练语料库有多新颖？在这项工作中，我们研究现代LMs生成训练数据中的$n$-gram的程度，评估LMs分配给完整训练$n$-gram的概率以及$n$-novelty，即LM生成的$n$-gram中在训练数据中未出现的比例（对于任意大的$n）。为了实现在常数时间内对语料库进行任意长度的$n$-gram搜索，我们开发了Rusty-DAWG，这是一种受到基因组数据索引启发的新型搜索工具。我们将LM生成的文本的新颖性与人类编写的文本进行比较，并探讨影响生成新颖性的因素，重点放在Pythia模型上。我们发现，对于$n > 4$，LM生成的文本比人类编写的文本更缺乏新颖性，尽管对于较小的$n$，它更有新意。更大的LMs和更受限制的解码策略都会降低新颖性。最后，我们展示了如果在训练数据中更频繁出现，LMs会以更低的损失完成$n$-gram。总的来说，我们的结果揭示了影响LM生成文本新颖性的因素，并发布了Rusty-DAWG以促进进一步的预训练数据研究。

更新时间: 2024-06-25 15:02:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13069v2

Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug Localization

Bug localization refers to the identification of source code files which is in a programming language and also responsible for the unexpected behavior of software using the bug report, which is a natural language. As bug localization is labor-intensive, bug localization models are employed to assist software developers. Due to the domain difference between source code files and bug reports, modern bug-localization systems, based on deep learning models, rely heavily on embedding techniques that project bug reports and source code files into a shared vector space. The creation of an embedding involves several design choices, but the impact of these choices on the quality of embedding and the performance of bug localization models remains unexplained in current research. To address this gap, our study evaluated 14 distinct embedding models to gain insights into the effects of various design choices. Subsequently, we developed bug localization models utilizing these embedding models to assess the influence of these choices on the performance of the localization models. Our findings indicate that the pre-training strategies significantly affect the quality of the embedding. Moreover, we discovered that the familiarity of the embedding models with the data has a notable impact on the bug localization model's performance. Notably, when the training and testing data are collected from different projects, the performance of the bug localization models exhibits substantial fluctuations.

Updated: 2024-06-25 15:01:39

标题: 调整编程语言和自然语言：探索基于多模态Transformer的嵌入式设计选择在错误定位中的应用

摘要: Bug localization是指通过bug报告识别导致软件出现意外行为的源代码文件，这些源代码文件使用编程语言编写。由于bug localization需要耗费大量人力，因此采用bug localization模型来辅助软件开发人员。由于源代码文件和bug报告之间的领域差异，基于深度学习模型的现代bug定位系统主要依赖于将bug报告和源代码文件投射到共享向量空间的嵌入技术。嵌入的创建涉及几个设计选择，但这些选择对嵌入的质量和bug定位模型的性能影响在当前研究中尚未得到解释。为了填补这一空白，我们的研究评估了14种不同的嵌入模型，以深入了解各种设计选择的影响。随后，我们利用这些嵌入模型开发bug定位模型，以评估这些选择对定位模型性能的影响。我们的研究结果表明，预训练策略显著影响嵌入的质量。此外，我们发现嵌入模型对数据的熟悉程度对bug定位模型的性能有显著影响。值得注意的是，当训练和测试数据来自不同项目时，bug定位模型的性能会出现显著波动。

更新时间: 2024-06-25 15:01:39

领域: cs.SE,cs.AI,cs.LG,D.2; I.2

下载: http://arxiv.org/abs/2406.17615v1

Distributed Training of Large Graph Neural Networks with Variable Communication Rates

Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. However, as the graph cannot generally be decomposed into small non-interacting components, data communication between the training machines quickly limits training speeds. Compressing the communicated node activations by a fixed amount improves the training speeds, but lowers the accuracy of the trained GNN. In this paper, we introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model. Based on our theoretical analysis, we derive a variable compression method that converges to a solution equivalent to the full communication case, for all graph partitioning schemes. Our empirical results show that our method attains a comparable performance to the one obtained with full communication. We outperform full communication at any fixed compression ratio for any communication budget.

Updated: 2024-06-25 14:57:38

标题: 大规模图神经网络的分布式训练与可变通信速率

摘要: 在大图上训练图神经网络（GNNs）面临着独特的挑战，因为需要大量的内存和计算资源。分布式GNN训练，其中图被分割到多台机器上，是在大图上训练GNNs的常见方法。然而，由于图通常不能被分解为小的非交互组件，训练机器之间的数据通信很快限制了训练速度。通过固定量压缩传输的节点激活值可以提高训练速度，但会降低训练后的GNN的准确性。在本文中，我们提出了一种可变压缩方案，用于减少分布式GNN训练中的通信量，而不影响学习模型的准确性。基于我们的理论分析，我们推导出一种可变压缩方法，对于所有图分区方案，都能收敛到与完全通信情况等效的解决方案。我们的实证结果表明，我们的方法在任何通信预算下的固定压缩比例下都能达到与完全通信获得的性能相当的表现。我们在任何通信预算下的固定压缩比例下都能胜过完全通信。

更新时间: 2024-06-25 14:57:38

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.17611v1

Diffusion-based Adversarial Purification for Intrusion Detection

The escalating sophistication of cyberattacks has encouraged the integration of machine learning techniques in intrusion detection systems, but the rise of adversarial examples presents a significant challenge. These crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts. As a reaction, adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results. However, their purification potential remains unexplored in the context of intrusion detection. This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection. Through a comprehensive analysis of the diffusion parameters, we identify optimal configurations maximizing adversarial robustness with minimal impact on normal performance. Importantly, this study reveals insights into the relationship between diffusion noise and diffusion steps, representing a novel contribution to the field. Our experiments are carried out on two datasets and against 5 adversarial attacks. The implementation code is publicly available.

Updated: 2024-06-25 14:48:28

标题: 扩散基对抗纯化用于入侵检测

摘要: 网络攻击日益复杂，促使入侵检测系统中集成机器学习技术，但对抗性示例的出现构成了重大挑战。这些精心制作的扰动误导了机器学习模型，使攻击者能够规避检测或触发虚假警报。作为一种反应，对抗性净化已成为一种引人注目的解决方案，特别是扩散模型显示出有希望的结果。然而，在入侵检测的背景下，它们的净化潜力尚未被探索。本文展示了扩散模型在网络入侵检测中净化对抗性示例的有效性。通过对扩散参数进行全面分析，我们确定了最大化对抗性鲁棒性并对正常性能影响最小的最佳配置。重要的是，这项研究揭示了扩散噪声和扩散步骤之间的关系，代表了对该领域的新贡献。我们的实验在两个数据集上进行，并针对5种对抗性攻击。实现代码公开可用。

更新时间: 2024-06-25 14:48:28

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17606v1

Constructing structured tensor priors for Bayesian inverse problems

Specifying a prior distribution is an essential part of solving Bayesian inverse problems. The prior encodes a belief on the nature of the solution and this regularizes the problem. In this article we completely characterize a Gaussian prior that encodes the belief that the solution is a structured tensor. We first define the notion of (A,b)-constrained tensors and show that they describe a large variety of different structures such as Hankel, circulant, triangular, symmetric, and so on. Then we completely characterize the Gaussian probability distribution of such tensors by specifying its mean vector and covariance matrix. Furthermore, explicit expressions are proved for the covariance matrix of tensors whose entries are invariant under a permutation. These results unlock a whole new class of priors for Bayesian inverse problems. We illustrate how new kernel functions can be designed and efficiently computed and apply our results on two particular Bayesian inverse problems: completing a Hankel matrix from a few noisy measurements and learning an image classifier of handwritten digits. The effectiveness of the proposed priors is demonstrated for both problems. All applications have been implemented as reactive Pluto notebooks in Julia.

Updated: 2024-06-25 14:40:34

标题: 构建结构化张量先验用于贝叶斯逆问题

摘要: 指定先验分布是解决贝叶斯反问题的重要部分。先验编码了对解决方案性质的信念，从而正则化问题。在本文中，我们完全表征了一个高斯先验，该先验编码了解决方案为结构张量的信念。我们首先定义了（A，b）受限张量的概念，并展示它们描述了许多不同结构，如汉克尔、循环、三角形、对称等等。然后我们通过指定其均值向量和协方差矩阵完全表征了这些张量的高斯概率分布。此外，对于在排列下不变的张量的协方差矩阵明确表达了。这些结果解锁了贝叶斯反问题的一整套新先验。我们展示了如何设计新的核函数并高效计算，并将结果应用于两个特定的贝叶斯反问题：从少数嘈杂的测量中完成一个汉克尔矩阵和学习手写数字的图像分类器。所提出的先验的有效性在这两个问题中都得到了证明。所有应用都已经在Julia中实现为反应性的Pluto笔记本。

更新时间: 2024-06-25 14:40:34

领域: math.NA,cs.LG,cs.NA,cs.SY,eess.SP,eess.SY,math.ST,stat.TH,15A29, 15A69, 62F15

下载: http://arxiv.org/abs/2406.17597v1

MgNO: Efficient Parameterization of Linear Operators via Multigrid

In this work, we propose a concise neural operator architecture for operator learning. Drawing an analogy with a conventional fully connected neural network, we define the neural operator as follows: the output of the $i$-th neuron in a nonlinear operator layer is defined by $\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$. Here, $\mathcal W_{ij}$ denotes the bounded linear operator connecting $j$-th input neuron to $i$-th output neuron, and the bias $\mathcal B_{ij}$ takes the form of a function rather than a scalar. Given its new universal approximation property, the efficient parameterization of the bounded linear operators between two neurons (Banach spaces) plays a critical role. As a result, we introduce MgNO, utilizing multigrid structures to parameterize these linear operators between neurons. This approach offers both mathematical rigor and practical expressivity. Additionally, MgNO obviates the need for conventional lifting and projecting operators typically required in previous neural operators. Moreover, it seamlessly accommodates diverse boundary conditions. Our empirical observations reveal that MgNO exhibits superior ease of training compared to other CNN-based models, while also displaying a reduced susceptibility to overfitting when contrasted with spectral-type neural operators. We demonstrate the efficiency and accuracy of our method with consistently state-of-the-art performance on different types of partial differential equations (PDEs).

Updated: 2024-06-25 14:39:52

标题: MgNO：通过多重网格实现线性算子的高效参数化

摘要: 在这项工作中，我们提出了一个简洁的神经算子架构用于运算符学习。通过将其与传统的全连接神经网络进行类比，我们将神经算子定义为：非线性算子层中第$i$个神经元的输出由$\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$定义。这里，$\mathcal W_{ij}$表示连接第$j$个输入神经元和第$i$个输出神经元的有界线性算子，偏置$\mathcal B_{ij}$采用函数形式而非标量形式。由于其新的通用逼近性质，有界线性算子在两个神经元之间的高效参数化起着至关重要的作用。因此，我们引入了MgNO，利用多重网格结构来参数化这些神经元之间的线性算子。这种方法既具有数学严谨性又具有实用的表达能力。另外，MgNO消除了以往神经算子中通常需要的传统抬升和投影算子。此外，它无缝地适应各种边界条件。我们的实证观察表明，与基于CNN的其他模型相比，MgNO在训练过程中表现出更高的易用性，并且与谱型神经算子相比，对过拟合的敏感性降低。我们通过在不同类型的偏微分方程（PDEs）上始终表现出最先进性能来展示我们方法的效率和准确性。

更新时间: 2024-06-25 14:39:52

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2310.19809v2

Learning with Noisy Labels through Learnable Weighting and Centroid Similarity

We introduce a novel method for training machine learning models in the presence of noisy labels, which are prevalent in domains such as medical diagnosis and autonomous driving and have the potential to degrade a model's generalization performance. Inspired by established literature that highlights how deep learning models are prone to overfitting to noisy samples in the later epochs of training, we propose a strategic approach. This strategy leverages the distance to class centroids in the latent space and incorporates a discounting mechanism, aiming to diminish the influence of samples that lie distant from all class centroids. By doing so, we effectively counteract the adverse effects of noisy labels. The foundational premise of our approach is the assumption that samples situated further from their respective class centroid in the initial stages of training are more likely to be associated with noise. Our methodology is grounded in robust theoretical principles and has been validated empirically through extensive experiments on several benchmark datasets. Our results show that our method consistently outperforms the existing state-of-the-art techniques, achieving significant improvements in classification accuracy in the presence of noisy labels. The code for our proposed loss function and supplementary materials is available at https://github.com/wanifarooq/NCOD

Updated: 2024-06-25 14:36:33

标题: 学习嘈杂标签的方法：通过可学习的加权和质心相似性

摘要: 我们介绍了一种新颖的方法，用于在存在嘈杂标签的情况下训练机器学习模型，这种情况在医学诊断和自动驾驶等领域中很常见，并有可能降低模型的泛化性能。受到已有文献的启发，该文献强调深度学习模型在训练后期容易过拟合嘈杂样本，我们提出了一种策略性方法。这种策略利用潜在空间中与类别质心的距离，并结合了一种折扣机制，旨在减小远离所有类别质心的样本的影响。通过这样做，我们有效地抵消了嘈杂标签的不利影响。我们方法的基本假设是，在训练的初始阶段，距离各自类别质心更远的样本更有可能与噪声相关。我们的方法基于稳健的理论原则，并通过对几个基准数据集进行广泛实验验证了其有效性。我们的结果表明，我们的方法在存在嘈杂标签的情况下始终优于现有的最先进技术，显著提高了分类准确性。我们提出的损失函数代码和补充材料可在https://github.com/wanifarooq/NCOD 上找到。

更新时间: 2024-06-25 14:36:33

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2303.09470v2

Empirical Bayes for Dynamic Bayesian Networks Using Generalized Variational Inference

In this work, we demonstrate the Empirical Bayes approach to learning a Dynamic Bayesian Network. By starting with several point estimates of structure and weights, we can use a data-driven prior to subsequently obtain a model to quantify uncertainty. This approach uses a recent development of Generalized Variational Inference, and indicates the potential of sampling the uncertainty of a mixture of DAG structures as well as a parameter posterior.

Updated: 2024-06-25 14:34:51

标题: 基于广义变分推断的动态贝叶斯网络的经验贝叶斯

摘要: 在这项工作中，我们展示了通过经验贝叶斯方法学习动态贝叶斯网络。通过从几个结构和权重的点估计开始，我们可以使用数据驱动的先验来随后获得一个模型来量化不确定性。这种方法使用了广义变分推断的最新发展，并表明了对DAG结构混合不确定性以及参数后验的抽样潜力。

更新时间: 2024-06-25 14:34:51

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17831v1

Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons

In this paper, we present a guide to the foundations of learning Dynamic Bayesian Networks (DBNs) from data in the form of multiple samples of trajectories for some length of time. We present the formalism for a generic as well as a set of common types of DBNs for particular variable distributions. We present the analytical form of the models, with a comprehensive discussion on the interdependence between structure and weights in a DBN model and their implications for learning. Next, we give a broad overview of learning methods and describe and categorize them based on the most important statistical features, and how they treat the interplay between learning structure and weights. We give the analytical form of the likelihood and Bayesian score functions, emphasizing the distinction from the static case. We discuss functions used in optimization to enforce structural requirements. We briefly discuss more complex extensions and representations. Finally we present a set of comparisons in different settings for various distinct but representative algorithms across the variants.

Updated: 2024-06-25 14:28:17

标题: Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons 从数据中学习动态贝叶斯网络：基础、第一原则和数字比较

摘要: 在这篇论文中，我们介绍了从数据中学习动态贝叶斯网络（DBNs）的基础指南，这些数据以一定时间长度的多个轨迹的形式呈现。我们提出了一般性的形式以及一组常见类型的DBNs，适用于特定变量分布。我们呈现了模型的分析形式，并全面讨论了DBN模型中结构和权重之间的相互依赖及其对学习的影响。接下来，我们对学习方法进行了广泛概述，并根据最重要的统计特征对其进行描述和分类，以及它们如何处理学习结构和权重之间的相互作用。我们给出了似然函数和贝叶斯评分函数的分析形式，强调了与静态案例的区别。我们讨论了用于强制执行结构要求的优化中使用的函数。我们简要讨论了更复杂的扩展和表示。最后，我们在不同设置中针对各种不同但具有代表性的算法进行了一系列比较。

更新时间: 2024-06-25 14:28:17

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17585v1

Towards Compositional Interpretability for XAI

Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

Updated: 2024-06-25 14:27:03

标题: 朝向XAI的可组合可解释性

摘要: 人工智能（AI）目前主要基于缺乏可解释性的黑盒机器学习模型。可解释人工智能（XAI）领域致力于解决这一主要关注点，在金融、法律和健康等高风险领域至关重要。我们提出了一种基于范畴论定义AI模型及其可解释性的方法。为此，我们采用组合模型的概念，将模型视为捕捉其抽象结构及具体实现的形式字符串图。这种全面的视角涵盖了确定性、概率和量子模型。我们比较了广泛的AI模型作为组合模型，包括线性和基于规则的模型、(循环)神经网络、变换器、VAEs以及因果和DisCoCirc模型。接下来，我们给出了模型解释的定义，以其组合结构为基础，演示如何分析模型的可解释性，并利用此来阐明XAI中的共同主题。我们发现，使得标准的“固有可解释性”模型如此透明的是在图中最清晰地展现出来。这引导我们到更一般的组合可解释（CI）模型的概念，其中包括因果、概念空间和DisCoCirc模型等。接下来，我们展示了CI模型的可解释性优势。首先，它们的组合结构可能允许计算其他感兴趣的量，并通过匹配其结构促进从模型到被建模现象的推理。其次，它们允许基于影响约束、图解手术和重写解释对其行为进行图解解释。最后，我们讨论了该方法的许多未来方向，提出了如何在实践中学习这种有意义结构的模型的问题。

更新时间: 2024-06-25 14:27:03

领域: cs.AI,cs.LG,cs.LO,math.CT

下载: http://arxiv.org/abs/2406.17583v1

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations

Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks. By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly, significantly streamlining traditional, manual penetration testing processes. The attack pathways revealed by the RL agent can provide valuable insights to the defense team, helping them identify network weak points and develop more resilient defensive measures. Experimental results on a 152-host example network confirm the effectiveness of the proposed approach, demonstrating the RL agent's capability to discover and orchestrate attacks on high-value targets while evading honeyfiles (decoy files strategically placed to detect unauthorized access).

Updated: 2024-06-25 14:16:40

标题: 利用强化学习技术进行高级勒索软件攻击模拟的红队技术

摘要: 勒索软件对个人和组织构成重大且不断增加的威胁，通过加密其系统并不释放直到收取高额费用。为了加强对潜在攻击的准备，组织通常进行红队演练，这涉及模拟攻击以评估现有的安全措施。本文提出了一种利用强化学习（RL）模拟勒索软件攻击的新方法。通过在模拟环境中训练RL代理程序，模拟现实世界网络，可以快速学习有效的攻击策略，显著简化传统的手动渗透测试过程。RL代理程序揭示的攻击路径可以为防御团队提供有价值的见解，帮助他们识别网络的弱点并制定更具弹性的防御措施。在一个包含152个主机的示例网络上的实验结果证实了所提出方法的有效性，展示了RL代理程序在发现和组织对高价值目标的攻击时的能力，同时避开蜜罐文件（策略性放置以检测未经授权访问的诱饵文件）。

更新时间: 2024-06-25 14:16:40

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17576v1

Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness

Randomized smoothing is a popular certified defense against adversarial attacks. In its essence, we need to solve a problem of statistical estimation which is usually very time-consuming since we need to perform numerous (usually $10^5$) forward passes of the classifier for every point to be certified. In this paper, we review the statistical estimation problems for randomized smoothing to find out if the computational burden is necessary. In particular, we consider the (standard) task of adversarial robustness where we need to decide if a point is robust at a certain radius or not using as few samples as possible while maintaining statistical guarantees. We present estimation procedures employing confidence sequences enjoying the same statistical guarantees as the standard methods, with the optimal sample complexities for the estimation task and empirically demonstrate their good performance. Additionally, we provide a randomized version of Clopper-Pearson confidence intervals resulting in strictly stronger certificates.

Updated: 2024-06-25 14:00:55

标题: 随机平滑在对抗鲁棒性中的统计估计问题处理

摘要: 随机平滑是一种常用的对抗攻击认证防御方法。本质上，我们需要解决一个通常非常耗时的统计估计问题，因为我们需要为每个要认证的点执行大量（通常为$10^5$）分类器的前向传递。在本文中，我们回顾了随机平滑的统计估计问题，以确定计算负担是否是必要的。特别地，我们考虑对抗鲁棒性的（标准）任务，在这个任务中，我们需要决定一个点在特定半径内是否具有鲁棒性，同时尽可能少地使用样本来保持统计保证。我们提出了采用置信序列的估计程序，这些程序享有与标准方法相同的统计保证，对于估计任务具有最佳的样本复杂度，并在实验中展示了它们的良好性能。此外，我们提供了克洛珀-皮尔逊置信区间的随机版本，结果证明具有更加严格的认证。

更新时间: 2024-06-25 14:00:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.17830v1

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Updated: 2024-06-25 14:00:42

标题: 大型语言模型的多属性导向与动态激活组合

摘要: 激活引导方法已被证明在调节语言模型生成中是有效的，通过对模型的中间表示进行累加干预。然而，迄今为止这些技术的评估仅限于单一调节属性和合成环境。在这项工作中，我们对各种激活引导策略进行了全面评估，突出了最佳参数的属性依赖性，以确保在整个生成过程中具有稳健的效果。为解决这一问题，我们提出了动态激活组合，这是一种信息论方法，可以调节一个或多个属性在生成过程中的引导强度。我们在多属性引导上的实验表明，我们的方法成功地保持了高度的调节能力，同时最小化了调节对生成流畅性的影响。

更新时间: 2024-06-25 14:00:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17563v1

Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs

Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.

Updated: 2024-06-25 13:53:57

标题: 广义图提示：朝向在图上预训练和下游任务的统一

摘要: 图神经网络已成为图表示学习的强大工具，但其性能严重依赖于大量的任务特定监督。为了减少标记要求，“预训练，提示”范式变得越来越常见。然而，现有对图上提示的研究有限，缺乏一种通用处理方式适用于不同的下游任务。在本文中，我们提出了一种新颖的图上预训练和提示框架GraphPrompt。GraphPrompt不仅将预训练和下游任务统一到一个通用任务模板中，还采用可学习的提示来帮助下游任务以特定于任务的方式定位预训练模型中最相关的知识。为了进一步增强GraphPrompt在这两个阶段的性能，我们将其扩展为GraphPrompt+，包括两个主要的增强措施。首先，我们将几种流行的图预训练任务进行了泛化，超越了简单的链接预测，以扩大与我们任务模板的兼容性。其次，我们提出了一种更通用的提示设计，该设计在预训练图编码器的每一层中包含一系列提示向量，以利用跨不同层次的层次信息，而不仅仅是读出层。最后，我们在五个公共数据集上进行了大量实验，评估和分析了GraphPrompt和GraphPrompt+。

更新时间: 2024-06-25 13:53:57

领域: cs.LG

下载: http://arxiv.org/abs/2311.15317v3

Extreme Learning Machines for Fast Training of Click-Through Rate Prediction Models

Extreme Learning Machines (ELM) provide a fast alternative to traditional gradient-based learning in neural networks, offering rapid training and robust generalization capabilities. Its theoretical basis shows its universal approximation capability. We explore the application of ELMs for the task of Click-Through Rate (CTR) prediction, which is largely unexplored by ELMs due to the high dimensionality of the problem. We introduce an ELM-based model enhanced with embedding layers to improve the performance on CTR tasks, which is a novel addition to the field. Experimental results on benchmark datasets, including Avazu and Criteo, demonstrate that our proposed ELM with embeddings achieves competitive F1 results while significantly reducing training time compared to state-of-the-art models such as Masknet. Our findings show that ELMs can be useful for CTR prediction, especially when fast training is needed.

Updated: 2024-06-25 13:50:00

标题: 极限学习机用于快速训练点击率预测模型

摘要: Extreme Learning Machines（ELM）提供了传统基于梯度的神经网络学习的快速替代方案，具有快速训练和强大的泛化能力。其理论基础显示了它的通用逼近能力。我们探讨了ELM在点击率（CTR）预测任务中的应用，这在很大程度上是由于问题的高维度而未被ELM广泛探索。我们引入了一个基于ELM的模型，增加了嵌入层以提高CTR任务的性能，这是该领域的一个新颖补充。在包括Avazu和Criteo在内的基准数据集上的实验结果表明，我们提出的具有嵌入的ELM在竞争F1结果的同时，与Masknet等最先进模型相比显著减少了训练时间。我们的发现表明，ELM在CTR预测中可能是有用的，特别是在需要快速训练时。

更新时间: 2024-06-25 13:50:00

领域: cs.LG,cs.AI,I.2; I.5.1

下载: http://arxiv.org/abs/2406.17828v1

Modularity Based Community Detection in Hypergraphs

In this paper, we propose a scalable community detection algorithm using hypergraph modularity function, h-Louvain. It is an adaptation of the classical Louvain algorithm in the context of hypergraphs. We observe that a direct application of the Louvain algorithm to optimize the hypergraph modularity function often fails to find meaningful communities. We propose a solution to this issue by adjusting the initial stage of the algorithm via carefully and dynamically tuned linear combination of the graph modularity function of the corresponding two-section graph and the desired hypergraph modularity function. The process is guided by Bayesian optimization of the hyper-parameters of the proposed procedure. Various experiments on synthetic as well as real-world networks are performed showing that this process yields improved results in various regimes.

Updated: 2024-06-25 13:49:56

标题: 超图中基于模块化的社区检测

摘要: 在本文中，我们提出了一种使用超图模块化函数h-Louvain的可扩展社区检测算法。它是在超图背景下对经典Louvain算法的改编。我们观察到，直接应用Louvain算法来优化超图模块化函数往往无法找到有意义的社区。我们通过通过仔细和动态调整线性组合相应两节图的图模块化函数和期望的超图模块化函数的初始阶段来解决这个问题。该过程受到所提出程序的超参数的贝叶斯优化指导。对合成和真实网络进行了各种实验，结果表明这一过程在各种情况下都可以取得改进的结果。

更新时间: 2024-06-25 13:49:56

领域: cs.SI,cs.LG,I.6.5; G.4

下载: http://arxiv.org/abs/2406.17556v1

A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

Transformer has been adopted to a wide range of tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. To address these issues, a hybrid approach has become a recent research trend, which replaces a part of ResNet with an MHSA (Multi-Head Self-Attention). In this paper, we propose a lightweight hybrid model which uses Neural ODE (Ordinary Differential Equation) as a backbone instead of ResNet for 12.1$\times$ parameter reduction. For the STL10 dataset, the proposed model achieves 80.15% top-1 accuracy which is comparable to ResNet50. Then, the proposed model is deployed on a modest-sized FPGA device for edge computing. To further reduce FPGA resource utilization, the model is quantized following QAT (Quantization Aware Training) scheme instead of PTQ (Post Training Quantization) to suppress the accuracy loss. As a result, an extremely lightweight Transformer-based model can be implemented on resource-limited FPGAs. The weights of the feature extraction network are stored on-chip to minimize the memory transfer overhead, allowing faster inference. By eliminating the overhead of memory transfers, inference can be executed seamlessly, leading to accelerated inference. The proposed FPGA implementation achieves a 34.01$\times$ speedup for the backbone and MHSA parts, and it achieves an overall 9.85$\times$ speedup when taking into account software pre- and post-processing. It also achieves an overall 7.10$\times$ higher energy efficiency compared to the ARM Cortex-A53 CPU.

Updated: 2024-06-25 13:49:31

标题: 一种成本效益高的FPGA实现，使用神经ODE实现微型Transformer模型

摘要: Transformer已经被广泛应用于各种任务，并且表现出优于CNN和RNN的性能，但其训练成本和计算复杂度较高。为了解决这些问题，混合方法已成为近期的研究趋势，该方法用MHSA（Multi-Head Self-Attention）替换了ResNet的一部分。本文提出了一个轻量级的混合模型，使用神经ODE（Ordinary Differential Equation）作为骨干，而不是ResNet，可减少参数量达到12.1倍。在STL10数据集上，提出的模型实现了80.15%的top-1准确率，与ResNet50可媲美。然后，将提出的模型部署在一台规模适中的FPGA设备上进行边缘计算。为了进一步减少FPGA资源利用率，该模型采用QAT（Quantization Aware Training）方案进行量化，而不是PTQ（Post Training Quantization）以抑制准确率损失。因此，可以在资源受限的FPGA上实现一个极其轻量级的基于Transformer的模型。特征提取网络的权重存储在芯片上，以减少内存传输开销，实现更快的推理。通过消除内存传输开销，推理可以无缝执行，加速推理过程。提出的FPGA实现在骨干和MHSA部分实现了34.01倍的加速，并在考虑软件预处理和后处理时整体实现了9.85倍的加速。与ARM Cortex-A53 CPU相比，它还实现了整体7.10倍更高的能效。

更新时间: 2024-06-25 13:49:31

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2401.02721v2

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

Question Answering (QA) datasets have been instrumental in developing and evaluating Large Language Model (LLM) capabilities. However, such datasets are scarce for languages other than English due to the cost and difficulties of collection and manual annotation. This means that producing novel models and measuring the performance of multilingual LLMs in low-resource languages is challenging. To mitigate this, we propose $\textbf{S}$yn$\textbf{DAR}$in, a method for generating and validating QA datasets for low-resource languages. We utilize parallel content mining to obtain $\textit{human-curated}$ paragraphs between English and the target language. We use the English data as context to $\textit{generate}$ synthetic multiple-choice (MC) question-answer pairs, which are automatically translated and further validated for quality. Combining these with their designated non-English $\textit{human-curated}$ paragraphs form the final QA dataset. The method allows to maintain the content quality, reduces the likelihood of factual errors, and circumvents the need for costly annotation. To test the method, we created a QA dataset with $1.2$K samples for the Armenian language. The human evaluation shows that $98\%$ of the generated English data maintains quality and diversity in the question types and topics, while the translation validation pipeline can filter out $\sim70\%$ of data with poor quality. We use the dataset to benchmark state-of-the-art LLMs, showing their inability to achieve human accuracy with some model performances closer to random chance. This shows that the generated dataset is non-trivial and can be used to evaluate reasoning capabilities in low-resource language.

Updated: 2024-06-25 13:48:41

标题: SynDARin：在低资源语言中合成数据集以进行自动推理

摘要: 问题回答（QA）数据集对于开发和评估大型语言模型（LLM）的能力至关重要。然而，除了英语之外，其他语言的这种数据集稀缺，原因是收集和手动注释的成本和困难。这意味着在低资源语言中生成新模型并衡量多语言LLM的性能是具有挑战性的。为了缓解这一问题，我们提出了一种名为$\textbf{S}$yn$\textbf{DAR}$in的方法，用于生成和验证低资源语言的QA数据集。我们利用平行内容挖掘获取英语和目标语言之间的$\textit{人工策划}$段落。我们使用英语数据作为上下文来$\textit{生成}$合成的多项选择（MC）问题-答案对，这些对自动翻译并进一步验证质量。将这些与它们指定的非英语$\textit{人工策划}$段落结合起来形成最终的QA数据集。该方法允许保持内容质量，减少事实错误的可能性，并避免昂贵的注释需求。为了测试该方法，我们为亚美尼亚语创建了一个包含$1.2$K个样本的QA数据集。人工评估显示，生成的英语数据中有$98\%$保持了问题类型和主题的质量和多样性，而翻译验证管道可以过滤掉大约$70\%$质量较差的数据。我们使用该数据集对最先进的LLM进行基准测试，显示它们无法达到人类准确度，一些模型的表现更接近随机。这表明生成的数据集是非平凡的，可以用于评估低资源语言的推理能力。

更新时间: 2024-06-25 13:48:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14425v2

Instance-level quantitative saliency in multiple sclerosis lesion segmentation

In recent years, explainable methods for artificial intelligence (XAI) have tried to reveal and describe models' decision mechanisms in the case of classification tasks. However, XAI for semantic segmentation and in particular for single instances has been little studied to date. Understanding the process underlying automatic segmentation of single instances is crucial to reveal what information was used to detect and segment a given object of interest. In this study, we proposed two instance-level explanation maps for semantic segmentation based on SmoothGrad and Grad-CAM++ methods. Then, we investigated their relevance for the detection and segmentation of white matter lesions (WML), a magnetic resonance imaging (MRI) biomarker in multiple sclerosis (MS). 687 patients diagnosed with MS for a total of 4043 FLAIR and MPRAGE MRI scans were collected at the University Hospital of Basel, Switzerland. Data were randomly split into training, validation and test sets to train a 3D U-Net for MS lesion segmentation. We observed 3050 true positive (TP), 1818 false positive (FP), and 789 false negative (FN) cases. We generated instance-level explanation maps for semantic segmentation, by developing two XAI methods based on SmoothGrad and Grad-CAM++. We investigated: 1) the distribution of gradients in saliency maps with respect to both input MRI sequences; 2) the model's response in the case of synthetic lesions; 3) the amount of perilesional tissue needed by the model to segment a lesion. Saliency maps (based on SmoothGrad) in FLAIR showed positive values inside a lesion and negative in its neighborhood. Peak values of saliency maps generated for these four groups of volumes presented distributions that differ significantly from one another, suggesting a quantitative nature of the proposed saliency. Contextual information of 7mm around the lesion border was required for their segmentation.

Updated: 2024-06-25 13:47:06

标题: 多发性硬化症病变分割中的实例级定量显著性

摘要: 近年来，解释人工智能（XAI）的方法试图揭示和描述在分类任务中模型的决策机制。然而，迄今为止，对于语义分割，特别是单个实例的XAI研究较少。理解单个实例自动分割的过程对于揭示检测和分割给定感兴趣对象所使用的信息至关重要。在这项研究中，我们提出了基于SmoothGrad和Grad-CAM++方法的两种语义分割的实例级解释图。然后，我们调查了它们在检测和分割白质病变（WML）方面的相关性，这是多发性硬化（MS）磁共振成像（MRI）生物标志物。在瑞士巴塞尔大学医院收集了687名被诊断为MS的患者的4043张FLAIR和MPRAGE MRI扫描图像。数据被随机分为训练、验证和测试集，用于训练3D U-Net进行MS病变分割。我们观察到3050个真阳性（TP），1818个假阳性（FP）和789个假阴性（FN）病例。我们通过基于SmoothGrad和Grad-CAM++方法开发的两种XAI方法生成了语义分割的实例级解释图。我们研究了：1）在显著性图中梯度的分布与输入MRI序列的关系；2）在合成病变情况下模型的响应；3）模型分割病变所需的病变周围组织的数量。在FLAIR中，显著性图（基于SmoothGrad）显示出病变内部的正值，病变周围的负值。为了分割病变，需要病变边界周围7mm的上下文信息。

更新时间: 2024-06-25 13:47:06

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.09335v2

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ticket Adaptation (LoTA), a sparse adaptation method that identifies and optimizes only a sparse subnetwork of the model. We evaluate LoTA on a wide range of challenging tasks such as instruction following, reasoning, math, and summarization. LoTA obtains better performance than full fine-tuning and low-rank adaptation (LoRA), and maintains good performance even after training on other tasks -- thus, avoiding catastrophic forgetting. By extracting and fine-tuning over lottery tickets (or sparse task vectors), LoTA also enables model merging over highly dissimilar tasks. Our code is made publicly available at https://github.com/kiddyboots216/lottery-ticket-adaptation.

Updated: 2024-06-25 13:46:41

标题: 彩票票适应性：减轻LLM中的破坏性干扰

摘要: 现有的大型语言模型（LLMs）适应新任务的方法并不适用于多任务适应，因为它们修改了所有模型权重--导致任务之间的破坏性干扰。由此产生的效果，如早期任务的灾难性遗忘，使得同时在多个任务上获得良好性能具有挑战性。为了缓解这一问题，我们提出了一种稀疏适应方法，名为Lottery Ticket Adaptation（LoTA），该方法仅识别和优化模型的稀疏子网络。我们在诸如遵循指示、推理、数学和摘要等各种具有挑战性的任务上评估了LoTA。LoTA的性能优于完全微调和低秩适应（LoRA），即使在训练其他任务后，仍能保持良好性能--从而避免灾难性遗忘。通过提取和对罗奖票（或稀疏任务向量）进行微调，LoTA还能实现高度不同任务的模型合并。我们的代码已公开提供，网址为https://github.com/kiddyboots216/lottery-ticket-adaptation。

更新时间: 2024-06-25 13:46:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16797v2

Overcoming the Paradox of Certified Training with Gaussian Smoothing

Training neural networks with high certified accuracy against adversarial examples remains an open problem despite significant efforts. While certification methods can effectively leverage tight convex relaxations for bound computation, in training, these methods perform worse than looser relaxations. Prior work hypothesized that this is caused by the discontinuity and perturbation sensitivity of the loss surface induced by these tighter relaxations. In this work, we show theoretically that Gaussian Loss Smoothing can alleviate both issues. We confirm this empirically by proposing a certified training method combining PGPE, an algorithm computing gradients of a smoothed loss, with different convex relaxations. When using this training method, we observe that tighter bounds indeed lead to strictly better networks. While scaling PGPE training remains challenging due to high computational cost, we show that by using a not theoretically sound, yet much cheaper smoothing approximation, we obtain better certified accuracies than state-of-the-art methods when training on the same network architecture. Our results clearly demonstrate the promise of Gaussian Loss Smoothing for training certifiably robust neural networks.

Updated: 2024-06-25 13:46:24

标题: 克服高斯平滑认证培训的悖论

摘要: 尽管已经付出了很大的努力，但训练神经网络以高可验证精度对抗对手示例仍然是一个尚未解决的问题。虽然认证方法可以有效地利用紧凑的凸松弛来进行边界计算，在训练中，这些方法表现不如较松的松弛。先前的研究假设这是由于这些更紧的松弛引起的损失表面的不连续性和扰动敏感性。在这项工作中，我们在理论上证明了高斯损失平滑可以缓解这两个问题。我们通过提出一种结合PGPE的认证训练方法来实证这一点，PGPE是一种计算平滑损失梯度的算法，与不同的凸松弛方法结合使用。在使用这种训练方法时，我们观察到更紧的边界确实会导致更好的网络性能。尽管由于高计算成本，扩展PGPE训练仍然具有挑战性，但我们发现通过使用一个从理论上来说并不正确、但成本更低的平滑近似方法，在相同的网络架构上训练时，我们可以获得比最先进方法更好的可验证准确性。我们的结果清楚地表明了高斯损失平滑对训练具有可验证强大神经网络的潜力。

更新时间: 2024-06-25 13:46:24

领域: cs.LG

下载: http://arxiv.org/abs/2403.07095v2

Aligning Large Language Models by On-Policy Self-Judgment

Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, SELF-JUDGE that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we propose Judge-augmented Supervised Fine-Tuning (JSFT) to train a single model to act as both a policy and a judge. Specifically, we view the pairwise judgment task, choosing the better response from a response pair, as a special case of the instruction-following task. The resulting model can judge preferences of on-the-fly responses from current policy initialized from itself. Experimental results show the efficacy of SELF-JUDGE, outperforming baselines in preference benchmarks. We also show that the rejecting sampling by itself can improve performance further without an additional evaluator.

Updated: 2024-06-25 13:39:52

标题: 用政策自我评判对齐大型语言模型

摘要: 现有的用于将大型语言模型与人类偏好对齐的方法面临一个需要为on-policy学习准备单独的奖励模型（RM）的权衡。在本文中，我们提出了一个新颖的对齐框架SELF-JUDGE，它（1）进行on-policy学习并且（2）参数高效，因为它不需要额外的RM来评估用于on-policy学习的样本。为此，我们提出了Judge-augmented Supervised Fine-Tuning（JSFT）来训练一个单一模型，既能充当策略又能充当评判者。具体来说，我们将成对判断任务，从一个响应对中选择更好的响应，视为遵循指示任务的特例。由此产生的模型可以评判来自当前策略的即时响应的偏好，该策略从自身初始化。实验结果显示了SELF-JUDGE的有效性，在偏好基准测试中表现优于基线。我们还展示了拒绝抽样本身可以进一步提高性能，而无需额外的评估者。

更新时间: 2024-06-25 13:39:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.11253v3

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To address this issue, we introduce MT-Bench-101, specifically designed to evaluate the fine-grained abilities of LLMs in multi-turn dialogues. By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks. We then evaluate 21 popular LLMs based on MT-Bench-101, conducting comprehensive analyses from both ability and task perspectives and observing differing trends in LLMs performance across dialogue turns within various tasks. Further analysis indicates that neither utilizing common alignment techniques nor chat-specific designs has led to obvious enhancements in the multi-turn abilities of LLMs. Extensive case studies suggest that our designed tasks accurately assess the corresponding multi-turn abilities. The data and code are available at \url{https://github.com/mtbench101/mt-bench-101}.

Updated: 2024-06-25 13:38:41

标题: MT-Bench-101：用于评估多轮对话中大型语言模型的细粒度基准测试

摘要: 大型语言模型（LLM）的出现大大提升了对话系统。然而，全面评估LLM的对话能力仍然是一个挑战。先前的基准主要集中在单轮对话或提供了多轮对话的粗略和不完整的评估，忽略了真实对话的复杂性和细微差别。为解决这一问题，我们引入了MT-Bench-101，专门设计用于评估LLM在多轮对话中的细粒度能力。通过对真实多轮对话数据的详细分析，我们构建了一个包含13个不同任务中1388个多轮对话中4208轮的三级层次能力分类法。然后我们基于MT-Bench-101评估了21个流行的LLM，从能力和任务角度进行全面分析，并观察到LLM在不同任务中对话轮之间表现出不同的趋势。进一步的分析表明，既不使用常见的对齐技术也不使用特定于聊天的设计并没有明显提升LLM的多轮对话能力。广泛的案例研究表明，我们设计的任务能准确评估相应的多轮对话能力。数据和代码可在\url{https://github.com/mtbench101/mt-bench-101} 上找到。

更新时间: 2024-06-25 13:38:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14762v2

Laminator: Verifiable ML Property Cards using Hardware-assisted Attestations

Regulations increasingly call for various assurances from machine learning (ML) model providers about their training data, training process, and the behavior of resulting models during inference. For better transparency, companies (e.g., Huggingface and Google) have adopted model cards and datasheets which describe different properties of the training datasets and models. In the same vein, we introduce the notion of an inference card to describe the properties of a given inference (e.g., binding output to the model and its corresponding input). We collectively refer to these as ML property cards. A malicious model provider can include false information in ML property cards, raising a need for verifiable ML property cards. We show how to realized them using property attestation, technical mechanisms by which a prover (e.g., a model provider) can attest different ML properties during training and inference to a verifier (e.g., an auditor). However, prior attestation mechanisms based purely on cryptography are often narrowly focused (lacking versatility) and inefficient. There is a need to efficiently attest different types properties across the ML model training and inference pipeline. Recent developments make it possible to run and even train models inside hardware-assisted trusted execution environments (TEEs), which can provide highly efficient attestation. We propose Laminator, the first framework for verifiable ML property cards using hardware-assisted ML property attestations to efficiently furnish attestations for various ML properties for training and inference. It scales to multiple verifiers, and is independent of the model configuration.

Updated: 2024-06-25 13:36:53

标题: 层压机：利用硬件辅助认证实现可验证的机器学习属性卡

摘要: 随着监管要求越来越多地要求机器学习（ML）模型提供者对其训练数据、训练过程以及推断过程中产生的模型行为提供各种保证。为了更好地透明化，一些公司（如Huggingface和Google）已经采用了模型卡和数据表，描述训练数据集和模型的不同属性。在同一思路下，我们介绍了推断卡的概念，用于描述给定推断的属性（例如，将输出绑定到模型及其相应的输入）。我们将这些统称为ML属性卡。一个恶意的模型提供者可能在ML属性卡中包含虚假信息，这就需要可验证的ML属性卡。我们展示了如何通过属性认证来实现这一目标，即技术机制，通过这种机制，证明者（例如一个模型提供者）可以在训练和推断过程中向验证者（例如一个审计员）证明不同的ML属性。然而，基于纯密码学的先前认证机制往往过于狭隘（缺乏多功能性）和低效。有必要高效地证明ML模型训练和推断管道中的不同类型的属性。最近的发展使得在硬件辅助的受信任执行环境（TEEs）内运行甚至训练模型成为可能，这可以提供高效的认证。我们提出了Laminator，这是第一个利用硬件辅助ML属性认证来为训练和推断提供各种ML属性的可验证框架。它可以扩展到多个验证者，并且独立于模型配置。

更新时间: 2024-06-25 13:36:53

领域: cs.CR

下载: http://arxiv.org/abs/2406.17548v1

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.

Updated: 2024-06-25 13:35:13

标题: DEM：一种用于在航空航天中对深度神经网络分类器输出进行认证的方法

摘要: 航空航天领域的软件开发需要遵循严格的高质量标准。虽然该领域存在商业软件的监管指南（如ARP-4754和DO-178），但这些指南并不适用于具有深度神经网络（DNN）组件的软件。因此，如何让航空航天系统受益于深度学习革命尚不清楚。我们的工作旨在通过一种新颖的、以输出为中心的方法来解决这一挑战，用于DNN认证。我们的方法采用统计验证技术，具有一个关键优势，即能够标记DNN输出可能不可靠的特定输入，以便后续由人类专家检查。为实现这一目标，我们的方法对DNN对其他附近输入的预测进行统计分析，以检测不一致性。与现有技术相比，现有技术通常试图对整个DNN进行认证，而不是对单个输出进行认证。我们的方法将DNN作为黑盒使用，不对其拓扑结构做出任何假设。我们希望这项工作是朝着将DNN集成到安全关键应用中迈出的又一步 - 尤其是在航空航天领域，高质量和可靠性标准至关重要。

更新时间: 2024-06-25 13:35:13

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2401.02283v3

CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent

Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. Through extensive evaluation on the PaLM2 model family, we demonstrate that CDQuant consistently outperforms GPTQ across diverse model sizes and quantization levels. In particular, for INT2 quantization of PaLM2-Otter, CDQuant achieves a 10% reduction in perplexity compared to GPTQ.

Updated: 2024-06-25 13:29:14

标题: CDQuant：使用贪婪坐标下降准确地对大型预训练模型进行训练后权重量化

摘要: 大型语言模型（LLMs）最近在各种语言任务中展示出了卓越的性能。但它们的部署常常受到其巨大的计算和存储需求的限制。量化已经成为解决这一挑战的关键技术，可以压缩大型模型而对性能影响最小。最近的GPTQ算法是一种后训练量化（PTQ）方法，已被证明在压缩LLMs方面非常有效，引发了一波利用GPTQ作为核心组件的研究热潮。鉴于GPTQ在PTQ领域中的关键作用，我们引入了CDQuant，这是一个简单且可扩展的GPTQ替代方案，性能更好。CDQuant使用坐标下降来最小化逐层重构损失，以实现高质量的量化权重。我们的算法易于实现，并且能够有效地扩展到具有数百亿个参数的模型。通过对PaLM2模型系列进行广泛评估，我们证明CDQuant在各种模型大小和量化级别上始终优于GPTQ。特别是对于PaLM2-Otter的INT2量化，CDQuant相比于GPTQ实现了10%的perplexity降低。

更新时间: 2024-06-25 13:29:14

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.17542v1

Scoreformer: A Surrogate Model For Large-Scale Prediction of Docking Scores

In this study, we present ScoreFormer, a novel graph transformer model designed to accurately predict molecular docking scores, thereby optimizing high-throughput virtual screening (HTVS) in drug discovery. The architecture integrates Principal Neighborhood Aggregation (PNA) and Learnable Random Walk Positional Encodings (LRWPE), enhancing the model's ability to understand complex molecular structures and their relationship with their respective docking scores. This approach significantly surpasses traditional HTVS methods and recent Graph Neural Network (GNN) models in both recovery and efficiency due to a wider coverage of the chemical space and enhanced performance. Our results demonstrate that ScoreFormer achieves competitive performance in docking score prediction and offers a substantial 1.65-fold reduction in inference time compared to existing models. We evaluated ScoreFormer across multiple datasets under various conditions, confirming its robustness and reliability in identifying potential drug candidates rapidly.

Updated: 2024-06-25 13:25:08

标题: Scoreformer：一种大规模预测对接得分的替代模型

摘要: 在这项研究中，我们提出了ScoreFormer，这是一种新颖的图转换模型，旨在准确预测分子对接评分，从而优化药物发现中的高通量虚拟筛选（HTVS）。该架构集成了主要邻域聚合（PNA）和可学习随机游走位置编码（LRWPE），增强了模型理解复杂分子结构及其与各自对接评分关系的能力。这种方法在恢复和效率方面显著超越传统的HTVS方法和最近的图神经网络（GNN）模型，因为它覆盖了更广泛的化学空间并提高了性能。我们的结果表明，ScoreFormer在对接评分预测方面表现出竞争力，并与现有模型相比推断时间减少了1.65倍。我们在各种条件下跨多个数据集评估了ScoreFormer，确认其在快速识别潜在药物候选物方面的稳健性和可靠性。

更新时间: 2024-06-25 13:25:08

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2406.09346v2

European Space Agency Benchmark for Anomaly Detection in Satellite Telemetry

Machine learning has vast potential to improve anomaly detection in satellite telemetry which is a crucial task for spacecraft operations. This potential is currently hampered by a lack of comprehensible benchmarks for multivariate time series anomaly detection, especially for the challenging case of satellite telemetry. The European Space Agency Benchmark for Anomaly Detection in Satellite Telemetry (ESA-ADB) aims to address this challenge and establish a new standard in the domain. It is a result of close cooperation between spacecraft operations engineers from the European Space Agency (ESA) and machine learning experts. The newly introduced ESA Anomalies Dataset contains annotated real-life telemetry from three different ESA missions, out of which two are included in ESA-ADB. Results of typical anomaly detection algorithms assessed in our novel hierarchical evaluation pipeline show that new approaches are necessary to address operators' needs. All elements of ESA-ADB are publicly available to ensure its full reproducibility.

Updated: 2024-06-25 13:23:37

标题: 欧洲空间局卫星遥测异常检测基准Benchmark

摘要: 机器学习在改进卫星遥测异常检测方面具有巨大潜力，这对于宇宙飞船操作是至关重要的任务。目前，这一潜力受到多变量时间序列异常检测可理解基准缺乏的限制，特别是对于卫星遥测这种具有挑战性的情况。欧洲空间局卫星遥测异常检测基准（ESA-ADB）旨在应对这一挑战并在该领域建立新的标准。这是欧洲空间局（ESA）的宇宙飞船操作工程师与机器学习专家之间密切合作的结果。新引入的ESA异常数据集包含来自三个不同ESA任务的带注释的实际遥测数据，其中两个包含在ESA-ADB中。我们在新颖的分层评估管道中评估的典型异常检测算法的结果表明，需要新的方法来满足操作员的需求。ESA-ADB的所有元素都是公开可用的，以确保其完全可复制。

更新时间: 2024-06-25 13:23:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17826v1

Optimal spanning tree reconstruction in symbolic regression

This paper investigates the problem of regression model generation. A model is a superposition of primitive functions. The model structure is described by a weighted colored graph. Each graph vertex corresponds to some primitive function. An edge assigns a superposition of two functions. The weight of an edge equals the probability of superposition. To generate an optimal model one has to reconstruct its structure from its graph adjacency matrix. The proposed algorithm reconstructs the~minimum spanning tree from the~weighted colored graph. This paper presents a novel solution based on the prize-collecting Steiner tree algorithm. This algorithm is compared with its alternatives.

Updated: 2024-06-25 13:22:13

标题: 符号回归中的最佳跨度树重构

摘要: 本文研究了回归模型生成的问题。模型是原始函数的叠加。模型结构由加权彩色图描述。每个图顶点对应于某个原始函数。一条边分配两个函数的叠加。边的权重等于叠加的概率。为了生成最佳模型，必须从其图邻接矩阵中重建其结构。提出的算法从加权彩色图中重建了最小生成树。本文提出了一种基于奖励收集Steiner树算法的新颖解决方案。该算法与其替代方案进行了比较。

更新时间: 2024-06-25 13:22:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.18612v1

SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder

Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 \% of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in daily life. Most of the techniques discussed in the literature rely on supervised Machine Learning (ML) methods. However, the challenge of accurately labeling variations in epileptic EEG waveforms complicates the use of these approaches. Additionally, the rarity of ictal events introduces an high imbalancing within the data, which could lead to poor prediction performance in supervised learning approaches. Instead, a semi-supervised approach allows to train the model only on data not containing seizures, thus avoiding the issues related to the data imbalancing. This work proposes a semi-supervised approach for detecting epileptic seizures from EEG data, utilizing a novel Deep Learning-based method called SincVAE. This proposal incorporates the learning of an ad-hoc array of bandpass filter as a first layer of a Variational Autoencoder (VAE), potentially eliminating the preprocessing stage where informative band frequencies are identified and isolated. Results indicate that SincVAE improves seizure detection in EEG data and is capable of identifying early seizures during the preictal stage as well as monitoring patients throughout the postictal stage.

Updated: 2024-06-25 13:21:01

标题: SincVAE：使用SincNet和变分自编码器改进脑电图数据异常检测的新方法

摘要: 在过去几十年中，脑电图（EEG）监测已成为诊断神经系统疾病的关键工具，特别是用于检测癫痫发作。全球范围内最常见的神经系统疾病之一是癫痫，影响大约1％的人口。这些患者面临着重大风险，强调了在日常生活中需要可靠、持续的癫痫监测。文献中讨论的大多数技术依赖于监督式机器学习（ML）方法。然而，准确标记癫痫脑电图波形的变化的挑战使得这些方法的使用变得复杂。此外，癫痫发作事件的稀有性在数据中引入了高度不平衡，可能导致监督学习方法中预测性能较差。相反，半监督方法允许仅对不包含癫痫发作数据进行模型训练，从而避免与数据不平衡相关的问题。本文提出了一种利用名为SincVAE的新型基于深度学习的方法进行从EEG数据中检测癫痫发作的半监督方法。该提议将带通滤波器的学习作为变分自动编码器（VAE）的第一层，从而潜在地消除了识别和隔离信息频段的预处理阶段。结果表明，SincVAE改善了EEG数据中的癫痫发作检测，并且能够在癫痫发作前期阶段识别早期发作，以及在发作后期监测患者。

更新时间: 2024-06-25 13:21:01

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2406.17537v1

MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions

The integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness. The computer vision community established benchmarks such as ImageNet-C as a fundamental prerequisite to measure progress towards those challenges. Similar datasets are largely absent in the medical imaging community which lacks a comprehensive benchmark that spans across imaging modalities and applications. To address this gap, we create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities. We simulate task and modality-specific image corruptions of varying severity to comprehensively evaluate the robustness of established algorithms against real-world artifacts and distribution shifts. We further provide quantitative evidence that our simple-to-use artificial corruptions allow for highly performant, lightweight data augmentation to enhance model robustness. Unlike traditional, generic augmentation strategies, our approach leverages domain knowledge, exhibiting significantly higher robustness when compared to widely adopted methods. By introducing MedMNIST-C and open-sourcing the corresponding library allowing for targeted data augmentations, we contribute to the development of increasingly robust methods tailored to the challenges of medical imaging. The code is available at https://github.com/francescodisalvo05/medmnistc-api}{github.com/francescodisalvo05/medmnistc-api.

Updated: 2024-06-25 13:20:39

标题: MedMNIST-C：通过模拟真实图像破坏实现全面基准和改进的分类器稳健性

摘要: 神经网络系统集成到临床实践中受到与领域泛化和稳健性相关的挑战的限制。计算机视觉社区建立了诸如ImageNet-C之类的基准，作为衡量取得进展向这些挑战努力的基本先决条件。医学影像社区中类似的数据集在很大程度上缺失，缺乏跨影像模态和应用的全面基准。为了填补这一空白，我们创建并开源了MedMNIST-C，这是基于MedMNIST+收集的基准数据集，涵盖了12个数据集和9种成像模态。我们模拟不同严重程度的任务和模态特定的图像损坏，全面评估现有算法对真实世界工件和分布转移的稳健性。我们进一步提供定量证据，表明我们简单易用的人工损坏允许高性能、轻量级数据增强，以增强模型的稳健性。与传统的通用增强策略不同，我们的方法利用领域知识，与广泛采用的方法相比，表现出更高的稳健性。通过引入MedMNIST-C并开源相应的库，允许有针对性地进行数据增强，我们为开发越来越稳健的方法，以应对医学影像的挑战做出贡献。代码可在以下网址找到：https://github.com/francescodisalvo05/medmnistc-api。

更新时间: 2024-06-25 13:20:39

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17536v1

Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to generate and manipulate human language, highlighting their potential across various applications. Evaluating LLMs in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts, thus broadening their usability and effectiveness. We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy. Our study makes three primary contributions: Firstly, we adapt the INVALSI benchmark for automated LLM evaluation, which involves rigorous adaptation of the test format to suit automated processing while retaining the essence of the original tests. Secondly, we provide a detailed assessment of current LLMs, offering a crucial reference point for the academic community. Finally, we visually compare the performance of these models against human results. Additionally, researchers are invited to submit their models for ongoing evaluation, ensuring the benchmark remains a current and valuable resource.

Updated: 2024-06-25 13:20:08

标题: Learn or Fail: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

摘要: 最近大语言模型（LLMs）的进展显著增强了它们生成和操纵人类语言的能力，突显了它们在各种应用中的潜力。评估除英语以外的语言中的LLMs对于确保它们的语言多样性、文化相关性和在不同全球背景下的适用性至关重要，从而拓宽了它们的可用性和有效性。我们通过引入使用INVALSI测试的结构化基准来应对这一挑战，INVALSI测试是一套旨在衡量意大利全国教育能力的既定评估。我们的研究做出了三项主要贡献：首先，我们对INVALSI基准进行了自动化LLM评估的调整，这涉及对测试格式的严格适应以适合自动处理，同时保留了原始测试的本质。其次，我们提供了对当前LLMs的详细评估，为学术界提供了重要的参考点。最后，我们将这些模型的表现与人类结果进行了可视化比较。此外，研究人员被邀请提交他们的模型进行持续评估，确保基准保持为一个当前和有价值的资源。

更新时间: 2024-06-25 13:20:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17535v1

Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study

Large language models (LLMs) have shown significant achievements in solving a wide range of tasks. Recently, LLMs' capability to store, retrieve and infer with symbolic knowledge has drawn a great deal of attention, showing their potential to understand structured information. However, it is not yet known whether LLMs can understand Description Logic (DL) ontologies. In this work, we empirically analyze the LLMs' capability of understanding DL-Lite ontologies covering 6 representative tasks from syntactic and semantic aspects. With extensive experiments, we demonstrate both the effectiveness and limitations of LLMs in understanding DL-Lite ontologies. We find that LLMs can understand formal syntax and model-theoretic semantics of concepts and roles. However, LLMs struggle with understanding TBox NI transitivity and handling ontologies with large ABoxes. We hope that our experiments and analyses provide more insights into LLMs and inspire to build more faithful knowledge engineering solutions.

Updated: 2024-06-25 13:16:34

标题: 大型语言模型能理解DL-Lite本体吗？一项实证研究

摘要: 大型语言模型（LLMs）在解决各种任务方面已经取得了显著的成就。最近，LLMs存储、检索和推理符号知识的能力引起了广泛关注，显示出它们理解结构化信息的潜力。然而，目前尚不清楚LLMs是否能够理解描述逻辑（DL）本体论。在这项工作中，我们通过实证分析LLMs理解DL-Lite本体论的能力，涵盖了从语法和语义方面的6个代表性任务。通过大量实验，我们展示了LLMs在理解DL-Lite本体论方面的有效性和局限性。我们发现LLMs可以理解概念和角色的形式语法和模型论语义。然而，LLMs在理解TBox NI的传递性和处理具有大型ABoxes的本体论方面存在困难。我们希望我们的实验和分析能够提供更多关于LLMs的见解，并激发建立更加忠实的知识工程解决方案。

更新时间: 2024-06-25 13:16:34

领域: cs.AI,cs.CL,cs.LO

下载: http://arxiv.org/abs/2406.17532v1

Enhancing LLM-Based Human-Robot Interaction with Nuances for Diversity Awareness

This paper presents a system for diversity-aware autonomous conversation leveraging the capabilities of large language models (LLMs). The system adapts to diverse populations and individuals, considering factors like background, personality, age, gender, and culture. The conversation flow is guided by the structure of the system's pre-established knowledge base, while LLMs are tasked with various functions, including generating diversity-aware sentences. Achieving diversity-awareness involves providing carefully crafted prompts to the models, incorporating comprehensive information about users, conversation history, contextual details, and specific guidelines. To assess the system's performance, we conducted both controlled and real-world experiments, measuring a wide range of performance indicators.

Updated: 2024-06-25 13:15:36

标题: 通过细微差别意识提升基于LLM的人机交互

摘要: 本文介绍了一个多样性感知的自主对话系统，利用大型语言模型（LLMs）的能力。该系统适应不同的人群和个体，考虑背景、个性、年龄、性别和文化等因素。对话流程由系统预先建立的知识库的结构引导，同时LLMs负责执行各种功能，包括生成具有多样性意识的句子。实现多样性感知涉及向模型提供精心设计的提示，整合关于用户、对话历史、环境细节和具体指导的综合信息。为了评估系统的性能，我们进行了受控和真实世界的实验，测量了广泛的性能指标。

更新时间: 2024-06-25 13:15:36

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.17531v1

Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need

We leverage multilevel Monte Carlo (MLMC) to improve the performance of multi-step look-ahead Bayesian optimization (BO) methods that involve nested expectations and maximizations. Often these expectations must be computed by Monte Carlo (MC). The complexity rate of naive MC degrades for nested operations, whereas MLMC is capable of achieving the canonical MC convergence rate for this type of problem, independently of dimension and without any smoothness assumptions. Our theoretical study focuses on the approximation improvements for twoand three-step look-ahead acquisition functions, but, as we discuss, the approach is generalizable in various ways, including beyond the context of BO. Our findings are verified numerically and the benefits of MLMC for BO are illustrated on several benchmark examples. Code is available at https://github.com/Shangda-Yang/MLMCBO .

Updated: 2024-06-25 13:11:33

标题: 加速贝叶斯优化中的前瞻：多层蒙特卡罗就是你所需要的

摘要: 我们利用多层蒙特卡罗（MLMC）来提高涉及嵌套期望和最大化的多步前瞻贝叶斯优化（BO）方法的性能。通常，这些期望必须通过蒙特卡罗（MC）计算。朴素MC的复杂度随着嵌套操作而降低，而MLMC能够实现这种类型问题的标准MC收敛速度，独立于维度，并且不需要任何光滑性假设。我们的理论研究集中在两步和三步前瞻获取函数的近似改进上，但正如我们所讨论的，这种方法在各种方面都是可推广的，包括超出BO的背景。我们的发现经过数值验证，并且MLMC对于BO的好处在几个基准示例中得到了说明。代码可在https://github.com/Shangda-Yang/MLMCBO 上找到。

更新时间: 2024-06-25 13:11:33

领域: stat.ML,cs.LG,math.OC,math.PR,stat.CO,stat.ME

下载: http://arxiv.org/abs/2402.02111v2

In value-based deep reinforcement learning, a pruned network is a good network

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

Updated: 2024-06-25 13:10:06

标题: 基于价值的深度强化学习中，修剪网络是一个好网络

摘要: 最近的研究表明，深度强化学习代理在有效利用其网络参数方面存在困难。我们利用之前对稀疏训练技术优势的洞察，并证明逐渐剪枝技术使基于价值的代理能够最大化参数的有效性。这导致网络产生显著的性能改进，仅使用完整网络参数的一小部分。

更新时间: 2024-06-25 13:10:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12479v3

Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density Functional Theory and Machine Learning

Knowing the rate at which particle radiation releases energy in a material, the stopping power, is key to designing nuclear reactors, medical treatments, semiconductor and quantum materials, and many other technologies. While the nuclear contribution to stopping power, i.e., elastic scattering between atoms, is well understood in the literature, the route for gathering data on the electronic contribution has for decades remained costly and reliant on many simplifying assumptions, including that materials are isotropic. We establish a method that combines time-dependent density functional theory (TDDFT) and machine learning to reduce the time to assess new materials to mere hours on a supercomputer and provides valuable data on how atomic details influence electronic stopping. Our approach uses TDDFT to compute the electronic stopping contributions to stopping power from first principles in several directions and then machine learning to interpolate to other directions at a cost of 10 million times fewer core-hours. We demonstrate the combined approach in a study of proton irradiation in aluminum and employ it to predict how the depth of maximum energy deposition, the "Bragg Peak," varies depending on incident angle -- a quantity otherwise inaccessible to modelers. The lack of any experimental information requirement makes our method applicable to most materials, and its speed makes it a prime candidate for enabling quantum-to-continuum models of radiation damage. The prospect of reusing valuable TDDFT data for training the model make our approach appealing for applications in the age of materials data science.

Updated: 2024-06-25 13:09:23

标题: 用时间依赖密度泛函理论和机器学习相结合，将电子阻止功率预测加速10百万倍

摘要: 了解粒子辐射在材料中释放能量的速率，即阻止能力，对于设计核反应堆、医疗治疗、半导体和量子材料以及许多其他技术至关重要。虽然文献中对核贡献到阻止能力的了解，即原子之间的弹性散射，但几十年来获取电子贡献数据的途径仍然昂贵且依赖于许多简化假设，包括材料是各向同性的。我们建立了一种方法，将时间相关密度泛函理论（TDDFT）和机器学习相结合，将评估新材料的时间缩短到在超级计算机上仅需几小时，并提供有关原子细节如何影响电子阻止的宝贵数据。我们的方法使用TDDFT从第一原理计算阻止能力的电子贡献，并在几个方向上进行插值，然后用成本为核心小时数的1000万倍计算其他方向。我们在铝中进行质子辐照研究中展示了这种综合方法，并利用它来预测最大能量沉积深度，“布拉格峰”，根据入射角的不同而变化 - 这是对模拟者无法获得的量。我们的方法不需要任何实验信息要求，适用于大多数材料，其速度使其成为实现辐射损伤量子到连续模型的首选候选者。重复使用宝贵的TDDFT数据来训练模型的前景使我们的方法在材料数据科学时代的应用中具有吸引力。

更新时间: 2024-06-25 13:09:23

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2311.00787v2

Bayesian Exploration Networks

Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.

Updated: 2024-06-25 13:06:13

标题: 贝叶斯探索网络

摘要: 贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序贝叶斯强化学习（RL）为在不确定性下进行序决策提供了一种原则性和优雅的方法。值得注意的是，贝叶斯代理人不会面临勘探/利用困境，这是频率方法的一个主要病理。然而，对于无模型方法的理论理解仍然欠缺。在本文中，我们介绍了一种新颖的贝叶斯无模型形式，并首次分析表明无模型方法可以产生贝叶斯最优策略。我们展示了所有现有的无模型方法都进行了近似，从而得到可能是任意贝叶斯次优的策略。作为朝向无模型贝叶斯最优性的第一步，我们引入了贝叶斯探索网络（BEN），该网络使用归一化流来模拟贝尔曼算子中的两种不确定性：通过密度估计模拟aleatoric不确定性，通过变分推理模拟epistemic不确定性。在完全优化的极限情况下，BEN学习真正的贝叶斯最优策略，但与变分期望最大化一样，部分优化使我们的方法可行。实证结果表明，在现有无模型方法失败的任务中，BEN能够学习真正的贝叶斯最优策略。

更新时间: 2024-06-25 13:06:13

领域: cs.LG

下载: http://arxiv.org/abs/2308.13049v4

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.

Updated: 2024-06-25 13:06:09

标题: 关于值基深度强化学习中超参数选择的一致性

摘要: 深度强化学习（深度RL）通过算法设计和谨慎选择超参数在各个领域取得了巨大成功。算法改进通常是基于先前方法的迭代增强，而超参数选择通常继承自先前方法或专门为所提出的技术进行微调。尽管超参数选择对性能具有关键影响，但常常被算法改进所掩盖。本文进行了一项广泛的经验研究，重点关注基于值的深度强化学习代理的超参数选择的可靠性，包括引入一个新的评分来量化各种超参数的一致性和可靠性。我们的研究结果不仅有助于确定哪些超参数是最关键的调整，还有助于澄清哪些调整在不同的训练规则下保持一致。

更新时间: 2024-06-25 13:06:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17523v1

Representation Surgery: Theory and Practice of Affine Steering

Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model's representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations in a manner that reduces the probability of it generating undesirable text. This paper investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model's representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.

Updated: 2024-06-25 13:00:08

标题: Representation Surgery: 仿射导向的理论与实践

摘要: 语言模型常常表现出不良行为，例如生成有毒或性别偏见的文本。在神经语言模型的情况下，不良行为的编码通常存在于模型的表示中。因此，防止模型表现出不良行为的一种自然（也是常见）方法是引导模型的表示，以减少其生成不良文本的概率。本文研究了引导函数的形式和经验特性，即改变神经语言模型表示的转换，从而改变其行为。首先，我们在不同约束条件下推导出两个最优的、在最小二乘意义下的仿射引导函数。我们的理论为现有方法提供了理论依据，并提供了一种新颖且改进的引导方法。其次，我们提供了一系列实验证明这些方法在减轻偏见和减少有毒生成方面的经验有效性。

更新时间: 2024-06-25 13:00:08

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2402.09631v5

Preserving Node Distinctness in Graph Autoencoders via Similarity Distillation

Graph autoencoders (GAEs), as a kind of generative self-supervised learning approach, have shown great potential in recent years. GAEs typically rely on distance-based criteria, such as mean-square-error (MSE), to reconstruct the input graph. However, relying solely on a single reconstruction criterion may lead to a loss of distinctiveness in the reconstructed graph, causing nodes to collapse into similar representations and resulting in sub-optimal performance. To address this issue, we have developed a simple yet effective strategy to preserve the necessary distinctness in the reconstructed graph. Inspired by the knowledge distillation technique, we found that the dual encoder-decoder architecture of GAEs can be viewed as a teacher-student relationship. Therefore, we propose transferring the knowledge of distinctness from the raw graph to the reconstructed graph, achieved through a simple KL constraint. Specifically, we compute pairwise node similarity scores in the raw graph and reconstructed graph. During the training process, the KL constraint is optimized alongside the reconstruction criterion. We conducted extensive experiments across three types of graph tasks, demonstrating the effectiveness and generality of our strategy. This indicates that the proposed approach can be employed as a plug-and-play method to avoid vague reconstructions and enhance overall performance.

Updated: 2024-06-25 12:54:35

标题: 通过相似度提炼在图自编码器中保持节点的独特性

摘要: 图自动编码器（GAEs）作为一种生成式自监督学习方法，在近年来展现出巨大潜力。GAEs通常依赖于基于距离的标准，比如均方误差（MSE），来重建输入图。然而，仅仅依赖单一的重建标准可能导致重建图中的特点丧失，使节点坍缩成相似的表示，从而导致次优性能。为了解决这个问题，我们开发了一种简单而有效的策略，以保留重建图中的必要特点。受知识蒸馏技术的启发，我们发现GAEs的双编码器-解码器架构可以看作是一种师生关系。因此，我们提出将重建图中的特点知识从原始图传输到重建图中，通过简单的KL约束实现。具体来说，我们计算原始图和重建图中的节点之间的相似性分数。在训练过程中，KL约束与重建标准一起被优化。我们进行了广泛的实验，涵盖了三种类型的图任务，展示了我们策略的有效性和普适性。这表明所提出的方法可以作为一种即插即用的方法，避免模糊的重建并增强整体性能。

更新时间: 2024-06-25 12:54:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17517v1

Benchmarking Mental State Representations in Language Models

While numerous works have assessed the generative performance of language models (LMs) on tasks requiring Theory of Mind reasoning, research into the models' internal representation of mental states remains limited. Recent work has used probing to demonstrate that LMs can represent beliefs of themselves and others. However, these claims are accompanied by limited evaluation, making it difficult to assess how mental state representations are affected by model design and training choices. We report an extensive benchmark with various LM types with different model sizes, fine-tuning approaches, and prompt designs to study the robustness of mental state representations and memorisation issues within the probes. Our results show that the quality of models' internal representations of the beliefs of others increases with model size and, more crucially, with fine-tuning. We are the first to study how prompt variations impact probing performance on theory of mind tasks. We demonstrate that models' representations are sensitive to prompt variations, even when such variations should be beneficial. Finally, we complement previous activation editing experiments on Theory of Mind tasks and show that it is possible to improve models' reasoning performance by steering their activations without the need to train any probe.

Updated: 2024-06-25 12:51:06

标题: 在语言模型中对精神状态表示进行基准测试

摘要: 尽管许多作品已评估了语言模型（LMs）在需要心灵理论推理的任务上的生成性能，但对模型对心理状态的内部表示的研究仍然有限。最近的研究利用探测展示了LMs可以表示自身和他人的信念。然而，这些主张缺乏评估，使得难以评估模型设计和训练选择对心理状态表示的影响。我们报告了一个广泛的基准测试，涉及不同模型大小、微调方法和提示设计的各种LM类型，以研究心理状态表示的稳健性和探针内的记忆问题。我们的结果显示，模型对他人信念的内部表示质量随着模型大小的增加而增加，更重要的是，随着微调的增加。我们是第一个研究提示变化如何影响心灵理论任务中的探针性能。我们证明，即使这些变化应该是有益的，模型的表示也对提示的变化敏感。最后，我们补充了先前在心灵理论任务上的激活编辑实验，并展示了通过引导激活而无需训练任何探针，可以改善模型的推理性能。

更新时间: 2024-06-25 12:51:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17513v1

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

Updated: 2024-06-25 12:50:34

标题: 拼图：通过链接AI基础模型支持设计师原型设计多模式应用程序

摘要: 最近AI基础模型的进步使得它们可以用于创造性任务，包括构思设计概念或生成视觉原型。然而，将这些模型整合到创意过程中可能具有挑战性，因为它们通常作为针对特定任务定制的独立应用存在。为了解决这一挑战，我们引入了Jigsaw，一个采用拼图片段作为隐喻来代表基础模型的原型系统。Jigsaw允许设计师通过组装兼容的拼图片段来结合不同的基础模型能力，跨越各种形式的模态。为了设计Jigsaw，我们对十位设计师进行了访谈，并提炼了设计目标。在用户研究中，我们发现Jigsaw增强了设计师对可用基础模型能力的理解，提供了关于如何结合不同模态和任务的能力的指导，并作为一个支持设计探索、原型制作和文档编制的画布。

更新时间: 2024-06-25 12:50:34

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.08574v2

WAVE: Weight Template for Adaptive Initialization of Variable-sized Models

The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking perspective and introduce \textbf{WAVE}, which incorporates a set of shared \textbf{W}eight templates for \textbf{A}daptive initialization of \textbf{V}ariable-siz\textbf{E}d Models. During initialization, target models will initialize the corresponding weight scalers tailored to their model size, which are sufficient to learn the connection rules of weight templates based on the Kronecker product from a limited amount of data. For the construction of the weight templates, WAVE utilizes the \textit{Learngene} framework, which structurally condenses common knowledge from ancestry models into weight templates as the learngenes through knowledge distillation. This process allows the integration of pre-trained models' knowledge into structured knowledge according to the rules of weight templates. We provide a comprehensive benchmark for the learngenes, and extensive experiments demonstrate the efficacy of WAVE. The results show that WAVE achieves state-of-the-art performance when initializing models with various depth and width, and even outperforms the direct pre-training of $n$ entire models, particularly for smaller models, saving approximately $n\times$ and $5\times$ in computational and storage resources, respectively. WAVE simultaneously achieves the most efficient knowledge transfer across a series of datasets, specifically achieving an average improvement of 1.8\% and 1.2\% on 7 downstream datasets.

Updated: 2024-06-25 12:43:33

标题: WAVE：用于自适应初始化变尺寸模型的权重模板

摘要: 模型参数的扩展强调了预训练模型的重要性；然而，在模型部署过程中遇到的约束需要具有可变尺寸的模型。因此，传统的预训练和微调范式无法解决目标模型与预训练模型不兼容时的初始化问题。我们从多任务的角度解决这个问题，并引入了\textbf{WAVE}，它包括一组共享的\textbf{W}eight模板，用于\textbf{A}daptive初始化\textbf{V}ariable-siz\textbf{E}d模型。在初始化过程中，目标模型将初始化相应的权重标量，适合其模型大小，这足以从有限数量的数据中学习基于Kronecker乘积的权重模板的连接规则。为了构建权重模板，WAVE利用\textit{Learngene}框架，通过知识蒸馏将祖先模型中的共同知识结构化为权重模板作为learngenes。这个过程允许根据权重模板的规则将预训练模型的知识整合为结构化的知识。我们为learngenes提供了全面的基准测试，广泛的实验证明了WAVE的有效性。结果显示，WAVE在初始化具有不同深度和宽度的模型时实现了最先进的性能，并且在小型模型方面甚至优于直接预训练$n$个完整模型，尤其节省了约$n\times$和$5\times$的计算和存储资源。WAVE同时实现了跨一系列数据集的最有效的知识传递，特别是在7个下游数据集上平均提高了1.8\%和1.2\%。

更新时间: 2024-06-25 12:43:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.17503v1

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classified into two categories: reference-free and reference-based attacks. They are both based on the hypothesis that training records consistently strike a higher probability of being sampled. Nevertheless, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. The reference-based attack seems to achieve promising effectiveness in LLMs, which measures a more reliable membership signal by comparing the probability discrepancy between the target model and the reference model. However, the performance of reference-based attack is highly dependent on a reference dataset that closely resembles the training dataset, which is usually inaccessible in the practical scenario. Overall, existing MIAs are unable to effectively unveil privacy leakage over practical fine-tuned LLMs that are overfitting-free and private. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal, probabilistic variation, which is based on memorization rather than overfitting. Furthermore, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs.

Updated: 2024-06-25 12:36:02

标题: Fine-tuned大型语言模型通过自提示校准的实用会员推断攻击

摘要: Membership Inference Attacks (MIA)旨在推断目标数据记录是否已被用于模型训练。先前的尝试通过MIA量化了语言模型（LMs）的隐私风险，但目前对于现有MIA算法是否会在实际大型语言模型（LLMs）上造成显著的隐私泄露尚无共识。针对LMs设计的现有MIAs可以分为两类：无参考和基于参考的攻击。它们都基于这样一个假设：训练记录一直具有更高的被采样概率。然而，这一假设严重依赖于目标模型的过拟合，这将通过多种正则化方法和LLMs的泛化来缓解。基于参考的攻击似乎在LLMs中取得了令人期待的有效性，通过比较目标模型和参考模型之间的概率差异来衡量更可靠的成员信号。然而，基于参考的攻击的性能高度依赖于一个与训练数据集密切相似的参考数据集，这在实际场景中通常是无法访问的。总的来说，现有的MIAs无法有效揭示在没有过拟合和私密性的实际微调LLMs上的隐私泄露。我们提出了一种基于自校准概率变异（SPV-MIA）的成员推断攻击。具体来说，由于在LLMs中的记忆在训练过程中是不可避免的，并且发生在过拟合之前，我们引入了一个更可靠的成员信号，概率变异，它基于记忆而不是过拟合。此外，我们引入了一种自提示方法，通过提示目标LLM本身来构建用于微调参考模型的数据集。通过这种方式，对手可以从公共API中收集一个具有类似分布的数据集。

更新时间: 2024-06-25 12:36:02

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.06062v3

A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models

Membership Inference Attack (MIA) identifies whether a record exists in a machine learning model's training set by querying the model. MIAs on the classic classification models have been well-studied, and recent works have started to explore how to transplant MIA onto generative models. Our investigation indicates that existing MIAs designed for generative models mainly depend on the overfitting in target models. However, overfitting can be avoided by employing various regularization techniques, whereas existing MIAs demonstrate poor performance in practice. Unlike overfitting, memorization is essential for deep learning models to attain optimal performance, making it a more prevalent phenomenon. Memorization in generative models leads to an increasing trend in the probability distribution of generating records around the member record. Therefore, we propose a Probabilistic Fluctuation Assessing Membership Inference Attack (PFAMI), a black-box MIA that infers memberships by detecting these trends via analyzing the overall probabilistic fluctuations around given records. We conduct extensive experiments across multiple generative models and datasets, which demonstrate PFAMI can improve the attack success rate (ASR) by about 27.9% when compared with the best baseline.

Updated: 2024-06-25 12:34:46

标题: 一种基于概率波动的扩散模型成员推断攻击

摘要: Membership Inference Attack (MIA)通过查询模型来确定记录是否存在于机器学习模型的训练集中。经典分类模型上的MIA已经得到广泛研究，最近的研究开始探索如何将MIA转移到生成模型上。我们的调查表明，现有针对生成模型设计的MIA主要依赖于目标模型中的过拟合。然而，过拟合可以通过采用各种正则化技术来避免，而现有的MIA在实践中表现不佳。与过拟合不同，记忆对于深度学习模型实现最佳性能至关重要，使其成为更为普遍的现象。生成模型中的记忆导致生成记录的概率分布在成员记录周围呈增长趋势。因此，我们提出一种基于概率波动的成员推断攻击（PFAMI），这是一种黑盒MIA，通过分析给定记录周围的整体概率波动来检测这些趋势，从而推断成员身份。我们在多个生成模型和数据集上进行了广泛的实验，结果表明与最佳基线相比，PFAMI可以将攻击成功率（ASR）提高约27.9%。

更新时间: 2024-06-25 12:34:46

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2308.12143v4

High-Dimension Human Value Representation in Large Language Models

The widespread application of Large Language Models (LLMs) across various tasks and fields has necessitated the alignment of these models with human values and preferences. Given various approaches of human value alignment, ranging from Reinforcement Learning with Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need to understand the scope and nature of human values injected into these models before their release. There is also a need for model alignment without a costly large scale human annotation effort. We propose UniVaR, a high-dimensional representation of human value distributions in LLMs, orthogonal to model architecture and training data. Trained from the value-relevant output of eight multilingual LLMs and tested on the output from four multilingual LLMs, namely LlaMA2, ChatGPT, JAIS and Yi, we show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources. Through UniVaR, we explore how different LLMs prioritize various values in different languages and cultures, shedding light on the complex interplay between human values and language modeling.

Updated: 2024-06-25 12:23:00

标题: 大型语言模型中的高维人类价值表征

摘要: 大型语言模型（LLMs）在各种任务和领域的广泛应用需要将这些模型与人类价值观和偏好进行对齐。鉴于人类价值观对齐的各种方法，从强化学习与人类反馈（RLHF）到宪法学习等，迫切需要在发布之前了解注入这些模型的人类价值观的范围和性质。此外，还需要进行模型对齐，而无需进行昂贵的大规模人工标注。我们提出了UniVaR，一种高维度表示人类价值分布在LLMs中的方法，与模型架构和训练数据正交。通过对八个多语言LLMs的与价值相关的输出进行训练，并在四个多语言LLMs的输出上进行测试，即LlaMA2、ChatGPT、JAIS和Yi，我们表明UniVaR是一个强大的工具，可以比较不同LLMs中嵌入的人类价值分布与不同语言来源。通过UniVaR，我们探讨不同LLMs如何在不同语言和文化中优先考虑各种价值观，揭示人类价值观与语言建模之间的复杂相互作用。

更新时间: 2024-06-25 12:23:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07900v2

BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO

We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education. BricksRL facilitates the creation, design, and training of custom LEGO robots in the real world by interfacing them with the TorchRL library for reinforcement learning agents. The integration of TorchRL with the LEGO hubs, via Bluetooth bidirectional communication, enables state-of-the-art reinforcement learning training on GPUs for a wide variety of LEGO builds. This offers a flexible and cost-efficient approach for scaling and also provides a robust infrastructure for robot-environment-algorithm communication. We present various experiments across tasks and robot configurations, providing built plans and training results. Furthermore, we demonstrate that inexpensive LEGO robots can be trained end-to-end in the real world to achieve simple tasks, with training times typically under 120 minutes on a normal laptop. Moreover, we show how users can extend the capabilities, exemplified by the successful integration of non-LEGO sensors. By enhancing accessibility to both robotics and reinforcement learning, BricksRL establishes a strong foundation for democratized robotic learning in research and educational settings.

Updated: 2024-06-25 12:17:44

标题: BricksRL：一个用LEGO实现机器人和强化学习研究与教育民主化的平台

摘要: 我们介绍了BricksRL，这是一个旨在推动强化学习研究和教育中对机器人的普及的平台。BricksRL通过将LEGO机器人与TorchRL强化学习代理库进行接口设计，促进了在现实世界中创建、设计和训练定制LEGO机器人的过程。通过蓝牙双向通信将TorchRL与LEGO集线器集成，可以在GPU上进行LEGO构建的各种最先进的强化学习训练。这提供了一种灵活且成本效益高的扩展方法，同时为机器人-环境-算法通信提供了坚实的基础设施。我们展示了在各种任务和机器人配置上的不同实验，提供了构建计划和训练结果。此外，我们证明了廉价的LEGO机器人可以在现实世界中端到端地进行训练，从而实现简单的任务，训练时间通常不超过普通笔记本电脑上的120分钟。此外，我们展示了用户如何扩展功能，例如成功集成非LEGO传感器。通过提高对机器人和强化学习的可访问性，BricksRL在研究和教育环境中为民主化的机器人学习奠定了坚实的基础。

更新时间: 2024-06-25 12:17:44

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.17490v1

Embedding Ontologies via Incorporating Extensional and Intensional Knowledge

Ontologies contain rich knowledge within domain, which can be divided into two categories, namely extensional knowledge and intensional knowledge. Extensional knowledge provides information about the concrete instances that belong to specific concepts in the ontology, while intensional knowledge details inherent properties, characteristics, and semantic associations among concepts. However, existing ontology embedding approaches fail to take both extensional knowledge and intensional knowledge into fine consideration simultaneously. In this paper, we propose a novel ontology embedding approach named EIKE (Extensional and Intensional Knowledge Embedding) by representing ontologies in two spaces, called extensional space and intensional space. EIKE presents a unified framework for embedding instances, concepts and their relations in an ontology, applying a geometry-based method to model extensional knowledge and a pretrained language model to model intensional knowledge, which can capture both structure information and textual information. Experimental results show that EIKE significantly outperforms state-of-the-art methods in three datasets for both triple classification and link prediction, indicating that EIKE provides a more comprehensive and representative perspective of the domain.

Updated: 2024-06-25 12:08:41

标题: 通过合并外延知识和内涵知识嵌入本体论

摘要: 本体包含领域内丰富的知识，可分为两类，即外延知识和内涵知识。外延知识提供了关于本体中特定概念所属具体实例的信息，而内涵知识详细描述了概念之间的固有属性、特征和语义关联。然而，现有的本体嵌入方法未能同时充分考虑外延知识和内涵知识。本文提出了一种新颖的本体嵌入方法，命名为EIKE（外延和内涵知识嵌入），通过在两个空间中表示本体，即外延空间和内涵空间。EIKE提出了一个统一的框架，用于嵌入本体中的实例、概念及其关系，应用基于几何的方法来建模外延知识，以及预训练的语言模型来建模内涵知识，从而能够捕捉结构信息和文本信息。实验结果表明，EIKE在三个数据集中在三元分类和链接预测方面明显优于最先进的方法，表明EIKE提供了更全面和代表性的领域视角。

更新时间: 2024-06-25 12:08:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.01677v3

LLMs Are Few-Shot In-Context Low-Resource Language Learners

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages. Nonetheless, there is only a handful of works explored ICL for low-resource languages with most of them focusing on relatively high-resource languages, such as French and Spanish. In this work, we extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages. Our study not only assesses the effectiveness of ICL with LLMs in low-resource languages but also identifies the shortcomings of in-context label alignment, and introduces a more effective alternative: query alignment. Moreover, we provide valuable insights into various facets of ICL for low-resource languages. Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs through semantically relevant information by closing the language gap in the target language and aligning the semantics between the targeted low-resource and the high-resource language that the model is proficient in. Our work highlights the importance of advancing ICL research, particularly for low-resource languages. Our code is publicly released at https://github.com/SamuelCahyawijaya/in-context-alignment

Updated: 2024-06-25 11:54:23

标题: LLMs 是少样本上下文低资源语言学习者

摘要: 上下文学习（ICL）使大型语言模型（LLMs）能够使用仅有的短时上下文信息在代表性不足的语言中执行多样化任务，为缩小高资源语言和低资源语言之间的差距提供了重要途径。然而，目前只有少数几项研究探讨了用于低资源语言的ICL，其中大多数侧重于相对高资源的语言，如法语和西班牙语。在本研究中，我们广泛研究了ICL及其跨语言变体（X-ICL）在25种低资源语言和7种相对高资源语言上的应用。我们的研究不仅评估了LLMs在低资源语言中使用ICL的有效性，还确定了上下文标签对齐的不足之处，并引入了一个更有效的替代方案：查询对齐。此外，我们为低资源语言的ICL的各个方面提供了有价值的见解。我们的研究总结了少量快速上下文信息对于通过在目标语言中关闭语言差距并在模型擅长的高资源语言和目标低资源语言之间对齐语义以提高LLMs的低资源理解质量的重要性。我们的工作强调了推进ICL研究的重要性，特别是对于低资源语言。我们的代码已公开发布在https://github.com/SamuelCahyawijaya/in-context-alignment。

更新时间: 2024-06-25 11:54:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.16512v5

Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication

Low-rank adaptation (LoRA) is an attractive alternative of adapting full weights for the federated fine-tuning of large pretrained models, which can significantly reduce the memory and communication burden. In principle, federated LoRA can provide an effective mean to allocate different resources to each client by tuning ranks for each client, which can be useful in achieving a better communication-performance tradeoff. We find, however, that the empirical performance of LoRA is highly unstable with respect to such rank-heterogeneity, severely limiting the applicability to the scenarios where it is desirable or even required to allocate nonuniform communication bandwidth to each client due to constrained total bandwidth. Our investigation reveals that the root cause of this instability is the zero-padding-based aggregation strategy adopted in conventional federated LoRA frameworks, which causes the information from high rank clients to get diluted during the aggregation process. To address this issue, we propose a new replication-based padding strategy, which allows us to better leverage the information from clients with high-quality datasets. This method ensures that valuable information from high rank clients is retained during the aggregation process, accelerating the convergence speed and enhancing the overall prediction quality of the global model.

Updated: 2024-06-25 11:49:33

标题: 朝向具有等级异构通信的联合低秩适应

摘要: 低秩适应（LoRA）是调整大型预训练模型的联邦微调的一种有吸引力的替代方法，可以显著减少内存和通信负担。原则上，联邦LoRA可以通过为每个客户端调整等级来提供有效的资源分配方式，这对于实现更好的通信性能权衡非常有用。然而，我们发现，LoRA的实证表现在等级异质性方面非常不稳定，严重限制了将非均匀通信带宽分配给每个客户端的场景的适用性。我们的调查揭示了这种不稳定性的根本原因是常规联邦LoRA框架中采用的基于零填充的聚合策略，导致高等级客户端的信息在聚合过程中被稀释。为了解决这个问题，我们提出了一种新的基于复制的填充策略，这使我们能够更好地利用具有高质量数据集的客户端的信息。这种方法确保了来自高等级客户端的宝贵信息在聚合过程中得以保留，加快了收敛速度，并提高了全局模型的整体预测质量。

更新时间: 2024-06-25 11:49:33

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.17477v1

Efficient 3D Molecular Generation with Flow Matching and Scale Optimal Transport

Generative models for 3D drug design have gained prominence recently for their potential to design ligands directly within protein pockets. Current approaches, however, often suffer from very slow sampling times or generate molecules with poor chemical validity. Addressing these limitations, we propose Semla, a scalable E(3)-equivariant message passing architecture. We further introduce a molecular generation model, SemlaFlow, which is trained using flow matching along with scale optimal transport, a novel extension of equivariant optimal transport. Our model produces state-of-the-art results on benchmark datasets with just 100 sampling steps. Crucially, SemlaFlow samples high quality molecules with as few as 20 steps, corresponding to a two order-of-magnitude speed-up compared to state-of-the-art, without sacrificing performance. Furthermore, we highlight limitations of current evaluation methods for 3D generation and propose new benchmark metrics for unconditional molecular generators. Finally, using these new metrics, we compare our model's ability to generate high quality samples against current approaches and further demonstrate SemlaFlow's strong performance.

Updated: 2024-06-25 11:42:09

标题: 高效的三维分子生成方法：流匹配和尺度最优输运

摘要: 最近，为了直接设计蛋白质口袋内的配体，3D药物设计的生成模型逐渐受到关注。然而，目前的方法往往在采样时间非常缓慢或生成具有较差化学有效性的分子。为了解决这些限制，我们提出了一种可扩展的E(3)-等变消息传递架构Semla。我们进一步引入了一种分子生成模型SemlaFlow，该模型使用流匹配以及尺度最优输运进行训练，这是等变最优输运的一种新扩展。我们的模型在基准数据集上仅需100个采样步骤就能产生最先进的结果。关键的是，SemlaFlow在仅20个步骤的情况下就能生成高质量的分子，相比于最先进技术，速度提高了两个数量级，而且性能没有受损。此外，我们强调了当前对3D生成的评估方法的局限性，并提出了无条件分子生成器的新基准指标。最后，利用这些新指标，我们比较了我们模型生成高质量样本的能力与当前方法，并进一步展示了SemlaFlow的强大性能。

更新时间: 2024-06-25 11:42:09

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.07266v2

Performative Debias with Fair-exposure Optimization Driven by Strategic Agents in Recommender Systems

Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking approach in dynamic settings with fair-exposure optimization driven by strategic agents. Designed for the producer side, the execution of agents assumes content creators can modify item features based on strategic incentives to maximize their exposure. This iterative process entails an end-to-end optimization, employing differentiable ranking operators that simultaneously target accuracy and fairness. Joint objectives ensure the performance of recommendations while enhancing the visibility of tail items. We also leveraged the performativity nature of predictions to illustrate how strategic learning influences content creators to shift towards fairness efficiently, thereby incentivizing features of tail items. Through comprehensive experiments on both public and industrial datasets, we have substantiated the effectiveness and dominance of the proposed method especially on unveiling the potential of tail items.

Updated: 2024-06-25 11:41:50

标题: 在推荐系统中由战略代理驱动的公平暴露优化执行去偏见

摘要: 数据偏见，例如流行度，损害了推荐系统中双边市场的动态。这掩盖了更不显眼但潜在引人入胜的长尾商品，可能会吸引用户兴趣。尽管围绕这个问题有大量的研究，但它仍然存在挑战，并且仍然是学术界的热门话题。在这方面，在本文中，我们开发了一种在动态环境中具有公平曝光优化驱动的重新排名方法，由战略代理驱动。设计用于生产者端，代理的执行假设内容创建者可以根据战略激励修改项目特征，以最大化它们的曝光。这个迭代过程涉及端到端的优化，采用可微分排名操作符，同时针对准确性和公平性。联合目标确保推荐的性能同时增强长尾商品的可见性。我们还利用预测的执行性质来说明战略学习如何促使内容创作者有效地转向公平，从而激励长尾商品的特征。通过对公共和工业数据集的全面实验，我们已经证明了所提出方法的有效性和优势，特别是在揭示长尾商品潜力方面。

更新时间: 2024-06-25 11:41:50

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.17475v1

Transformer-based Named Entity Recognition with Combined Data Representation

This study examines transformer-based models and their effectiveness in named entity recognition tasks. The study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector. Analysis shows that training models with a single strategy may lead to poor performance on different data representations. To address this limitation, the study proposes a combined training procedure that utilizes all three strategies to improve model stability and adaptability. The results of this approach are presented and discussed for four languages (English, Polish, Czech, and German) across various datasets, demonstrating the effectiveness of the combined strategy.

Updated: 2024-06-25 11:41:16

标题: 基于Transformer的具有组合数据表示的命名实体识别

摘要: 这项研究探讨了基于变压器的模型及其在命名实体识别任务中的有效性。研究调查了数据表示策略，包括单一、合并和上下文，分别使用一个句子、多个句子以及通过关注上下文连接的句子来表示。分析表明，使用单一策略训练模型可能导致在不同数据表示上表现不佳。为解决这一局限性，该研究提出了一种综合训练程序，利用所有三种策略来提高模型的稳定性和适应性。该方法的结果在四种语言（英语、波兰语、捷克语和德语）的各种数据集上进行了展示和讨论，展示了综合策略的有效性。

更新时间: 2024-06-25 11:41:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17474v1

TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification

The usage of medical image data for the training of large-scale machine learning approaches is particularly challenging due to its scarce availability and the costly generation of data annotations, typically requiring the engagement of medical professionals. The rapid development of generative models allows towards tackling this problem by leveraging large amounts of realistic synthetically generated data for the training process. However, randomly choosing synthetic samples, might not be an optimal strategy. In this work, we investigate the targeted generation of synthetic training data, in order to improve the accuracy and robustness of image classification. Therefore, our approach aims to guide the generative model to synthesize data with high epistemic uncertainty, since large measures of epistemic uncertainty indicate underrepresented data points in the training set. During the image generation we feed images reconstructed by an auto encoder into the classifier and compute the mutual information over the class-probability distribution as a measure for uncertainty.We alter the feature space of the autoencoder through an optimization process with the objective of maximizing the classifier uncertainty on the decoded image. By training on such data we improve the performance and robustness against test time data augmentations and adversarial attacks on several classifications tasks.

Updated: 2024-06-25 11:38:46

标题: TSynD：针对增强医学图像分类的目标合成数据生成

摘要: 将医学图像数据用于大规模机器学习方法的训练特别具有挑战性，因为其稀缺可用性和昂贵的数据标注生成，通常需要医疗专业人员的参与。生成模型的快速发展允许通过利用大量逼真合成的数据来解决这个问题。然而，随机选择合成样本可能不是最佳策略。在这项工作中，我们研究了有针对性地生成合成训练数据，以提高图像分类的准确性和鲁棒性。因此，我们的方法旨在引导生成模型合成具有高认知不确定性的数据，因为大量认知不确定性指示训练集中代表性不足的数据点。在图像生成过程中，我们将由自动编码器重建的图像馈送到分类器中，并计算类概率分布上的互信息作为不确定性的衡量。我们通过优化过程改变自动编码器的特征空间，目标是最大化解码图像上的分类器不确定性。通过在这些数据上进行训练，我们改善了性能，并提高了对测试时间数据增强和对抗性攻击的鲁棒性在多个分类任务上。

更新时间: 2024-06-25 11:38:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17473v1

From Text to Test: AI-Generated Control Software for Materials Science Instruments

Large language models (LLMs) are transforming the landscape of chemistry and materials science. Recent examples of LLM-accelerated experimental research include virtual assistants for parsing synthesis recipes from the literature, or using the extracted knowledge to guide synthesis and characterization. Despite these advancements, their application is constrained to labs with automated instruments and control software, leaving much of materials science reliant on manual processes. Here, we demonstrate the rapid deployment of a Python-based control module for a Keithley 2400 electrical source measure unit using ChatGPT-4. Through iterative refinement, we achieved effective instrument management with minimal human intervention. Additionally, a user-friendly graphical user interface (GUI) was created, effectively linking all instrument controls to interactive screen elements. Finally, we integrated this AI-crafted instrument control software with a high-performance stochastic optimization algorithm to facilitate rapid and automated extraction of electronic device parameters related to semiconductor charge transport mechanisms from current-voltage (IV) measurement data. This integration resulted in a comprehensive open-source toolkit for semiconductor device characterization and analysis using IV curve measurements. We demonstrate the application of these tools by acquiring, analyzing, and parameterizing IV data from a Pt/Cr$_2$O$_3$:Mg/$\beta$-Ga$_2$O$_3$ heterojunction diode, a novel stack for high-power and high-temperature electronic devices. This approach underscores the powerful synergy between LLMs and the development of instruments for scientific inquiry, showcasing a path for further acceleration in materials science.

Updated: 2024-06-25 11:34:15

标题: 从文本到测试：材料科学仪器的人工智能生成控制软件

摘要: 大型语言模型(LLMs)正在改变化学和材料科学的格局。最近LLM加速的实验研究的例子包括从文献中解析合成配方的虚拟助手，或者利用提取的知识指导合成和表征。尽管取得了这些进展，它们的应用仍受限于具有自动化仪器和控制软件的实验室，许多材料科学仍依赖于手动过程。在这里，我们展示了基于Python的控制模块快速部署的案例，该模块用于使用ChatGPT-4控制Keithley 2400电源测量单元。通过迭代改进，我们实现了对仪器的有效管理，减少了人为干预。此外，我们创建了一个用户友好的图形用户界面(GUI)，有效地将所有仪器控制链接到交互式屏幕元素。最后，我们将这个由人工智能设计的仪器控制软件与高性能随机优化算法集成，以促进从电流-电压(IV)测量数据中快速自动提取与半导体电荷传输机制相关的电子器件参数。这种集成导致了一个全面的开源工具包，用于使用IV曲线测量对半导体器件进行表征和分析。我们通过从Pt/Cr$_2$O$_3$:Mg/$\beta$-Ga$_2$O$_3$异质结二极管中获取、分析和参数化IV数据来展示这些工具的应用，这是用于高功率和高温电子器件的新型堆叠。这种方法强调了LLMs与科学研究仪器开发之间的强大协同作用，展示了材料科学进一步加速的路径。

更新时间: 2024-06-25 11:34:15

领域: cond-mat.mtrl-sci,cs.AI

下载: http://arxiv.org/abs/2406.16224v2

Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Leveraging the computing and sensing capabilities of vehicles, vehicular federated learning (VFL) has been applied to edge training for connected vehicles. The dynamic and interconnected nature of vehicular networks presents unique opportunities to harness direct vehicle-to-vehicle (V2V) communications, enhancing VFL training efficiency. In this paper, we formulate a stochastic optimization problem to optimize the VFL training performance, considering the energy constraints and mobility of vehicles, and propose a V2V-enhanced dynamic scheduling (VEDS) algorithm to solve it. The model aggregation requirements of VFL and the limited transmission time due to mobility result in a stepwise objective function, which presents challenges in solving the problem. We thus propose a derivative-based drift-plus-penalty method to convert the long-term stochastic optimization problem to an online mixed integer nonlinear programming (MINLP) problem, and provide a theoretical analysis to bound the performance gap between the online solution and the offline optimal solution. Further analysis of the scheduling priority reduces the original problem into a set of convex optimization problems, which are efficiently solved using the interior-point method. Experimental results demonstrate that compared with the state-of-the-art benchmarks, the proposed algorithm enhances the image classification accuracy on the CIFAR-10 dataset by 3.18% and reduces the average displacement errors on the Argoverse trajectory prediction dataset by 10.21%.

Updated: 2024-06-25 11:15:53

标题: 动态调度用于增强联邦学习的车辆间通信

摘要: 通过利用车辆的计算和感知能力，车辆联合学习（VFL）已被应用于连接车辆的边缘训练。车辆网络的动态和互联性特质提供了利用直接车辆间通信（V2V）增强VFL训练效率的独特机会。在本文中，我们制定了一个随机优化问题，以优化VFL训练性能，考虑到车辆的能源限制和移动性，并提出了一个V2V增强动态调度（VEDS）算法来解决这个问题。VFL的模型聚合需求和由于移动性所导致的有限传输时间导致了一个分步的目标函数，这在解决问题时存在挑战。因此，我们提出了一个基于导数的漂移加罚方法，将长期随机优化问题转换为在线混合整数非线性规划（MINLP）问题，并提供了一个理论分析来界定在线解和离线最优解之间的性能差距。进一步分析调度优先级将原始问题简化为一组凸优化问题，这些问题可以通过内点法有效地解决。实验结果表明，与最先进的基准相比，所提出的算法将CIFAR-10数据集上的图像分类准确率提高了3.18％，并将Argoverse轨迹预测数据集上的平均位移误差降低了10.21％。

更新时间: 2024-06-25 11:15:53

领域: cs.LG,cs.AI,cs.DC,cs.IT,math.IT

下载: http://arxiv.org/abs/2406.17470v1

Early learning of the optimal constant solution in neural networks and humans

Deep neural networks learn increasingly complex functions over the course of training. Here, we show both empirically and theoretically that learning of the target function is preceded by an early phase in which networks learn the optimal constant solution (OCS) - that is, initial model responses mirror the distribution of target labels, while entirely ignoring information provided in the input. Using a hierarchical category learning task, we derive exact solutions for learning dynamics in deep linear networks trained with bias terms. Even when initialized to zero, this simple architectural feature induces substantial changes in early dynamics. We identify hallmarks of this early OCS phase and illustrate how these signatures are observed in deep linear networks and larger, more complex (and nonlinear) convolutional neural networks solving a hierarchical learning task based on MNIST and CIFAR10. We explain these observations by proving that deep linear networks necessarily learn the OCS during early learning. To further probe the generality of our results, we train human learners over the course of three days on the category learning task. We then identify qualitative signatures of this early OCS phase in terms of the dynamics of true negative (correct-rejection) rates. Surprisingly, we find the same early reliance on the OCS in the behaviour of human learners. Finally, we show that learning of the OCS can emerge even in the absence of bias terms and is equivalently driven by generic correlations in the input data. Overall, our work suggests the OCS as a universal learning principle in supervised, error-corrective learning, and the mechanistic reasons for its prevalence.

Updated: 2024-06-25 11:12:52

标题: 神经网络和人类早期学习最佳常数解

摘要: 深度神经网络在训练过程中学习越来越复杂的函数。在这里，我们在实证和理论上展示，学习目标函数之前存在一个早期阶段，网络在这个阶段学习到最优常数解（OCS）-也就是说，初始模型的响应反映了目标标签的分布，完全忽略了输入提供的信息。通过一个层级类别学习任务，我们推导了使用偏置项训练的深度线性网络学习动态的确切解。即使初始化为零，这个简单的架构特征也会在早期动态中引起显著变化。我们确定了这个早期OCS阶段的特征，并说明这些特征如何在解决基于MNIST和CIFAR10的层级学习任务的深度线性网络和更大、更复杂（非线性）的卷积神经网络中观察到。我们通过证明深度线性网络在早期学习阶段必然学习到OCS来解释这些观察结果。为了进一步探究我们结果的普适性，我们训练人类学习者在三天的时间内进行类别学习任务。然后，我们通过真实负面（正确拒绝）率的动态识别了这个早期OCS阶段的定性特征。令人惊讶的是，我们发现人类学习者的行为中也存在对OCS的早期依赖。最后，我们展示了即使在没有偏置项的情况下，OCS的学习也可能出现，并且同样受输入数据中的通用相关性驱动。总的来说，我们的工作认为OCS是监督式、错误修正学习中的一种普遍学习原则，并解释了其普遍性的机制原因。

更新时间: 2024-06-25 11:12:52

领域: cs.LG

下载: http://arxiv.org/abs/2406.17467v1

Enhancing Tool Retrieval with Iterative Feedback from Large Language Models

Tool learning aims to enhance and expand large language models' (LLMs) capabilities with external tools, which has gained significant attention recently. Current methods have shown that LLMs can effectively handle a certain amount of tools through in-context learning or fine-tuning. However, in real-world scenarios, the number of tools is typically extensive and irregularly updated, emphasizing the necessity for a dedicated tool retrieval component. Tool retrieval is nontrivial due to the following challenges: 1) complex user instructions and tool descriptions; 2) misalignment between tool retrieval and tool usage models. To address the above issues, we propose to enhance tool retrieval with iterative feedback from the large language model. Specifically, we prompt the tool usage model, i.e., the LLM, to provide feedback for the tool retriever model in multi-round, which could progressively improve the tool retriever's understanding of instructions and tools and reduce the gap between the two standalone components. We build a unified and comprehensive benchmark to evaluate tool retrieval models. The extensive experiments indicate that our proposed approach achieves advanced performance in both in-domain evaluation and out-of-domain evaluation.

Updated: 2024-06-25 11:12:01

标题: 利用大型语言模型的迭代反馈来增强工具检索

摘要: 工具学习旨在通过外部工具增强和扩展大型语言模型（LLMs）的能力，最近引起了广泛关注。当前的方法表明，LLMs可以通过上下文学习或微调有效地处理一定数量的工具。然而，在现实世界的场景中，工具的数量通常是庞大的，并且不规则更新，强调了需要一个专门的工具检索组件。由于以下挑战，工具检索并不容易：1）复杂的用户说明和工具描述；2）工具检索和工具使用模型之间的不匹配。为了解决上述问题，我们提出通过大型语言模型的迭代反馈来增强工具检索。具体来说，我们促使工具使用模型，即LLM，在多轮中为工具检索模型提供反馈，这可以逐步提高工具检索对说明和工具的理解，并减少两个独立组件之间的差距。我们建立了一个统一和全面的基准来评估工具检索模型。广泛的实验表明，我们提出的方法在领域内评估和领域外评估中均取得了先进的性能。

更新时间: 2024-06-25 11:12:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17465v1

The Tree of Diffusion Life: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

Diffusion models generate high-quality samples by corrupting data with Gaussian noise and iteratively reconstructing it with deep learning, slowly transforming noisy images into refined outputs. Understanding this data evolution is important for interpretability but is complex due to its high-dimensional evolutionary nature. While traditional dimensionality reduction methods like t-distributed stochastic neighborhood embedding (t-SNE) aid in understanding high-dimensional spaces, they neglect evolutionary structure preservation. Hence, we propose Tree of Diffusion Life (TDL), a method to understand data evolution in the generative process of diffusion models. TDL samples a diffusion model's generative space via instances with varying prompts and employs image encoders to extract semantic meaning from these samples, projecting them to an intermediate space. It employs a novel evolutionary embedding algorithm that explicitly encodes the iterations while preserving the high-dimensional relations, facilitating the visualization of data evolution. This embedding leverages three metrics: a standard t-SNE loss to group semantically similar elements, a displacement loss to group elements from the same iteration step, and an instance alignment loss to align elements of the same instance across iterations. We present rectilinear and radial layouts to represent iterations, enabling comprehensive exploration. We assess various feature extractors and highlight TDL's potential with prominent diffusion models like GLIDE and Stable Diffusion with different prompt sets. TDL simplifies understanding data evolution within diffusion models, offering valuable insights into their functioning.

Updated: 2024-06-25 11:05:26

标题: 扩散生命之树：进化嵌入以理解扩散模型生成过程

摘要: 扩散模型通过用高斯噪声破坏数据并通过深度学习迭代重建数据，从而生成高质量样本，逐渐将嘈杂图像转化为精细的输出。理解这种数据演变对于可解释性至关重要，但由于其高维演化性质而复杂。虽然传统的降维方法如 t-分布随机邻域嵌入（t-SNE）有助于理解高维空间，但它们忽略了演化结构的保持。因此，我们提出了Tree of Diffusion Life（TDL），一种用于理解扩散模型生成过程中数据演变的方法。TDL通过具有不同提示的实例在扩散模型的生成空间中取样，并利用图像编码器从这些样本中提取语义含义，将它们投影到一个中间空间。它采用一种新颖的演化嵌入算法，明确编码迭代过程，同时保持高维关系，便于数据演变的可视化。这种嵌入利用三个指标：标准的 t-SNE 损失用于将语义相似的元素分组，位移损失用于将同一迭代步骤的元素分组，实例对齐损失用于对齐同一实例的元素跨迭代。我们提供了矩形和径向布局表示迭代，实现全面探索。我们评估了各种特征提取器，并突出了 TDL 与诸如 GLIDE 和 Stable Diffusion 等不同提示集的知名扩散模型的潜力。TDL简化了对扩散模型内数据演变的理解，为了解其功能提供了宝贵的见解。

更新时间: 2024-06-25 11:05:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17462v1

Towards Unbiased Calibration using Meta-Regularization

Model miscalibration has been frequently identified in modern deep neural networks. Recent work aims to improve model calibration directly through a differentiable calibration proxy. However, the calibration produced is often biased due to the binning mechanism. In this work, we propose to learn better-calibrated models via meta-regularization, which has two components: (1) gamma network (gamma-net), a meta learner that outputs sample-wise gamma values (continuous variable) for Focal loss for regularizing the backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based, unbiased, and differentiable surrogate to ECE that enables the smooth optimization of gamma-Net. We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets. We empirically demonstrate that: (a) learning sample-wise gamma as continuous variables can effectively improve calibration; (b) SECE smoothly optimizes gamma-net towards unbiased and robust calibration with respect to the binning schemes; and (c) the combination of gamma-net and SECE achieves the best calibration performance across various calibration metrics while retaining very competitive predictive performance as compared to multiple recently proposed methods.

Updated: 2024-06-25 11:00:05

标题: 朝向使用元正则化的无偏校准

摘要: 模型误差校准在现代深度神经网络中经常被发现。最近的工作旨在通过可微的校准代理直接改善模型校准。然而，由于分箱机制，产生的校准往往存在偏差。在这项工作中，我们提出通过元正则化学习更好校准的模型，该方法包括两个组成部分：（1）伽马网络（gamma-net），一个元学习器，用于为Focal损失输出样本级伽马值（连续变量），用于正则化骨干网络；（2）平滑期望校准误差（SECE），基于高斯核的、无偏的、可微的ECE代理，使得gamma-Net的平滑优化成为可能。我们在三个计算机视觉数据集上评估了所提出方法在正则化神经网络朝着改善和无偏校准方面的有效性。我们从经验上证明：（a）学习样本级gamma作为连续变量可以有效改善校准；（b）SECE可以平滑地优化gamma-net，实现对于分箱方案的无偏和稳健校准；（c）gamma-net和SECE的组合在各种校准度量上实现了最佳的校准性能，同时与多种最近提出的方法相比，保持了非常有竞争力的预测性能。

更新时间: 2024-06-25 11:00:05

领域: cs.LG

下载: http://arxiv.org/abs/2303.15057v3

Embodied Question Answering via Multi-LLM Systems

Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. Using CAM, we observe a $50\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. Finally, we present a feature importance analysis for CAM via permutation feature importance (PFI), quantifying CAMs reliance on each independent agent and query context.

Updated: 2024-06-25 10:50:09

标题: 通过多种LLM系统实现具身化问答

摘要: 具体问题回答（EQA）是一个重要的问题，涉及代理人探索环境以回答用户查询。在现有文献中，EQA仅在单一代理场景中进行研究，其中探索可能耗时且昂贵。在这项工作中，我们考虑在一个多代理框架中进行EQA，涉及多个基于大型语言模型（LLM）的独立代理回答有关家庭环境的查询。为了为每个查询生成一个答案，我们使用各自的响应来训练一个中央答案模型（CAM），该模型聚合响应以获得稳健的答案。使用CAM时，与集成LLM的聚合方法（如投票方案和辩论）相比，我们观察到50％更高的EQA准确性。CAM不需要任何形式的代理通信，减轻了相关成本。我们使用各种非线性（神经网络、随机森林、决策树、XGBoost）和线性（逻辑回归分类器、支持向量机）算法来消融CAM。最后，我们通过排列特征重要性（PFI）为CAM提供特征重要性分析，量化CAM对每个独立代理和查询上下文的依赖。

更新时间: 2024-06-25 10:50:09

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10918v3

Improving Grammatical Error Correction via Contextual Data Augmentation

Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase rather than the data-limited fine-tuning phase due to inconsistent error distribution and noisy labels. In this paper, we propose a synthetic data construction method based on contextual augmentation, which can ensure an efficient augmentation of the original data with a more consistent error distribution. Specifically, we combine rule-based substitution with model-based generation, using the generative model to generate a richer context for the extracted error patterns. Besides, we also propose a relabeling-based data cleaning method to mitigate the effects of noisy labels in synthetic data. Experiments on CoNLL14 and BEA19-Test show that our proposed augmentation method consistently and substantially outperforms strong baselines and achieves the state-of-the-art level with only a few synthetic data.

Updated: 2024-06-25 10:49:56

标题: 通过上下文数据增强改进语法错误纠正

摘要: 现在，通过合成数据进行数据增强在语法错误纠正（GEC）领域被广泛使用，以缓解数据稀缺问题。然而，这些合成数据主要用于预训练阶段，而不是由于错误分布不一致和噪声标签而限制数据微调阶段。本文提出了一种基于上下文增强的合成数据构建方法，可以确保对原始数据进行更一致的错误分布有效增强。具体来说，我们将基于规则的替换与基于模型的生成相结合，利用生成模型为提取的错误模式生成更丰富的上下文。此外，我们还提出了一种基于重新标记的数据清理方法，以减轻合成数据中噪声标签的影响。在CoNLL14和BEA19-Test上的实验表明，我们提出的增强方法始终且显著优于强基线，并仅使用少量合成数据即可达到最先进水平。

更新时间: 2024-06-25 10:49:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17456v1

High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data

This paper introduces a novel formulation of the clustering problem, namely the Minimum Sum-of-Squares Clustering of Infinitely Tall Data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive-cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism, and outperforming even advanced algorithms designed for small and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on synthetic data further confirm HPClust's exceptional scalability and robustness to noise.

Updated: 2024-06-25 10:49:06

标题: 高性能混合算法用于无限高数据的最小平方和聚类

摘要: 本文介绍了一种新颖的聚类问题形式，即无限高数据的最小平方和聚类（MSSC-ITD），并提出了HPClust，一组创新的混合并行方法，用于有效解决该问题。通过利用现代高性能计算技术，HPClust增强了关键的聚类指标：效果、计算效率和可扩展性。与普通数据并行不同，后者仅通过MapReduce框架加快处理时间，我们的方法通过利用多策略竞争合作并行以及目标函数景观的复杂性质，解锁了更优越的性能。与其他现有算法相比，这些算法往往难以扩展，我们的算法本质上是并行的，通过增加可扩展性和并行性来改善解决方案质量，并且优于为小型和中型数据集设计的先进算法。我们对HPClust的评估包括四种并行策略，证明了它在关键指标上的优越性，比传统和尖端方法表现更好。这些结果还表明，并行处理不仅增强了聚类效率，还提高了准确性。此外，我们探索了计算效率和聚类质量之间的平衡，根据数据集的特定情况和资源可用性，提供了关于最佳并行策略的见解。这项研究推进了我们对聚类算法中并行性的理解，表明对MSSC-ITD采用先进并行方法的明智混合能够产生最佳结果。对合成数据的实验进一步证实了HPClust在可扩展性和对噪声的稳健性方面的卓越表现。

更新时间: 2024-06-25 10:49:06

领域: cs.DC,cs.LG,math.OC

下载: http://arxiv.org/abs/2311.04517v5

Pseudo Labelling for Enhanced Masked Autoencoders

Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating additional architectural components. In this paper, we propose an enhanced approach that boosts MAE performance by integrating pseudo labelling for both class and data tokens, alongside replacing the traditional pixel-level reconstruction with token-level reconstruction. This strategy uses cluster assignments as pseudo labels to promote instance-level discrimination within the network, while token reconstruction requires generation of discrete tokens encapturing local context. The targets for pseudo labelling and reconstruction needs to be generated by a teacher network. To disentangle the generation of target pseudo labels and the reconstruction of the token features, we decouple the teacher into two distinct models, where one serves as a labelling teacher and the other as a reconstruction teacher. This separation proves empirically superior to a single teacher, while having negligible impact on throughput and memory consumption. Incorporating pseudo-labelling as an auxiliary task has demonstrated notable improvements in ImageNet-1K and other downstream tasks, including classification, semantic segmentation, and detection.

Updated: 2024-06-25 10:41:45

标题: 伪标记用于增强遮罩自动编码器

摘要: 基于Masked Image Modeling（MIM）的模型，如SdAE、CAE、GreenMIM和MixAE，通过修改预测、损失函数或整合额外的架构组件，探索了增强Masked Autoencoders（MAE）性能的不同策略。本文提出了一种增强方法，通过将伪标记（pseudo labelling）集成到类别和数据标记中，以及用标记级别的重建替代传统的像素级重建，来提升MAE性能。该策略使用聚类分配作为伪标签，促进网络内部实例级别的区分，同时标记重建需要生成包含局部上下文的离散标记。伪标记和重建的目标需要由一个教师网络生成。为了解开目标伪标签的生成和标记特征的重建，我们将教师分为两个独立的模型，一个用作标记教师，另一个用作重建教师。这种分离在经验上优于单个教师，同时对吞吐量和内存消耗几乎没有影响。将伪标记作为辅助任务整合进来，在ImageNet-1K和其他下游任务中，包括分类、语义分割和检测，已经展示了显著的改进。

更新时间: 2024-06-25 10:41:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17450v1

WRDScore: New Metric for Evaluation of Natural Language Generation Models

The problem of natural language generation, and, more specifically, method name prediction, faces significant difficulties when proposed models need to be evaluated on test data. Such a metric would need to consider the versatility with which a single method can be named, with respect to both semantics and syntax. Measuring the direct overlap between the predicted and reference (true) sequences will not be able to capture these subtleties. Other existing embedding based metrics either do not measure precision and recall or impose strict unrealistic assumptions on both sequences. To address these issues, we propose a new metric that, on the one hand, is very simple and lightweight, and, on the other hand, is able to calculate precision and recall without resorting to any assumptions while obtaining good performance with respect to the human judgement.

Updated: 2024-06-25 10:41:43

标题: WRDScore：自然语言生成模型评估的新指标

摘要: 自然语言生成问题，尤其是方法名称预测，在测试数据上评估时面临重大困难。这样的度量标准需要考虑单个方法可以被命名的多样性，包括语义和语法。直接衡量预测序列与参考（真实）序列之间的重叠将无法捕捉这些微妙之处。其他现有基于嵌入的度量标准要么不衡量精确度和召回率，要么对两个序列都施加严格的不切实际的假设。为了解决这些问题，我们提出了一种新的度量标准，一方面非常简单轻量，另一方面能够在不依赖任何假设的情况下计算精确度和召回率，同时在人类判断方面表现良好。

更新时间: 2024-06-25 10:41:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19220v3

DE-COP: Detecting Copyrighted Content in Language Models Training Data

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.

Updated: 2024-06-25 10:33:41

标题: DE-COP：在语言模型训练数据中检测版权内容

摘要: 我们如何检测语言模型的训练过程中是否使用了受版权保护的内容，考虑到训练数据通常是不公开的？我们的动机是基于一个假设，即语言模型可能会识别其训练文本中的直接摘录。我们提出了DE-COP，一种确定是否在训练中包含受版权保护内容的方法。DE-COP的核心方法是用多项选择题向语言模型提出问题，选项包括直接文本及其释义。我们构建了BookTection，一个基准测试，包含165本出版在模型训练截止日期之前和之后的书籍摘录，以及它们的释义。我们的实验表明，DE-COP在具有对数可用的模型上，检测性能（AUC）比先前最佳方法提高了9.6%。此外，DE-COP还在完全黑匣模型上实现了对可疑书籍的平均准确率为72％，而先前的方法仅提供约4％的准确率。代码和数据集可在https://github.com/LeiLiLab/DE-COP 上获得。

更新时间: 2024-06-25 10:33:41

领域: cs.CL,cs.LG,I.2

下载: http://arxiv.org/abs/2402.09910v2

Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case Study

Cybersecurity remains a critical challenge in the digital age, with network traffic flow anomaly detection being a key pivotal instrument in the fight against cyber threats. In this study, we address the prevalent issue of data integrity in network traffic datasets, which are instrumental in developing machine learning (ML) models for anomaly detection. We introduce two refined versions of the CICIDS-2017 dataset, NFS-2023-nTE and NFS-2023-TE, processed using NFStream to ensure methodologically sound flow expiration and labeling. Our research contrasts the performance of the Random Forest (RF) algorithm across the original CICIDS-2017, its refined counterparts WTMC-2021 and CRiSIS-2022, and our NFStream-generated datasets, in both binary and multi-class classification contexts. We observe that the RF model exhibits exceptional robustness, achieving consistent high-performance metrics irrespective of the underlying dataset quality, which prompts a critical discussion on the actual impact of data integrity on ML efficacy. Our study underscores the importance of continual refinement and methodological rigor in dataset generation for network security research. As the landscape of network threats evolves, so must the tools and techniques used to detect and analyze them.

Updated: 2024-06-25 10:27:26

标题: 评估基于机器学习的异常检测在不同数据集上的表现：案例研究

摘要: 网络安全在数字时代仍然是一个关键挑战，网络流量异常检测是对抗网络威胁的关键工具。在这项研究中，我们解决了网络流量数据集中数据完整性的普遍问题，这对于开发用于异常检测的机器学习（ML）模型至关重要。我们引入了两个经过精细处理的CICIDS-2017数据集的改进版本，NFS-2023-nTE和NFS-2023-TE，使用NFStream确保方法论上的流过期和标记。我们的研究比较了原始CICIDS-2017、其改进版本WTMC-2021和CRiSIS-2022以及我们通过NFStream生成的数据集在二进制和多类分类背景下的随机森林（RF）算法的性能。我们观察到RF模型表现出卓越的稳健性，无论基础数据集的质量如何，都能实现一致的高性能指标，这引发了关于数据完整性对ML效能实际影响的关键讨论。我们的研究强调了网络安全研究中数据集生成持续完善和方法论严谨的重要性。随着网络威胁的演变，用于检测和分析网络威胁的工具和技术也必须不断更新。

更新时间: 2024-06-25 10:27:26

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2401.16843v2

Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. Further, this correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of the Kakade gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use this to show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.

Updated: 2024-06-25 10:26:49

标题: 离散折现马尔可夫决策过程中熵正则化误差的基本尖锐估计

摘要: 我们研究了对无限时间段离散折现马尔可夫决策过程进行熵正则化引入的误差。我们展示了这种误差在逆正则化强度下以指数方式减少，无论是在加权KL-散度还是价值方面，都具有特定问题指数。我们提供了一个下界，与我们的上界匹配，最多相差一个多项式因子。我们的证明依赖于熵正则化马尔可夫决策过程的解与在自然策略梯度方法中常见的黎曼度量下未正则化奖励的梯度流之间的对应关系。此外，这种对应关系使我们能够确定梯度流的极限为广义最大熵最优策略，从而表征了Kakade梯度流的隐式偏差，该偏差对应于自然策略梯度方法的时间连续版本。我们利用这一点表明，对于熵正则化的自然策略梯度方法，总误差以迭代次数的平方根指数方式下降，改善了现有的亚线性保证。

更新时间: 2024-06-25 10:26:49

领域: math.OC,cs.LG,cs.SY,eess.SY,37N40, 65K05, 90C05, 90C40, 90C53

下载: http://arxiv.org/abs/2406.04163v2

Low-Cost Privacy-Aware Decentralized Learning

This paper introduces ZIP-DL, a novel privacy-aware decentralized learning (DL) algorithm that exploits correlated noise to provide strong privacy protection against a local adversary while yielding efficient convergence guarantees for a low communication cost. The progressive neutralization of the added noise during the distributed aggregation process results in ZIP-DL fostering a high model accuracy under privacy guarantees. ZIP-DL further uses a single communication round between each gradient descent, thus minimizing communication overhead. We provide theoretical guarantees for both convergence speed and privacy guarantees, thereby making ZIP-DL applicable to practical scenarios. Our extensive experimental study shows that ZIP-DL significantly outperforms the state-of-the-art in terms of vulnerability/accuracy trade-off. In particular, ZIP-DL (i) reduces the efficacy of linkability attacks by up to 52 percentage points compared to baseline DL, (ii) improves accuracy by up to 37 percent w.r.t. the state-of-the-art privacy-preserving mechanism operating under the same threat model as ours, when configured to provide the same protection against membership inference attacks, and (iii) reduces communication by up to 10.5x against the same competitor for the same level of protection.

Updated: 2024-06-25 10:20:49

标题: 低成本隐私保护意识的去中心化学习

摘要: 本文介绍了ZIP-DL，一种新颖的隐私感知去中心化学习（DL）算法，利用相关噪声提供强大的隐私保护，同时以低通信成本实现高效收敛保证。在分布式聚合过程中逐渐中和增加的噪声，使得ZIP-DL在隐私保证下促进了高模型准确性。ZIP-DL进一步利用了梯度下降之间的单一通信轮次，从而最大程度地减少了通信开销。我们为收敛速度和隐私保证提供了理论保证，从而使ZIP-DL适用于实际场景。我们的广泛实验研究表明，与最先进技术相比，ZIP-DL在易受攻击性/准确性平衡方面表现显著优于其他方法。具体来说，ZIP-DL（i）相对于基线DL将可追溯性攻击的效果降低了高达52个百分点，（ii）在配置为提供相同对抗成员推断攻击保护的情况下，相对于操作在相同威胁模型下的最先进隐私保护机制，将准确性提高了高达37％，（iii）相对于相同对手，在相同保护水平下将通信减少了高达10.5倍。

更新时间: 2024-06-25 10:20:49

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2403.11795v2

Mind the Graph When Balancing Data for Fairness or Robustness

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

Updated: 2024-06-25 10:16:19

标题: 注意图形平衡数据以实现公平性或鲁棒性

摘要: 在机器学习预测设置中，公平性或鲁棒性的失败可能是由于协变量、结果和辅助变化因素之间存在意外的依赖关系。缓解这些失败的常见策略是数据平衡，试图消除这些意外的依赖关系。在这项工作中，我们定义了对训练分布的条件，使数据平衡能够产生公平或鲁棒的模型。我们的结果显示，在许多情况下，平衡分布并不对应于有选择性地消除任务的因果图中的意外依赖关系，导致多种失败模式甚至干扰其他缓解技术如正则化。总的来说，我们的结果凸显了在进行数据平衡之前考虑因果图的重要性。

更新时间: 2024-06-25 10:16:19

领域: cs.LG

下载: http://arxiv.org/abs/2406.17433v1

A Critical Analysis of the Theoretical Framework of the Extreme Learning Machine

Despite the number of successful applications of the Extreme Learning Machine (ELM), we show that its underlying foundational principles do not have a rigorous mathematical justification. Specifically, we refute the proofs of two main statements, and we also create a dataset that provides a counterexample to the ELM learning algorithm and explain its design, which leads to many such counterexamples. Finally, we provide alternative statements of the foundations, which justify the efficiency of ELM in some theoretical cases.

Updated: 2024-06-25 10:06:07

标题: 极限学习机理论框架的批判性分析

摘要: 尽管极限学习机（ELM）已经在许多成功的应用中得到了证明，但我们显示其基本原理并没有严格的数学理论依据。具体来说，我们反驳了两个主要论点的证明，并创建了一个数据集，为ELM学习算法提供了反例，并解释了其设计，从而产生了许多这样的反例。最后，我们提供了基础理论的替代陈述，这些陈述在某些理论情况下证明了ELM的效率。

更新时间: 2024-06-25 10:06:07

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.17427v1

CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems

Cooperative Multi-Agent Reinforcement Learning (CMARL) strategies are well known to be vulnerable to adversarial perturbations. Previous works on adversarial attacks have primarily focused on white-box attacks that directly perturb the states or actions of victim agents, often in scenarios with a limited number of attacks. However, gaining complete access to victim agents in real-world environments is exceedingly difficult. To create more realistic adversarial attacks, we introduce a novel method that involves injecting traitor agents into the CMARL system. We model this problem as a Traitor Markov Decision Process (TMDP), where traitors cannot directly attack the victim agents but can influence their formation or positioning through collisions. In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward. Despite this, the training efficiency for traitors remains low because it is challenging for them to directly associate their actions with the victim agents' rewards. To address this issue, we propose the Curiosity-Driven Adversarial Attack (CuDA2) framework. CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies while maintaining the optimal policy invariance of the traitors. Specifically, we employ a pre-trained Random Network Distillation (RND) module, where the extra reward generated by the RND module encourages traitors to explore states unencountered by the victim agents. Extensive experiments on various scenarios from SMAC demonstrate that our CuDA2 framework offers comparable or superior adversarial attack capabilities compared to other baselines.

Updated: 2024-06-25 09:59:31

标题: CuDA2：将叛徒代理纳入合作多智能体系统的方法

摘要: 合作式多智能体强化学习（CMARL）策略众所周知容易受到敌对扰动的影响。先前的研究主要集中在白盒攻击上，直接扰乱受害代理的状态或行动，通常在攻击数量有限的情况下。然而，在现实环境中完全获取受害代理的访问权限非常困难。为了创建更真实的敌对攻击，我们引入了一种新的方法，即将叛徒代理注入到CMARL系统中。我们将这个问题建模为叛徒马尔可夫决策过程（TMDP），其中叛徒不能直接攻击受害代理，但可以通过碰撞影响它们的形成或位置。在TMDP中，叛徒使用与受害代理相同的MARL算法进行训练，其奖励函数设定为受害代理奖励的负值。尽管如此，叛徒的训练效率仍然较低，因为他们很难直接将自己的行动与受害代理的奖励联系起来。为解决这个问题，我们提出了Curiosity-Driven Adversarial Attack（CuDA2）框架。CuDA2增强了对指定受害代理政策的攻击效率和侵略性，同时保持了叛徒的最优政策不变性。具体来说，我们采用了一个预训练的随机网络蒸馏（RND）模块，其中RND模块产生的额外奖励鼓励叛徒探索受害代理未遇到的状态。通过在SMAC的各种场景上进行广泛实验，我们的CuDA2框架相比其他基准线提供了可比或更优越的敌对攻击能力。

更新时间: 2024-06-25 09:59:31

领域: cs.LG,cs.AI,cs.CR,cs.MA

下载: http://arxiv.org/abs/2406.17425v1

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-context applications. To bridge this gap, we propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA). Unlike typical document QA, in Loong's test cases, each document is relevant to the final answer, ignoring any document will lead to the failure of the answer. Furthermore, Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning, to facilitate a more realistic and comprehensive evaluation of long-context understanding. Extensive experiments indicate that existing long-context language models still exhibit considerable potential for enhancement. Retrieval augmented generation (RAG) achieves poor performance, demonstrating that Loong can reliably assess the model's long-context modeling capabilities.

Updated: 2024-06-25 09:42:56

标题: 不留下任何文件：使用扩展的多文档QA基准测试长上下文LLM

摘要: 长文本建模能力引起了广泛关注，导致出现了具有超长上下文窗口的大型语言模型（LLMs）。与此同时，用于评估长上下文LLMs的基准逐渐赶上。然而，现有的基准采用无关的噪音文本来人为延长测试案例的长度，与长上下文应用的真实场景背道而驰。为了弥补这一差距，我们提出了一个新颖的长上下文基准Loong，通过扩展的多文档问答（QA）来与现实场景保持一致。与典型的文档问答不同，在Loong的测试案例中，每个文档都与最终答案相关，忽略任何文档将导致答案失败。此外，Loong引入了四种类型的任务，涵盖不同上下文长度范围：聚光定位、比较、聚类和推理链，以促进对长上下文理解的更为真实和全面的评估。大量实验表明，现有的长上下文语言模型仍然具有相当大的增强潜力。检索增强生成（RAG）表现不佳，证明Loong能够可靠地评估模型的长上下文建模能力。

更新时间: 2024-06-25 09:42:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17419v1

SE-VGAE: Unsupervised Disentangled Representation Learning for Interpretable Architectural Layout Design Graph Generation

Despite the suitability of graphs for capturing the relational structures inherent in architectural layout designs, there is a notable dearth of research on interpreting architectural design space using graph-based representation learning and exploring architectural design graph generation. Concurrently, disentangled representation learning in graph generation faces challenges such as node permutation invariance and representation expressiveness. To address these challenges, we introduce an unsupervised disentangled representation learning framework, Style-based Edge-augmented Variational Graph Auto-Encoder (SE-VGAE), aiming to generate architectural layout in the form of attributed adjacency multi-graphs while prioritizing representation disentanglement. The framework is designed with three alternative pipelines, each integrating a transformer-based edge-augmented encoder, a latent space disentanglement module, and a style-based decoder. These components collectively facilitate the decomposition of latent factors influencing architectural layout graph generation, enhancing generation fidelity and diversity. We also provide insights into optimizing the framework by systematically exploring graph feature augmentation schemes and evaluating their effectiveness for disentangling architectural layout representation through extensive experiments. Additionally, we contribute a new benchmark large-scale architectural layout graph dataset extracted from real-world floor plan images to facilitate the exploration of graph data-based architectural design representation space interpretation. This study pioneered disentangled representation learning for the architectural layout graph generation. The code and dataset of this study will be open-sourced.

Updated: 2024-06-25 09:40:47

标题: SE-VGAE：用于可解释性建筑布局设计图生成的无监督解缠表示学习

摘要: 尽管图表适合捕捉建筑布局设计中固有的关系结构，但在使用基于图表表示学习解释建筑设计空间以及探索建筑设计图生成方面，研究明显不足。同时，图生成中的解耦表示学习面临节点排列不变性和表示表达能力等挑战。为解决这些挑战，我们引入了一种无监督的解耦表征学习框架，即基于样式的边增强变分图自动编码器（SE-VGAE），旨在以属性邻接多图的形式生成建筑布局，同时优先考虑表征解耦。该框架设计了三种替代流水线，每种都集成了基于变压器的边增强编码器、潜在空间解耦模块和基于样式的解码器。这些组件共同促进了影响建筑布局图生成的潜在因素的分解，增强了生成的逼真度和多样性。我们还通过系统地探索图特征增强方案并评估其对建筑布局表示解耦的有效性，为优化该框架提供了见解。此外，我们贡献了一个新的基于真实世界平面图像提取的大规模建筑布局图数据集，以促进基于图数据的建筑设计表示空间的探索。本研究开创了建筑布局图生成的解耦表征学习。本研究的代码和数据集将开源。

更新时间: 2024-06-25 09:40:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17418v1

A Numerical Proof of Shell Model Turbulence Closure

The development of turbulence closure models, parametrizing the influence of small non-resolved scales on the dynamics of large resolved ones, is an outstanding theoretical challenge with vast applicative relevance. We present a closure, based on deep recurrent neural networks, that quantitatively reproduces, within statistical errors, Eulerian and Lagrangian structure functions and the intermittent statistics of the energy cascade, including those of subgrid fluxes. To achieve high-order statistical accuracy, and thus a stringent statistical test, we employ shell models of turbulence. Our results encourage the development of similar approaches for 3D Navier-Stokes turbulence.

Updated: 2024-06-25 09:40:14

标题: 壳模型湍流封闭的数值证明

摘要: 湍流封闭模型的发展，对于参数化未解决小尺度对大尺度动力学的影响，是一项重要的理论挑战，具有广泛的应用意义。我们提出了一种基于深度递归神经网络的封闭模型，可以在统计误差范围内定量重现欧拉和拉格朗日结构函数以及能量级联的间歇统计，包括亚网格通量的统计。为了实现高阶统计精度，从而进行严格的统计测试，我们采用了湍流壳模型。我们的结果鼓励类似方法用于3D Navier-Stokes湍流的发展。

更新时间: 2024-06-25 09:40:14

领域: physics.flu-dyn,cond-mat.stat-mech,cs.LG,nlin.CD,physics.comp-ph

下载: http://arxiv.org/abs/2202.09289v2

Classification with neural networks with quadratic decision functions

Neural networks with quadratic decision functions have been introduced as alternatives to standard neural networks with affine linear ones. They are advantageous when the objects or classes to be identified are compact and of basic geometries like circles, ellipses etc. In this paper we investigate the use of such ansatz functions for classification. In particular we test and compare the algorithm on the MNIST dataset for classification of handwritten digits and for classification of subspecies. We also show, that the implementation can be based on the neural network structure in the software Tensorflow and Keras, respectively.

Updated: 2024-06-25 09:37:40

标题: 用具有二次决策函数的神经网络进行分类

摘要: 具有二次决策函数的神经网络已被引入作为标准神经网络的替代品，后者具有仿射线性决策函数。当要识别的对象或类别是紧凑且具有基本几何形状（如圆、椭圆等）时，这些网络是有优势的。本文研究了这种假设函数在分类中的应用。具体来说，我们在MNIST数据集上测试和比较了该算法，用于手写数字的分类和亚种的分类。我们还展示了，该实现可以基于Tensorflow和Keras软件中的神经网络结构。

更新时间: 2024-06-25 09:37:40

领域: cs.LG,cs.NA,math.NA,49N45, 41A30, 65XX, 68TXX

下载: http://arxiv.org/abs/2401.10710v2

Variable Layer-Wise Quantization: A Simple and Effective Approach to Quantize LLMs

We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first measures the importance of a layer based on how different its output embeddings are from the input embeddings (the higher the better); the second estimates the importance of a layer using the number of layer weights that are much larger than average (the smaller the better). We show that quantizing different layers at varying bits according to our importance scores results in minimal performance drop with a far more compressed model size. Finally, we present several practical key takeaways from our variable layer-wise quantization experiments: (a) LLM performance under variable quantization remains close to the original model until 25-50% of layers are moved in lower quantization using our proposed ordering but only until 5-10% if moved using no specific ordering; (b) Quantizing LLMs to lower bits performs substantially better than pruning unless extreme quantization (2-bit) is used; and (c) Layer-wise quantization to lower bits works better in the case of larger LLMs with more layers compared to smaller LLMs with fewer layers. The code used to run the experiments is available at: https://github.com/RazvanDu/LayerwiseQuant.

Updated: 2024-06-25 09:37:15

标题: 可变逐层量化：一种简单有效的LLM量化方法

摘要: 我们提出了一种简单的可变量化方法，该方法以不同的比特级别量化大型语言模型（LLM）的不同层。具体而言，我们将最重要的层量化为更高的比特精度，将不太重要的层量化为更低的比特，以实现浮点量化水平。我们提出了两种有效的策略来衡量LLM中各层的重要性：第一种策略根据一个层的输出嵌入与输入嵌入有多大不同来衡量该层的重要性（差异越大越好）；第二种策略根据层权重中远大于平均值的数量来估计该层的重要性（数量越小越好）。我们展示了根据我们的重要性评分以不同比特量化不同层会导致性能下降最小，但模型大小更小。最后，我们从可变层量化实验中得出了几个实用的关键结论：（a）在使用我们提出的顺序时，LLM在可变量化下的性能保持接近原始模型，直到25-50%的层降为更低比特，但如果没有使用特定顺序，只能在5-10%的情况下；（b）将LLM量化为更低比特的性能比修剪要好得多，除非使用极端量化（2比特）；以及（c）与层数较少的较小LLM相比，对较大LLM进行层间量化至更低比特效果更好。用于运行实验的代码可在https://github.com/RazvanDu/LayerwiseQuant上找到。

更新时间: 2024-06-25 09:37:15

领域: cs.CL,cs.AI,cs.LG,I.2.7; I.2.0

下载: http://arxiv.org/abs/2406.17415v1

Telecom Language Models: Must They Be Large?

The increasing interest in Large Language Models (LLMs) within the telecommunications sector underscores their potential to revolutionize operational efficiency. However, the deployment of these sophisticated models is often hampered by their substantial size and computational demands, raising concerns about their viability in resource-constrained environments. Addressing this challenge, recent advancements have seen the emergence of small language models that surprisingly exhibit performance comparable to their larger counterparts in many tasks, such as coding and common-sense reasoning. Phi-2, a compact yet powerful model, exemplifies this new wave of efficient small language models. This paper conducts a comprehensive evaluation of Phi-2's intrinsic understanding of the telecommunications domain. Recognizing the scale-related limitations, we enhance Phi-2's capabilities through a Retrieval-Augmented Generation approach, meticulously integrating an extensive knowledge base specifically curated with telecom standard specifications. The enhanced Phi-2 model demonstrates a profound improvement in accuracy, answering questions about telecom standards with a precision that closely rivals the more resource-intensive GPT-3.5. The paper further explores the refined capabilities of Phi-2 in addressing problem-solving scenarios within the telecom sector, highlighting its potential and limitations.

Updated: 2024-06-25 09:28:43

标题: 电信语言模型：它们一定要庞大吗？

摘要: 在电信行业对大型语言模型（LLMs）越来越感兴趣，突显了它们在革新运营效率方面的潜力。然而，这些复杂模型的部署常常受到它们庞大体积和计算需求的限制，引发了对它们在资源受限环境中可行性的担忧。为了解决这一挑战，最近的进展出现了小型语言模型，令人惊讶地在许多任务中表现出与其更大对应物相媲美的性能，如编码和常识推理。Phi-2是一个紧凑而强大的模型，体现了这一新一波高效小型语言模型的特点。本文对Phi-2在电信领域的内在理解进行了全面评估。鉴于规模相关的限制，我们通过检索增强生成方法提升了Phi-2的能力，精心整合了一个专门为电信标准规范策划的广泛知识库。增强后的Phi-2模型展现出了在准确性方面的显著提升，以接近更消耗资源的GPT-3.5的精确度回答关于电信标准的问题。本文进一步探讨了Phi-2在解决电信行业中的问题场景中的精细能力，突出了其潜力和限制。

更新时间: 2024-06-25 09:28:43

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.04666v2

Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning stage of the large language model. The training method simply introduces some noise at the input for the model to learn the denoising task. It significantly enhances the parallel decoding capability of the model without affecting the original task capability. In addition, we propose a tree-based retrieval-augmented Jacobi (TR-Jacobi) decoding strategy to further improve the inference speed of MSN models. Experiments in both the general and code domains have shown that MSN can improve inference speed by 2.3-2.7x times without compromising model performance. The MSN model also achieves comparable acceleration ratios to the SOTA model with additional model structure on Spec-Bench.

Updated: 2024-06-25 09:25:39

标题: 制造一些噪声：通过嘈杂训练释放语言模型的并行推理能力

摘要: 现有的推测解码方法通常需要额外的模型结构和训练过程来帮助模型生成草稿令牌。这使得加速方法向新模型的迁移成本更高，对设备内存的要求也更高。为解决这一问题，我们提出了Make Some Noise（MSN）训练框架，作为大型语言模型监督微调阶段的替代方案。该训练方法简单地在输入中引入一些噪音，使模型学习去噪任务。这显著增强了模型的并行解码能力，而不会影响原始任务能力。此外，我们提出了一种基于树的检索增强雅可比（TR-Jacobi）解码策略，进一步提高MSN模型的推理速度。在通用和代码领域的实验表明，MSN可以提高推理速度2.3-2.7倍，而不会影响模型性能。MSN模型在Spec-Bench上也实现了与具有额外模型结构的SOTA模型相当的加速比率。

更新时间: 2024-06-25 09:25:39

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17404v1

GradCheck: Analyzing classifier guidance gradients for conditional diffusion sampling

To sample from an unconditionally trained Denoising Diffusion Probabilistic Model (DDPM), classifier guidance adds conditional information during sampling, but the gradients from classifiers, especially those not trained on noisy images, are often unstable. This study conducts a gradient analysis comparing robust and non-robust classifiers, as well as multiple gradient stabilization techniques. Experimental results demonstrate that these techniques significantly improve the quality of class-conditional samples for non-robust classifiers by providing more stable and informative classifier guidance gradients. The findings highlight the importance of gradient stability in enhancing the performance of classifier guidance, especially on non-robust classifiers.

Updated: 2024-06-25 09:23:25

标题: GradCheck：分析分类器引导梯度以进行条件扩散采样

摘要: 为了从无条件训练的去噪扩散概率模型（DDPM）中取样，分类器指导在取样过程中增加了条件信息，但是来自分类器的梯度，特别是那些没有在嘈杂图像上进行训练的梯度，往往是不稳定的。本研究进行了一个梯度分析，比较了稳健和非稳健分类器，以及多种梯度稳定技术。实验结果表明，这些技术通过提供更稳定和信息丰富的分类器指导梯度，显著改善了非稳健分类器的类条件样本的质量。研究结果强调了梯度稳定性在增强分类器指导性能方面的重要性，特别是对于非稳健分类器来说。

更新时间: 2024-06-25 09:23:25

领域: cs.LG

下载: http://arxiv.org/abs/2406.17399v1

AI for the prediction of early stages of Alzheimer's disease from neuroimaging biomarkers -- A narrative review of a growing field

Objectives: The objectives of this narrative review are to summarize the current state of AI applications in neuroimaging for early Alzheimer's disease (AD) prediction and to highlight the potential of AI techniques in improving early AD diagnosis, prognosis, and management. Methods: We conducted a narrative review of studies using AI techniques applied to neuroimaging data for early AD prediction. We examined single-modality studies using structural MRI and PET imaging, as well as multi-modality studies integrating multiple neuroimaging techniques and biomarkers. Furthermore, they reviewed longitudinal studies that model AD progression and identify individuals at risk of rapid decline. Results: Single-modality studies using structural MRI and PET imaging have demonstrated high accuracy in classifying AD and predicting progression from mild cognitive impairment (MCI) to AD. Multi-modality studies, integrating multiple neuroimaging techniques and biomarkers, have shown improved performance and robustness compared to single-modality approaches. Longitudinal studies have highlighted the value of AI in modeling AD progression and identifying individuals at risk of rapid decline. However, challenges remain in data standardization, model interpretability, generalizability, clinical integration, and ethical considerations. Conclusion: AI techniques applied to neuroimaging data have the potential to improve early AD diagnosis, prognosis, and management. Addressing challenges related to data standardization, model interpretability, generalizability, clinical integration, and ethical considerations is crucial for realizing the full potential of AI in AD research and clinical practice. Collaborative efforts among researchers, clinicians, and regulatory agencies are needed to develop reliable, robust, and ethical AI tools that can benefit AD patients and society.

Updated: 2024-06-25 09:22:53

标题: 人工智能用于从神经影像生物标志物预测早期阿尔茨海默病的研究--一个不断发展的领域的叙事性综述

摘要: 目的：本叙述性综述的目的是总结目前在神经影像学中人工智能应用于早期阿尔茨海默病（AD）预测的现状，并突出人工智能技术在改善早期AD诊断、预后和管理方面的潜力。方法：我们对应用人工智能技术于神经影像数据用于早期AD预测的研究进行了叙述性综述。我们检查了使用结构性MRI和PET成像的单模态研究，以及整合多种神经影像技术和生物标志物的多模态研究。此外，他们还审查了模拟AD进展并识别存在快速下降风险的个体的纵向研究。结果：使用结构性MRI和PET成像的单模态研究已经证明在分类AD和预测从轻度认知障碍（MCI）到AD的进展方面具有高准确性。整合多种神经影像技术和生物标志物的多模态研究相比于单模态方法表现出更好的性能和稳健性。纵向研究突出了AI在模拟AD进展和识别存在快速下降风险的个体方面的价值。然而，在数据标准化、模型可解释性、泛化能力、临床整合和伦理考虑方面仍存在挑战。结论：应用于神经影像数据的人工智能技术有望改善早期AD诊断、预后和管理。解决与数据标准化、模型可解释性、泛化能力、临床整合和伦理考虑有关的挑战对于实现AD研究和临床实践中人工智能的全部潜力至关重要。研究人员、临床医生和监管机构之间的合作努力是必要的，以开发可靠、稳健和符合伦理要求的人工智能工具，从而使AD患者和社会受益。

更新时间: 2024-06-25 09:22:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17822v1

Explainable Online Unsupervised Anomaly Detection for Cyber-Physical Systems via Causal Discovery from Time Series

Online unsupervised detection of anomalies is crucial to guarantee the correct operation of cyber-physical systems and the safety of humans interacting with them. State-of-the-art approaches based on deep learning via neural networks achieve outstanding performance at anomaly recognition, evaluating the discrepancy between a normal model of the system (with no anomalies) and the real-time stream of sensor time series. However, large training data and time are typically required, and explainability is still a challenge to identify the root of the anomaly and implement predictive maintainance. In this paper, we use causal discovery to learn a normal causal graph of the system, and we evaluate the persistency of causal links during real-time acquisition of sensor data to promptly detect anomalies. On two benchmark anomaly detection datasets, we show that our method has higher training efficiency, outperforms the accuracy of state-of-the-art neural architectures and correctly identifies the sources of >10 different anomalies. The code is at https://github.com/Isla-lab/causal_anomaly_detection.

Updated: 2024-06-25 09:10:46

标题: 通过时间序列的因果发现解释网络化无监督异常检测的物理系统

摘要: 在线无监督异常检测对于确保网络物理系统的正确运行和与之交互的人类的安全至关重要。基于深度学习和神经网络的最新方法在异常识别方面取得了出色的表现，通过评估系统的正常模型（没有异常）与实时传感器时间序列之间的差异来实现。然而，通常需要大量的训练数据和时间，并且解释性仍然是识别异常根源并实施预测性维护的挑战。在本文中，我们使用因果发现来学习系统的正常因果图，并在实时获取传感器数据期间评估因果链接的持久性以及及时检测异常。在两个基准异常检测数据集上，我们展示了我们的方法具有更高的训练效率，优于最先进的神经网络架构的准确性，并正确识别了超过10种不同异常的来源。代码位于https://github.com/Isla-lab/causal_anomaly_detection。

更新时间: 2024-06-25 09:10:46

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.09871v3

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.

Updated: 2024-06-25 09:05:22

标题: 双动量方法用于低层次约束双层优化

摘要: 双层优化（BO）最近在许多机器学习应用中备受关注，因为它能够捕捉这些问题固有的嵌套结构。最近，许多超梯度方法已被提出作为解决大规模问题的有效解决方案。然而，目前用于下层约束双层优化（LCBO）问题的超梯度方法需要非常严格的假设，即，最优性条件满足可微性和可逆性条件，并且缺乏对收敛速度的深入分析。更糟糕的是，现有方法要求进行双层更新，有时效率较低。为解决这个问题，在本文中，我们提出了一种新的LCBO超梯度方法，利用非光滑隐式函数定理的理论，而不是使用严格的假设。此外，我们提出了一种基于双动量方法和自适应步长方法的\textit{单层单时间尺度}算法，并证明它可以在$\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$次迭代中返回一个$(\delta, \epsilon)$-稳定点。两个应用的实验表明了我们提出的方法的有效性。

更新时间: 2024-06-25 09:05:22

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17386v1

FedPop: Federated Population-based Hyperparameter Tuning

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both client and server sides. Compared with prior tuning methods, FedPop employs an online "tuning-while-training" framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP tuning methods for FL.

Updated: 2024-06-25 09:04:08

标题: FedPop：联邦式基于人口的超参数调整

摘要: 联邦学习（FL）是一种分布式机器学习（ML）范例，多个客户端协作训练ML模型，而无需将本地数据集中化。与传统的ML流程类似，FL中的客户端本地优化和服务器聚合过程对超参数（HP）的选择敏感。尽管对于集中式ML进行了大量的超参数调整研究，但是当应用于FL时，这些方法产生的结果并不理想。主要原因是它们的“调整后训练”框架不适用于具有有限客户端计算能力的FL。虽然已经提出了一些在FL中进行HP调整的方法，但这些方法仅限于客户端本地更新的HP。在这项工作中，我们提出了一种新颖的HP调整算法，称为联邦基于种群的超参数调整（FedPop），以解决这个重要但具有挑战性的问题。FedPop采用基于种群的进化算法来优化HP，可以适应客户端和服务器两侧的各种HP类型。与先前的调整方法相比，FedPop采用在线的“调整同时训练”框架，提供了计算效率，并能够探索更广泛的HP搜索空间。我们在常见的FL基准测试和复杂的现实世界FL数据集上进行的实证验证表明，所提出的方法的有效性，大大优于当前最先进的FL HP调整方法。

更新时间: 2024-06-25 09:04:08

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2308.08634v2

SoK: Facial Deepfake Detectors

Deepfakes have rapidly emerged as a profound and serious threat to society, primarily due to their ease of creation and dissemination. This situation has triggered an accelerated development of deepfake detection technologies. However, many existing detectors rely heavily on lab-generated datasets for validation, which may not effectively prepare them for novel, emerging, and real-world deepfake techniques. In this paper, we conduct an extensive and comprehensive review and analysis of the latest state-of-the-art deepfake detectors, evaluating them against several critical criteria. These criteria facilitate the categorization of these detectors into 4 high-level groups and 13 fine-grained sub-groups, all aligned with a unified standard conceptual framework. This classification and framework offer deep and practical insights into the factors that affect detector efficacy. We assess the generalizability of 16 leading detectors across various standard attack scenarios, including black-box, white-box, and gray-box settings. Our systematized analysis and experimentation lay the groundwork for a deeper understanding of deepfake detectors and their generalizability, paving the way for future research focused on creating detectors adept at countering various attack scenarios. Additionally, this work offers insights for developing more proactive defenses against deepfakes.

Updated: 2024-06-25 09:02:42

标题: SoK: 面部深度伪造检测器

摘要: Deepfakes（深度伪造）迅速成为对社会的深刻和严重威胁，主要是由于它们易于制作和传播。这种情况引发了对深度伪造检测技术的加速发展。然而，许多现有的检测器在验证时严重依赖于实验室生成的数据集，这可能无法有效地为它们准备好新颖、新兴和真实世界的深度伪造技术。在本文中，我们对最新的深度伪造检测器进行了广泛而全面的审查和分析，评估它们符合几个关键标准。这些标准有助于将这些检测器分类为4个高级组别和13个细粒度子组别，所有这些都与一个统一的标准概念框架一致。这种分类和框架提供了深入而实用的见解，涉及影响检测器效力的因素。我们评估了在各种标准攻击场景下（包括黑盒、白盒和灰盒设置）的16个主要检测器的泛化能力。我们系统化的分析和实验为更深入地理解深度伪造检测器及其泛化能力奠定了基础，为未来研究开辟了道路，重点是创建能够应对各种攻击场景的检测器。此外，这项工作为开发更积极的对抗深度伪造的防御提供了见解。

更新时间: 2024-06-25 09:02:42

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2401.04364v2

Forget but Recall: Incremental Latent Rectification in Continual Learning

Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR. In a nutshell, ILR learns to propagate with correction (or rectify) the representation from the current trained DNN backward to the representation space of the old task, where performing predictive decisions is easier. This rectification process only employs a chain of small representation mapping networks, called rectifier units. Empirical experiments on several continual learning benchmarks, including CIFAR10, CIFAR100, and Tiny ImageNet, demonstrate the effectiveness and potential of this novel CL direction compared to existing representative CL methods.

Updated: 2024-06-25 08:57:47

标题: 忘记但记忆：在持续学习中的增量潜在修正

摘要: 深度神经网络（DNNs）具有持续学习不断变化的数据流的内在能力是一个追求目标。然而，当前的DNNs存在灾难性遗忘的问题，这阻碍了记忆过去的知识。为了缓解这个问题，现有的持续学习（CL）方法要么保留示例进行重放，要么规范学习，要么为新任务分配专门的容量。本文探讨了一个未被开发的增量学习CL方向，称为增量潜在矫正或ILR。简而言之，ILR学习从当前训练的DNN向后纠正（或矫正）表示，传播到旧任务的表示空间，从而更容易进行预测性决策。这个矫正过程仅使用一系列小的表示映射网络，称为矫正单元。对包括CIFAR10、CIFAR100和微型ImageNet在内的几个持续学习基准进行的实证实验表明，与现有代表性CL方法相比，这种新颖的CL方向的有效性和潜力。

更新时间: 2024-06-25 08:57:47

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.17381v1

On the numerical reliability of nonsmooth autodiff: a MaxPool case study

This paper considers the reliability of automatic differentiation (AD) for neural networks involving the nonsmooth MaxPool operation. We investigate the behavior of AD across different precision levels (16, 32, 64 bits) and convolutional architectures (LeNet, VGG, and ResNet) on various datasets (MNIST, CIFAR10, SVHN, and ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations (such as MaxPool and ReLU). On the other hand, in practice, AD operates with floating-point numbers (not real numbers), and there is, therefore, a need to explore subsets on which AD can be numerically incorrect. These subsets include a bifurcation zone (where AD is incorrect over reals) and a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for the training process, we study the impact of different choices of the nonsmooth Jacobian for the MaxPool function on the precision of 16 and 32 bits. These findings suggest that nonsmooth MaxPool Jacobians with lower norms help maintain stable and efficient test accuracy, whereas those with higher norms can result in instability and decreased performance. We also observe that the influence of MaxPool's nonsmooth Jacobians on learning can be reduced by using batch normalization, Adam-like optimizers, or increasing the precision level.

Updated: 2024-06-25 08:55:16

标题: 关于非光滑自动微分数值可靠性的研究：MaxPool案例研究

摘要: 这篇论文考虑了涉及非光滑MaxPool操作的神经网络的自动微分（AD）的可靠性。我们研究了在不同精度级别（16、32、64位）和卷积架构（LeNet、VGG和ResNet）以及各种数据集（MNIST、CIFAR10、SVHN和ImageNet）上的AD行为。尽管AD可能不正确，但最近的研究表明，即使在存在非光滑操作（如MaxPool和ReLU）的情况下，它几乎在任何地方都与导数相符。另一方面，在实践中，AD操作的是浮点数（而不是实数），因此需要探索AD可能在数值上不正确的子集。这些子集包括一个分叉区域（在该区域上AD在实数上不正确）和一个补偿区域（在该区域上AD在浮点数上不正确但在实数上正确）。使用SGD进行训练过程时，我们研究了16位和32位精度下对MaxPool函数的非光滑Jacobian的不同选择对精度的影响。这些发现表明，具有较低范数的非光滑MaxPool Jacobians有助于维持稳定和高效的测试准确性，而具有较高范数的Jacobians可能导致不稳定和性能下降。我们还观察到，使用批量归一化、类似Adam的优化器或增加精度级别可以减少MaxPool的非光滑Jacobian对学习的影响。

更新时间: 2024-06-25 08:55:16

领域: cs.LG,cs.NA,math.NA,math.OC,stat.ML

下载: http://arxiv.org/abs/2401.02736v2

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in specific regions of both frequency channels and temporal segments, while MHSA neglects this temporal-channel dependency of the input sequence. In this work, we proposed a Temporal-Channel Modeling (TCM) module to enhance MHSA's capability for capturing temporal-channel dependencies. Experimental results on the ASVspoof 2021 show that with only 0.03M additional parameters, the TCM module can outperform the state-of-the-art system by 9.25% in EER. Further ablation study reveals that utilizing both temporal and channel information yields the most improvement for detecting synthetic speech.

Updated: 2024-06-25 08:50:43

标题: 在多头自注意力中的时间通道建模用于合成语音检测

摘要: 最近利用Transformer模型的合成语音检测器在性能上优于卷积神经网络对应物。这种改进可能是由于Transformer模型中多头自注意力（MHSA）的强大建模能力，该模型学习每个输入标记的时间关系。然而，合成语音的伪像可以位于频道和时间段的特定区域，而MHSA忽视了输入序列的时间-频道依赖性。在这项工作中，我们提出了一个Temporal-Channel Modeling（TCM）模块，以增强MHSA捕获时间-频道依赖性的能力。在ASVspoof 2021上的实验结果表明，仅仅通过0.03M的额外参数，TCM模块可以在EER方面比最先进的系统提高9.25%。进一步的消融研究揭示了利用时间和频道信息对于检测合成语音带来最大的改进。

更新时间: 2024-06-25 08:50:43

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.17376v1

Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression

We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.

Updated: 2024-06-25 08:50:33

标题: 稀疏线性回归中不当学习的计算统计差距

摘要: 我们研究了稀疏线性回归中不恰当学习的计算统计差距。更具体地，给定从维度为d的k-稀疏线性模型中获得的n个样本，我们问什么是有效地（在时间多项式中的d、k和n）找到一个潜在密集估计回归向量的最小样本复杂性，该估计在n个样本上实现非平凡的预测误差。从信息理论上讲，这可以通过使用$\Theta(k \log (d/k))$个样本来实现。然而，尽管在文献中很突出，但没有已知的多项式时间算法可以在不对模型施加额外限制的情况下，使用少于$\Theta(d)$个样本来实现相同的保证。同样，现有的困难结果要么仅限于适当的设置，其中估计必须也是稀疏的，要么仅适用于特定算法。我们提供证据表明，这一任务的高效算法至少需要（大致）$\Omega(k^2)$个样本。特别地，我们展示了一个用于稀疏线性回归的不恰当学习算法可以用于解决其Wishart形式中带有负尖峰的稀疏PCA问题，在这些情况下，广泛认为需要至少$\Omega(k^2)$个样本的高效算法。我们补充我们的规约，提供了稀疏PCA问题的低次和统计查询下界，从中我们减少。我们的困难结果适用于（相关的）随机设计设置，其中协变量从未知协方差的零均值高斯分布中独立同分布地绘制。

更新时间: 2024-06-25 08:50:33

领域: cs.LG,cs.CC,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.14103v2

Generalizability of experimental studies

Experimental studies are a cornerstone of machine learning (ML) research. A common, but often implicit, assumption is that the results of a study will generalize beyond the study itself, e.g. to new data. That is, there is a high probability that repeating the study under different conditions will yield similar results. Despite the importance of the concept, the problem of measuring generalizability remains open. This is probably due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization and develop a quantifiable notion of generalizability. This notion allows to explore the generalizability of existing studies and to estimate the number of experiments needed to achieve the generalizability of new studies. To demonstrate its usefulness, we apply it to two recently published benchmarks to discern generalizable and non-generalizable results. We also publish a Python module that allows our analysis to be repeated for other experimental studies.

Updated: 2024-06-25 08:49:07

标题: 实验研究的泛化能力

摘要: 实验研究是机器学习（ML）研究的基石。一个常见但常常隐含的假设是，研究结果将会推广到研究本身之外，例如到新数据。也就是说，在不同条件下重复研究会产生类似结果的高概率。尽管这个概念的重要性，衡量泛化能力的问题仍然存在。这可能是因为缺乏对实验研究的数学形式化。在本文中，我们提出了这样一个形式化，并开发了一个可量化的泛化能力概念。这个概念可以用来探索现有研究的泛化能力，并估计需要多少实验来实现新研究的泛化能力。为了证明其有用性，我们将其应用于最近发布的两个基准测试，以区分可泛化和不可泛化的结果。我们还发布了一个Python模块，允许我们的分析被重复用于其他实验研究。

更新时间: 2024-06-25 08:49:07

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17374v1

Direct Multi-Turn Preference Optimization for Language Agents

Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents. Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors, offering a means to directly optimize Reinforcement Learning (RL) objectives. However, applying DPO to multi-turn tasks presents challenges due to the inability to cancel the partition function. Overcoming this obstacle involves making the partition function independent of the current state and addressing length disparities between preferred and dis-preferred trajectories. In this light, we replace the policy constraint with the state-action occupancy measure constraint in the RL objective and add length normalization to the Bradley-Terry model, yielding a novel loss function named DMPO for multi-turn agent tasks with theoretical explanations. Extensive experiments on three multi-turn agent task datasets confirm the effectiveness and superiority of the DMPO loss.

Updated: 2024-06-25 08:44:24

标题: 语言代理的直接多轮偏好优化

摘要: 使用大型语言模型（LLMs）适应代理任务对于开发语言代理至关重要。直接偏好优化（DPO）是一种有前途的技术，可以通过减轻复合错误来进行适应，提供一种直接优化强化学习（RL）目标的方法。然而，将DPO应用于多轮任务存在挑战，因为无法取消分区函数。克服这一障碍涉及使分区函数独立于当前状态，并解决首选和非首选轨迹之间的长度差异。在这种情况下，我们将策略约束替换为RL目标中的状态-动作占用度约束，并向Bradley-Terry模型添加长度归一化，从而产生一种名为DMPO的新型损失函数，用于多轮代理任务，并提供了理论解释。对三个多轮代理任务数据集进行了大量实验证实了DMPO损失的有效性和优越性。

更新时间: 2024-06-25 08:44:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.14868v2

Automatically Adaptive Conformal Risk Control

Science and technology have a growing need for effective mechanisms that ensure reliable, controlled performance from black-box machine learning algorithms. These performance guarantees should ideally hold conditionally on the input-that is the performance guarantees should hold, at least approximately, no matter what the input. However, beyond stylized discrete groupings such as ethnicity and gender, the right notion of conditioning can be difficult to define. For example, in problems such as image segmentation, we want the uncertainty to reflect the intrinsic difficulty of the test sample, but this may be difficult to capture via a conditioning event. Building on the recent work of Gibbs et al. [2023], we propose a methodology for achieving approximate conditional control of statistical risks-the expected value of loss functions-by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning. We apply this framework to various regression and segmentation tasks, enabling finer-grained control over model performance and demonstrating that by continuously monitoring and adjusting these parameters, we can achieve superior precision compared to conventional risk-control methods.

Updated: 2024-06-25 08:29:32

标题: 自适应符合风险控制

摘要: 科学技术对于确保黑匣子机器学习算法可靠、可控的性能有着日益增长的需求。这些性能保证理想情况下应该是有条件地成立的——也就是说，无论输入是什么，性能保证至少应该近似成立。然而，在除种族和性别等简化的离散分组之外，正确的条件概念可能很难定义。例如，在图像分割等问题中，我们希望不确定性能够反映测试样本的固有难度，但这可能很难通过一个条件事件捕捉到。借鉴Gibbs等人最近的工作，我们提出了一种方法论，通过适应测试样本的难度来实现对统计风险（损失函数的期望值）的近似条件控制。我们的框架超越了基于用户提供的条件事件的传统条件风险控制，而是通过算法和数据驱动确定适当的函数类来进行条件控制。我们将这一框架应用于各种回归和分割任务，实现对模型性能的更细粒度控制，并证明通过持续监控和调整这些参数，我们可以实现比传统风险控制方法更优越的精度。

更新时间: 2024-06-25 08:29:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17819v1

Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding

Predicting protein properties is paramount for biological and medical advancements. Current protein engineering mutates on a typical protein, called the wild-type, to construct a family of homologous proteins and study their properties. Yet, existing methods easily neglect subtle mutations, failing to capture the effect on the protein properties. To this end, we propose EvolMPNN, Evolution-aware Message Passing Neural Network, an efficient model to learn evolution-aware protein embeddings. EvolMPNN samples sets of anchor proteins, computes evolutionary information by means of residues and employs a differentiable evolution-aware aggregation scheme over these sampled anchors. This way, EvolMPNN can efficiently utilise a novel message-passing method to capture the mutation effect on proteins with respect to the anchor proteins. Afterwards, the aggregated evolution-aware embeddings are integrated with sequence embeddings to generate final comprehensive protein embeddings. Our model shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup in comparison with large pre-trained models. Code and models are available at https://github.com/zhiqiangzhongddu/EvolMPNN.

Updated: 2024-06-25 08:26:33

标题: 通过进化编码高效预测同源蛋白质的突变效应

摘要: 预测蛋白质属性对于生物学和医学的进步至关重要。当前的蛋白质工程通过对一种典型蛋白质（称为野生型）进行突变，构建一族同源蛋白质并研究它们的属性。然而，现有方法很容易忽略细微的突变，无法捕捉蛋白质属性的影响。为此，我们提出了EvolMPNN，即EvolMPNN，一种高效的模型，用于学习具有进化意识的蛋白质嵌入。EvolMPNN通过样本集合锚定蛋白质，通过残基计算进化信息，并在这些样本锚上采用可微分的具有进化意识的聚合方案。这样，EvolMPNN可以有效地利用一种新颖的消息传递方法，以便捕捉相对于锚定蛋白质的蛋白质突变效应。随后，聚合的具有进化意识的嵌入与序列嵌入集成，生成最终的全面蛋白质嵌入。我们的模型比最先进的方法提高了高达6.4%，并且与大型预训练模型相比，推理速度提高了36倍。代码和模型可在https://github.com/zhiqiangzhongddu/EvolMPNN 上找到。

更新时间: 2024-06-25 08:26:33

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2402.13418v2

Harnessing Large Language Models as Post-hoc Correctors

As Machine Learning (ML) models grow in size and demand higher-quality training data, the expenses associated with re-training and fine-tuning these models are escalating rapidly. Inspired by recent impressive achievements of Large Language Models (LLMs) in different fields, this paper delves into the question: can LLMs efficiently improve an ML's performance at a minimal cost? We show that, through our proposed training-free framework LlmCorr, an LLM can work as a post-hoc corrector to propose corrections for the predictions of an arbitrary ML model. In particular, we form a contextual knowledge database by incorporating the dataset's label information and the ML model's predictions on the validation dataset. Leveraging the in-context learning capability of LLMs, we ask the LLM to summarise the instances in which the ML model makes mistakes and the correlation between primary predictions and true labels. Following this, the LLM can transfer its acquired knowledge to suggest corrections for the ML model's predictions. Our experimental results on text analysis and the challenging molecular predictions show that \model improves the performance of a number of models by up to 39%.

Updated: 2024-06-25 08:26:19

标题: 利用大型语言模型作为事后校正器

摘要: 随着机器学习（ML）模型规模的增长和对高质量训练数据的需求增加，与重新训练和微调这些模型相关的费用迅速上升。受到最近大型语言模型（LLMs）在不同领域取得的令人印象深刻的成就的启发，本文探讨了一个问题：LLMs能否以最小的成本有效地改善ML的性能？我们展示了，通过我们提出的无需训练的框架LlmCorr，一个LLM可以作为后期校正器为任意ML模型的预测提出校正建议。具体来说，我们通过将数据集的标签信息和ML模型对验证数据集的预测结合起来形成一个上下文知识数据库。利用LLMs的上下文学习能力，我们要求LLM总结ML模型犯错的情况以及主要预测与真实标签之间的相关性。随后，LLM可以将其获得的知识转移，为ML模型的预测提出校正建议。我们在文本分析和具有挑战性的分子预测上的实验结果表明，该模型可以将多个模型的性能提高高达39%。

更新时间: 2024-06-25 08:26:19

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.13414v2

PatentEval: Understanding Errors in Patent Generation

In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: claims-to-abstract generation, and the generation of the next claim given previous ones. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context. Our study includes a comparative analysis, annotated by humans, of various models. These range from those specifically adapted during training for tasks within the patent domain to the latest general-purpose large language models (LLMs). Furthermore, we explored and evaluated some metrics to approximate human judgments in patent text evaluation, analyzing the extent to which these metrics align with expert assessments. These approaches provide valuable insights into the capabilities and limitations of current language models in the specialized field of patent text generation.

Updated: 2024-06-25 08:23:03

标题: 专利评估：理解专利生成中的错误

摘要: 在这项工作中，我们引入了一种专门设计用于评估机器生成专利文本中两个不同任务的全面错误分类法：从权利要求到摘要的生成，以及根据先前的权利要求生成下一个权利要求。我们还开发了一个名为PatentEval的基准，用于系统评估语言模型在这种情况下的表现。我们的研究包括一个由人类注释的各种模型的比较分析。这些模型从专门在专利领域的训练中适应任务的模型到最新的通用大型语言模型（LLMs）都有涵盖。此外，我们还探讨和评估了一些指标来近似专利文本评估中的人类判断，分析这些指标在多大程度上与专家评估相一致。这些方法为当前语言模型在专利文本生成领域的能力和局限性提供了宝贵的见解。

更新时间: 2024-06-25 08:23:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06589v2

Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars

Automatic monitoring of calf behaviour is a promising way of assessing animal welfare from their first week on farms. This study aims to (i) develop machine learning models from accelerometer data to classify the main behaviours of pre-weaned calves and (ii) set up a digital tool for monitoring the behaviour of pre-weaned calves from the models' prediction. Thirty pre-weaned calves were equipped with a 3-D accelerometer attached to a neck-collar for two months and filmed simultaneously. The behaviours were annotated, resulting in 27.4 hours of observation aligned with the accelerometer data. The time-series were then split into 3 seconds windows. Two machine learning models were tuned using data from 80% of the calves: (i) a Random Forest model to classify between active and inactive behaviours using a set of 11 hand-craft features [model 1] and (ii) a RidgeClassifierCV model to classify between lying, running, drinking milk and other behaviours using ROCKET features [model 2]. The performance of the models was tested using data from the remaining 20% of the calves. Model 1 achieved a balanced accuracy of 0.92. Model 2 achieved a balanced accuracy of 0.84. Behavioural metrics such as daily activity ratio and episodes of running, lying, drinking milk, and other behaviours expressed over time were deduced from the predictions. All the development was finally embedded into a Python dashboard so that the individual calf metrics could be displayed directly from the raw accelerometer files.

Updated: 2024-06-25 08:11:22

标题: 一个用于监测断奶前小牛行为的数字工具的开发：使用加速度计颈圈

摘要: 自动监测小牛行为是评估动物福利的一种有前途的方式，从它们在农场的第一周开始。本研究旨在（i）从加速度计数据中开发机器学习模型，以分类断奶前小牛的主要行为，以及（ii）建立一个数字工具，用于监测从模型预测中的断奶前小牛的行为。三十头断奶前小牛被配备了一个连接在颈圈上的3-D加速度计，持续两个月，并同时进行了拍摄。行为被注释，结果是与加速度计数据对齐的27.4小时的观察。然后将时间序列分割为3秒窗口。使用80%小牛的数据调整了两个机器学习模型：（i）一个随机森林模型，使用一组11个手工特征对活动和静止行为进行分类[model 1]，以及（ii）一个RidgeClassifierCV模型，使用ROCKET特征对躺、奔跑、饮奶和其他行为进行分类[model 2]。使用剩余20%小牛的数据测试了模型的性能。模型1实现了0.92的平衡准确度。模型2实现了0.84的平衡准确度。从预测中推断出的每日活动比率和奔跑、躺、饮奶和其他行为的发生次数等行为指标随时间表达。最终将所有开发内容嵌入到一个Python仪表板中，以便可以直接从原始加速度计文件中显示个体小牛的指标。

更新时间: 2024-06-25 08:11:22

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2406.17352v1

Temporal Prototype-Aware Learning for Active Voltage Control on Power Distribution Networks

Active Voltage Control (AVC) on the Power Distribution Networks (PDNs) aims to stabilize the voltage levels to ensure efficient and reliable operation of power systems. With the increasing integration of distributed energy resources, recent efforts have explored employing multi-agent reinforcement learning (MARL) techniques to realize effective AVC. Existing methods mainly focus on the acquisition of short-term AVC strategies, i.e., only learning AVC within the short-term training trajectories of a singular diurnal cycle. However, due to the dynamic nature of load demands and renewable energy, the operation states of real-world PDNs may exhibit significant distribution shifts across varying timescales (e.g., daily and seasonal changes). This can render those short-term strategies suboptimal or even obsolete when performing continuous AVC over extended periods. In this paper, we propose a novel temporal prototype-aware learning method, abbreviated as TPA, to learn time-adaptive AVC under short-term training trajectories. At the heart of TPA are two complementary components, namely multi-scale dynamic encoder and temporal prototype-aware policy, that can be readily incorporated into various MARL methods. The former component integrates a stacked transformer network to learn underlying temporal dependencies at different timescales of the PDNs, while the latter implements a learnable prototype matching mechanism to construct a dedicated AVC policy that can dynamically adapt to the evolving operation states. Experimental results on the AVC benchmark with different PDN sizes demonstrate that the proposed TPA surpasses the state-of-the-art counterparts not only in terms of control performance but also by offering model transferability. Our code is available at https://github.com/Canyizl/TPA-for-AVC.

Updated: 2024-06-25 08:07:00

标题: 时间原型感知学习在配电网络上的主动电压控制中的应用

摘要: 电力配电网络（PDNs）上的主动电压控制（AVC）旨在稳定电压水平，以确保电力系统的高效和可靠运行。随着分布式能源资源的不断集成，最近的努力探索了采用多智能体强化学习（MARL）技术来实现有效的AVC。现有方法主要集中在获取短期AVC策略，即仅在单一昼夜周期的短期训练轨迹内学习AVC。然而，由于负载需求和可再生能源的动态性质，实际PDNs的运行状态可能在不同时间尺度（例如，日常和季节性变化）上表现出显著的分布变化。这可能导致在长时间内连续进行AVC时，这些短期策略变得次优甚至过时。在本文中，我们提出了一种新颖的时间原型感知学习方法，简称为TPA，以在短期训练轨迹下学习时适应AVC。TPA的核心是两个互补组件，即多尺度动态编码器和时间原型感知策略，可以轻松整合到各种MARL方法中。前者组件集成了一个堆叠的变压器网络，以学习PDNs不同时间尺度上的潜在时间依赖关系，而后者实现了一个可学习的原型匹配机制，以构建一个专用的AVC策略，可以动态适应不断变化的运行状态。对具有不同PDN规模的AVC基准的实验结果表明，所提出的TPA不仅在控制性能方面超越了最先进的对手，而且还提供了模型可转移性。我们的代码可在https://github.com/Canyizl/TPA-for-AVC找到。

更新时间: 2024-06-25 08:07:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17818v1

Semantic Deep Hiding for Robust Unlearnable Examples

Ensuring data privacy and protection has become paramount in the era of deep learning. Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration by adding small perturbations to data. However, such perturbations (e.g., noise, texture, color change) predominantly impact low-level features, making them vulnerable to common countermeasures. In contrast, semantic images with intricate shapes have a wealth of high-level features, making them more resilient to countermeasures and potential for producing robust unlearnable examples. In this paper, we propose a Deep Hiding (DH) scheme that adaptively hides semantic images enriched with high-level features. We employ an Invertible Neural Network (INN) to invisibly integrate predefined images, inherently hiding them with deceptive perturbations. To enhance data unlearnability, we introduce a Latent Feature Concentration module, designed to work with the INN, regularizing the intra-class variance of these perturbations. To further boost the robustness of unlearnable examples, we design a Semantic Images Generation module that produces hidden semantic images. By utilizing similar semantic information, this module generates similar semantic images for samples within the same classes, thereby enlarging the inter-class distance and narrowing the intra-class distance. Extensive experiments on CIFAR-10, CIFAR-100, and an ImageNet subset, against 18 countermeasures, reveal that our proposed method exhibits outstanding robustness for unlearnable examples, demonstrating its efficacy in preventing unauthorized data exploitation.

Updated: 2024-06-25 08:05:42

标题: 语义深度隐藏以获得强大的不可学习示例

摘要: 在深度学习时代，确保数据隐私和保护变得至关重要。不可学习的示例被提出用来误导深度学习模型，通过向数据添加小扰动来防止数据被未经授权地探索。然而，这种扰动（如噪音、纹理、颜色变化）主要影响低级特征，使它们容易受到常见对抗措施的攻击。相比之下，具有复杂形状的语义图像具有丰富的高级特征，使它们更能抵抗对抗措施，并具有产生强大不可学习示例的潜力。在本文中，我们提出了一种深度隐藏（DH）方案，用于自适应隐藏富含高级特征的语义图像。我们采用一种可逆神经网络（INN）来隐形地整合预定义的图像，从本质上隐藏它们并添加欺骗性扰动。为了增强数据的不可学习性，我们引入了一个潜在特征集中模块，旨在与INN一起工作，规范这些扰动的类内方差。为了进一步提高不可学习示例的稳健性，我们设计了一个语义图像生成模块，用于生成隐藏的语义图像。通过利用相似的语义信息，该模块为同一类别内的样本生成相似的语义图像，从而扩大类间距离并缩小类内距离。针对CIFAR-10、CIFAR-100和ImageNet子集的广泛实验，对抗18种对抗措施，表明我们提出的方法在不可学习示例方面表现出色，展示了它在防止未经授权数据开发方面的有效性。

更新时间: 2024-06-25 08:05:42

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.17349v1

Stacked Confusion Reject Plots (SCORE)

Machine learning is more and more applied in critical application areas like health and driver assistance. To minimize the risk of wrong decisions, in such applications it is necessary to consider the certainty of a classification to reject uncertain samples. An established tool for this are reject curves that visualize the trade-off between the number of rejected samples and classification performance metrics. We argue that common reject curves are too abstract and hard to interpret by non-experts. We propose Stacked Confusion Reject Plots (SCORE) that offer a more intuitive understanding of the used data and the classifier's behavior. We present example plots on artificial Gaussian data to document the different options of SCORE and provide the code as a Python package.

Updated: 2024-06-25 07:59:29

标题: 叠加混淆拒绝图（SCORE）

摘要: 机器学习越来越多地应用于健康和驾驶员辅助等关键应用领域。为了最大限度地减少错误决策的风险，在这些应用中有必要考虑分类的确定性，以拒绝不确定的样本。拒绝曲线是一种用于可视化拒绝样本数量和分类性能指标之间权衡的成熟工具。我们认为常见的拒绝曲线过于抽象，非专业人员难以理解。我们提出了Stacked Confusion Reject Plots（SCORE）作为一种更直观理解使用数据和分类器行为的方法。我们在人造高斯数据上展示了示例图，并提供了作为Python包的代码。

更新时间: 2024-06-25 07:59:29

领域: cs.LG

下载: http://arxiv.org/abs/2406.17346v1

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios. Post-training Quantization (PTQ) offers a promising solution by compressing model sizes and speeding up inference for the pretrained models while eliminating model retraining. However, we have observed the existing PTQ frameworks exclusively designed for both ViT and conventional Diffusion models fall into biased quantization and result in remarkable performance degradation. In this paper, we find that the DiTs typically exhibit considerable variance in terms of both weight and activation, which easily runs out of the limited numerical representations. To address this issue, we devise Q-DiT, which seamlessly integrates three techniques: fine-grained quantization to manage substantial variance across input channels of weights and activations, an automatic search strategy to optimize the quantization granularity and mitigate redundancies, and dynamic activation quantization to capture the activation changes across timesteps. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of the proposed Q-DiT. Specifically, when quantizing DiT-XL/2 to W8A8 on ImageNet 256x256, Q-DiT achieves a remarkable reduction in FID by 1.26 compared to the baseline. Under a W4A8 setting, it maintains high fidelity in image generation, showcasing only a marginal increase in FID and setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Code is available at \href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}.

Updated: 2024-06-25 07:57:27

标题: Q-DiT：扩散变压器的精确后训练量化

摘要: 最近在扩散模型中取得的进展，特别是从基于UNet的扩散转变为扩散Transformer（DiT）的趋势，显著提高了图像合成的质量和可扩展性。尽管具有令人难以置信的生成质量，但这些大规模模型的巨大计算要求显著阻碍了它们在现实场景中的部署。后训练量化（PTQ）通过压缩模型大小、加快预训练模型的推理速度并消除模型重训练，提供了一个有希望的解决方案。然而，我们观察到，现有的PTQ框架专门设计用于ViT和传统扩散模型，容易陷入偏向量化，导致显著的性能下降。在本文中，我们发现DiTs在权重和激活方面通常表现出相当大的变化，很容易超出有限的数值表示范围。为了解决这个问题，我们设计了Q-DiT，它无缝集成了三种技术：细粒度量化以管理权重和激活的输入通道之间的巨大差异，自动搜索策略以优化量化粒度并减少冗余，以及动态激活量化以捕捉跨时间步的激活变化。对ImageNet数据集的大量实验表明了所提出的Q-DiT的有效性。具体来说，在ImageNet 256x256上将DiT-XL/2量化为W8A8时，与基准相比，Q-DiT将FID显著降低了1.26。在W4A8设置下，它在图像生成方面保持了高保真度，仅展示了FID的轻微增加，并为扩散Transformer中高效、高质量量化设定了一个新的基准。代码可在\href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}上找到。

更新时间: 2024-06-25 07:57:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17343v1

Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct a neural field representation of 3D shapes, thereby learning discrete semantic features of point patches. Subsequently, by combining the masking model with variable masking ratios, we achieve synchronous training for both generation and representation learning. Furthermore, our framework seamlessly integrates with existing point cloud self-supervised learning (SSL) models, thereby enhancing their performance. We extensively evaluate the representation learning and generation capabilities of Point-MAGE. In shape classification tasks, Point-MAGE achieved an accuracy of 94.2% on the ModelNet40 dataset and 92.9% (+1.3%) on the ScanObjectNN dataset. Additionally, it achieved new state-of-the-art performance in few-shot learning and part segmentation tasks. Experimental results also confirmed that Point-MAGE can generate detailed and high-quality 3D shapes in both unconditional and conditional settings.

Updated: 2024-06-25 07:57:03

标题: 掩盖式生成提取器：点云的协同表示和3D生成

摘要: 在二维图像生成建模和表示学习领域，掩蔽生成编码器（MAGE）展示了生成建模和表示学习之间协同潜力。受此启发，我们提出了Point-MAGE，将这一概念扩展到点云数据。具体来说，该框架首先利用矢量量化变分自动编码器（VQVAE）重建三维形状的神经场表示，从而学习点块的离散语义特征。随后，通过将掩蔽模型与可变掩蔽比率相结合，我们实现了生成和表示学习的同步训练。此外，我们的框架与现有的点云自监督学习（SSL）模型无缝集成，从而提升了它们的性能。我们对Point-MAGE的表示学习和生成能力进行了广泛评估。在形状分类任务中，Point-MAGE在ModelNet40数据集上达到了94.2%的准确率，在ScanObjectNN数据集上达到了92.9%（+1.3%）。此外，它在少样本学习和部分分割任务中实现了新的最先进性能。实验结果还证实，Point-MAGE能够在无条件和有条件设置下生成详细和高质量的三维形状。

更新时间: 2024-06-25 07:57:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17342v1

Generative Modelling of Structurally Constrained Graphs

Graph diffusion models have emerged as state-of-the-art techniques in graph generation, yet integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph generative models fail to guarantee such domain-specific structural properties. We present ConStruct, a novel framework that allows for hard-constraining graph diffusion models to incorporate specific properties, such as planarity or acyclicity. Our approach ensures that the sampled graphs remain within the domain of graphs that verify the specified property throughout the entire trajectory in both the forward and reverse processes. This is achieved by introducing a specific edge-absorbing noise model and a new projector operator. ConStruct demonstrates versatility across several structural and edge-deletion invariant constraints and achieves state-of-the-art performance for both synthetic benchmarks and attributed real-world datasets. For example, by leveraging planarity in digital pathology graph datasets, the proposed method outperforms existing baselines and enhances generated data validity by up to 71.1 percentage points.

Updated: 2024-06-25 07:54:32

标题: 受结构约束的图的生成建模

摘要: 图扩散模型已成为图生成中的最先进技术，然而将领域知识整合到这些模型中仍然具有挑战性。在现实世界的场景中，领域知识尤为重要，因为生成的无效图会阻碍在实际应用中的部署。不受限制和条件化的图生成模型无法保证这种领域特定的结构性质。我们提出了ConStruct，这是一个新颖的框架，允许硬约束图扩散模型以整合特定属性，比如平面性或无环性。我们的方法确保抽样的图在整个轨迹过程中保持在验证指定属性的图领域内，无论是在正向还是逆向过程中。这是通过引入特定的边吸收噪声模型和一个新的投影算子来实现的。ConStruct展现了在几种结构性和边删除不变约束中的多功能性，并在合成基准和带属性的真实世界数据集上实现了最先进的性能。例如，通过利用数字病理学图数据集中的平面性，所提出的方法胜过现有基线，并将生成数据的有效性提高了高达71.1个百分点。

更新时间: 2024-06-25 07:54:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.17341v1

Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propose an innovative deep learning framework that combines feature decoupling and adaptive adversarial training. Firstly, we employ two iteratively compressed decouplers to supervised decouple common features and specific features related to fatty liver in abdominal ultrasound images. Subsequently, the decoupled features are concatenated with the original image after transforming the color space and are fed into the classifier. During adversarial training, we adaptively adjust the perturbation and balance the adversarial strength by the accuracy of each class. The model will eliminate recognition weaknesses by correctly classifying adversarial samples, thus improving recognition robustness. Finally, the accuracy of our method improved by 4.16%, achieving 82.95%. As demonstrated by extensive experiments, our method is a generalized learning framework that can be directly used to eliminate the recognition weaknesses of any classifier while improving its average performance. Code is available at https://github.com/HP-ML/MICCAI2024.

Updated: 2024-06-25 07:50:09

标题: 强化优化的深度特征解耦网络用于脂肪肝病的检测

摘要: 目前的医学图像分类工作主要旨在提高平均性能，往往忽视了不同类别之间的平衡。这可能导致不同类别之间的识别准确度差异显著，以及明显的识别弱点。在没有大量数据支持的情况下，深度学习在脂肪肝的细粒度分类中面临挑战。本文提出了一种创新的深度学习框架，结合了特征解耦和自适应对抗训练。首先，我们使用两个迭代压缩的解耦器来监督解耦腹部超声图像中与脂肪肝相关的公共特征和特定特征。随后，在转换颜色空间后，将解耦特征与原始图像连接起来，并输入分类器。在对抗训练过程中，我们通过每个类别的准确度自适应调整扰动并平衡对抗强度。该模型将通过正确分类对抗样本来消除识别弱点，从而提高识别的鲁棒性。最终，我们的方法的准确率提高了4.16%，达到了82.95%。通过大量实验证明，我们的方法是一个通用的学习框架，可以直接用于消除任何分类器的识别弱点，同时提高其平均性能。代码可在https://github.com/HP-ML/MICCAI2024 上找到。

更新时间: 2024-06-25 07:50:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17338v1

TabVFL: Improving Latent Representation in Vertical Federated Learning

Autoencoders are popular neural networks that are able to compress high dimensional data to extract relevant latent information. TabNet is a state-of-the-art neural network model designed for tabular data that utilizes an autoencoder architecture for training. Vertical Federated Learning (VFL) is an emerging distributed machine learning paradigm that allows multiple parties to train a model collaboratively on vertically partitioned data while maintaining data privacy. The existing design of training autoencoders in VFL is to train a separate autoencoder in each participant and aggregate the latent representation later. This design could potentially break important correlations between feature data of participating parties, as each autoencoder is trained on locally available features while disregarding the features of others. In addition, traditional autoencoders are not specifically designed for tabular data, which is ubiquitous in VFL settings. Moreover, the impact of client failures during training on the model robustness is under-researched in the VFL scene. In this paper, we propose TabVFL, a distributed framework designed to improve latent representation learning using the joint features of participants. The framework (i) preserves privacy by mitigating potential data leakage with the addition of a fully-connected layer, (ii) conserves feature correlations by learning one latent representation vector, and (iii) provides enhanced robustness against client failures during training phase. Extensive experiments on five classification datasets show that TabVFL can outperform the prior work design, with 26.12% of improvement on f1-score.

Updated: 2024-06-25 07:46:30

标题: TabVFL：改善垂直联邦学习中的潜在表示

摘要: 自动编码器是一种流行的神经网络，能够压缩高维数据以提取相关的潜在信息。TabNet是一种为表格数据设计的最先进的神经网络模型，利用自动编码器架构进行训练。垂直联邦学习（VFL）是一种新兴的分布式机器学习范式，允许多个参与方在垂直分区的数据上协作训练模型，同时保持数据隐私。现有的VFL中训练自动编码器的设计是在每个参与者中分别训练一个自动编码器，然后再聚合潜在表示。这种设计可能会破坏参与方特征数据之间的重要相关性，因为每个自动编码器是在本地可用特征上训练的，而忽略了其他参与方的特征。此外，传统的自动编码器并没有专门为在VFL设置中普遍使用的表格数据而设计。此外，在VFL场景中关于训练期间客户端故障对模型鲁棒性的影响尚未得到深入研究。在本文中，我们提出了TabVFL，一个分布式框架，旨在通过利用参与者的联合特征来改进潜在表示学习。该框架（i）通过添加一个全连接层来减少潜在数据泄漏，以保护隐私，（ii）通过学习一个潜在表示向量来保留特征之间的相关性，（iii）在训练阶段提供增强的鲁棒性，以应对客户端故障。对五个分类数据集的大量实验表明，TabVFL可以优于先前的设计，f1得分提高了26.12%。

更新时间: 2024-06-25 07:46:30

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2404.17990v2

A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems

Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.

Updated: 2024-06-25 07:45:00

标题: 一个轻量级基于嵌入的推荐系统性能基准测试

摘要: 自从Web诞生以来，推荐系统（RSs）已成为信息过滤中不可或缺的机制。最先进的RSs主要依赖于由嵌入向量编码的分类特征，导致嵌入表过大。为防止过参数化的嵌入表影响可扩展性，学术界和工业界正在加大压缩RS嵌入的努力。然而，尽管轻量级基于嵌入的RSs（LERSs）繁荣发展，但在评估协议中存在广泛多样性，导致在将LERS性能与实际可用性联系起来时遇到障碍。此外，尽管轻量级嵌入的共同目标，LERSs在协同过滤和基于内容的推荐这两个主要推荐任务之间只能选择一个进行评估。对跨任务可转移性缺乏讨论阻碍了统一、更可扩展解决方案的发展。受到这些问题的激励，本研究通过彻底的基准测试过程调查了各种LERSs的性能、效率和跨任务可转移性。此外，我们提出了一种高效的嵌入压缩方法，使用幅度修剪，这是一种易于部署但性能优于各种复杂LERSs的基线。我们的研究揭示了LERSs在两个任务中的不同性能，为其有效性和普适性投下了光芒。为支持基于边缘的推荐，我们在Raspberry Pi 4上测试了所有LERSs，揭示了效率瓶颈。最后，我们以LERSs性能的关键总结、模型选择建议以及对未开发的LERSs挑战的未来研究进行了总结。为鼓励未来研究，我们在\href{this link}{https://github.com/chenxing1999/recsys-benchmark}上发布源代码和工件。

更新时间: 2024-06-25 07:45:00

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.17335v1

CoSMo: a Framework to Instantiate Conditioned Process Simulation Models

Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven approaches offer increased interpretability, while deep learning-based excel at generalizing changes across large event logs. However, the practical application of deep learning faces challenges related to managing stochasticity and integrating information for what-if analysis. This paper introduces a novel recurrent neural architecture tailored to discover COnditioned process Simulation MOdels (CoSMo) based on user-based constraints or any other nature of a-priori knowledge. This architecture facilitates the simulation of event logs that adhere to specific constraints by incorporating declarative-based rules into the learning phase as an attempt to fill the gap of incorporating information into deep learning models to perform what-if analysis. Experimental validation illustrates CoSMo's efficacy in simulating event logs while adhering to predefined declarative conditions, emphasizing both control-flow and data-flow perspectives.

Updated: 2024-06-25 07:44:31

标题: CoSMo：一个实例化条件化过程模拟模型的框架

摘要: 过程模拟因其评估潜在绩效改善和与业务流程变化相关的风险的能力而受到关注。现有文献提出了各种技术，通常基于从事件日志数据中发现的过程模型或建立在深度学习算法之上。这些技术具有特定的优势和局限性。传统的数据驱动方法提供了更高的可解释性，而基于深度学习的方法在大型事件日志中概括变化方面表现出色。然而，深度学习的实际应用面临管理随机性和集成信息以进行假设分析的挑战。本文介绍了一种新颖的适用于发现基于用户约束或任何其他先验知识性质的COnditioned process Simulation MOdels（CoSMo）的递归神经架构。该架构通过在学习阶段将声明性规则整合进来，以填补将信息整合到深度学习模型以进行假设分析的空白。实验验证表明CoSMo在模拟事件日志时遵循预定义的声明性条件方面的有效性，强调了控制流和数据流的两个视角。

更新时间: 2024-06-25 07:44:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2303.17879v4

Straight-Through meets Sparse Recovery: the Support Exploration Algorithm

The {\it straight-through estimator} (STE) is commonly used to optimize quantized neural networks, yet its contexts of effective performance are still unclear despite empirical successes.To make a step forward in this comprehension, we apply STE to a well-understood problem: {\it sparse support recovery}. We introduce the {\it Support Exploration Algorithm} (SEA), a novel algorithm promoting sparsity, and we analyze its performance in support recovery (a.k.a. model selection) problems. SEA explores more supports than the state-of-the-art, leading to superior performance in experiments, especially when the columns of $A$ are strongly coherent.The theoretical analysis considers recovery guarantees when the linear measurements matrix $A$ satisfies the {\it Restricted Isometry Property} (RIP).The sufficient conditions of recovery are comparable but more stringent than those of the state-of-the-art in sparse support recovery. Their significance lies mainly in their applicability to an instance of the STE.

Updated: 2024-06-25 07:42:54

标题: 直通遇上稀疏恢复：支持探索算法

摘要: 直通估计器（STE）通常用于优化量化神经网络，尽管在实证成功的情况下，其有效性表现的背景仍然不清楚。为了在这方面迈出一步，我们将STE应用于一个广泛理解的问题：稀疏支持恢复。我们引入了支持探索算法（SEA），这是一种促进稀疏性的新算法，并分析其在支持恢复（也称为模型选择）问题中的表现。SEA探索的支持比现有技术更多，导致在实验中表现出更优异的性能，特别是当矩阵A的列之间高度相干时。理论分析考虑了当线性测量矩阵A满足受限等距性质（RIP）时的恢复保证。恢复的充分条件与现有技术在稀疏支持恢复方面相比是可比的，但更严格。它们的重要性主要在于它们适用于STE的一个实例。

更新时间: 2024-06-25 07:42:54

领域: cs.LG,cs.AI,math.OC,math.ST,stat.TH

下载: http://arxiv.org/abs/2301.13584v3

Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approaches have either ignored the admission control of VNRs, which has a potential impact on long-term performances, or not fully exploited the temporal and topological features of the physical network and VNRs. In this paper, we propose a deep Hierarchical Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for VNE, named HRL-ACRA. Specifically, the whole VNE process is decomposed into an upper-level policy for deciding whether to admit the arriving VNR or not and a lower-level policy for allocating resources of the physical network to meet the requirement of VNR through the HRL approach. Considering the proximal policy optimization as the basic training algorithm, we also adopt the average reward method to address the infinite horizon problem of the upper-level agent and design a customized multi-objective intrinsic reward to alleviate the sparse reward issue of the lower-level agent. Moreover, we develop a deep feature-aware graph neural network to capture the features of VNR and physical network and exploit a sequence-to-sequence model to generate embedding actions iteratively. Finally, extensive experiments are conducted in various settings, and show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue. Our code is available at \url{https://github.com/GeminiLight/hrl-acra}.

Updated: 2024-06-25 07:42:30

标题: 虚拟网络嵌入的联合接入控制和资源分配：基于层次深度强化学习的方法

摘要: 作为网络虚拟化中的一个关键资源管理问题，虚拟网络嵌入（VNE）旨在将物理网络的有限资源分配给具有不同资源需求的顺序到达的虚拟网络请求（VNRs）。由于这是一个NP困难的组合优化问题，许多努力已经进行，以提供可行的解决方案。然而，大多数现有方法要么忽略了VNRs的准入控制，这可能会对长期性能产生影响，要么没有充分利用物理网络和VNRs的时间和拓扑特征。在本文中，我们提出了一种深度分层强化学习方法，用于学习一种用于VNE的联合准入控制和资源分配策略，称为HRL-ACRA。具体而言，整个VNE过程被分解为一个用于决定是否接纳到达的VNR的上层策略和一个用于通过HRL方法分配物理网络资源以满足VNR需求的下层策略。考虑到近端策略优化作为基本训练算法，我们还采用平均奖励方法来解决上层代理的无限视野问题，并设计了一个定制的多目标内在奖励来缓解下层代理的稀疏奖励问题。此外，我们开发了一个深度特征感知图神经网络来捕捉VNR和物理网络的特征，并利用一个序列到序列模型来迭代生成嵌入动作。最后，在各种设置中进行了广泛的实验，结果表明HRL-ACRA在接受比和长期平均收入方面优于现有技术基线。我们的代码可在\url{https://github.com/GeminiLight/hrl-acra}获取。

更新时间: 2024-06-25 07:42:30

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2406.17334v1

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.

Updated: 2024-06-25 07:25:23

标题: 自适应协同相关性学习的半监督多标签特征选择

摘要: 最近发展了半监督多标签特征选择方法，以解决高维多标签数据中存在某些样本缺失标签的维度灾难问题。尽管已经做出了许多努力，但大多数现有方法都使用预定义的图方法来捕捉样本相似性或标签相关性。通过这种方式，原始特征空间中的噪声和异常值可能会破坏生成的样本相似性图的可靠性。此外，由于存在未知标签，它也无法精确描述标签相关性。此外，这些方法只考虑所选特征的识别能力，而忽视它们的冗余性。在本文中，我们提出了一种基于自适应协同相关学习的半监督多标签特征选择（Access-MFS）方法来解决这些问题。具体而言，引入了一个配备扩展的不相关约束的广义回归模型，以选择具有鉴别能力但不相关的特征，并在有标签数据中维持预测和地面真实标签之间的一致性。然后，将实例相关性和标签相关性整合到所提出的回归模型中，以自适应地学习样本相似性图和标签相似性图，从而相互增强特征选择性能。广泛的实验结果表明，所提出的Access-MFS方法优于其他最先进的方法。

更新时间: 2024-06-25 07:25:23

领域: cs.LG

下载: http://arxiv.org/abs/2406.12193v2

Dual-Space Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are from the respective output spaces of the two models, using their own prediction heads. We argue that the space discrepancy will lead to low similarity between the teacher model and the student model on both representation and distribution levels. Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs. To address these issues, we propose a dual-space knowledge distillation (DSKD) framework that unifies the output spaces of the two models for KD. On the basis of DSKD, we further develop a cross-model attention mechanism, which can automatically align the representations of the two models with different vocabularies. Thus, our framework is not only compatible with various distance functions for KD (e.g., KL divergence) like the current framework, but also supports KD between any two LLMs regardless of their vocabularies. Experiments on task-agnostic instruction-following benchmarks show that DSKD significantly outperforms the current white-box KD framework with various distance functions, and also surpasses existing KD methods for LLMs with different vocabularies.

Updated: 2024-06-25 07:25:15

标题: 大型语言模型的双空间知识蒸馏

摘要: 知识蒸馏（KD）被认为是一种很有前途的解决方案，可以通过将知识转移给更小的模型来压缩大型语言模型（LLMs）。在这个过程中，白盒KD方法通常会最小化两个模型的输出分布之间的距离，以便能够转移更多的知识。然而，在当前的白盒KD框架中，输出分布来自两个模型的各自输出空间，使用它们自己的预测头。我们认为，空间差异会导致教师模型和学生模型在表示和分布水平上具有较低的相似性。此外，这种差异也阻碍了具有不同词汇的模型之间的KD过程，这在当前的LLMs中很常见。为了解决这些问题，我们提出了一个双空间知识蒸馏（DSKD）框架，统一了两个模型的输出空间以进行KD。基于DSKD，我们进一步开发了一种跨模型注意力机制，可以自动对齐具有不同词汇的两个模型的表示。因此，我们的框架不仅与当前框架一样兼容各种KD距离函数（如KL散度），而且还支持任何两个LLMs之间的KD，无论它们的词汇是什么。对任务不可知的指令遵循基准的实验表明，DSKD在各种距离函数下明显优于当前的白盒KD框架，并且也优于现有的具有不同词汇的LLMs的KD方法。

更新时间: 2024-06-25 07:25:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17328v1

The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game

Cooperative behavior is prevalent in both human society and nature. Understanding the emergence and maintenance of cooperation among self-interested individuals remains a significant challenge in evolutionary biology and social sciences. Reinforcement learning (RL) provides a suitable framework for studying evolutionary game theory as it can adapt to environmental changes and maximize expected benefits. In this study, we employ the State-Action-Reward-State-Action (SARSA) algorithm as the decision-making mechanism for individuals in evolutionary game theory. Initially, we apply SARSA to imitation learning, where agents select neighbors to imitate based on rewards. This approach allows us to observe behavioral changes in agents without independent decision-making abilities. Subsequently, SARSA is utilized for primary agents to independently choose cooperation or betrayal with their neighbors. We evaluate the impact of SARSA on cooperation rates by analyzing variations in rewards and the distribution of cooperators and defectors within the network.

Updated: 2024-06-25 07:21:35

标题: 《空间囚徒困境游戏中的状态-动作-奖励-状态-动作算法》

摘要: 合作行为在人类社会和自然界中普遍存在。理解自私个体之间合作的出现和维持仍然是进化生物学和社会科学中的一项重要挑战。强化学习（RL）提供了一个适合研究进化博弈论的框架，因为它可以适应环境变化并最大化预期利益。在这项研究中，我们采用状态-动作-奖励-状态-动作（SARSA）算法作为进化博弈论中个体的决策机制。最初，我们将SARSA应用于模仿学习，其中代理根据奖励选择邻居进行模仿。这种方法使我们能够观察代理的行为变化，而不需要独立的决策能力。随后，SARSA被用于主要代理独立选择与邻居合作还是背叛。我们通过分析奖励的变化和网络内合作者和叛徒的分布，评估了SARSA对合作率的影响。

更新时间: 2024-06-25 07:21:35

领域: cs.AI

下载: http://arxiv.org/abs/2406.17326v1

FAIIR: Building Toward A Conversational AI Agent Assistant for Youth Mental Health Service Provision

The world's healthcare systems and mental health agencies face both a growing demand for youth mental health services, alongside a simultaneous challenge of limited resources. Here, we focus on frontline crisis support, where Crisis Responders (CRs) engage in conversations for youth mental health support and assign an issue tag to each conversation. In this study, we develop FAIIR (Frontline Assistant: Issue Identification and Recommendation), an advanced tool leveraging an ensemble of domain-adapted and fine-tuned transformer models trained on a large conversational dataset comprising 780,000 conversations. The primary aim is to reduce the cognitive burden on CRs, enhance the accuracy of issue identification, and streamline post-conversation administrative tasks. We evaluate FAIIR on both retrospective and prospective conversations, emphasizing human-in-the-loop design with active CR engagement for model refinement, consensus-building, and overall assessment. Our results indicate that FAIIR achieves an average AUCROC of 94%, a sample average F1-score of 64%, and a sample average recall score of 81% on the retrospective test set. We also demonstrate the robustness and generalizability of the FAIIR tool during the silent testing phase, with less than a 2% drop in all performance metrics. Notably, CRs' responses exhibited an overall agreement of 90.9% with FAIIR's predictions. Furthermore, expert agreement with FAIIR surpassed their agreement with the original labels. To conclude, our findings indicate that assisting with the identification of issues of relevance helps reduce the burden on CRs, ensuring that appropriate resources can be provided and that active rescues and mandatory reporting can take place in critical situations requiring immediate de-escalation.

Updated: 2024-06-25 07:18:14

标题: FAIIR：为青少年心理健康服务提供构建以对话AI代理助手

摘要: 世界的卫生保健系统和心理健康机构面临着青少年心理健康服务需求增长和有限资源挑战并存的情况。在这里，我们关注前线危机支持，危机响应者（CRs）参与青少年心理健康支持对话并为每个对话分配一个问题标签。在这项研究中，我们开发了FAIIR（前线助手：问题识别和建议），这是一种先进工具，利用了一个由78万个对话组成的大型对话数据集训练的一组经过领域适应和微调的变压器模型。主要目标是减轻CRs的认知负担，提高问题识别的准确性，并简化对话后的行政任务。我们在回顾性和前瞻性对话中评估了FAIIR，强调了人为主导设计，积极参与CRs进行模型细化、共识建立和整体评估。我们的结果表明，FAIIR在回顾性测试集上实现了94%的平均AUCROC、64%的样本平均F1分数和81%的样本平均召回分数。我们还在沉默测试阶段展示了FAIIR工具的稳健性和泛化能力，所有绩效指标均不到2%的下降。值得注意的是，CRs的响应与FAIIR的预测整体一致性达到了90.9%。此外，专家与FAIIR的一致性超过了他们与原始标签的一致性。总之，我们的研究结果表明，协助识别相关问题有助于减轻CRs的负担，确保能够提供适当的资源，并在需要立即缓解的紧急情况下进行积极营救和强制性报告。

更新时间: 2024-06-25 07:18:14

领域: cs.AI

下载: http://arxiv.org/abs/2405.18553v3

XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

Reflected or scattered light produce artefacts in astronomical observations that can negatively impact the scientific study. Hence, automated detection of these artefacts is highly beneficial, especially with the increasing amounts of data gathered. Machine learning methods are well-suited to this problem, but currently there is a lack of annotated data to train such approaches to detect artefacts in astronomical observations. In this work, we present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts. We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods. We further demonstrate techniques tailored for accurate detection and masking of artefacts using instance segmentation. We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models and use their advantages in segmentation. The presented method and dataset will advance artefact detection in astronomical observations by providing a reproducible baseline. All code and data are made available (https://github.com/ESA-Datalabs/XAMI-model and https://github.com/ESA-Datalabs/XAMI-dataset).

Updated: 2024-06-25 07:14:15

标题: XAMI -- 用于XMM-Newton光学图像中伪影检测的基准数据集

摘要: 反射或散射光在天文观测中产生的伪影可能会对科学研究产生负面影响。因此，自动检测这些伪影是非常有益的，尤其是随着数据量的增加。机器学习方法非常适合解决这个问题，但目前缺乏已标记的数据来训练这种方法以检测天文观测中的伪影。在这项工作中，我们展示了来自XMM-Newton空间望远镜光学监测相机的图像数据集，展示了不同类型的伪影。我们手动标注了1000张带有伪影的图像样本，用于训练自动化机器学习方法。我们进一步展示了针对准确检测和遮蔽伪影的技术，使用实例分割。我们采用混合方法，结合卷积神经网络（CNNs）和基于变压器的模型的知识，并利用它们在分割中的优势。所提出的方法和数据集将通过提供可重现的基准来推进天文观测中的伪影检测。所有代码和数据均可在以下网址获得（https://github.com/ESA-Datalabs/XAMI-model 和 https://github.com/ESA-Datalabs/XAMI-dataset）。

更新时间: 2024-06-25 07:14:15

领域: cs.CV,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2406.17323v1

ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms' efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings. We provide ALPBench here: https://github.com/ValentinMargraf/ActiveLearningPipelines.

Updated: 2024-06-25 07:14:14

标题: ALPBench：一个针对表格数据上主动学习流水线的基准测试

摘要: 在只能承担有限标记数据预算的情况下，主动学习旨在设计查询策略，以选择最具信息量的数据点进行标记，以增强学习算法的效率和性能。在主动学习文献中已经提出并比较了许多这样的查询策略。然而，学术界仍然缺乏用于比较不同查询策略性能的标准基准。这尤其适用于将不同学习算法的查询策略组合到主动学习管道中，并研究学习算法选择的影响。为了弥补这一差距，我们提出了ALPBench，它可以促进主动学习管道的规范、执行和性能监控。它具有内置措施，可以确保评估是可重现的，保存所使用算法的确切数据集拆分和超参数设置。总体上，ALPBench包括86个真实世界的表格分类数据集和5个主动学习设置，产生430个主动学习问题。为了展示其用途和与各种学习算法和查询策略的广泛兼容性，我们进行了一项示范性研究，评估了9种查询策略与8种学习算法在2种不同设置中的配对。我们在这里提供ALPBench: https://github.com/ValentinMargraf/ActiveLearningPipelines。

更新时间: 2024-06-25 07:14:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17322v1

Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning.

Updated: 2024-06-25 07:08:34

标题: 专家Q学习：利用离线专家示例的粗糙状态值进行深度强化学习

摘要: 在这篇文章中，我们提出了一种名为专家Q-learning的深度强化学习算法。专家Q-learning受到Dueling Q-learning的启发，旨在通过将Q值分为状态值和动作优势，将半监督学习纳入强化学习中。我们要求一个离线专家以粗略方式使用三个离散值评估状态的价值。除了Q网络之外，我们设计了一个专家网络，每当专家示例缓冲区不为空时，它会在常规离线小批量更新之后更新。我们使用棋盘游戏奥赛洛来将我们的算法与基准Q-learning算法进行比较，该算法是双Q-learning和Dueling Q-learning的组合。我们的结果表明，专家Q-learning确实有用，并且更能抵抗高估偏差。基准Q-learning算法在非确定性环境中表现不稳定且次优，而专家Q-learning表现出更稳健的性能，得分更高，说明我们的算法确实适合将专家示例中的状态值整合到Q-learning中。

更新时间: 2024-06-25 07:08:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2106.14642v5

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a \textbf{Di}stribution-\textbf{P}reserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.

Updated: 2024-06-25 07:08:17

标题: 一种适用于大型语言模型的具有弹性和可访问性的分布保持水印

摘要: 水印技术提供了一种有前途的方法，通过将隐蔽信息嵌入从语言模型生成的内容中来识别机器生成的内容。该领域的一个挑战在于在水印后保持原始生成内容的分布。我们的研究对现有的水印框架进行了扩展和改进，强调了\textbf{Di}stribution-\textbf{P}reserving（DiP）水印的重要性。与当前的策略相反，我们提出的DiPmark在水印过程中同时保持了原始令牌分布（分布保持），可以在没有语言模型API和提示的情况下检测（可访问性），并且经证明对令牌的中等更改具有鲁棒性（弹性）。DiPmark通过在生成单词之前选择一组随机令牌，然后通过分布保持的重新加权函数修改令牌分布，以增强在抽样过程中这些选定令牌的概率来运行。对各种语言模型和任务的广泛实证评估展示了我们方法的分布保持特性、可访问性和鲁棒性，使其成为需要无可挑剔的质量保留的水印任务的有效解决方案。

更新时间: 2024-06-25 07:08:17

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.07710v2

Detecting Misuse of Security APIs: A Systematic Review

Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and evaluated several approaches to detecting security API misuse to help developers and organizations. This study rigorously reviews the literature on detecting misuse of security APIs to gain a comprehensive understanding of this critical domain. Our goal is to identify and analyze security API misuses, the detection approaches developed, and the evaluation methodologies employed along with the open research avenues to advance the state-of-the-art in this area. Employing the systematic literature review (SLR) methodology, we analyzed 69 research papers. Our review has yielded (a) identification of 6 security API types; (b) classification of 30 distinct misuses; (c) categorization of detection techniques into heuristic-based and ML-based approaches; and (d) identification of 10 performance measures and 9 evaluation benchmarks. The review reveals a lack of coverage of detection approaches in several areas. We recommend that future efforts focus on aligning security API development with developers' needs and advancing standardized evaluation methods for detection technologies.

Updated: 2024-06-25 07:01:49

标题: 检测安全API滥用：一项系统性审查

摘要: 安全应用程序编程接口（API）对确保软件安全至关重要。然而，它们的错误使用会引入漏洞，可能导致严重的数据泄露和大量的财务损失。复杂的API设计、不足的文档和不足的安全培训通常会导致开发人员不经意地错误使用。软件安全社区已经设计并评估了几种方法来检测安全API的错误使用，以帮助开发人员和组织。本研究严格审查了有关检测安全API错误使用的文献，以全面了解这一关键领域。我们的目标是识别和分析安全API的错误使用、开发的检测方法以及采用的评估方法，以及推进这一领域的最新技术的开放研究方向。采用系统文献综述（SLR）方法，我们分析了69篇研究论文。我们的综述结果包括：（a）识别了6种安全API类型；（b）对30种不同的错误使用进行分类；（c）将检测技术分为基于启发式和基于机器学习的方法；以及（d）确定了10个性能指标和9个评估基准。综述揭示了在几个领域缺乏检测方法的覆盖。我们建议未来的努力应着重于使安全API的开发与开发人员的需求保持一致，并推进检测技术的标准化评估方法。

更新时间: 2024-06-25 07:01:49

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2306.08869v2

A review of unsupervised learning in astronomy

This review summarizes popular unsupervised learning methods, and gives an overview of their past, current, and future uses in astronomy. Unsupervised learning aims to organise the information content of a dataset, in such a way that knowledge can be extracted. Traditionally this has been achieved through dimensionality reduction techniques that aid the ranking of a dataset, for example through principal component analysis or by using auto-encoders, or simpler visualisation of a high dimensional space, for example through the use of a self organising map. Other desirable properties of unsupervised learning include the identification of clusters, i.e. groups of similar objects, which has traditionally been achieved by the k-means algorithm and more recently through density-based clustering such as HDBSCAN. More recently, complex frameworks have emerged, that chain together dimensionality reduction and clustering methods. However, no dataset is fully unknown. Thus, nowadays a lot of research has been directed towards self-supervised and semi-supervised methods that stand to gain from both supervised and unsupervised learning.

Updated: 2024-06-25 06:57:47

标题: 天文学中无监督学习的综述

摘要: 这篇综述总结了流行的无监督学习方法，并概述了它们在天文学中过去、现在和未来的应用。无监督学习旨在以一种有利于提取知识的方式组织数据集的信息内容。传统上，这通过降维技术实现，这些技术有助于对数据集进行排名，例如通过主成分分析或使用自动编码器，或者通过对高维空间进行简化的可视化，例如通过使用自组织映射。无监督学习的其他理想特性包括识别簇，即类似对象的群组，传统上通过k均值算法实现，最近通过基于密度的聚类如HDBSCAN实现。最近出现了复杂的框架，将降维和聚类方法链接在一起。然而，没有任何数据集是完全未知的。因此，现在很多研究都致力于自监督和半监督方法，这些方法能够从监督学习和无监督学习中获益。

更新时间: 2024-06-25 06:57:47

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2406.17316v1

Improving Realized LGD Approximation: A Novel Framework with XGBoost for Handling Missing Cash-Flow Data

The scope for the accurate calculation of the Loss Given Default (LGD) parameter is comprehensive in terms of financial data. In this research, we aim to explore methods for improving the approximation of realized LGD in conditions of limited access to the cash-flow data. We enhance the performance of the method which relies on the differences between exposure values (delta outstanding approach) by employing machine learning (ML) techniques. The research utilizes the data from the mortgage portfolio of one of the European countries and assumes a close resemblance to similar economic contexts. It incorporates non-financial variables and macroeconomic data related to the housing market, improving the accuracy of loss severity approximation. The proposed methodology attempts to mitigate the country-specific (related to the local legal) or portfolio-specific factors in aim to show the general advantage of applying ML techniques, rather than case-specific relation. We developed an XGBoost model that does not rely on cash-flow data yet enhances the accuracy of realized LGD estimation compared to results obtained with the delta outstanding approach. A novel aspect of our work is the detailed exploration of the delta outstanding approach and the methodology for addressing conditions of limited access to cash-flow data through machine learning models.

Updated: 2024-06-25 06:41:09

标题: 改进实现的LGD近似值：一种新的框架，使用XGBoost处理缺失现金流数据

摘要: 在财务数据方面，对于准确计算违约损失率（LGD）参数的范围是全面的。本研究旨在探索在有限获取现金流数据的情况下改进实现LGD的逼近方法。我们通过采用机器学习（ML）技术，提高了依赖于暴露价值差异（未偿债务方法）的方法的性能。该研究利用了欧洲某国的抵押贷款组合数据，并假定与类似的经济环境相近。它结合了与房屋市场相关的非金融变量和宏观经济数据，提高了损失严重性逼近的准确性。所提出的方法试图减轻与特定国家（与当地法律相关）或特定组合因素有关的影响，以展示应用ML技术的一般优势，而不是特定案例的关系。我们开发了一个不依赖于现金流数据但比未偿债务方法获得的结果更准确的实现LGD估计的XGBoost模型。我们工作的一个新颖方面是对未偿债务方法和应对有限获取现金流数据条件的方法进行了详细探讨，通过机器学习模型。

更新时间: 2024-06-25 06:41:09

领域: q-fin.RM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17308v1

Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models

We propose Knowledge Crosswords, a geometric knowledge reasoning benchmark consisting of incomplete knowledge networks bounded by structured factual constraints, where LLMs are tasked with inferring the missing facts to meet all constraints. The novel setting of geometric knowledge reasoning necessitates new LM abilities beyond existing atomic/linear multi-hop QA, such as backtracking, verifying facts and constraints, reasoning with uncertainty, and more. Knowledge Crosswords contains 2,101 individual problems, covering diverse knowledge domains, and is further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLMs and approaches on Knowledge Crosswords. Results demonstrate that baseline approaches struggle with larger knowledge networks and semantically-equivalent entity distractors. In light of their limitations, we propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' abilities for error-aware backtracking and constraint verification. Our Verify-All significantly outperforms prior methods and is more robust towards problems in the hard subset. Further analysis shows that geometric knowledge reasoning poses new challenges to LLMs' knowledge abilities, particularly in robustness towards varying option orders, complex structural constraints in knowledge networks, "none of the above" scenarios, and more.

Updated: 2024-06-25 06:25:41

标题: 知识填字游戏：利用大型语言模型进行几何知识推理

摘要: 我们提出了知识填字游戏（Knowledge Crosswords），这是一个由结构化事实约束限制的不完整知识网络组成的几何知识推理基准，LLMs的任务是推断缺失的事实以满足所有约束条件。几何知识推理的新颖设置需要超出现有原子/线性多跳QA的新LM能力，例如回溯，验证事实和约束，推理不确定性等。知识填字游戏包含2,101个独立问题，涵盖多种知识领域，并进一步分为三个难度级别。我们进行了大量实验，评估现有的LLMs和方法在知识填字游戏上的表现。结果表明，基线方法在处理更大的知识网络和语义等价实体干扰物时存在困难。鉴于它们的局限性，我们提出了两种新方法，分阶段提示和全部验证，以增强LLMs在错误感知回溯和约束验证方面的能力。我们的全部验证方法明显优于先前的方法，并对困难子集中的问题更具鲁棒性。进一步分析显示，几何知识推理对LLMs的知识能力提出了新的挑战，特别是在对各种选项顺序的鲁棒性，知识网络中的复杂结构约束，“以上都不是”情况等方面。

更新时间: 2024-06-25 06:25:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.01290v2

Approximation Theory of Tree Tensor Networks: Tensorized Multivariate Functions

We study the approximation of multivariate functions with tensor networks (TNs). The main conclusion of this work is an answer to the following two questions: ``What are the approximation capabilities of TNs?" and "What is an appropriate model class of functions that can be approximated with TNs?" To answer the former, we show that TNs can (near to) optimally replicate $h$-uniform and $h$-adaptive approximation, for any smoothness order of the target function. Tensor networks thus exhibit universal expressivity w.r.t. isotropic, anisotropic and mixed smoothness spaces that is comparable with more general neural networks families such as deep rectified linear unit (ReLU) networks. Put differently, TNs have the capacity to (near to) optimally approximate many function classes -- without being adapted to the particular class in question. To answer the latter, as a candidate model class we consider approximation classes of TNs and show that these are (quasi-)Banach spaces, that many types of classical smoothness spaces are continuously embedded into said approximation classes and that TN approximation classes are themselves not embedded in any classical smoothness space.

Updated: 2024-06-25 06:24:52

标题: 树张量网络的逼近理论：张量化多变量函数

摘要: 我们研究了使用张量网络（TNs）来逼近多元函数。这项工作的主要结论回答了以下两个问题：“TNs的逼近能力是什么？”和“哪种函数模型类适合使用TNs进行逼近？”为了回答前者，我们表明TNs可以（接近）最优地复制$h$-均匀和$h$-自适应逼近，对于目标函数的任何光滑度顺序。因此，张量网络在各向同性、各向异性和混合光滑度空间方面表现出与更一般的神经网络家族（如深度修正线性单元（ReLU）网络）可比拟的通用表达能力。换句话说，TNs具有能力（接近）最优地逼近许多函数类别，而无需针对特定类别进行调整。为了回答后者，作为候选模型类，我们考虑TNs的逼近类，并表明这些类是（拟）巴拿赫空间，许多类型的经典光滑度空间被连续嵌入到这些逼近类中，并且TN的逼近类本身并不被任何经典光滑度空间所嵌入。

更新时间: 2024-06-25 06:24:52

领域: math.FA,cs.LG,cs.NA,math.NA,41A65, 41A15, 41A10 (primary), 68T05, 42C40, 65D99 (secondary)

下载: http://arxiv.org/abs/2101.11932v5

Towards Hypermedia Environments for Adaptive Coordination in Industrial Automation

Electromechanical systems manage physical processes through a network of inter-connected components. Today, programming the interactions required for coordinating these components is largely a manual process. This process is time-consuming and requires manual adaptation when system features change. To overcome this issue, we use autonomous software agents that process semantic descriptions of the system to determine coordination requirements and constraints; on this basis, they then interact with one another to control the system in a decentralized and coordinated manner.Our core insight is that coordination requirements between individual components are, ultimately, largely due to underlying physical interdependencies between the components, which can be (and, in many cases, already are) semantically modeled in automation projects. Agents then use hypermedia to discover, at run time, the plans and protocols required for enacting the coordination. A key novelty of our approach is the use of hypermedia-driven interaction: it reduces coupling in the system and enables its run-time adaptation as features change.

Updated: 2024-06-25 06:21:52

标题: 朝向工业自动化中自适应协调的超媒体环境

摘要: 电机系统通过一个互连的组件网络来管理物理过程。如今，为了协调这些组件之间的交互，编程过程主要是手动的。这个过程耗时且需要在系统特性改变时手动适应。为了克服这个问题，我们使用自主软件代理处理系统的语义描述，以确定协调需求和约束；基于此，它们相互交互以去分散和协调地控制系统。我们的核心见解是，单个组件之间的协调需求，最终主要是由组件之间的物理相互依赖导致的，这些依赖可以在自动化项目中进行语义建模。代理然后利用超媒体在运行时发现实施协调所需的计划和协议。我们方法的一个关键创新是使用超媒体驱动的交互：它减少了系统中的耦合，并且使系统能够在特性改变时进行运行时适应。

更新时间: 2024-06-25 06:21:52

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2406.17816v1

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.

Updated: 2024-06-25 06:13:38

标题: 用Gumbel传播的潜在最优路径进行变分贝叶斯动态规划

摘要: 我们提出了一种通过概率软化解决经典最优路径问题的随机最优路径。这种统一方法将各种DP问题转化为有向无环图，其中所有路径都遵循Gibbs分布。我们通过Gumbel分布的性质展示了Gibbs分布与消息传递算法的等价性，并提供了进行潜在路径的变分贝叶斯推断所需的所有要素，即贝叶斯动态规划（BDP）。我们展示了在变分自动编码器（VAEs）的潜在空间中使用BDP，并提出了捕捉结构稀疏最优路径作为潜在变量的BDP-VAE。这使得基于未观察到的结构信息的生成任务能够进行端到端训练。最后，我们验证了我们方法的行为，并展示了其在两个真实世界应用中的适用性：文本转语音和歌声合成。我们的实现代码可在\url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}中找到。

更新时间: 2024-06-25 06:13:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.02568v3

Towards Efficient and Scalable Training of Differentially Private Deep Learning

Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The major drawback of DP-SGD is the drop in utility which prior work has comprehensively studied. However, in practice another major drawback that hinders the large-scale deployment is the significantly higher computational cost. We conduct a comprehensive empirical study to quantify the computational cost of training deep learning models under DP and benchmark methods that aim at reducing the cost. Among these are more efficient implementations of DP-SGD and training with lower precision. Finally, we study the scaling behaviour using up to 80 GPUs.

Updated: 2024-06-25 06:04:58

标题: 朝着高效和可扩展的差分隐私深度学习训练

摘要: 差分隐私随机梯度下降（DP-SGD）是在差分隐私（DP）下训练机器学习模型的标准算法。DP-SGD的主要缺点是效用下降，先前的研究已经全面研究过。然而，在实践中，另一个阻碍大规模部署的主要缺点是显着更高的计算成本。我们进行了一项全面的实证研究，以量化在DP下训练深度学习模型的计算成本，并对旨在减少成本的方法进行基准测试。其中包括更高效的DP-SGD实现和使用更低精度进行训练。最后，我们使用高达80个GPU来研究扩展行为。

更新时间: 2024-06-25 06:04:58

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.17298v1

SUM: Saliency Unification through Mamba for Visual Attention Modeling

Visual attention modeling, important for interpreting and prioritizing visual stimuli, plays a significant role in applications such as marketing, multimedia, and robotics. Traditional saliency prediction models, especially those based on Convolutional Neural Networks (CNNs) or Transformers, achieve notable success by leveraging large-scale annotated datasets. However, the current state-of-the-art (SOTA) models that use Transformers are computationally expensive. Additionally, separate models are often required for each image type, lacking a unified approach. In this paper, we propose Saliency Unification through Mamba (SUM), a novel approach that integrates the efficient long-range dependency modeling of Mamba with U-Net to provide a unified model for diverse image types. Using a novel Conditional Visual State Space (C-VSS) block, SUM dynamically adapts to various image types, including natural scenes, web pages, and commercial imagery, ensuring universal applicability across different data types. Our comprehensive evaluations across five benchmarks demonstrate that SUM seamlessly adapts to different visual characteristics and consistently outperforms existing models. These results position SUM as a versatile and powerful tool for advancing visual attention modeling, offering a robust solution universally applicable across different types of visual content.

Updated: 2024-06-25 05:54:07

标题: 总结：通过蟒蛇实现视觉注意建模的显著性统一

摘要: 视觉注意力建模对解释和优先处理视觉刺激至关重要，在营销、多媒体和机器人等应用中发挥着重要作用。传统显著性预测模型，特别是基于卷积神经网络（CNN）或变压器的模型，通过利用大规模标注数据集取得了显著成功。然而，目前基于变压器的最新模型在计算方面非常昂贵。此外，通常需要针对每种图像类型使用单独的模型，缺乏统一的方法。本文提出了一种名为SUM（Saliency Unification through Mamba）的新方法，该方法将Mamba的高效长距离依赖建模与U-Net集成，为不同图像类型提供了统一的模型。通过一种新颖的条件视觉状态空间（C-VSS）块，SUM能够动态适应各种图像类型，包括自然场景、网页和商业图像，确保在不同数据类型之间具有普遍适用性。我们在五个基准测试中进行了全面评估，结果表明SUM能够无缝适应不同的视觉特征，并始终优于现有模型。这些结果将SUM定位为推进视觉注意力建模的多功能强大工具，为不同类型的视觉内容提供了一个稳健的解决方案。

更新时间: 2024-06-25 05:54:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17815v1

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA) add trainable low-rank matrix factorizations, altering the training dynamics and limiting the model's parameter search to a low-rank subspace. GaLore, a more recent method, employs Gradient Low-Rank Projection to reduce the memory footprint, in the full parameter training setting. However GaLore can only be applied to a subset of the LLM layers that satisfy the "reversibility" property, thus limiting their applicability. In response to these challenges, we introduce BlockLLM, an approach inspired by block coordinate descent. Our method carefully selects and updates a very small subset of the trainable parameters without altering any part of its architecture and training procedure. BlockLLM achieves state-of-the-art performance in both finetuning and pretraining tasks, while reducing the memory footprint of the underlying optimization process. Our experiments demonstrate that fine-tuning with only less than 5% of the parameters, BlockLLM achieves state-of-the-art perplexity scores on the GLUE benchmarks. On Llama model pretrained on C4 dataset, BlockLLM is able to train with significantly less memory than the state-of-the-art, while still maintaining competitive performance.

Updated: 2024-06-25 05:45:12

标题: BlockLLM：通过选择和优化正确的坐标块来实现LLM的内存高效适应

摘要: 训练大型语言模型（LLMs）进行预训练或适应新任务和领域已经变得越来越关键，因为它们的应用范围不断扩大。然而，随着模型和数据规模的增长，训练过程面临重大的内存挑战，通常需要大量的GPU内存，这种内存可能不容易获得。现有方法如低秩适应（LoRA）添加可训练的低秩矩阵分解，改变训练动态，并将模型参数搜索限制在低秩子空间。最近的方法GaLore采用Gradient Low-Rank Projection来减少内存占用，在完整参数训练设置下。然而，GaLore只能应用于满足“可逆性”属性的LLM层的子集，从而限制了它们的适用性。为了应对这些挑战，我们提出了BlockLLM，这是一种受块坐标下降启发的方法。我们的方法精心选择并更新一小部分可训练参数，而不改变其任何部分的架构和训练过程。BlockLLM在微调和预训练任务中实现了最先进的性能，同时减少了底层优化过程的内存占用。我们的实验表明，仅使用不到5%的参数进行微调，BlockLLM在GLUE基准测试中实现了最先进的困惑度得分。在C4数据集上预训练的Llama模型上，BlockLLM能够以显著更少的内存进行训练，同时仍保持竞争性能。

更新时间: 2024-06-25 05:45:12

领域: cs.LG

下载: http://arxiv.org/abs/2406.17296v1

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?

Effectively representing materials as text has the potential to leverage the vast advancements of large language models (LLMs) for discovering new materials. While LLMs have shown remarkable success in various domains, their application to materials science remains underexplored. A fundamental challenge is the lack of understanding of how to best utilize text-based representations for materials modeling. This challenge is further compounded by the absence of a comprehensive benchmark to rigorously evaluate the capabilities and limitations of these text representations in capturing the complexity of material systems. To address this gap, we propose MatText, a suite of benchmarking tools and datasets designed to systematically evaluate the performance of language models in modeling materials. MatText encompasses nine distinct text-based representations for material systems, including several novel representations. Each representation incorporates unique inductive biases that capture relevant information and integrate prior physical knowledge about materials. Additionally, MatText provides essential tools for training and benchmarking the performance of language models in the context of materials science. These tools include standardized dataset splits for each representation, probes for evaluating sensitivity to geometric factors, and tools for seamlessly converting crystal structures into text. Using MatText, we conduct an extensive analysis of the capabilities of language models in modeling materials. Our findings reveal that current language models consistently struggle to capture the geometric information crucial for materials modeling across all representations. Instead, these models tend to leverage local information, which is emphasized in some of our novel representations. Our analysis underscores MatText's ability to reveal shortcomings of text-based methods for materials design.

Updated: 2024-06-25 05:45:07

标题: MatText：材料建模需要除文本和规模之外的内容吗？

摘要: 将材料有效地表示为文本具有利用大型语言模型（LLMs）发现新材料的潜力。虽然LLMs在各个领域已经取得了显著成功，但它们在材料科学中的应用仍未被充分探索。一个基本挑战是如何最好地利用基于文本的表示来进行材料建模的缺乏理解。这一挑战进一步加剧了没有全面基准来严格评估这些文本表示在捕捉材料系统复杂性方面的能力和限制。为了解决这一差距，我们提出了MatText，一个旨在系统评估语言模型在建模材料方面性能的基准工具和数据集套件。MatText包括九种用于材料系统的不同基于文本的表示，包括几种新颖的表示。每种表示都包含捕捉相关信息和整合有关材料的先前物理知识的独特归纳偏见。此外，MatText提供了用于在材料科学背景下训练和基准测试语言模型性能的基本工具。这些工具包括每种表示的标准数据集拆分、用于评估对几何因素敏感性的探针，以及将晶体结构无缝转换为文本的工具。使用MatText，我们对语言模型在建模材料方面的能力进行了广泛分析。我们的研究结果显示，当前的语言模型在各种表示中一致难以捕捉对材料建模至关重要的几何信息。相反，这些模型倾向于利用局部信息，这在我们的一些新颖表示中得到了强调。我们的分析强调了MatText揭示基于文本方法在材料设计中的不足。

更新时间: 2024-06-25 05:45:07

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2406.17295v1

A Survey on Safe Multi-Modal Learning System

In the rapidly evolving landscape of artificial intelligence, multimodal learning systems (MMLS) have gained traction for their ability to process and integrate information from diverse modality inputs. Their expanding use in vital sectors such as healthcare has made safety assurance a critical concern. However, the absence of systematic research into their safety is a significant barrier to progress in this field. To bridge the gap, we present the first taxonomy that systematically categorizes and assesses MMLS safety. This taxonomy is structured around four fundamental pillars that are critical to ensuring the safety of MMLS: robustness, alignment, monitoring, and controllability. Leveraging this taxonomy, we review existing methodologies, benchmarks, and the current state of research, while also pinpointing the principal limitations and gaps in knowledge. Finally, we discuss unique challenges in MMLS safety. In illuminating these challenges, we aim to pave the way for future research, proposing potential directions that could lead to significant advancements in the safety protocols of MMLS.

Updated: 2024-06-25 05:42:43

标题: 一个关于安全多模学习系统的调查

摘要: 在人工智能快速发展的领域中，多模态学习系统(MMLS)因其能够处理和整合来自不同模态输入的信息而受到关注。它们在医疗保健等重要领域的不断扩张已经使安全保障成为一个关键关注点。然而，对它们安全性的系统研究的缺乏是这一领域进展的重要障碍。为了弥合这一差距，我们提出了第一个系统地对MMLS安全性进行分类和评估的分类法。这个分类法围绕着四个关键的支柱结构，这些支柱对确保MMLS的安全性至关重要：健壮性、对齐性、监控和可控性。利用这个分类法，我们回顾了现有的方法论、基准和当前研究的现状，同时也指出了主要的限制和知识空白。最后，我们讨论了MMLS安全性中的独特挑战。通过阐明这些挑战，我们旨在为未来研究铺平道路，提出可能的方向，这可能会导致MMLS安全协议的重大进展。

更新时间: 2024-06-25 05:42:43

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.05355v4

Hyperbolic Knowledge Transfer in Cross-Domain Recommendation System

Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed data in recommendation systems. Additionally, adding data from other domains can worsen the long-tail characteristics of the entire dataset, making it harder to train CDR models effectively. Recent studies have shown that hyperbolic methods are particularly suitable for modeling long-tail distributions, which has led us to explore hyperbolic representations for users and items in CDR scenarios. However, due to the distinct characteristics of the different domains, applying hyperbolic representation learning to CDR tasks is quite challenging. In this paper, we introduce a new framework called Hyperbolic Contrastive Learning (HCTS), designed to capture the unique features of each domain while enabling efficient knowledge transfer between domains. We achieve this by embedding users and items from each domain separately and mapping them onto distinct hyperbolic manifolds with adjustable curvatures for prediction. To improve the representations of users and items in the target domain, we develop a hyperbolic contrastive learning module for knowledge transfer. Extensive experiments on real-world datasets demonstrate that hyperbolic manifolds are a promising alternative to Euclidean space for CDR tasks.

Updated: 2024-06-25 05:35:02

标题: 跨领域推荐系统中的双曲型知识转移

摘要: 跨领域推荐（CDR）旨在利用不同领域的知识来缓解目标推荐领域中数据稀疏的问题，并且近年来受到了越来越多的关注。尽管在这个领域已经取得了显著进展，但大多数当前方法表示用户和项目在欧几里得空间中，这对于处理推荐系统中的长尾分布数据并不理想。此外，添加来自其他领域的数据可能会加剧整个数据集的长尾特征，使得更难有效训练CDR模型。最近的研究表明，双曲方法特别适合建模长尾分布，这促使我们探索在CDR场景中为用户和项目引入双曲表示。然而，由于不同领域的独特特征，将双曲表示学习应用于CDR任务是非常具有挑战性的。在本文中，我们介绍了一个名为Hyperbolic Contrastive Learning（HCTS）的新框架，旨在捕捉每个领域的独特特征，同时实现领域之间的有效知识传递。我们通过分别嵌入每个领域的用户和项目，并将它们映射到具有可调曲率的不同双曲流形上进行预测。为了改善目标领域中用户和项目的表示，我们开发了一个双曲对比学习模块用于知识传递。对真实世界数据集的大量实验证明，双曲流形是CDR任务的一种有前途的替代方案。

更新时间: 2024-06-25 05:35:02

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.17289v1

Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play and questionnaire-based prompting to condition LLMs on counseling sessions, simulating client responses to the Big Five Inventory. We evaluated our framework on 853 real-world counseling sessions, finding a significant correlation between LLM-predicted and actual Big Five traits, proving the validity of framework. Moreover, ablation studies highlight the importance of role-play simulations and task simplification via questionnaires in enhancing prediction accuracy. Meanwhile, our fine-tuned Llama3-8B model, utilizing Direct Preference Optimization with Supervised Fine-Tuning, achieves a 130.95\% improvement, surpassing the state-of-the-art Qwen1.5-110B by 36.94\% in personality prediction validity. In conclusion, LLMs can predict personality based on counseling dialogues. Our code and model are publicly available at \url{https://github.com/kuri-leo/BigFive-LLM-Predictor}, providing a valuable tool for future research in computational psychometrics.

Updated: 2024-06-25 05:30:55

标题: 使用大型语言模型在中国辅导对话中预测大五人格特质

摘要: 准确评估个性特征对于有效的心理咨询至关重要，然而传统的方法如自我报告问卷耗时且存在偏见。本研究检验了大型语言模型（LLMs）能否直接从咨询对话中预测大五人格特征，并引入了一个创新的框架来执行此任务。我们的框架通过角色扮演和基于问卷的提示来调整LLMs对咨询会话进行条件化，模拟客户对大五人格清单的回应。我们在853个真实世界的咨询会话中评估了我们的框架，发现LLM预测的大五特质与实际特质之间存在显著相关性，证明了框架的有效性。此外，消融研究突显了角色扮演模拟和通过问卷简化任务在提高预测准确性方面的重要性。同时，我们的经过精细调整的Llama3-8B模型，利用直接偏好优化与监督微调，实现了130.95\%的改进，超越了现有技术Qwen1.5-110B36.94\%的人格预测效度。总之，LLMs可以根据咨询对话预测个性。我们的代码和模型可以在\url{https://github.com/kuri-leo/BigFive-LLM-Predictor}公开获取，为计算心理测量学的未来研究提供了有价值的工具。

更新时间: 2024-06-25 05:30:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17287v1

EON-1: A Brain-Inspired Processor for Near-Sensor Extreme Edge Online Feature Extraction

For Edge AI applications, deploying online learning and adaptation on resource-constrained embedded devices can deal with fast sensor-generated streams of data in changing environments. However, since maintaining low-latency and power-efficient inference is paramount at the Edge, online learning and adaptation on the device should impose minimal additional overhead for inference. With this goal in mind, we explore energy-efficient learning and adaptation on-device for streaming-data Edge AI applications using Spiking Neural Networks (SNNs), which follow the principles of brain-inspired computing, such as high-parallelism, neuron co-located memory and compute, and event-driven processing. We propose EON-1, a brain-inspired processor for near-sensor extreme edge online feature extraction, that integrates a fast online learning and adaptation algorithm. We report results of only 1% energy overhead for learning, by far the lowest overhead when compared to other SoTA solutions, while attaining comparable inference accuracy. Furthermore, we demonstrate that EON-1 is up for the challenge of low-latency processing of HD and UHD streaming video in real-time, with learning enabled.

Updated: 2024-06-25 05:23:41

标题: EON-1：一种用于传感器附近极端边缘在线特征提取的脑启发式处理器

摘要: 对于边缘人工智能应用来说，在资源受限的嵌入式设备上部署在线学习和调整可以处理在不断变化的环境中产生的快速传感器生成的数据流。然而，在边缘，保持低延迟和高效能推断至关重要，因此设备上的在线学习和调整应该对推断施加最小的额外开销。为了实现这一目标，我们探索了使用脉冲神经网络（SNNs）的流数据边缘人工智能应用的能效学习和调整，这些网络遵循了大脑启发式计算的原则，如高并行性、神经元共位内存和计算，以及事件驱动处理。我们提出了EON-1，一种用于近传感器极限边缘在线特征提取的脑启发处理器，集成了快速在线学习和调整算法。我们报告了学习方面仅1%的能量开销，与其他最新解决方案相比远低，同时获得了可比较的推断准确度。此外，我们证明了EON-1能够应对实时处理高清和超高清流视频的低延迟挑战，并且支持学习功能。

更新时间: 2024-06-25 05:23:41

领域: cs.NE,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2406.17285v1

Distance Recomputator and Topology Reconstructor for Graph Neural Networks

This paper introduces novel methodologies, the Distance Recomputator and Topology Reconstructor, aimed at enhancing Graph Neural Networks (GNNs). The Distance Recomputator dynamically recalibrates node distances within k-hop neighborhoods using a dynamic encoding scheme, thereby improving the accuracy and adaptability of node representations. Concurrently, the Topology Reconstructor adjusts local graph structures based on computed "similarity distances," optimizing network configurations for improved learning outcomes. These methods address the limitations of static node representations and fixed aggregation schemes in traditional GNNs, offering a more nuanced approach to modeling complex and dynamic graph topologies. Furthermore, our experimental evaluations demonstrate significant performance advantages over existing methods across various benchmark datasets. The proposed Distance Recomputator and Topology Reconstructor not only enhance node relationship modeling accuracy but also optimize information aggregation efficiency through an asynchronous aggregation mechanism. This approach proves particularly effective in scenarios involving dynamic or large-scale graphs, showcasing the methods' robustness and applicability in real-world graph learning tasks.

Updated: 2024-06-25 05:12:51

标题: 图神经网络的距离重计算器和拓扑重构器

摘要: 本文介绍了一种旨在增强图神经网络（GNNs）的新方法，距离重新计算器和拓扑重构器。距离重新计算器使用动态编码方案在k-hop邻域内动态重新校准节点距离，从而提高节点表示的准确性和适应性。同时，拓扑重构器根据计算的“相似距离”调整本地图结构，优化网络配置以改善学习结果。这些方法解决了传统GNNs中静态节点表示和固定聚合方案的局限性，提供了更细致的建模复杂和动态图拓扑的方法。此外，我们的实验评估显示，与各种基准数据集上的现有方法相比，提出的距离重新计算器和拓扑重构器具有显著的性能优势。这两种方法不仅提高了节点关系建模的准确性，还通过异步聚合机制优化了信息聚合效率。这种方法在涉及动态或大规模图的情况下特别有效，展示了这些方法在真实世界图学习任务中的稳健性和适用性。

更新时间: 2024-06-25 05:12:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.17281v1

Distribution Learnability and Robustness

We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnability of a distribution class and what is such learnability robust against. We show that realizable learnability of a class of distributions implies its robust learnability with respect to only additive corruption, but not against subtractive corruption. We also explore related implications in the context of compression schemes and differentially private learnability.

Updated: 2024-06-25 05:09:54

标题: 分布可学习性与鲁棒性

摘要: 我们研究了分布学习问题中可学习性与鲁棒（或不可知）可学习性之间的关系。我们展示了与其他学习设置（例如，函数类的PAC学习）相反，一类概率分布的可实现可学习性并不意味着其不可知可学习性。我们继续研究了什么类型的数据损坏可以破坏分布类的可学习性以及这种可学习性对抗哪种损坏是具有鲁棒性的。我们展示了一类分布的真实可学习性意味着其相对于仅加性损坏具有鲁棒性，但不对抗减性损坏。在压缩方案和差分隐私可学习性的背景下，我们还探讨了相关的影响。

更新时间: 2024-06-25 05:09:54

领域: stat.ML,cs.DS,cs.IT,cs.LG,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17814v1

Learning Decentralized Multi-Biped Control for Payload Transport

Payload transport over flat terrain via multi-wheel robot carriers is well-understood, highly effective, and configurable. In this paper, our goal is to provide similar effectiveness and configurability for transport over rough terrain that is more suitable for legs rather than wheels. For this purpose, we consider multi-biped robot carriers, where wheels are replaced by multiple bipedal robots attached to the carrier. Our main contribution is to design a decentralized controller for such systems that can be effectively applied to varying numbers and configurations of rigidly attached bipedal robots without retraining. We present a reinforcement learning approach for training the controller in simulation that supports transfer to the real world. Our experiments in simulation provide quantitative metrics showing the effectiveness of the approach over a wide variety of simulated transport scenarios. In addition, we demonstrate the controller in the real-world for systems composed of two and three Cassie robots. To our knowledge, this is the first example of a scalable multi-biped payload transport system.

Updated: 2024-06-25 05:08:44

标题: 学习用于载荷运输的分散式多足控制

摘要: 通过多轮机器人搬运器在平坦地形上的有效运输已经被广泛理解，并且具有可配置性。本文的目标是为了在更适合于腿而不是轮子的崎岖地形上提供类似的效率和可配置性。为此，我们考虑了多足机器人搬运器，其中车轮被多个连接到搬运器的双足机器人取代。我们的主要贡献是设计了一个去中心化的控制器，可以有效地应用于不同数量和配置的刚性连接的双足机器人而无需重新训练。我们提出了一种在模拟中训练控制器的强化学习方法，支持在真实世界中进行转移。我们在模拟中的实验提供了定量指标，展示了该方法在各种模拟运输场景中的有效性。此外，我们展示了在由两个和三个Cassie机器人组成的系统中的真实世界中的控制器。据我们所知，这是一个可扩展的多足机器人载荷运输系统的第一个示例。

更新时间: 2024-06-25 05:08:44

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.17279v1

NExT-GPT: Any-to-Any Multimodal LLM

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through various modalities, developing any-to-any MM-LLMs capable of accepting and delivering content in any modality becomes essential to human-level AI. To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities. Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation. Overall, our research showcases the promising possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community. Project page: https://next-gpt.github.io/

Updated: 2024-06-25 05:01:09

标题: NExT-GPT: 任意-任意多模态LLM

摘要: 最近，多模式大型语言模型（MM-LLMs）取得了令人兴奋的进展，但它们大多受限于仅具有输入端多模态理解的能力，而无法在多种模态下产生内容。由于我们人类总是通过各种方式感知世界并与人交流，因此开发能够接受和提供任何模态内容的任意至任意MM-LLMs对于达到人类水平的人工智能至关重要。为了填补这一空白，我们提出了一个端到端通用任意至任意MM-LLM系统NExT-GPT。我们将一个LLM与多模态适配器和不同的扩散解码器连接起来，使NExT-GPT能够以任意文本、图像、视频和音频的组合感知输入并生成输出。通过利用现有训练良好且性能出色的编码器和解码器，NExT-GPT仅调整了少量参数（某些投射层的1%），这不仅有利于低成本训练，还方便地扩展到更多潜在模态。此外，我们引入了一种模态切换指令调整（MosIT）并手动策划了一个高质量的MosIT数据集，基于该数据集，NExT-GPT具有复杂的跨模态语义理解和内容生成能力。总的来说，我们的研究展示了建立一种能够建模通用模态的AI代理的有前景的可能性，为社区中更加类人的AI研究铺平了道路。项目页面：https://next-gpt.github.io/

更新时间: 2024-06-25 05:01:09

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2309.05519v3

Securing Voice Authentication Applications Against Targeted Data Poisoning

Deep neural network-based voice authentication systems are promising biometric verification techniques that uniquely identify biological characteristics to verify a user. However, they are particularly susceptible to targeted data poisoning attacks, where attackers replace legitimate users' utterances with their own. We propose an enhanced framework using realworld datasets considering realistic attack scenarios. The results show that the proposed approach is robust, providing accurate authentications even when only a small fraction (5% of the dataset) is poisoned.

Updated: 2024-06-25 04:52:37

标题: 保护语音认证应用程序免受有针对性的数据毒化攻击

摘要: 基于深度神经网络的语音认证系统是一种有前景的生物特征验证技术，可以唯一识别生物特征以验证用户身份。但是，它们特别容易受到有针对性的数据毒化攻击的影响，攻击者会用自己的话语替换合法用户的话语。我们提出了一个增强框架，利用现实世界数据集考虑了现实的攻击场景。结果显示，所提出的方法是稳健的，即使只有很小一部分数据集（5%）受到毒化，也能提供准确的认证。

更新时间: 2024-06-25 04:52:37

领域: cs.CR

下载: http://arxiv.org/abs/2406.17277v1

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. However, these works often study idealized problem settings which may fail to capture complexities of real-world scenarios. We identify a natural tabular data setting where DL models can outperform GBDTs: tabular Learning-to-Rank (LTR) under label scarcity. Tabular LTR applications, including search and recommendation, often have an abundance of unlabeled data, and scarce labeled data. We show that DL rankers can utilize unsupervised pretraining to exploit this unlabeled data. In extensive experiments over both public and proprietary datasets, we show that pretrained DL rankers consistently outperform GBDT rankers on ranking metrics -- sometimes by as much as 38% -- both overall and on outliers.

Updated: 2024-06-25 04:41:56

标题: 预训练深度模型在标签稀缺情况下的学习排序中优于GBDTs

摘要: 关于表格数据，大量文献表明目前的深度学习（DL）模型在最佳情况下表现与梯度提升决策树（GBDTs）相似，但在异常数据上明显表现不佳。然而，这些研究往往研究了理想化的问题设置，可能无法捕捉现实场景的复杂性。我们确定了一个自然的表格数据设置，深度学习模型可以胜过GBDTs：在标签稀缺情况下的表格学习排序（LTR）。表格LTR应用，包括搜索和推荐，通常有大量未标记数据和稀少标记数据。我们展示了DL排序器可以利用无监督预训练来利用这些未标记数据。通过对公共和专有数据集进行广泛实验，我们展示预训练的DL排序器在排名指标上一贯优于GBDT排序器--有时甚至高达38%--无论是整体还是在异常值上。

更新时间: 2024-06-25 04:41:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.00177v4

Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncertainty model metrics on diverse and potentially conflicting NLG metrics. To address this issue, we introduce a comprehensive UE-TS benchmark incorporating 31 NLG metrics across four dimensions. The benchmark evaluates the uncertainty estimation capabilities of two large language models and one pre-trained language model on three datasets, with human-annotation analysis incorporated where applicable. We also assess the performance of 14 common uncertainty estimation methods within this benchmark. Our findings emphasize the importance of considering multiple uncorrelated NLG metrics and diverse uncertainty estimation methods to ensure reliable and efficient evaluation of UE-TS techniques.

Updated: 2024-06-25 04:41:17

标题: 我们能信任文本摘要中不确定性估计方法的性能评估吗？

摘要: 文本摘要是一项关键的自然语言生成（NLG）任务，在各个领域都至关重要。然而，在风险关键应用中，特别是涉及人在决策中的应用中，不准确摘要的高成本引发了对文本摘要中不确定性估计（UE-TS）评估方法可靠性的担忧。这种担忧源于不确定性模型指标对各种可能相互冲突的NLG指标的依赖。为了解决这个问题，我们引入了一个涵盖了四个维度的31个NLG指标的全面UE-TS基准。该基准评估了两个大语言模型和一个预训练语言模型在三个数据集上的不确定性估计能力，并在适用的情况下包括了人工注释分析。我们还评估了该基准中14种常见的不确定性估计方法的性能。我们的发现强调了考虑多个不相关的NLG指标和多样化的不确定性估计方法的重要性，以确保对UE-TS技术进行可靠和高效的评估。

更新时间: 2024-06-25 04:41:17

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17274v1

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty.

Updated: 2024-06-25 04:39:44

标题: 光度DESI明亮红星系大尺度聚集的本地原始非高斯性

摘要: 我们利用来自暗能量光谱仪（DESI）成像调查的明亮红色星系的角聚类来限制本地原始非高斯性参数$\fnl$。我们的样本包括超过1200万个目标，覆盖了14,000平方度的天空，红移范围为$0.2< z < 1.35$。我们确定了银河消光、调查深度和天文观测作为系统误差的主要来源，并采用线性回归和人工神经网络来减轻大尺度上的非宇宙学过剩聚类。我们的方法经过了有$\fnl$和系统误差以及不含这些内容的模拟测试，结果显示出神经网络处理的优越性能。通过一组九个成像属性图的神经网络通过了我们的系统空检验标准，并被选择为基准处理。假设普遍关系，我们发现$\fnl = 34^{+24(+50)}_{-44(-73)}$，置信水平为68\%(95\%)。我们进行了一系列稳健性测试（例如，对成像、赤纬或使用的尺度进行切割），结果显示获得的约束是一致的。我们研究了回归方法如何使测量的角功率谱产生偏差并降低$\fnl$的约束能力。使用九个图表比在回归中仅使用三个主要图表使不确定性增加了一倍以上。因此，我们的结果鼓励开发更有效的方法，避免过度校正，保护大尺度聚类信息，并保持约束能力。此外，我们的结果鼓励进一步研究DESI光谱样本中的$\fnl$，其中包括3D聚类模式应有助于分离成像系统误差，并减少$\fnl$不确定性的降级。

更新时间: 2024-06-25 04:39:44

领域: astro-ph.CO,cs.LG,physics.comp-ph,physics.data-an

下载: http://arxiv.org/abs/2307.01753v3

Modeling Emotions and Ethics with Large Language Models

This paper explores the integration of human-like emotions and ethical considerations into Large Language Models (LLMs). We first model eight fundamental human emotions, presented as opposing pairs, and employ collaborative LLMs to reinterpret and express these emotions across a spectrum of intensity. Our focus extends to embedding a latent ethical dimension within LLMs, guided by a novel self-supervised learning algorithm with human feedback (SSHF). This approach enables LLMs to perform self-evaluations and adjustments concerning ethical guidelines, enhancing their capability to generate content that is not only emotionally resonant but also ethically aligned. The methodologies and case studies presented herein illustrate the potential of LLMs to transcend mere text and image generation, venturing into the realms of empathetic interaction and principled decision-making, thereby setting a new precedent in the development of emotionally aware and ethically conscious AI systems.

Updated: 2024-06-25 04:36:08

标题: 使用大型语言模型对情绪和道德进行建模

摘要: 本文探讨了将类人情感和道德考虑融入大型语言模型（LLMs）的整合。我们首先建模了八种基本的人类情感，呈现为相对的对立对，使用协作LLMs重新解释和表达这些情感在不同强度的光谱上。我们的焦点扩展到在LLMs中嵌入一个潜在的道德维度，由一种新颖的自监督学习算法指导，带有人类反馈（SSHF）。这种方法使LLMs能够进行关于道德准则的自我评估和调整，增强它们生成内容的能力，不仅在情感上共鸣，而且在道德上保持一致。本文介绍的方法和案例研究展示了LLMs超越纯文本和图像生成的潜力，涉足共情互动和原则性决策领域，从而在开发具有情感意识和道德意识的AI系统方面树立了新的先例。

更新时间: 2024-06-25 04:36:08

领域: cs.CL,cs.AI,I.2.0

下载: http://arxiv.org/abs/2404.13071v2

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Recent works have shown promising results in connecting speech encoders to large language models (LLMs) for speech recognition. However, several limitations persist, including limited fine-tuning options, a lack of mechanisms to enforce speech-text alignment, and high insertion errors especially in domain mismatch conditions. This paper presents a comprehensive solution to address these issues. We begin by investigating more thoughtful fine-tuning schemes. Next, we propose a matching loss to enhance alignment between modalities. Finally, we explore training and inference methods to mitigate high insertion errors. Experimental results on the Librispeech corpus demonstrate that partially fine-tuning the encoder and LLM using parameter-efficient methods, such as LoRA, is the most cost-effective approach. Additionally, the matching loss improves modality alignment, enhancing performance. The proposed training and inference methods significantly reduce insertion errors.

Updated: 2024-06-25 04:35:50

标题: 一个全面的解决方案，用于连接语音编码器和大型语言模型进行自动语音识别

摘要: 最近的研究表明，在将语音编码器连接到大型语言模型（LLMs）进行语音识别方面取得了有希望的结果。然而，仍然存在一些限制，包括有限的微调选项，缺乏强制语音文本对齐的机制，以及在领域不匹配条件下尤其高的插入错误。本文提出了一个全面的解决方案来解决这些问题。我们首先调查了更加周到的微调方案。接下来，我们提出了一个匹配损失来增强模态之间的对齐。最后，我们探讨了训练和推理方法来减少高插入错误。在Librispeech语料库上的实验结果表明，使用参数高效的方法，如LoRA，部分微调编码器和LLM是最具成本效益的方法。此外，匹配损失提高了模态对齐，增强了性能。所提出的训练和推理方法显著减少了插入错误。

更新时间: 2024-06-25 04:35:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.17272v1

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%.

Updated: 2024-06-25 04:34:38

标题: 用动量对比学习增强的蒸馏时间序列预测网络

摘要: 对比表示学习在时间序列分析中至关重要，因为它缓解了数据噪声、不完整性以及监督信号稀疏性的问题。然而，现有的对比学习框架通常集中在时间内部特征上，未能充分利用时间序列数据的复杂性。为了解决这个问题，我们提出了DE-TSMCL，这是一个创新的蒸馏增强框架，用于长序列时间序列预测。具体地，我们设计了一个可学习的数据增强机制，它自适应地学习是否屏蔽一个时间戳以获得优化的子序列。然后，我们提出了一个对比学习任务，带有动量更新，来探索时间序列的样本间和时间内部相关性，从而学习未标记时间序列的潜在结构特征。同时，我们设计了一个监督任务来学习更健壮的表示，并促进对比学习过程。最后，我们联合优化上述两个任务。通过从多个任务中开发模型损失，我们可以学习出用于下游预测任务的有效表示。与最新技术进行广泛实验比较，充分展示了DE-TSMCL的有效性，其中最大改进可达27.3%。

更新时间: 2024-06-25 04:34:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.17802v2

CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same network settings were evaluated and compared to the conventional multi-pool Lorentzian fitting (MPLF) method in generating water and multiple CEST contrasts, including amide, relayed nuclear Overhauser effect (rNOE), and magnetization transfer (MT). Results: The water and CEST maps generated by both MLP and KAN were visually comparable to the MPLF results. However, the KAN model demonstrated higher accuracy in extrapolating the CEST fitting metrics, as evidenced by the smaller validation loss during training and smaller absolute error during testing. Voxel-wise correlation analysis showed that all four CEST fitting metrics generated by KAN consistently exhibited higher Pearson coefficients than the MLP results, indicating superior performance. Moreover, the KAN models consistently outperformed the MLP models in varying hidden layer numbers despite longer training time. Conclusion: In this study, we demonstrated for the first time the feasibility of utilizing KAN for CEST MRI data analysis, highlighting its superiority over MLP in this task. The findings suggest that CEST-KAN has the potential to be a robust and reliable post-analysis tool for CEST MRI in clinical settings.

Updated: 2024-06-25 04:28:09

标题: CEST-KAN：用于CEST MRI数据分析的科尔莫戈洛夫-阿诺德网络

摘要: 目的：本研究旨在提出并探讨使用科尔莫哥洛夫-阿诺德网络（KAN）进行CEST MRI数据分析（CEST-KAN）的可行性。方法：在3T下从十二名健康志愿者中获取CEST MRI数据。十位受试者的数据用于训练，而其余两位用于测试。评估和比较了具有相同网络设置的多层感知器（MLP）和KAN模型与传统的多池洛伦兹拟合（MPLF）方法在生成水和多个CEST对比，包括酰胺、中继核Overhauser效应（rNOE）和磁化转移（MT）方面的性能。结果：MLP和KAN生成的水和CEST地图在视觉上与MPLF结果相似。然而，KAN模型在外推CEST拟合度量方面表现出更高的准确性，这表现在训练期间更小的验证损失和测试期间更小的绝对误差。逐体素相关分析显示，KAN生成的四个CEST拟合度量始终比MLP结果具有更高的皮尔逊系数，表明其性能更优越。此外，尽管训练时间较长，KAN模型在不同隐藏层数量下始终优于MLP模型。结论：本研究首次证明了利用KAN进行CEST MRI数据分析的可行性，并强调了其在此任务中优于MLP的优越性。研究结果表明，CEST-KAN在临床环境中有潜力成为CEST MRI的强大可靠的后续分析工具。

更新时间: 2024-06-25 04:28:09

领域: physics.med-ph,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.16026v2

AG-LSEC: Audio Grounded Lexical Speaker Error Correction

Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker turns and regions of speech overlap. To reduce these errors, a Lexical Speaker Error Correction (LSEC), in which an external language model provides lexical information to correct the speaker errors, was recently proposed. Though the approach achieves good Word Diarization error rate (WDER) improvements, it does not use any additional acoustic information and is prone to miscorrections. In this paper, we propose to enhance and acoustically ground the LSEC system with speaker scores directly derived from the existing SD pipeline. This approach achieves significant relative WDER reductions in the range of 25-40% over the audio-based SD, ASR system and beats the LSEC system by 15-25% relative on RT03-CTS, Callhome American English and Fisher datasets.

Updated: 2024-06-25 04:20:49

标题: AG-LSEC：基于音频的词汇发音错误修正

摘要: 说话人分离（SD）系统通常基于音频，并独立于传统语音转录流水线中的ASR系统运行，可能由于SD和/或ASR协调而出现说话人错误，特别是在说话人转换和语音重叠的区域。为了减少这些错误，最近提出了一种词汇说话人错误纠正（LSEC）方法，其中外部语言模型提供词汇信息来纠正说话人错误。尽管这种方法在词汇分离错误率（WDER）方面取得了良好的改进，但它并未使用任何额外的声学信息，容易发生错误纠正。在本文中，我们提出通过直接从现有SD流水线中导出的说话人分数来增强和声学基础化LSEC系统。这种方法在RT03-CTS、Callhome American English和Fisher数据集上相对于基于音频的SD、ASR系统实现了显著的WDER降低，降低范围为25-40%，并且在RT03-CTS、Callhome American English和Fisher数据集上相对于LSEC系统实现了15-25%的改进。

更新时间: 2024-06-25 04:20:49

领域: eess.AS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17266v1

Image-Guided Outdoor LiDAR Perception Quality Assessment for Autonomous Driving

LiDAR is one of the most crucial sensors for autonomous vehicle perception. However, current LiDAR-based point cloud perception algorithms lack comprehensive and rigorous LiDAR quality assessment methods, leading to uncertainty in detection performance. Additionally, existing point cloud quality assessment algorithms are predominantly designed for indoor environments or single-object scenarios. In this paper, we introduce a novel image-guided point cloud quality assessment algorithm for outdoor autonomous driving environments, named the Image-Guided Outdoor Point Cloud Quality Assessment (IGO-PQA) algorithm. Our proposed algorithm comprises two main components. The first component is the IGO-PQA generation algorithm, which leverages point cloud data, corresponding RGB surrounding view images, and agent objects' ground truth annotations to generate an overall quality score for a single-frame LiDAR-based point cloud. The second component is a transformer-based IGO-PQA regression algorithm for no-reference outdoor point cloud quality assessment. This regression algorithm allows for the direct prediction of IGO-PQA scores in an online manner, without requiring image data and object ground truth annotations. We evaluate our proposed algorithm using the nuScenes and Waymo open datasets. The IGO-PQA generation algorithm provides consistent and reasonable perception quality indices. Furthermore, our proposed IGO-PQA regression algorithm achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.86 on the nuScenes dataset and 0.97 on the Waymo dataset.

Updated: 2024-06-25 04:16:14

标题: 自动驾驶的户外激光雷达感知质量评估的图像引导

摘要: 激光雷达是自动驾驶车辆感知中最关键的传感器之一。然而，目前基于激光雷达的点云感知算法缺乏全面严格的激光雷达质量评估方法，导致检测性能存在不确定性。此外，现有的点云质量评估算法主要设计用于室内环境或单个对象场景。本文介绍了一种新颖的图像引导的室外点云质量评估算法，名为图像引导的室外点云质量评估（IGO-PQA）算法。我们提出的算法包括两个主要组件。第一个组件是IGO-PQA生成算法，利用点云数据、对应的RGB环境图像和代理对象的真实标注，为单帧基于激光雷达的点云生成一个总体质量得分。第二个组件是基于转换器的IGO-PQA回归算法，用于无参考的室外点云质量评估。这个回归算法允许在线直接预测IGO-PQA得分，无需图像数据和对象真实标注。我们使用nuScenes和Waymo开放数据集评估了我们提出的算法。IGO-PQA生成算法提供了一致合理的感知质量指数。此外，我们提出的IGO-PQA回归算法在nuScenes数据集上实现了0.86的皮尔逊线性相关系数（PLCC），在Waymo数据集上实现了0.97。

更新时间: 2024-06-25 04:16:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17265v1

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward models; (ii) the potential existence of multiple modes; and (iii) the fact that gradient of, or adjoint solver for, the forward model might not be feasible. While existing Bayesian inference methods meet some of these challenges individually, we propose a framework that tackles all three systematically. Our approach builds upon the Fisher-Rao gradient flow in probability space, yielding a dynamical system for probability densities that converges towards the target distribution at a uniform exponential rate. This rapid convergence is advantageous for the computational burden outlined in (i). We apply Gaussian mixture approximations with operator splitting techniques to simulate the flow numerically; the resulting approximation can capture multiple modes thus addressing (ii). Furthermore, we employ the Kalman methodology to facilitate a derivative-free update of these Gaussian components and their respective weights, addressing the issue in (iii). The proposed methodology results in an efficient derivative-free sampler flexible enough to handle multi-modal distributions: Gaussian Mixture Kalman Inversion (GMKI). The effectiveness of GMKI is demonstrated both theoretically and numerically in several experiments with multimodal target distributions, including proof-of-concept and two-dimensional examples, as well as a large-scale application: recovering the Navier-Stokes initial condition from solution data at positive times.

Updated: 2024-06-25 04:07:22

标题: 高效、多模态和免导数的贝叶斯推断：具有Fisher-Rao梯度流的方法

摘要: 在本文中，我们研究了已知标准化常数的概率分布的高效近似抽样。我们特别关注贝叶斯推断中出现的一个问题类，即在科学和工程应用中的大规模逆问题。我们提出的方法所解决的计算挑战包括：(i) 需要重复评估昂贵的前向模型；(ii) 可能存在多个模式；以及 (iii) 前向模型的梯度或伴随求解器可能不可行。虽然现有的贝叶斯推断方法分别满足了一些这些挑战，但我们提出了一个系统地解决所有三个挑战的框架。我们的方法建立在概率空间中的Fisher-Rao梯度流之上，产生了一个概率密度的动力系统，以均匀指数速率收敛到目标分布。这种快速收敛对于(i)中概述的计算负担是有利的。我们应用高斯混合逼近和操作分裂技术来数值模拟流动；由此产生的逼近可以捕捉到多个模式，从而解决(ii)中的问题。此外，我们采用卡尔曼方法来促进这些高斯分量及其相应权重的无导数更新，以解决(iii)中的问题。我们提出的方法得到了一个高效的无导数采样器，足够灵活以处理多模态分布：高斯混合卡尔曼反演（GMKI）。GMKI的有效性在几个实验中得到了理论上和数值上的验证，包括概念验证和二维示例，以及一个大规模应用：从正时间的解数据中恢复Navier-Stokes初始条件。

更新时间: 2024-06-25 04:07:22

领域: cs.LG,cs.NA,math.DS,math.NA

下载: http://arxiv.org/abs/2406.17263v1

Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup

The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay's real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives. We will publicly release our code and the MAAD dataset after some reviews\footnote{https://github.com/MorinW/AntGMM$\_$Pruning}.

Updated: 2024-06-25 03:53:28

标题: 蚂蚁集团通过高效剪枝和蒸馏实现大型多模态模型压缩

摘要: 在蚂蚁集团内部部署大型多模态模型（LMMs）显著推进了支付、安全和广告等多模态任务，尤其在支付宝的广告审查任务中显著增强。然而，部署如此庞大的模型引入了挑战，尤其是在增加延迟和碳排放方面，这与绿色人工智能的理念相矛盾。本文介绍了一种新颖的多阶段压缩策略，用于我们专有的LLM，AntGMM。我们的方法围绕三个主要方面展开：使用较小的训练样本大小，通过多阶段修剪来解决多级冗余，并引入先进的蒸馏损失设计。在我们的研究中，我们构建了一个数据集，即多模态广告审查数据集（MAAD），从支付宝的实际场景中进行了实验，以验证我们提出的策略的可靠性。此外，我们的策略的有效性在其在2023年9月起为期三个月的支付宝实际多模态广告审查中的运营成功中显而易见。值得注意的是，我们的方法实现了延迟的大幅减少，从700毫秒降至90毫秒，同时保持在线性能只有轻微的性能下降。此外，与直接部署AntGMM相比，我们压缩的模型预计每年可减少约7500万千瓦时的电力消耗，表明我们致力于绿色人工智能倡议。我们将在一些审查后公开发布我们的代码和MAAD数据集。

更新时间: 2024-06-25 03:53:28

领域: cs.AI

下载: http://arxiv.org/abs/2312.05795v2

LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation

In clinical practice, medical image segmentation provides useful information on the contours and dimensions of target organs or tissues, facilitating improved diagnosis, analysis, and treatment. In the past few years, convolutional neural networks (CNNs) and Transformers have dominated this area, but they still suffer from either limited receptive fields or costly long-range modeling. Mamba, a State Space Sequence Model (SSM), recently emerged as a promising paradigm for long-range dependency modeling with linear complexity. In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation. A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers, while maintaining superior efficiency in global modeling compared to self-attention with quadratic complexity. Additionally, we design a novel hierarchical and bidirectional Mamba block to further enhance Mamba's global and neighborhood spatial modeling capability for vision inputs. Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields. Codes are available at https://github.com/wjh892521292/LKM-UNet.

Updated: 2024-06-25 03:37:26

标题: LKM-UNet：用于医学图像分割的大内核视觉曼巴UNet

摘要: 在临床实践中，医学图像分割为目标器官或组织的轮廓和尺寸提供了有用的信息，有助于改善诊断、分析和治疗。在过去几年中，卷积神经网络（CNNs）和Transformer在这一领域占据主导地位，但它们仍然存在有限的感知域或昂贵的远程建模问题。最近出现了一种具有线性复杂性的长程依赖建模的状态空间序列模型（SSM）Mamba，被认为是一种有前途的范式。本文介绍了一种用于医学图像分割的大内核视觉Mamba U型网络，即LKM-UNet。我们LKM-UNet的一个显著特点是其利用大型Mamba内核，在局部空间建模方面优于基于小内核的CNN和Transformer，同时与具有二次复杂性的自注意力相比，在全局建模方面保持卓越的效率。此外，我们设计了一个新颖的分层和双向Mamba块，进一步增强了Mamba对视觉输入的全局和邻域空间建模能力。全面的实验证明了使用大尺寸Mamba内核实现大感知域的可行性和有效性。代码可在https://github.com/wjh892521292/LKM-UNet上找到。

更新时间: 2024-06-25 03:37:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.07332v2

TopoGCL: Topological Graph Contrastive Learning

Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address this limitation by introducing the concepts of topological invariance and extended persistence on graphs to GCL. In particular, we propose a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions. Along with the extended topological layer, we introduce a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees. Our extensive numerical results on biological, chemical, and social interaction graphs show that the new Topological Graph Contrastive Learning (TopoGCL) model delivers significant performance gains in unsupervised graph classification for 11 out of 12 considered datasets and also exhibits robustness under noisy scenarios.

Updated: 2024-06-25 03:35:20

标题: TopoGCL：拓扑图对比学习

摘要: 图对比学习（GCL）最近出现作为一个新概念，允许利用图神经网络（GNNs）的优势，在涉及大量未标记信息的各种应用中学习丰富的表示。然而，现有的GCL方法往往忽视了高阶图结构上的重要潜在信息。我们通过在图上引入拓扑不变性和扩展持久性的概念来解决这一限制。特别是，我们提出了一个新的对比模式，该模式针对从同一图中提取的多个分辨率的潜在形状属性，从而针对两个增强视图的拓扑表示。除了扩展拓扑层，我们还引入了一种新的扩展持久性摘要，即扩展持久性景观（EPL），并推导了其理论稳定性保证。我们在生物、化学和社交互动图上的大量数值结果表明，新的拓扑图对比学习（TopoGCL）模型在12个考虑的数据集中有11个数据集的无监督图分类中取得了显著的性能提升，并且在噪声环境下表现出了稳健性。

更新时间: 2024-06-25 03:35:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17251v1

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

Current trends in audio anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks by learning to identify a variety of spoofing artifacts. This emphasis has primarily focused on the spoof class. Recently, several studies have noted that the distribution of silence differs between the two classes, which can serve as a shortcut. In this paper, we extend class-wise interpretations beyond silence. We employ loss analysis and asymmetric methodologies to move away from traditional attack-focused and result-oriented evaluations towards a deeper examination of model behaviors. Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.

Updated: 2024-06-25 03:24:12

标题: 超越沉默：通过损失和非对称方法在音频反欺诈中进行偏见分析

摘要: 当前音频反欺骗检测研究的趋势是努力改进模型在学习识别各种欺骗痕迹时跨未知攻击的泛化能力。这一重点主要集中在欺骗类别上。最近，几项研究指出沉默分布在两个类别之间存在差异，可以作为一种捷径。在本文中，我们将类别解释扩展到沉默之外。我们采用损失分析和非对称方法，摆脱传统攻击集中和结果导向的评估，转向对模型行为的更深入研究。我们的调查突显了两个类别之间训练动态的显著差异，强调未来研究需要重点关注对真实类别的稳健建模。

更新时间: 2024-06-25 03:24:12

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.17246v1

Unlocking Continual Learning Abilities in Language Models

Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}.

Updated: 2024-06-25 03:24:06

标题: 解锁语言模型中的持续学习能力

摘要: 语言模型（LMs）展现出令人印象深刻的性能和泛化能力。然而，LMs在持续学习（CL）中面临着灾难性遗忘的持久挑战，这削弱了它们在长期可持续性上的表现。现有方法通常通过将旧任务数据或任务特定的归纳偏差纳入LMs中来解决这个问题。然而，旧数据和准确的任务信息通常难以获取或成本高昂，这妨碍了当前LMs的持续学习方法的可用性。为了解决这一限制，我们引入了$\textbf{MIGU}$（$\textbf{M}$基于幅度的$\textbf{G}$radient $\textbf{U}$pdating for continual learning），这是一种无需重复练习和无需任务标签的方法，仅通过更新LMs线性层中输出幅度较大的模型参数。MIGU基于我们的观察，即LMs线性层输出的L1归一化幅度分布在LM模型处理不同任务数据时是不同的。通过在梯度更新过程中施加这一简单约束，我们可以利用LMs的固有行为，从而释放其内在的CL能力。我们的实验表明，MIGU通用适用于所有三种LM架构（T5、RoBERTa和Llama2），在四个CL基准测试中在持续微调和持续预训练设置下提供最新技术或与之相媲美的性能。例如，在一个15个任务的CL基准测试中，MIGU相比传统的参数高效微调基线带来了15.2%的平均准确度提升。MIGU还可以无缝集成到所有三种现有的CL类型中，以进一步提升性能。代码可在\href{https://github.com/wenyudu/MIGU}{此链接}获取。

更新时间: 2024-06-25 03:24:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.17245v1

When Large Language Models Meet Optical Networks: Paving the Way for Automation

Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in specialized fields such as optical networks. In this study, we propose a framework of LLM-empowered optical networks, facilitating intelligent control of the physical layer and efficient interaction with the application layer through an LLM-driven agent (AI-Agent) deployed in the control layer. The AI-Agent can leverage external tools and extract domain knowledge from a comprehensive resource library specifically established for optical networks. This is achieved through user input and well-crafted prompts, enabling the generation of control instructions and result representations for autonomous operation and maintenance in optical networks. To improve LLM's capability in professional fields and stimulate its potential on complex tasks, the details of performing prompt engineering, establishing domain knowledge library, and implementing complex tasks are illustrated in this study. Moreover, the proposed framework is verified on two typical tasks: network alarm analysis and network performance optimization. The good response accuracies and sematic similarities of 2,400 test situations exhibit the great potential of LLM in optical networks.

Updated: 2024-06-25 03:23:00

标题: 当大型语言模型遇上光网络：为自动化铺平道路

摘要: 自从GPT问世以来，大型语言模型（LLMs）已经在各个领域带来了革命性的进步。作为一种优越的自然语言处理（NLP）技术，LLMs在许多领域始终取得了最先进的性能。然而，LLMs被认为是用于NLP任务的通用模型，当应用于专业领域如光网络中的复杂任务时可能会遇到挑战。在本研究中，我们提出了一个基于LLM的光网络框架，通过在控制层部署一个LLM驱动的智能代理（AI-Agent），实现对物理层的智能控制和与应用层的高效交互。AI-Agent可以利用外部工具，并从专门为光网络建立的全面资源库中提取领域知识。通过用户输入和精心设计的提示，实现了生成控制指令和结果表示，用于光网络的自主运行和维护。为了提高LLM在专业领域的能力，并激发其在复杂任务中的潜力，本研究详细说明了执行提示工程、建立领域知识库和实现复杂任务的细节。此外，提出的框架在两个典型任务上进行了验证：网络告警分析和网络性能优化。2,400个测试情况的良好响应准确性和语义相似性展示了LLM在光网络中的巨大潜力。

更新时间: 2024-06-25 03:23:00

领域: cs.NI,cs.AI,cs.CL,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.17441v2

Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis

Health monitoring systems have revolutionized modern healthcare by enabling the continuous capture of physiological and behavioral data, essential for preventive measures and early health intervention. While integrating this data with Large Language Models (LLMs) has shown promise in delivering interactive health advice, traditional methods like Retrieval-Augmented Generation (RAG) and fine-tuning often fail to fully utilize the complex, multi-dimensional, and temporally relevant data from wearable devices. These conventional approaches typically provide limited actionable and personalized health insights due to their inadequate capacity to dynamically integrate and interpret diverse health data streams. In response, this paper introduces a graph-augmented LLM framework designed to significantly enhance the personalization and clarity of health insights. Utilizing a hierarchical graph structure, the framework captures inter and intra-patient relationships, enriching LLM prompts with dynamic feature importance scores derived from a Random Forest Model. The effectiveness of this approach is demonstrated through a sleep analysis case study involving 20 college students during the COVID-19 lockdown, highlighting the potential of our model to generate actionable and personalized health insights efficiently. We leverage another LLM to evaluate the insights for relevance, comprehensiveness, actionability, and personalization, addressing the critical need for models that process and interpret complex health data effectively. Our findings show that augmenting prompts with our framework yields significant improvements in all 4 criteria. Through our framework, we can elicit well-crafted, more thoughtful responses tailored to a specific patient.

Updated: 2024-06-25 03:17:40

标题: 图形增强LLM用于个性化健康洞见：睡眠分析案例研究

摘要: 健康监测系统通过实现生理和行为数据的持续捕获，为预防措施和早期健康干预提供了基础，从而彻底改变了现代医疗保健。虽然将这些数据与大型语言模型（LLMs）集成在一起在提供互动健康建议方面显示出了潜力，但传统方法如检索增强生成（RAG）和微调通常未能充分利用可穿戴设备的复杂、多维和与时间相关的数据。这些传统方法通常由于不足以动态整合和解释各种健康数据流而提供有限的可操作和个性化健康见解。因此，本文介绍了一种图增强的LLM框架，旨在显著提升个性化和清晰度的健康见解。该框架利用层次图结构捕捉患者间和患者内部关系，通过来自随机森林模型的动态特征重要性得分丰富LLM提示。通过涉及20名大学生在COVID-19封锁期间的睡眠分析案例研究，展示了这种方法的有效性，突显了我们的模型高效生成可操作和个性化的健康见解的潜力。我们利用另一个LLM评估见解的相关性、全面性、可操作性和个性化程度，解决了处理和解释复杂健康数据的模型的重要需求。我们的研究结果表明，通过我们的框架增强提示在所有4个标准上都取得了显著改进。通过我们的框架，我们可以引出经过精心设计、更加深思熟虑的针对特定患者的回应。

更新时间: 2024-06-25 03:17:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16252v2

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests.

Updated: 2024-06-25 03:17:22

标题: 快速准确诊断急性主动脉病综合征的非对比CT：一项大规模、回顾性、多中心和基于人工智能的研究

摘要: 胸痛症状在急诊室（ED）中高度普遍，其中急性主动脉综合征（AAS）是一种具有高致死率的心血管紧急情况，特别是当未及时和准确地治疗时。然而，目前急诊室的分诊实践可能导致大约一半的AAS患者最初被漏诊或误诊为其他急性胸痛症状。随后，这些AAS患者将接受临床不准确或次优的鉴别诊断。幸运的是，即使在这些次优的协议下，几乎所有这些患者在鉴别诊断的早期阶段都接受了覆盖主动脉解剖的非对比CT检查。在这项研究中，我们开发了一个使用非对比CT的人工智能模型（DeepAAS），该模型在识别AAS方面非常准确，并提供可解释的结果以协助临床决策。性能在两个主要阶段进行了评估：一个多中心回顾性研究（n = 20,750）和一个在现实世界急诊场景中的探索（n = 137,525）。在多中心队列中，DeepAAS实现了平均接收者操作特征曲线下面积为0.958（95% CI 0.950-0.967）。在现实世界队列中，DeepAAS检测到109名最初怀疑错误的AAS患者，平均敏感性为92.6%（95% CI 76.2%-97.5%），平均特异性为99.2%（95% CI 99.1%-99.3%）。我们的AI模型在非对比CT上表现良好，能够在鉴别诊断工作流程的所有适用早期阶段有效降低总体漏诊和误诊率，将最初怀疑错误患者的诊断时间从平均681.8（74-11,820）分钟缩短到68.5（23-195）分钟。DeepAAS能够有效地填补当前临床工作流程中的空白，无需额外检查。

更新时间: 2024-06-25 03:17:22

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.15222v2

Nakamoto Consensus under Bounded Processing Capacity

For Nakamoto's longest-chain consensus protocol, whose proof-of-work (PoW) and proof-of-stake (PoS) variants power major blockchains such as Bitcoin and Cardano, we revisit the classic problem of the security-performance tradeoff: Given a network of nodes with finite communication- and computation-resources, against what fraction of adversary power is Nakamoto consensus (NC) secure for a given block production rate? State-of-the-art analyses of NC fail to answer this question, because their bounded-delay model does not capture the rate limits to nodes' processing of blocks, which cause congestion when blocks are released in quick succession. We develop a new analysis technique to prove a refined security-performance tradeoff for PoW NC in a bounded-capacity model. In this model, we show that, in contrast to the classic bounded-delay model, Nakamoto's private attack is no longer the worst attack, and a new attack we call the teasing strategy, that exploits congestion, is strictly worse. In PoS, equivocating blocks can exacerbate congestion, making traditional PoS NC insecure except at very low block production rates. To counter such equivocation spamming, we present a variant of PoS NC we call Blanking NC (BlaNC), which achieves the same resilience as PoW NC.

Updated: 2024-06-25 03:16:28

标题: 有界处理能力下的中本聪共识

摘要: 对于Nakamoto的最长链共识协议，其工作量证明（PoW）和权益证明（PoS）变体支持主要的区块链，如比特币和卡尔达诺，我们重新审视了安全性和性能之间的经典问题：在具有有限通信和计算资源的节点网络中，对于给定的区块生产速率，Nakamoto共识（NC）对多少比例的对手能力是安全的？最先进的NC分析未能回答这个问题，因为它们的有界延迟模型未能捕捉节点处理区块的速率限制，这会在区块快速释放时引起拥塞。我们开发了一种新的分析技术，在有界容量模型中证明了PoW NC的一个优化的安全性和性能权衡。在这个模型中，我们展示，与经典的有界延迟模型相反，Nakamoto的私有攻击不再是最糟糕的攻击，而一种我们称为挑逗策略的新攻击，利用拥塞，是严格更糟糕的。在PoS中，模棱两可的区块可能加剧拥塞，使传统的PoS NC在非常低的区块生产速率下变得不安全。为了对抗这种模棱两可的垃圾信息，我们提出了一种我们称为Blanking NC（BlaNC）的PoS NC变体，它实现了与PoW NC相同的弹性。

更新时间: 2024-06-25 03:16:28

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2303.09113v4

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, offer one convenient solution for deciding when to refine. Process Based Reward Models (\textbf{PRMs}), trained to predict correctness of intermediate steps, can then be used to indicate where to refine. But they are expensive to train, requiring extensive human annotations. In this paper, we propose Stepwise ORMs (\textbf{SORMs}) which are trained, only on synthetic data, to approximate the expected future reward of the optimal policy or $V^{\star}$. More specifically, SORMs are trained to predict the correctness of the final answer when sampling the current policy many times (rather than only once as in the case of ORMs). Our experiments show that SORMs can more accurately detect incorrect reasoning steps compared to ORMs, thus improving downstream accuracy when doing refinements. We then train \textit{global} refinement models, which take only the question and a draft solution as input and predict a corrected solution, and \textit{local} refinement models which also take as input a critique indicating the location of the first reasoning error. We generate training data for both models synthetically by reusing data used to train the SORM. We find combining global and local refinements, using the ORM as a reranker, significantly outperforms either one individually, as well as a best of three sample baseline. With this strategy we can improve the accuracy of a LLaMA-2 13B model (already fine-tuned with RL) on GSM8K from 53\% to 65\% when greedily sampled.

Updated: 2024-06-25 03:14:10

标题: GLoRe：何时、何地以及如何通过全局和局部细化改进LLM推理

摘要: 最先进的语言模型在数学、科学或编码任务上展现出令人印象深刻的推理精细化能力。然而，最近的研究表明，即使是最好的模型也很难在没有外部反馈的情况下确定何时以及在哪里进行精细化。基于结果的奖励模型（ORMs），训练以预测最终答案的正确性，指示何时进行精细化，提供了一种方便的解决方案来决定何时进行精细化。基于过程的奖励模型（PRMs），训练以预测中间步骤的正确性，然后可以用来指示在哪里进行精细化。但它们的训练成本高昂，需要大量的人工标注。在本文中，我们提出了逐步ORMs（SORMs），它们仅在合成数据上训练，以逼近最优策略或$V^{\star}$的预期未来奖励。更具体地说，SORMs被训练以在多次采样当前策略时预测最终答案的正确性（而不是像ORMs那样只采样一次）。我们的实验表明，与ORMs相比，SORMs可以更准确地检测出错误的推理步骤，从而在进行精细化时提高下游准确性。然后，我们训练了全局精细化模型，它仅接受问题和草稿解决方案作为输入，并预测一个校正后的解决方案，以及本地精细化模型，它还接受指示第一个推理错误位置的批评作为输入。我们通过重新使用用于训练SORM的数据来合成这两个模型的训练数据。我们发现，将全局和局部精细化组合，在使用ORM作为重新排序器时，明显优于任何一个单独的模型，以及三次采样基准的最佳表现。通过这种策略，我们可以将已经使用RL进行微调的LLaMA-2 13B模型在GSM8K上的准确性从53%提高到65%。

更新时间: 2024-06-25 03:14:10

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.10963v2

Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples

The challenge of limited availability of data for training in machine learning arises in many applications and the impact on performance and generalization is serious. Traditional data augmentation methods aim to enhance training with a moderately sufficient data set. Generative models like Generative Adversarial Networks (GANs) often face problematic convergence when generating significant and diverse data samples. Diffusion models, though effective, still struggle with high computational cost and long training times. This paper introduces an innovative Expansive Synthesis model that generates large-scale, high-fidelity datasets from minimal samples. The proposed approach exploits expander graph mappings and feature interpolation to synthesize expanded datasets while preserving the intrinsic data distribution and feature structural relationships. The rationale of the model is rooted in the non-linear property of neural networks' latent space and in its capture by a Koopman operator to yield a linear space of features to facilitate the construction of larger and enriched consistent datasets starting with a much smaller dataset. This process is optimized by an autoencoder architecture enhanced with self-attention layers and further refined for distributional consistency by optimal transport. We validate our Expansive Synthesis by training classifiers on the generated datasets and comparing their performance to classifiers trained on larger, original datasets. Experimental results demonstrate that classifiers trained on synthesized data achieve performance metrics on par with those trained on full-scale datasets, showcasing the model's potential to effectively augment training data. This work represents a significant advancement in data generation, offering a robust solution to data scarcity and paving the way for enhanced data availability in machine learning applications.

Updated: 2024-06-25 02:59:02

标题: 广泛综合：从最小样本生成大规模数据集

摘要: 机器学习中训练数据有限的挑战在许多应用中出现，对性能和泛化的影响是严重的。传统的数据增强方法旨在通过一个适度充足的数据集增强训练。生成模型如生成对抗网络（GANs）在生成大量和多样化数据样本时常面临收敛困难。虽然扩散模型是有效的，但仍面临高计算成本和长训练时间的困扰。本文介绍了一种创新的扩张合成模型，可以从最少的样本中生成大规模、高保真度的数据集。该方法利用扩展图映射和特征插值来合成扩展数据集，同时保持内在数据分布和特征结构关系。该模型的基本原理根植于神经网络潜在空间的非线性特性，并通过 Koopman 操作符捕获，以产生线性特征空间，以便利用较小的数据集构建更大、更丰富的一致数据集。这一过程通过自注意力层增强的自编码器结构进行优化，并通过最优输运进一步优化分布的一致性。我们通过在生成的数据集上训练分类器并将其性能与在较大的原始数据集上训练的分类器进行比较来验证我们的扩张合成模型。实验结果表明，在合成数据上训练的分类器的性能指标与在全尺度数据集上训练的分类器相当，展示了该模型有效增强训练数据的潜力。这项工作代表了数据生成领域的重大进展，为解决数据稀缺问题提供了稳健的解决方案，并为机器学习应用中数据的增强可用性铺平了道路。

更新时间: 2024-06-25 02:59:02

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.17238v1

Multi-class Temporal Logic Neural Networks

Time-series data can represent the behaviors of autonomous systems, such as drones and self-driving cars. The task of binary and multi-class classification for time-series data has become a prominent area of research. Neural networks represent a popular approach to classifying data; However, they lack interpretability, which poses a significant challenge in extracting meaningful information from them. Signal Temporal Logic (STL) is a formalism that describes the properties of timed behaviors. We propose a method that combines all of the above: neural networks that represent STL specifications for multi-class classification of time-series data. We offer two key contributions: 1) We introduce a notion of margin for multi-class classification, and 2) we introduce STL-based attributes for enhancing the interpretability of the results. We evaluate our method on two datasets and compare it with state-of-the-art baselines.

Updated: 2024-06-25 02:58:06

标题: 多类时间逻辑神经网络

摘要: 时间序列数据可以代表自主系统的行为，例如无人机和自动驾驶汽车。对于时间序列数据的二元和多类分类任务已经成为研究的一个突出领域。神经网络代表了一种常用的分类数据的方法；然而，它们缺乏可解释性，这在从中提取有意义信息方面构成了一个重要挑战。信号时序逻辑（STL）是一种描述定时行为属性的形式化。我们提出了一种结合以上所有方法的方法：利用代表多类分类时间序列数据的STL规范的神经网络。我们提出了两个关键贡献：1）我们引入了一个多类分类的边界的概念，2）我们引入了基于STL的属性来增强结果的可解释性。我们在两个数据集上评估了我们的方法，并将其与最先进的基线进行了比较。

更新时间: 2024-06-25 02:58:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.12397v2

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.

Updated: 2024-06-25 02:55:44

标题: 谁先行动？优化带有多个机器人的斯达克尔贝格博弈中的行动顺序

摘要: 我们考虑多智能体空间导航问题，即计算社会最优游戏顺序，即智能体做出决策的顺序，并在N个玩家Stackelberg轨迹游戏中关联的均衡。我们将这个问题建模为一个混合整数优化问题，涉及所有可能与游戏顺序排列相关的Stackelberg游戏空间。为了解决这个问题，我们引入了Branch and Play（B&P），这是一个高效且精确的算法，可以收敛到社会最优游戏顺序及其Stackelberg均衡。作为B&P的子程序，我们利用并扩展了顺序轨迹规划，即一种流行的多智能体控制方法，以可扩展地计算任何给定游戏顺序的有效局部Stackelberg均衡。我们展示了B&P在协调空中交通管制、群体形成和交付车队方面的实际效用。我们发现B&P始终优于各种基准线，并计算出社会最优均衡。

更新时间: 2024-06-25 02:55:44

领域: cs.RO,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2402.09246v4

Task-Agnostic Federated Learning

In the realm of medical imaging, leveraging large-scale datasets from various institutions is crucial for developing precise deep learning models, yet privacy concerns frequently impede data sharing. federated learning (FL) emerges as a prominent solution for preserving privacy while facilitating collaborative learning. However, its application in real-world scenarios faces several obstacles, such as task & data heterogeneity, label scarcity, non-identically distributed (non-IID) data, computational vaiation, etc. In real-world, medical institutions may not want to disclose their tasks to FL server and generalization challenge of out-of-network institutions with un-seen task want to join the on-going federated system. This study address task-agnostic and generalization problem on un-seen tasks by adapting self-supervised FL framework. Utilizing Vision Transformer (ViT) as consensus feature encoder for self-supervised pre-training, no initial labels required, the framework enabling effective representation learning across diverse datasets and tasks. Our extensive evaluations, using various real-world non-IID medical imaging datasets, validate our approach's efficacy, retaining 90\% of F1 accuracy with only 5\% of the training data typically required for centralized approaches and exhibiting superior adaptability to out-of-distribution task. The result indicate that federated learning architecture can be a potential approach toward multi-task foundation modeling.

Updated: 2024-06-25 02:53:37

标题: 任务不可知的联邦学习

摘要: 在医学成像领域，利用来自各个机构的大规模数据集对开发精确的深度学习模型至关重要，然而隐私问题经常阻碍数据共享。联邦学习（FL）作为一个突出的解决方案出现，既保护隐私又促进协作学习。然而，在现实场景中，其应用面临几个障碍，如任务和数据的异质性、标签稀缺性、非独立分布（non-IID）数据、计算变化等。在现实世界中，医疗机构可能不愿向FL服务器披露其任务，并且想要加入正在进行的联邦系统的未见任务的跨网络机构面临泛化挑战。本研究通过适应自监督FL框架解决了未见任务的任务无关性和泛化问题。利用Vision Transformer（ViT）作为自监督预训练的共识特征编码器，无需初始标签，该框架实现了跨多样数据集和任务的有效表示学习。我们使用各种现实世界非独立分布的医学成像数据集进行广泛评估，验证了我们方法的有效性，仅需通常集中式方法所需训练数据的5％即可保留90％的F1准确率，并展现了对未见任务的超强适应能力。结果表明，联邦学习架构可以成为多任务基础建模的潜在方法。

更新时间: 2024-06-25 02:53:37

领域: cs.CV,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.17235v1

Revisiting Active Learning in the Era of Vision Foundation Models

Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks. Given these properties, they are a natural fit for active learning (AL), which aims to maximize labeling efficiency. However, the full potential of foundation models has not been explored in the context of AL, specifically in the low-budget regime. In this work, we evaluate how foundation models influence three critical components of effective AL, namely, 1) initial labeled pool selection, 2) ensuring diverse sampling, and 3) the trade-off between representative and uncertainty sampling. We systematically study how the robust representations of foundation models (DINOv2, OpenCLIP) challenge existing findings in active learning. Our observations inform the principled construction of a new simple and elegant AL strategy that balances uncertainty estimated via dropout with sample diversity. We extensively test our strategy on many challenging image classification benchmarks, including natural images as well as out-of-domain biomedical images that are relatively understudied in the AL literature. We also provide a highly performant and efficient implementation of modern AL strategies (including our method) at https://github.com/sanketx/AL-foundation-models.

Updated: 2024-06-25 02:43:06

标题: 在视觉基础模型时代重新审视主动学习

摘要: 基于基础视觉或视觉语言模型在大规模未标记或嘈杂数据上训练，学习出能在各种任务上取得惊人的零标记或少标记性能的稳健表示。考虑到这些特性，它们自然适用于主动学习（AL），旨在最大化标记效率。然而，在低预算范围内，基础模型的全部潜力尚未在AL的背景下得到探索。在这项工作中，我们评估了基础模型如何影响有效AL的三个关键组成部分，即1）初始标记池选择，2）确保样本多样性，以及3）代表性和不确定性采样之间的权衡。我们系统地研究了基础模型（DINOv2、OpenCLIP）的稳健表示如何挑战现有的主动学习发现。我们的观察为基于丢失估计不确定性和样本多样性平衡的新简单而优雅的AL策略的原则构建提供了信息。我们在许多具有挑战性的图像分类基准测试中广泛测试我们的策略，包括自然图像以及在AL文献中相对不常研究的跨领域生物医学图像。我们还在https://github.com/sanketx/AL-foundation-models上提供了现代AL策略（包括我们的方法）的高性能和高效实现。

更新时间: 2024-06-25 02:43:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.14555v2

Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better utilize the small-sized datasets that are frequently encountered in this task. Our study demonstrates notable performance improvements by utilizing SSL embeddings compared to conventional speech features. We compare various types of SSL pretrained models to elucidate the type of speech information (semantic, speaker, or prosodic) that contributes the most in identifying different symptoms. Additionally, we evaluate the impact of combining multiple SSL embeddings on performance. Furthermore, we show the significance of multi-task learning for identifying depressive symptoms effectively.

Updated: 2024-06-25 02:35:37

标题: 无需监督的嵌入用于检测抑郁症的个体症状

摘要: 抑郁症是一种普遍存在的精神健康障碍，影响全球数百万人，需要可靠的评估系统。与以往仅专注于检测抑郁症或预测其严重程度的研究不同，我们的工作通过语音输入识别抑郁症的各种症状，并预测其严重程度。我们利用基于自监督学习（SSL）的语音模型更好地利用在这一任务中经常遇到的小型数据集。我们的研究表明，与传统的语音特征相比，利用SSL嵌入能够取得显著的性能改进。我们比较了各种类型的SSL预训练模型，以阐明哪种类型的语音信息（语义、说话者或韵律）对识别不同症状的贡献最大。此外，我们评估了结合多个SSL嵌入对性能的影响。此外，我们展示了多任务学习对有效识别抑郁症症状的重要性。

更新时间: 2024-06-25 02:35:37

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.17229v1

Greedy equivalence search for nonparametric graphical models

One of the hallmark achievements of the theory of graphical models and Bayesian model selection is the celebrated greedy equivalence search (GES) algorithm due to Chickering and Meek. GES is known to consistently estimate the structure of directed acyclic graph (DAG) models in various special cases including Gaussian and discrete models, which are in particular curved exponential families. A general theory that covers general nonparametric DAG models, however, is missing. Here, we establish the consistency of greedy equivalence search for general families of DAG models that satisfy smoothness conditions on the Markov factorization, and hence may not be curved exponential families, or even parametric. The proof leverages recent advances in nonparametric Bayes to construct a test for comparing misspecified DAG models that avoids arguments based on the Laplace approximation. Nonetheless, when the Laplace approximation is valid and a consistent scoring function exists, we recover the classical result. As a result, we obtain a general consistency theorem for GES applied to general DAG models.

Updated: 2024-06-25 02:31:32

标题: 贪婪等效搜索用于非参数图模型

摘要: 图形模型和贝叶斯模型选择理论的一个标志性成就是由Chickering和Meek提出的著名贪婪等价搜索（GES）算法。GES被认为能够在各种特殊情况下一致估计有向无环图（DAG）模型的结构，包括高斯和离散模型，特别是曲线指数族。然而，目前缺乏涵盖一般非参数DAG模型的一般理论。在这里，我们建立了对满足马尔可夫因子化上平滑条件的一般DAG模型族的贪婪等价搜索的一致性，因此可能不是曲线指数族，甚至不是参数化的。证明利用了最近在非参数Bayes中的进展，构建了一个用于比较错配DAG模型的检验，避免了基于拉普拉斯近似的论证。然而，当拉普拉斯近似有效且存在一致的评分函数时，我们可以恢复经典结果。因此，我们得到了一个适用于一般DAG模型的GES的一般一致性定理。

更新时间: 2024-06-25 02:31:32

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17228v1

Minimax Optimal Estimation of Stability Under Distribution Shift

The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.

Updated: 2024-06-25 02:21:54

标题: 稳定性在分布转移下的最小化最优估计

摘要: 决策策略和预测模型的性能在应用于训练时未见过的环境时通常会恶化。为了确保可靠运行，我们分析系统在分布转移下的稳定性，即定义为导致系统性能恶化超过允许阈值的潜在环境变化的最小值。与标准尾风险度量和分布鲁棒损失相比，这种稳定性度量是以更直观的数量——可接受性能降级水平——来定义的，而不需要规定可能的分布转移的幅度。我们开发了稳定性的极小极大估计器，并分析了其收敛速度，表现出基本的相位移行为。我们对极小极大收敛速度的表征显示，评估稳定性对大量性能降级会产生统计成本。在实证上，我们通过使用稳定性框架来比较对分布转移具有重要性的问题上的系统设计，展示了我们稳定性框架的实际效用。

更新时间: 2024-06-25 02:21:54

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2212.06338v2

OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks. The quadratic self-attention mechanism effectively alleviates the shortcomings of selective state space in handling long-term dependencies of any element in the sequence. We propose a position information injection method that connects the selective state space model with the quadratic attention, and integrates these two architectures with hybrid experts with cross-sharing domains, so that we can enjoy the advantages of both. We design a new architecture with a more biomimetic idea: Observer-Thinker-Conceiver-Expresser (OTCE), which can compete with well-known medium-scale open-source language models on a small scale in language modeling tasks.

Updated: 2024-06-25 02:20:14

标题: OTCE：混合SSM和注意力与跨领域专家混合构建观察者-思考者-构想者-表达者

摘要: 最近的研究表明，将Mamba与具有选择性状态空间和二次自注意机制的Transformer架构结合起来，在语言建模任务中胜过仅使用Mamba或Transformer架构。二次自注意机制有效地缓解了选择性状态空间在处理序列中任何元素的长期依赖性方面的缺点。我们提出了一种位置信息注入方法，将选择性状态空间模型与二次注意力连接起来，并将这两种架构与具有交叉共享领域的混合专家集成在一起，以便我们可以享受两者的优势。我们设计了一个更具生物启发性思想的新架构：Observer-Thinker-Conceiver-Expresser（OTCE），可以在语言建模任务的小规模中与众所周知的中等规模开源语言模型竞争。

更新时间: 2024-06-25 02:20:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16495v2

Accurately Classifying Out-Of-Distribution Data in Facial Recognition

Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data ("out-of-distribution data") which is different from data in the training distribution("in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. We also tested the datasets in reverse order to see whether a more fair dataset with balanced features has an effect on the model's accuracy.

Updated: 2024-06-25 02:20:06

标题: 准确分类面部识别中的外部数据

摘要: 标准分类理论假设测试集和训练集中的图像分布相同。不幸的是，现实生活中通常会出现未见过的数据（“离群数据”），这些数据与训练分布中的数据不同（“在分布内”）。这个问题在社会公正问题中最为突出，因为来自少数群体的数据可能会出现在测试数据中，而在训练数据中并不代表相等比例。这可能导致模型返回自信但错误的决策和预测。我们感兴趣的问题是：当神经网络同时在多个在分布数据集上进行训练时，能否改善对离群数据中的面部图像的表现？我们通过引入Outlier Exposure模型来解决这个问题，并研究当其他面部图像数据集被应用时，模型的表现如何改变。我们观察到通过应用Outlier Exposure，引入可训练的权重参数以增加机器对离群图像的重视，并重新调整不同类别标签的重要性，可以提高模型的准确性和其他指标。我们还尝试通过对图像进行排序并通过图像特征确定离群值，看看这是否比通过平均像素值排序对指标产生更大影响。我们的目标是使模型不仅更准确，而且更公平，通过扫描更广泛的图像范围。我们还测试了数据集的逆序，以查看一个具有平衡特征的更公平数据集是否对模型的准确性产生影响。

更新时间: 2024-06-25 02:20:06

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.03876v3

Large Language Models are Interpretable Learners

The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge this gap. In the proposed LLM-based Symbolic Programs (LSPs), the pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. Symbolic programs then integrate these modules into an interpretable decision rule. To train LSPs, we develop a divide-and-conquer approach to incrementally build the program from scratch, where the learning process of each step is guided by LLMs. To evaluate the effectiveness of LSPs in extracting interpretable and accurate knowledge from data, we introduce IL-Bench, a collection of diverse tasks, including both synthetic and real-world scenarios across different modalities. Empirical results demonstrate LSP's superior performance compared to traditional neurosymbolic programs and vanilla automatic prompt tuning methods. Moreover, as the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable), and other LLMs, and generalizes well to out-of-distribution samples.

Updated: 2024-06-25 02:18:15

标题: 大型语言模型是可解释的学习者

摘要: 在构建面向人类的分类和决策预测模型时，表达能力和可解释性之间的权衡仍然是一个核心挑战。尽管符号规则提供可解释性，但它们通常缺乏表达能力，而神经网络在性能上表现出色，但被认为是黑匣子。在本文中，我们展示了大型语言模型（LLMs）和符号程序的结合可以弥合这一差距。在提出的基于LLM的符号程序（LSPs）中，预训练的LLM通过自然语言提示提供了一个大量的可解释模块，可以将原始输入转化为自然语言概念。然后，符号程序将这些模块整合到一个可解释的决策规则中。为了训练LSPs，我们开发了一个分而治之的方法，逐步从头开始构建程序，其中每个步骤的学习过程都受到LLMs的指导。为了评估LSPs从数据中提取可解释和准确知识的效果，我们引入了IL-Bench，一个包含各种任务的集合，包括不同模态的合成和真实场景。实证结果表明，与传统的神经符号程序和普通的自动提示调整方法相比，LSP表现出更优越的性能。此外，由于LSP学习的知识是自然语言描述和符号规则的结合，因此易于传输给人类（可解释）和其他LLMs，并且对分布外样本具有良好的泛化能力。

更新时间: 2024-06-25 02:18:15

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.SC,68T05

下载: http://arxiv.org/abs/2406.17224v1

I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors

Deep neural networks (DNNs) have revolutionized the field of computer vision like object detection with their unparalleled performance. However, existing research has shown that DNNs are vulnerable to adversarial attacks. In the physical world, an adversary could exploit adversarial patches to implement a Hiding Attack (HA) which patches the target object to make it disappear from the detector, and an Appearing Attack (AA) which fools the detector into misclassifying the patch as a specific object. Recently, many defense methods for detectors have been proposed to mitigate the potential threats of adversarial patches. However, such methods still have limitations in generalization, robustness and efficiency. Most defenses are only effective against the HA, leaving the detector vulnerable to the AA. In this paper, we propose \textit{NutNet}, an innovative model for detecting adversarial patches, with high generalization, robustness and efficiency. With experiments for six detectors including YOLOv2-v4, SSD, Faster RCNN and DETR on both digital and physical domains, the results show that our proposed method can effectively defend against both the HA and AA, with only 0.4\% sacrifice of the clean performance. We compare NutNet with four baseline defense methods for detectors, and our method exhibits an average defense performance that is over 2.4 times and 4.7 times higher than existing approaches for HA and AA, respectively. In addition, NutNet only increases the inference time by 8\%, which can meet the real-time requirements of the detection systems. Demos of NutNet are available at: \url{https://sites.google.com/view/nutnet}.

Updated: 2024-06-25 02:11:46

标题: 我不认识你，但我可以抓到你：针对物体检测器的多样敌对贴片的实时防御

摘要: 深度神经网络（DNNs）已经彻底改变了计算机视觉领域，如目标检测，在性能上表现出色。然而，现有研究显示DNNs容易受到对抗性攻击的影响。在现实世界中，对手可以利用对抗性贴纸实施隐藏攻击（HA），将贴纸覆盖在目标物体上，使其从检测器中消失，还可以执行出现攻击（AA），欺骗检测器将贴纸误分类为特定物体。最近，许多用于检测器的防御方法已被提出，以减轻对抗性贴纸的潜在威胁。然而，这些方法仍然存在在泛化、稳健性和效率方面的局限性。大多数防御方法只对HA有效，使检测器容易受到AA攻击。在本文中，我们提出了一种名为NutNet的创新模型，用于检测对抗性贴纸，具有高泛化性、稳健性和效率性。通过对包括YOLOv2-v4、SSD、Faster RCNN和DETR在数字和物理领域上的六个检测器进行实验，结果显示我们提出的方法可以有效抵御HA和AA，仅损失0.4%的干净性能。我们将NutNet与四种基准检测器防御方法进行比较，我们的方法在HA和AA方面的平均防御性能分别比现有方法高出2.4倍和4.7倍。此外，NutNet仅将推理时间增加了8%，可以满足检测系统的实时要求。NutNet的演示可在以下网址找到：\url{https://sites.google.com/view/nutnet}。

更新时间: 2024-06-25 02:11:46

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.10285v2

Machine Unlearning Fails to Remove Data Poisoning Attacks

We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings (e.g., alleviating membership inference attacks), they fail to remove the effects of data poisoning, across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, is required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned datapoints without having to retrain, our work suggests that these methods are not yet "ready for prime time", and currently provide limited benefit over retraining.

Updated: 2024-06-25 02:05:29

标题: 机器反学习无法消除数据中毒攻击

摘要: 我们重新审视了几种适用于大规模深度学习的近似机器遗忘的实用方法的有效性。除了遵守数据删除请求外，被引用最多的一种机器遗忘方法的潜在应用是消除对有毒数据的训练效果。我们通过实验证明，虽然现有的遗忘方法在许多评估设置中已被证明是有效的（例如，缓解成员推理攻击），但它们无法消除各种类型的数据污染攻击（包括不加选择的、有针对性的和新引入的高斯数据污染攻击）和模型（图像分类器和LLM）对数据的影响；即使获得了相对较大的计算预算。为了准确描述遗忘方法的有效性，我们引入了基于数据污染的新评估指标。我们的结果表明，需要更广泛的评估视角，以避免对深度学习机器遗忘程序产生虚假的信心，而不提供可证明的保证。此外，虽然遗忘方法显示出一些迹象表明它们可以有效地消除受污染的数据点而无需重新训练，但我们的工作表明，这些方法尚未"准备就绪"，目前与重新训练相比提供的好处有限。

更新时间: 2024-06-25 02:05:29

领域: cs.LG,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2406.17216v1

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, and geographical information. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy. First, we pretrain a masked language model on the DNA barcode sequences of the BIOSCAN-5M dataset, and demonstrate the impact of using this large reference library on species- and genus-level classification performance. Second, we propose a zero-shot transfer learning task applied to images and DNA barcodes to cluster feature embeddings obtained from self-supervised learning, to investigate whether meaningful clusters can be derived from these representation embeddings. Third, we benchmark multi-modality by performing contrastive learning on DNA barcodes, image data, and taxonomic information. This yields a general shared embedding space enabling taxonomic classification using multiple types of information and modalities. The code repository of the BIOSCAN-5M Insect dataset is available at https://github.com/zahrag/BIOSCAN-5M.

Updated: 2024-06-25 02:00:48

标题: BIOSCAN-5M：一种昆虫生物多样性的多模态数据集

摘要: 作为全球持续努力理解和监测昆虫生物多样性的一部分，本文将BIOSCAN-5M昆虫数据集呈现给机器学习社区，并建立几个基准任务。BIOSCAN-5M是一个包含超过500万昆虫标本的综合数据集，通过包含分类标签、原始核苷酸条形码序列、分配的条形码索引号和地理信息，它显著扩展了现有的基于图像的生物数据集。我们提出了三个基准实验，以展示多模态数据类型对分类和聚类准确性的影响。首先，我们在BIOSCAN-5M数据集的DNA条形码序列上预训练了一个蒙面语言模型，并展示了使用这个大型参考库对物种和属级分类性能的影响。其次，我们提出了一个应用于图像和DNA条形码的零样本迁移学习任务，用于对通过自监督学习获得的特征嵌入进行聚类，以探讨是否可以从这些表示嵌入中得到有意义的聚类。第三，我们通过在DNA条形码、图像数据和分类信息上执行对比学习来对多模态进行基准测试。这产生了一个通用的共享嵌入空间，可以使用多种类型的信息和模态进行分类。BIOSCAN-5M昆虫数据集的代码存储库可在https://github.com/zahrag/BIOSCAN-5M找到。

更新时间: 2024-06-25 02:00:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.12723v3

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding.

Updated: 2024-06-25 01:49:45

标题: 叠加解码：单次自回归推断生成多代

摘要: 许多应用程序今天在用户输入时提供多个自动完成草稿，包括GitHub的代码补全、Gmail的智能撰写和Apple的消息自动建议。在底层，语言模型通过运行自回归推断来提供草稿。因此，为用户提供$k$个草稿需要运行昂贵的语言模型$k$次。为了减轻运行$k$个推断传递的计算成本，我们提出了Superposed Decoding，一种新的解码算法，以一个自回归推断传递的计算成本生成$k$个草稿。我们通过将来自$k$个草稿的最近标记嵌入的叠加作为输入馈送到语言模型的下一个解码步骤来实现这一点。在每个推断步骤中，我们将$k$个草稿与前$k$个标记相结合，以获取$k^2$个新草稿，并缓存$k$个最可能的选项，使用最小计算开销的n-gram插值来过滤出不连贯的生成。我们的实验表明，Superposed Decoding生成的$k$个草稿至少与Nucleus Sampling和Greedy Decoding在连贯性和事实性上相当，而对于$k\ge3$，速度至少快了$2.44\times$。在一个计算规范化的设置中，用户评价明显偏向于由Superposed Decoding生成的文本而不是Nucleus Sampling。代码和更多示例已在https://github.com/RAIVNLab/SuperposedDecoding开源。

更新时间: 2024-06-25 01:49:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.18400v3

VR-based Blockchain-enabled Data Visualization Framework For Manufacturing Industry

This research proposes a blockchain-based data visualization framework integrated with VR to get manufacturing insights. This framework is implemented at the testbed of the Future Factories Lab at the University of South Carolina. The proposed system aims to enhance understanding, analysis, and decision-making processes by immersing users in a VR environment where complex manufacturing data stored using blockchain is translated into intuitive and interactive representations. The project focuses on two main components: blockchain and VR. Hyperledger Fabric is employed to establish a blockchain network, recording data from the Future Factories testbed. This system captures information from various sources, such as potentiometers on robot grippers to measure grip positioning, load cells to gauge pressure, emergency stop buttons, temperature, speed, and vibration sensors on the conveyors. Whenever predefined conditions are met, pertinent data, including sensor ID, timestamp, value, cause, and importance, is securely recorded in the blockchain, signaling the occurrence of a defect within the cell. Data retrieved from the blockchain system is accessed through 'GET' API requests. A VR application is developed using a cross-platform Unity game engine to visualize the data retrieved from the blockchain database. Meta Quest 3 is used as the targeted Head Mounted VR device. The VR application has two C# scripts: one script to query blockchain data using 'GET' API calls and another script converts the JSON object to text data to visualize in the VR system. The proposed system leverages blockchain technology and VR visualization to deliver immersive, actionable insights using secure data transmission. By embracing the proposed framework, manufacturers can unlock new potential for efficiency, sustainability, and resilience in today's increasingly complex and interconnected manufacturing workplace.

Updated: 2024-06-25 01:25:09

标题: 基于VR和区块链的制造业数据可视化框架

摘要: 这项研究提出了一个基于区块链的数据可视化框架，结合虚拟现实技术，用于获取制造业见解。该框架在南卡罗来纳大学未来工厂实验室的测试台上得到实施。所提出的系统旨在通过让用户沉浸在虚拟现实环境中，将使用区块链存储的复杂制造数据转化为直观和交互式的表现形式，从而增强理解、分析和决策过程。该项目主要关注两个主要组成部分：区块链和虚拟现实。采用Hyperledger Fabric建立区块链网络，记录来自未来工厂测试台的数据。该系统从各种来源捕获信息，例如机器人夹爪上的电位计用于测量夹持位置、负荷传感器用于测量压力、紧急停止按钮、传送带上的温度、速度和振动传感器。每当满足预定义条件时，相关数据，包括传感器ID、时间戳、值、原因和重要性，都会被安全地记录在区块链中，标志着细胞内发生缺陷。从区块链系统中检索的数据通过“GET”API请求访问。使用跨平台Unity游戏引擎开发了一个虚拟现实应用程序，用于可视化从区块链数据库检索的数据。Meta Quest 3被用作目标头戴式虚拟现实设备。虚拟现实应用程序有两个C#脚本：一个脚本用于使用“GET”API调用查询区块链数据，另一个脚本将JSON对象转换为文本数据以在虚拟现实系统中可视化。所提出的系统利用区块链技术和虚拟现实可视化，通过安全数据传输提供沉浸式、可操作的见解。通过采用所提出的框架，制造商可以在当今日益复杂和互联的制造业工作场所中释放效率、可持续性和韧性的新潜力。

更新时间: 2024-06-25 01:25:09

领域: cs.CR

下载: http://arxiv.org/abs/2406.17207v1

Contrastive General Graph Matching with Adaptive Augmentation Sampling

Graph matching has important applications in pattern recognition and beyond. Current approaches predominantly adopt supervised learning, demanding extensive labeled data which can be limited or costly. Meanwhile, self-supervised learning methods for graph matching often require additional side information such as extra categorical information and input features, limiting their application to the general case. Moreover, designing the optimal graph augmentations for self-supervised graph matching presents another challenge to ensure robustness and efficacy. To address these issues, we introduce a novel Graph-centric Contrastive framework for Graph Matching (GCGM), capitalizing on a vast pool of graph augmentations for contrastive learning, yet without needing any side information. Given the variety of augmentation choices, we further introduce a Boosting-inspired Adaptive Augmentation Sampler (BiAS), which adaptively selects more challenging augmentations tailored for graph matching. Through various experiments, our GCGM surpasses state-of-the-art self-supervised methods across various datasets, marking a significant step toward more effective, efficient and general graph matching.

Updated: 2024-06-25 01:08:03

标题: 对比自适应增强采样的一般图匹配

摘要: 图匹配在模式识别和其他领域中具有重要应用。当前的方法主要采用监督学习，需要大量标记数据，这可能受限或成本高昂。同时，用于图匹配的自监督学习方法通常需要额外的辅助信息，如额外的分类信息和输入特征，限制了它们在一般情况下的应用。此外，为自监督图匹配设计最佳的图增强也是一项挑战，以确保其鲁棒性和有效性。为了解决这些问题，我们引入了一种新颖的图中心对比框架用于图匹配（GCGM），利用大量的图增强进行对比学习，而无需任何辅助信息。鉴于增强选择的多样性，我们进一步引入了一个受启发自增强采样器（BiAS），它自适应地选择更具挑战性的增强，以适应图匹配。通过各种实验，我们的GCGM在各种数据集上超越了最先进的自监督方法，标志着更有效、高效和通用的图匹配迈出了重要一步。

更新时间: 2024-06-25 01:08:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.17199v1

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, built on unrealistic assumptions, or only effective against specific poison types, limiting their universal applicability. In this research, we propose a more universally effective, practical, and robust defense scheme called ECLIPSE. We first investigate the impact of Gaussian noise on the poisons and theoretically prove that any kind of poison will be largely assimilated when imposing sufficient random noise. In light of this, we assume the victim has access to an extremely limited number of clean images (a more practical scene) and subsequently enlarge this sparse set for training a denoising probabilistic model (a universal denoising tool). We then begin by introducing Gaussian noise to absorb the poisons and then apply the model for denoising, resulting in a roughly purified dataset. Finally, to address the trade-off of the inconsistency in the assimilation sensitivity of different poisons by Gaussian noise, we propose a lightweight corruption compensation module to effectively eliminate residual poisons, providing a more universal defense approach. Extensive experiments demonstrate that our defense approach outperforms 10 state-of-the-art defenses. We also propose an adaptive attack against ECLIPSE and verify the robustness of our defense scheme. Our code is available at https://github.com/CGCL-codes/ECLIPSE.

Updated: 2024-06-25 01:07:15

标题: ECLIPSE: 通过稀疏扩散净化消除干净标签的不明毒素

摘要: 清标签的无差别中毒攻击向正确标记的训练图像添加了不可见的扰动，从而显着降低了受害者模型的泛化能力。最近，一些防御机制已被提出，例如对抗训练、图像转换技术和图像净化。然而，这些方案要么容易受到自适应攻击的影响，建立在不切实际的假设基础上，要么只能针对特定类型的毒素有效，限制了它们的普适性。在这项研究中，我们提出了一个更具普遍有效性、实用和稳健的防御方案，称为ECLIPSE。我们首先研究了高斯噪声对毒素的影响，并理论上证明了当施加足够的随机噪声时，任何类型的毒素都将被大量吸收。基于此，我们假设受害者只能访问极少量的干净图像（更实际的场景），然后扩大这个稀疏集合以训练一个去噪概率模型（一种通用去噪工具）。我们首先引入高斯噪声来吸收毒素，然后应用模型进行去噪，得到一个大致净化的数据集。最后，为了解决高斯噪声对不同毒素的同化敏感性不一致所带来的权衡问题，我们提出了一个轻量级的污染补偿模块，有效消除残留毒素，提供一种更普遍的防御方法。大量实验证明，我们的防御方法优于10种最先进的防御方法。我们还提出了对ECLIPSE的自适应攻击，并验证了我们的防御方案的稳健性。我们的代码可在https://github.com/CGCL-codes/ECLIPSE 上找到。

更新时间: 2024-06-25 01:07:15

领域: cs.CR,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.15093v2

Sound Tagging in Infant-centric Home Soundscapes

Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.

Updated: 2024-06-25 00:15:54

标题: 婴儿中心家庭声音景观中的声音标记

摘要: 某些环境噪音与婴儿和幼儿的负面发展结果相关联。尽管在家庭环境中对声音事件进行分类或标记是一个活跃的研究领域，以往的研究集中在从非静态麦克风或成年人的角度收集的数据上。此外，许多这些作品忽略了环境中的婴儿或幼儿，或者只从一个家庭收集数据，固定声源的噪音可能在婴儿位置处适中，反之亦然。因此，尽管最近大型预训练模型在婴儿中心噪音声音环境的检测方面取得了成功，但这些模型在家庭中的表现尚未被探索。为了弥补这一差距，我们以不引人注目的方式收集和标记了22个家庭的家庭声音环境中的噪音数据，数据通过婴儿佩戴的记录设备收集。在本文中，我们探讨了一个大型预训练模型（音频频谱变换器[AST]）在我们的噪音调节的婴儿中心环境数据以及公开可用的家庭环境数据集上的表现。利用不同的训练策略，如重采样、利用公共数据集、混合公共和婴儿中心训练集以及使用噪音和掩蔽进行数据增强，我们评估了一个大型预训练模型在稀疏和不平衡的婴儿中心数据上的表现。我们的结果表明，通过将我们收集的数据集与公共数据集相结合，微调大型预训练模型将F1分数从0.11（公共数据集）和0.76（收集的数据集）提高到0.84（组合数据集），Cohen's Kappa从0.013（公共数据集）和0.77（收集的数据集）提高到0.83（组合数据集），与仅使用公共或收集的数据集训练相比。

更新时间: 2024-06-25 00:15:54

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.17190v1

Geometric Median (GM) Matching for Robust Data Pruning

Data pruning, the combinatorial task of selecting a small and informative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large-scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. Unfortunately, the existing heuristics for (robust) data pruning lack theoretical coherence and rely on heroic assumptions, that are, often unattainable, by the very nature of the problem setting. Moreover, these strategies often yield sub-optimal neural scaling laws even compared to random sampling, especially in scenarios involving strong corruption and aggressive pruning rates -- making provably robust data pruning an open challenge. In response, in this work, we propose Geometric Median ($\gm$) Matching -- a herding~\citep{welling2009herding} style greedy algorithm -- that yields a $k$-subset such that the mean of the subset approximates the geometric median of the (potentially) noisy dataset. Theoretically, we show that $\gm$ Matching enjoys an improved $\gO(1/k)$ scaling over $\gO(1/\sqrt{k})$ scaling of uniform sampling; while achieving the optimal breakdown point of 1/2 even under arbitrary corruption. Extensive experiments across popular deep learning benchmarks indicate that $\gm$ Matching consistently outperforms prior state-of-the-art; the gains become more profound at high rates of corruption and aggressive pruning rates; making $\gm$ Matching a strong baseline for future research in robust data pruning.

Updated: 2024-06-25 00:02:01

标题: 几何中位数（GM）匹配用于稳健数据修剪

摘要: 数据修剪是从大型数据集中选择一个小而信息丰富的子集的组合任务，对于减轻训练数据需求庞大的现代深度学习模型的巨大计算成本至关重要。由于大规模数据集往往存在噪声，因此在实践中开发即使在存在污染的情况下仍能保持稳健性的数据修剪策略是至关重要的。不幸的是，现有的（稳健的）数据修剪启发式方法缺乏理论上的一致性，并且依赖于英雄式的假设，这些假设往往在问题设置的本质上是无法实现的。此外，这些策略通常甚至与随机抽样相比产生次优的神经缩放规律，特别是在涉及强污染和侵略性修剪率的情况下 - 使得可证明稳健的数据修剪成为一个开放挑战。为了应对这一挑战，在这项工作中，我们提出了几何中位数（$\gm$）匹配 - 一种类似赶羊的贪婪算法，产生一个$k$-子集，使得该子集的均值近似于（可能）带噪声的数据集的几何中位数。从理论上讲，我们展示了$\gm$匹配在$\gO(1/k)$缩放上比均匀抽样的$\gO(1/\sqrt{k})$缩放有所改进；同时在任意污染下实现了最佳的破坏点为1/2。通过对流行的深度学习基准的大量实验，我们发现$\gm$匹配一直优于先前的最先进技术；在高污染率和侵略性修剪率下，这些收益变得更加显著；使得$\gm$匹配成为未来研究稳健数据修剪的强基线。

更新时间: 2024-06-25 00:02:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17188v1