Arxiv Day: Article

Loss Shaping Constraints for Long-Term Time Series Forecasting

Several applications in time series forecasting require predicting multiple steps ahead. Despite the vast amount of literature in the topic, both classical and recent deep learning based approaches have mostly focused on minimising performance averaged over the predicted window. We observe that this can lead to disparate distributions of errors across forecasting steps, especially for recent transformer architectures trained on popular forecasting benchmarks. That is, optimising performance on average can lead to undesirably large errors at specific time-steps. In this work, we present a Constrained Learning approach for long-term time series forecasting that aims to find the best model in terms of average performance that respects a user-defined upper bound on the loss at each time-step. We call our approach loss shaping constraints because it imposes constraints on the loss at each time step, and leverage recent duality results to show that despite its non-convexity, the resulting problem has a bounded duality gap. We propose a practical Primal-Dual algorithm to tackle it, and demonstrate that the proposed approach exhibits competitive average performance in time series forecasting benchmarks, while shaping the distribution of errors across the predicted window.

Updated: 2024-07-11 23:43:18

标题: 长期时间序列预测的损失塑造约束

摘要: 许多时间序列预测应用需要预测多个步骤。尽管在这个主题上有大量文献，但无论是传统的还是最近的基于深度学习的方法，大多数都集中在最小化预测窗口上的绩效。我们观察到，这可能会导致在预测步骤之间出现不同的错误分布，特别是对于在流行的预测基准上训练的最近的变压器架构。也就是说，优化平均性能可能会导致在特定时间步骤上出现不良大的错误。在这项工作中，我们提出了一种用于长期时间序列预测的约束学习方法，旨在找到在平均性能方面最佳的模型，同时尊重每个时间步上用户定义的损失上限。我们称之为损失塑形约束方法，因为它对每个时间步的损失施加约束，并利用最近的对偶结果表明，尽管它是非凸的，但所得问题具有有界的对偶差距。我们提出了一个实用的原始-对偶算法来解决这个问题，并证明所提出的方法在时间序列预测基准中表现出有竞争力的平均性能，同时塑造了在预测窗口中错误的分布。

更新时间: 2024-07-11 23:43:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.09373v2

Characterizing Prompt Compression Methods for Long Context Inference

Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks through a standardized analysis. This has led to conflicting results. To address this, here we perform a comprehensive characterization and evaluation of different prompt compression methods. In particular, we analyze extractive compression, summarization-based abstractive compression, and token pruning methods. Surprisingly, we find that extractive compression often outperforms all the other approaches, and enables up to 10x compression with minimal accuracy degradation. Interestingly, we also find that despite several recent claims, token pruning methods often lag behind extractive compression. We only found marginal improvements on summarization tasks.

Updated: 2024-07-11 23:34:32

标题: 表征长上下文推理的提示压缩方法

摘要: 长上下文推理在系统级别上面临挑战，包括增加的计算和内存需求，以及从准确性的角度来看，能够在长上下文上进行推理。最近，已经提出了几种方法来压缩提示以减少上下文长度。然而，对于比较不同提出的方法在不同任务中的表现，目前还有很少的研究通过标准化分析。这导致了冲突的结果。为了解决这个问题，我们在这里对不同的提示压缩方法进行了全面的表征和评估。具体来说，我们分析了提取式压缩、基于摘要的抽象式压缩和标记修剪方法。令人惊讶的是，我们发现提取式压缩通常优于所有其他方法，并且能够实现最多10倍的压缩而几乎不降低准确性。有趣的是，我们还发现，尽管最近有几项声明，但标记修剪方法通常落后于提取式压缩。我们在摘要任务中只发现了轻微的改善。

更新时间: 2024-07-11 23:34:32

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08892v1

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

Machine learning models trained on code and related artifacts offer valuable support for software maintenance but suffer from interpretability issues due to their complex internal variables. These concerns are particularly significant in safety-critical applications where the models' decision-making processes must be reliable. The specific features and representations learned by these models remain unclear, adding to the hesitancy in adopting them widely. To address these challenges, we introduce DeepCodeProbe, a probing approach that examines the syntax and representation learning abilities of ML models designed for software maintenance tasks. Our study applies DeepCodeProbe to state-of-the-art models for code clone detection, code summarization, and comment generation. Findings reveal that while small models capture abstract syntactic representations, their ability to fully grasp programming language syntax is limited. Increasing model capacity improves syntax learning but introduces trade-offs such as increased training time and overfitting. DeepCodeProbe also identifies specific code patterns the models learn from their training data. Additionally, we provide best practices for training models on code to enhance performance and interpretability, supported by an open-source replication package for broader application of DeepCodeProbe in interpreting other code-related models.

Updated: 2024-07-11 23:16:44

标题: DeepCodeProbe：走向理解代码训练模型学习的过程

摘要: 在代码和相关文档上训练的机器学习模型为软件维护提供了有价值的支持，但由于其复杂的内部变量而存在解释性问题。这些问题在安全关键应用中尤为重要，因为模型的决策过程必须可靠。这些模型学习的具体特征和表示仍不清楚，这增加了广泛采用它们的犹豫。为了解决这些挑战，我们介绍了DeepCodeProbe，一种探测方法，用于检查为软件维护任务设计的ML模型的语法和表示学习能力。我们的研究将DeepCodeProbe应用于代码克隆检测、代码摘要和注释生成的最先进模型。研究结果表明，尽管小型模型捕获了抽象的语法表示，但它们完全掌握编程语言语法的能力有限。增加模型容量可以改善语法学习，但会引入一些权衡，如增加训练时间和过拟合。DeepCodeProbe还确定了模型从训练数据中学习的特定代码模式。此外，我们提供了在代码上训练模型以增强性能和解释性的最佳实践，支持通过一个开源复制包来更广泛地应用DeepCodeProbe来解释其他与代码相关的模型。

更新时间: 2024-07-11 23:16:44

领域: cs.SE,cs.AI,cs.LG,I.2.5; D.2.3

下载: http://arxiv.org/abs/2407.08890v1

Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs

Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.

Updated: 2024-07-11 23:04:16

标题: 揭示威胁行为者利用的语义和主题以传递恶意附件和URLs

摘要: 最近的威胁报告强调电子邮件仍然是传送恶意软件到终端的主要途径。尽管有这些统计数据，检测恶意电子邮件附件和URL经常忽略了语义线索、语言特征和上下文线索。我们的研究利用BERTopic无监督主题建模来识别电子邮件中嵌入的共同语义和主题，以传递恶意附件和呼吁行动的URL。我们通过提取和清理内容对电子邮件进行预处理，并使用BGE-M3等多语言嵌入模型进行密集表示，聚类算法（HDBSCAN和OPTICS）用来根据语义相似性对电子邮件进行分组。Phi3-Mini-4K-Instruct有助于语义和hLDA辅助主题分析，以了解威胁行为者的模式。我们的研究将评估和比较不同的聚类算法在主题数量、连贯性和多样性指标上的表现，最终得出对威胁行为者常用的语义和主题进行恶意附件和URL传递的见解，这对威胁检测领域具有重要贡献。

更新时间: 2024-07-11 23:04:16

领域: cs.LG

下载: http://arxiv.org/abs/2407.08888v1

On the Foundations of Shortcut Learning

Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on \emph{predictivity} -- how reliably a feature indicates training-set labels -- but also on \emph{availability} -- how easily the feature can be extracted from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and we quantify a model's shortcut bias -- its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.

Updated: 2024-07-11 23:03:09

标题: 关于快捷学习基础的研究

摘要: 深度学习模型可以从数据中提取丰富的特征。模型使用哪些特征不仅取决于\emph{预测性}——即一个特征如何可靠地指示训练集标签，还取决于\emph{可用性}——即一个特征在输入中如何容易提取。关于快捷学习的文献已经指出了一些例子，即模型更偏向使用一个特征而不是另一个，例如纹理优先于形状，图像背景优先于前景对象。在这里，我们测试了关于哪些输入属性对模型更可用的假设，并系统地研究了预测性和可用性如何相互作用来塑造模型使用特征的方式。我们构建了一个最小的、显式的生成框架，用于合成具有两个潜在特征的分类数据集，这些特征在预测性和我们假设与可用性相关的因素上有所变化，我们量化了模型的快捷偏见——即它对快捷（更可用，不太具有预测性）特征的过度依赖，而牺牲了核心（不太可用，更具有预测性）特征。我们发现线性模型相对不偏见，但引入具有ReLU或Tanh单元的单隐藏层会产生偏见。我们的实证发现与基于神经切线核的理论解释一致。最后，我们研究了实际使用的模型如何在自然数据集中权衡预测性和可用性，发现增加模型快捷偏见程度的可用性操作。综合这些发现表明，学习快捷特征的倾向是深度非线性架构的一个基本特征，值得系统研究，因为它在塑造模型如何解决任务方面发挥着作用。

更新时间: 2024-07-11 23:03:09

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.16228v2

Multi-step Inference over Unstructured Data

The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cognition (EC), we have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine for logical inference, planning and interactive constraint solving. We describe Cora, a Collaborative Research Assistant built on this platform, that is designed to perform complex research and discovery tasks in high-stakes domains. This paper discusses the multi-step inference challenges inherent in such domains, critiques the limitations of existing LLM-based methods, and demonstrates how Cora's neuro-symbolic approach effectively addresses these issues. We provide an overview of the system architecture, key algorithms for knowledge extraction and formal reasoning, and present preliminary evaluation results that highlight Cora's superior performance compared to well-known LLM and RAG baselines.

Updated: 2024-07-11 22:55:40

标题: 多步推理在非结构化数据上的应用

摘要: 大语言模型（LLMs）和生成式人工智能的出现彻底改变了各个领域的自然语言应用。然而，在医疗、法律和金融等领域的高风险决策任务需要一定精确性、全面性和逻辑一致性，而纯LLM或检索增强生成（RAG）方法通常无法提供。在Elemental Cognition（EC），我们开发了一个神经符号人工智能平台来解决这些问题。该平台集成了经过精细调整的LLMs用于知识提取和对齐，以及一个强大的符号推理引擎用于逻辑推理、规划和交互式约束求解。我们描述了Cora，一个建立在这个平台上的协作研究助手，旨在在高风险领域执行复杂研究和发现任务。本文讨论了这些领域固有的多步推理挑战，批评了现有基于LLM的方法的局限性，并展示了Cora的神经符号方法如何有效地解决这些问题。我们提供了系统架构概述、知识提取和形式推理的关键算法，并呈现了初步评估结果，突出了Cora相对于知名LLM和RAG基线的优越性能。

更新时间: 2024-07-11 22:55:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17987v3

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on downstream tasks. Previous work studied the effect of pruning the training set of the downstream tasks on the performance of the model on its evaluation set. In this work, we propose an automatic dataset pruning method for the training set of fine-tuning tasks. Our method is based on the model's success rate in correctly classifying each training data point. Unlike previous work which relies on user feedback to determine subset size, our method automatically extracts training subsets that are adapted for each pair of model and fine-tuning task. Our method provides multiple subsets for use in dataset pruning that navigate the trade-off between subset size and evaluation accuracy. Our largest subset, which we also refer to as the winning ticket subset, is on average $3 \times$ smaller than the original training set of the fine-tuning task. Our experiments on 5 downstream tasks and 2 language models show that, on average, fine-tuning on the winning ticket subsets results in a $0.1 \%$ increase in the evaluation performance of the model.

Updated: 2024-07-11 22:46:18

标题: 基于Transformer的语言模型微调数据集的自动修剪

摘要: 基于Transformer的语言模型已经在各种自然语言理解任务中展现出最先进的性能。为了实现这一性能，这些模型首先在通用语料库上进行预训练，然后在下游任务上进行微调。先前的研究探讨了对下游任务的训练集进行修剪对模型在其评估集上性能的影响。在本研究中，我们提出了一种自动数据集修剪方法，用于微调任务的训练集。我们的方法基于模型正确分类每个训练数据点的成功率。与以往依赖用户反馈确定子集大小的方法不同，我们的方法自动提取适用于每对模型和微调任务的训练子集。我们的方法提供了多个用于数据集修剪的子集，以在子集大小和评估准确度之间进行权衡。我们最大的子集，也称为获奖票子集，平均比微调任务的原始训练集小3倍。我们在5个下游任务和2个语言模型上的实验表明，平均而言，在获奖票子集上微调会导致模型评估性能提高0.1%。

更新时间: 2024-07-11 22:46:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08887v1

Semi-Supervised Multi-Task Learning Based Framework for Power System Security Assessment

This paper develops a novel machine learning-based framework using Semi-Supervised Multi-Task Learning (SS-MTL) for power system dynamic security assessment that is accurate, reliable, and aware of topological changes. The learning algorithm underlying the proposed framework integrates conditional masked encoders and employs multi-task learning for classification-aware feature representation, which improves the accuracy and scalability to larger systems. Additionally, this framework incorporates a confidence measure for its predictions, enhancing its reliability and interpretability. A topological similarity index has also been incorporated to add topological awareness to the framework. Various experiments on the IEEE 68-bus system were conducted to validate the proposed method, employing two distinct database generation techniques to generate the required data to train the machine learning algorithm. The results demonstrate that our algorithm outperforms existing state-of-the-art machine learning based techniques for security assessment in terms of accuracy and robustness. Finally, our work underscores the value of employing auto-encoders for security assessment, highlighting improvements in accuracy, reliability, and robustness. All datasets and codes used have been made publicly available to ensure reproducibility and transparency.

Updated: 2024-07-11 22:42:53

标题: 基于半监督多任务学习的电力系统安全评估框架

摘要: 本文提出了一种新颖的基于机器学习的框架，使用半监督多任务学习（SS-MTL）进行电力系统动态安全评估，该框架准确、可靠，并且能够感知拓扑变化。所提出的学习算法整合了条件掩码编码器，并采用多任务学习进行分类感知特征表示，从而提高了准确性和适用于更大系统的可伸缩性。此外，该框架还结合了对其预测的置信度度量，增强了其可靠性和可解释性。拓扑相似性指数也被纳入框架中，以增加对拓扑的感知性。在IEEE 68节点系统上进行了各种实验，验证了所提出的方法，采用两种不同的数据库生成技术来生成训练机器学习算法所需的数据。结果表明，我们的算法在准确性和鲁棒性方面优于现有的基于机器学习的安全评估技术。最后，我们的工作强调了使用自编码器进行安全评估的价值，突出了准确性、可靠性和鲁棒性的改进。所有使用的数据集和代码都已公开提供，以确保可重复性和透明度。

更新时间: 2024-07-11 22:42:53

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.08886v1

Fairness in Ranking under Disparate Uncertainty

Ranking is a ubiquitous method for focusing the attention of human evaluators on a manageable subset of options. Its use as part of human decision-making processes ranges from surfacing potentially relevant products on an e-commerce site to prioritizing college applications for human review. While ranking can make human evaluation more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options. Unfortunately, such disparity in uncertainty appears widespread, often to the detriment of minority groups for which relevance estimates can have higher uncertainty due to a lack of data or appropriate features. To address this fairness issue, we propose Equal-Opportunity Ranking (EOR) as a new fairness criterion for ranking and show that it corresponds to a group-wise fair lottery among the relevant options even in the presence of disparate uncertainty. EOR optimizes for an even cost burden on all groups, unlike the conventional Probability Ranking Principle, and is fundamentally different from existing notions of fairness in rankings, such as demographic parity and proportional Rooney rule constraints that are motivated by proportional representation relative to group size. To make EOR ranking practical, we present an efficient algorithm for computing it in time $O(n \log(n))$ and prove its close approximation guarantee to the globally optimal solution. In a comprehensive empirical evaluation on synthetic data, a US Census dataset, and a real-world audit of Amazon search queries, we find that the algorithm reliably guarantees EOR fairness while providing effective rankings.

Updated: 2024-07-11 21:35:37

标题: 不同不确定性条件下排名的公平性

摘要: 排名是一种普遍的方法，用于引导人类评估者的注意力集中在可管理的选项子集上。它作为人类决策过程的一部分的使用范围从在电子商务网站上呈现潜在相关产品到为人类审查优先处理大学申请。虽然排名可以通过将注意力集中在最有前途的选项上使人类评估更加有效，但我们认为，如果潜在相关性模型的不确定性在不同组的选项之间有所不同，那么它可能会引入不公平。不幸的是，这种不确定性的差异似乎很普遍，往往对少数群体造成损害，因为由于缺乏数据或适当的特征，相关性估计的不确定性可能更高。为了解决这个公平性问题，我们提出Equal-Opportunity Ranking（EOR）作为排名的新公平标准，并表明即使存在不同的不确定性，它也相当于对相关选项进行群体公平抽奖。EOR优化所有群体之间的成本负担均匀，与传统的概率排名原则不同，并且与现有的排名公平性概念（如人口统计平衡和比例Rooney规则约束）基本不同，这些概念是基于相对于群体大小的比例代表性来推动的。为了使EOR排名实用，我们提出了一种高效的算法，在时间$O(n \log(n))$内计算它，并证明了它对全局最优解的近似保证。在合成数据、美国人口普查数据集和亚马逊搜索查询的实际审计中进行了全面的实证评估，发现该算法可可靠地保证EOR公平性，并提供有效的排名。

更新时间: 2024-07-11 21:35:37

领域: cs.LG,cs.CY,cs.IR

下载: http://arxiv.org/abs/2309.01610v3

SALT: Introducing a Framework for Hierarchical Segmentations in Medical Imaging using Softmax for Arbitrary Label Trees

Traditional segmentation networks approach anatomical structures as standalone elements, overlooking the intrinsic hierarchical connections among them. This study introduces Softmax for Arbitrary Label Trees (SALT), a novel approach designed to leverage the hierarchical relationships between labels, improving the efficiency and interpretability of the segmentations. This study introduces a novel segmentation technique for CT imaging, which leverages conditional probabilities to map the hierarchical structure of anatomical landmarks, such as the spine's division into lumbar, thoracic, and cervical regions and further into individual vertebrae. The model was developed using the SAROS dataset from The Cancer Imaging Archive (TCIA), comprising 900 body region segmentations from 883 patients. The dataset was further enhanced by generating additional segmentations with the TotalSegmentator, for a total of 113 labels. The model was trained on 600 scans, while validation and testing were conducted on 150 CT scans. Performance was assessed using the Dice score across various datasets, including SAROS, CT-ORG, FLARE22, LCTSC, LUNA16, and WORD. Among the evaluated datasets, SALT achieved its best results on the LUNA16 and SAROS datasets, with Dice scores of 0.93 and 0.929 respectively. The model demonstrated reliable accuracy across other datasets, scoring 0.891 on CT-ORG and 0.849 on FLARE22. The LCTSC dataset showed a score of 0.908 and the WORD dataset also showed good performance with a score of 0.844. SALT used the hierarchical structures inherent in the human body to achieve whole-body segmentations with an average of 35 seconds for 100 slices. This rapid processing underscores its potential for integration into clinical workflows, facilitating the automatic and efficient computation of full-body segmentations with each CT scan, thus enhancing diagnostic processes and patient care.

Updated: 2024-07-11 21:33:08

标题: 盐：引入一个用于医学成像中Hierarchical Segmentations的框架，使用Softmax进行任意标签树

摘要: 传统的分割网络方法将解剖结构视为独立的元素，忽视它们之间的固有层次关系。本研究介绍了Softmax for Arbitrary Label Trees（SALT），这是一种新颖的方法，旨在利用标签之间的层次关系，提高分割的效率和可解释性。本研究介绍了一种新颖的CT成像分割技术，利用条件概率来映射解剖标志物的层次结构，例如脊柱分为腰椎、胸椎和颈椎区域，进一步分为单个椎骨。该模型使用来自癌症成像存档（TCIA）的SAROS数据集开发，包括来自883名患者的900个身体区域分割。通过使用TotalSegmentator生成额外的分割，总共有113个标签。该模型在600个扫描上进行了训练，而验证和测试则在150个CT扫描上进行。性能是使用Dice分数在各种数据集上评估的，包括SAROS、CT-ORG、FLARE22、LCTSC、LUNA16和WORD。在评估的数据集中，SALT在LUNA16和SAROS数据集上取得了最佳结果，分别为0.93和0.929的Dice分数。该模型在其他数据集上表现出可靠的准确性，CT-ORG得分为0.891，FLARE22得分为0.849。LCTSC数据集显示得分为0.908，WORD数据集也显示出0.844的良好表现。 SALT利用人体固有的层次结构实现了整体身体分割，每100个切片平均耗时35秒。这种快速处理凸显了其整合到临床工作流程中的潜力，促进了每个CT扫描的全身分割的自动和高效计算，从而增强了诊断过程和患者护理。

更新时间: 2024-07-11 21:33:08

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.08878v1

Generalizable Physics-informed Learning for Stochastic Safety-critical Systems

Accurate estimate of long-term risk is critical for safe decision-making, but sampling from rare risk events and long-term trajectories can be prohibitively costly. Risk gradient can be used in many first-order techniques for learning and control methods, but gradient estimate is difficult to obtain using Monte Carlo (MC) methods because the infinitesimal devisor may significantly amplify sampling noise. Motivated by this gap, we propose an efficient method to evaluate long-term risk probabilities and their gradients using short-term samples without sufficient risk events. We first derive that four types of long-term risk probability are solutions of certain partial differential equations (PDEs). Then, we propose a physics-informed learning technique that integrates data and physics information (aforementioned PDEs). The physics information helps propagate information beyond available data and obtain provable generalization beyond available data, which in turn enables long-term risk to be estimated using short-term samples of safe events. Finally, we demonstrate in simulation that the proposed technique has improved sample efficiency, generalizes well to unseen regions, and adapts to changing system parameters.

Updated: 2024-07-11 21:10:03

标题: 可推广的物理知识引导学习用于随机安全关键系统

摘要: 准确估计长期风险对于安全决策至关重要，但从罕见的风险事件和长期轨迹中取样可能成本过高。风险梯度可以在许多学习和控制方法的一阶技术中使用，但使用蒙特卡洛（MC）方法难以获得梯度估计，因为无穷小除法可能会显著放大采样噪声。受此差距的启发，我们提出了一种有效的方法，利用短期样本评估长期风险概率及其梯度，而无需足够的风险事件。我们首先推导出四种类型的长期风险概率是某些偏微分方程（PDEs）的解。然后，我们提出了一种物理信息学习技术，将数据和物理信息（上述PDEs）整合在一起。物理信息帮助传播超出可用数据的信息，并获得超出可用数据的可证泛化，从而使得长期风险能够使用安全事件的短期样本来估计。最后，我们在模拟中证明，所提出的技术具有改善的样本效率，能够很好地泛化到未知区域，并适应不断变化的系统参数。

更新时间: 2024-07-11 21:10:03

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.08868v1

What Do People Think about Sentient AI?

With rapid advances in machine learning, many people in the field have been discussing the rise of digital minds and the possibility of artificial sentience. Future developments in AI capabilities and safety will depend on public opinion and human-AI interaction. To begin to fill this research gap, we present the first nationally representative survey data on the topic of sentient AI: initial results from the Artificial Intelligence, Morality, and Sentience (AIMS) survey, a preregistered and longitudinal study of U.S. public opinion that began in 2021. Across one wave of data collection in 2021 and two in 2023 (total \textit{N} = 3,500), we found mind perception and moral concern for AI well-being in 2021 were higher than predicted and significantly increased in 2023: for example, 71\% agree sentient AI deserve to be treated with respect, and 38\% support legal rights. People have become more threatened by AI, and there is widespread opposition to new technologies: 63\% support a ban on smarter-than-human AI, and 69\% support a ban on sentient AI. Expected timelines are surprisingly short and shortening with a median forecast of sentient AI in only five years and artificial general intelligence in only two years. We argue that, whether or not AIs become sentient, the discussion itself may overhaul human-computer interaction and shape the future trajectory of AI technologies, including existential risks and opportunities.

Updated: 2024-07-11 21:04:39

标题: 人们认为有感知能力的人工智能是什么？

摘要: 随着机器学习的快速发展，该领域的许多人一直在讨论数字思维的崛起和人工智能可能具有感知能力的可能性。人工智能能力和安全性的未来发展将取决于公众意见和人机交互。为了填补这一研究空白，我们提供了关于有感知能力人工智能这一主题的第一份具有国家代表性的调查数据：来自《人工智能、道德和感知》(AIMS)调查的初步结果，这是一项美国公众意见的预注册和纵向研究，始于2021年。在2021年的一次数据收集波和2023年的两次数据收集波中（总计N=3,500），我们发现2021年的人们对AI的思维感知和道德关注程度高于预期，并在2023年显著增加：例如，71\%的人同意应该尊重有感知能力的人工智能，38\%的人支持法律权利。人们对人工智能感到更加威胁，对新技术普遍持反对态度：63\%的人支持禁止比人类更聪明的AI，69\%的人支持禁止有感知能力的AI。预期时间表令人惊讶地短，且呈现缩短趋势，有关有感知能力的人工智能的中位预测时间仅为五年，有关人工智能的一般智能的中位预测时间仅为两年。我们认为，无论人工智能是否具有感知能力，讨论本身可能会彻底改变人机交互，并塑造人工智能技术的未来轨迹，包括存在风险和机会。

更新时间: 2024-07-11 21:04:39

领域: cs.AI,cs.CY,cs.ET,cs.HC

下载: http://arxiv.org/abs/2407.08867v1

A Hybrid Spiking-Convolutional Neural Network Approach for Advancing Machine Learning Models

In this article, we propose a novel standalone hybrid Spiking-Convolutional Neural Network (SC-NN) model and test on using image inpainting tasks. Our approach uses the unique capabilities of SNNs, such as event-based computation and temporal processing, along with the strong representation learning abilities of CNNs, to generate high-quality inpainted images. The model is trained on a custom dataset specifically designed for image inpainting, where missing regions are created using masks. The hybrid model consists of SNNConv2d layers and traditional CNN layers. The SNNConv2d layers implement the leaky integrate-and-fire (LIF) neuron model, capturing spiking behavior, while the CNN layers capture spatial features. In this study, a mean squared error (MSE) loss function demonstrates the training process, where a training loss value of 0.015, indicates accurate performance on the training set and the model achieved a validation loss value as low as 0.0017 on the testing set. Furthermore, extensive experimental results demonstrate state-of-the-art performance, showcasing the potential of integrating temporal dynamics and feature extraction in a single network for image inpainting.

Updated: 2024-07-11 20:50:33

标题: 一个用于推进机器学习模型的混合脉冲卷积神经网络方法

摘要: 在本文中，我们提出了一种新颖的独立混合脉冲卷积神经网络（SC-NN）模型，并在图像修复任务中进行了测试。我们的方法利用了SNN的独特能力，如基于事件的计算和时间处理，以及CNN的强大表示学习能力，生成高质量的修复图像。该模型在一个专门为图像修复设计的自定义数据集上进行训练，其中使用掩模创建缺失区域。混合模型由SNNConv2d层和传统CNN层组成。SNNConv2d层实现了漏电积分-射击（LIF）神经元模型，捕捉脉冲行为，而CNN层捕捉空间特征。在这项研究中，均方误差（MSE）损失函数展示了训练过程，其中训练损失值为0.015，表明在训练集上表现准确，模型在测试集上实现了0.0017的验证损失值。此外，广泛的实验结果展示了最先进的性能，展示了将时间动态和特征提取集成到单个网络中进行图像修复的潜力。

更新时间: 2024-07-11 20:50:33

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.08861v1

Enhancing Transformer RNNs with Multiple Temporal Perspectives

We introduce the concept of multiple temporal perspectives, a novel approach applicable to Recurrent Neural Network (RNN) architectures for enhancing their understanding of sequential data. This method involves maintaining diverse temporal views of previously encountered text, significantly enriching the language models' capacity to interpret context. To show the efficacy of this approach, we incorporate it into the Receptance Weighted Key Value (RWKV) architecture, addressing its inherent challenge of retaining all historical information within a single hidden state. Notably, this improvement is achieved with a minimal increase in the number of parameters --even as little as $0.04\%$ of the original number of parameters. Further, the additional parameters necessary for the multiple temporal perspectives are fine-tuned with minimal computational overhead, avoiding the need for a full pre-training. The resulting model maintains linear computational complexity during prompt inference, ensuring consistent efficiency across various sequence lengths. The empirical results and ablation studies included in our research validate the effectiveness of our approach, showcasing improved performance across multiple benchmarks. The code, model weights and datasets are open-sourced at: https://github.com/RazvanDu/TemporalRNNs.

Updated: 2024-07-11 20:43:59

标题: 利用多个时间视角增强Transformer RNNs

摘要: 我们引入了多个时间视角的概念，这是一种新颖的方法，适用于改进递归神经网络（RNN）架构以增强其对序列数据的理解。该方法涉及保持先前遇到的文本的多样化时间视图，显著丰富了语言模型解释上下文的能力。为了展示这种方法的有效性，我们将其纳入到Receptance Weighted Key Value（RWKV）架构中，解决了在单个隐藏状态中保留所有历史信息的固有挑战。值得注意的是，即使参数数量增加很少--甚至只有原始参数数量的$0.04\%$，也实现了这种改进。此外，多个时间视角所需的额外参数经过微调，几乎没有额外的计算开销，避免了需要完全预训练的必要性。由此产生的模型在提示推理期间保持线性计算复杂度，确保在各种序列长度上保持一致的效率。我们研究中包含的实证结果和消融研究验证了我们方法的有效性，展示了在多个基准测试中性能的改善。代码、模型权重和数据集在https://github.com/RazvanDu/TemporalRNNs 上开源。

更新时间: 2024-07-11 20:43:59

领域: cs.LG,cs.AI,cs.CL,I.2.0; I.2.7

下载: http://arxiv.org/abs/2402.02625v2

UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven experienced designers. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

Updated: 2024-07-11 20:18:19

标题: UICrit：利用UICritique数据集增强自动化设计评估

摘要: 自动化UI评估对设计过程有益；例如，比较不同的UI设计，或进行自动启发式评估。基于LLM的UI评估，特别是具有广泛适用性的UI类型和评估任务的承诺。然而，当前基于LLM的技术尚未达到人类评估者的表现水平。我们假设通过收集一个有针对性的UI反馈数据集，然后利用该数据集来提高通用LLM的性能，可以改进自动评估。我们提供了一个由七名经验设计师收集的针对983个移动UI的3,059个设计批评和质量评级的有针对性数据集。我们进行了深入分析以表征数据集的特征。然后，我们应用这个数据集通过各种少样本和视觉提示技术，在LLM生成的UI反馈中实现了55%的性能增益。我们还讨论了该数据集的未来应用，包括训练一个用于生成UI技术的奖励模型，以及微调一个与工具无关的多模式LLM，自动化UI评估。

更新时间: 2024-07-11 20:18:19

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.08850v1

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

The escalating challenge of misinformation, particularly in political discourse, requires advanced fact-checking solutions; this is even clearer in the more complex scenario of multimodal claims. We tackle this issue using a multimodal large language model in conjunction with retrieval-augmented generation (RAG), and introduce two novel reasoning techniques: Chain of RAG (CoRAG) and Tree of RAG (ToRAG). They fact-check multimodal claims by extracting both textual and image content, retrieving external information, and reasoning subsequent questions to be answered based on prior evidence. We achieve a weighted F1-score of 0.85, surpassing a baseline reasoning technique by 0.14 points. Human evaluation confirms that the vast majority of our generated fact-check explanations contain all information from gold standard data.

Updated: 2024-07-11 20:16:09

标题: RAGAR，你的谎言雷达：使用多模式大型语言模型进行政治事实核查的RAG增强推理

摘要: 随着政治话语中虚假信息不断增加的挑战，需要先进的事实核查解决方案；在多模态索赔更为复杂的情况下，这一点甚至更为明显。我们利用多模态大型语言模型结合检索增强生成（RAG）来解决这个问题，并引入两种新颖的推理技术：RAG链（CoRAG）和RAG树（ToRAG）。它们通过提取文本和图像内容、检索外部信息以及基于先前证据推理后续问题来核查多模态索赔。我们实现了0.85的加权F1分数，超过了基线推理技术0.14分。人工评估确认，我们生成的事实核查解释中绝大多数包含了来自黄金标准数据的所有信息。

更新时间: 2024-07-11 20:16:09

领域: cs.CL,cs.AI,cs.CY,cs.ET,cs.MA

下载: http://arxiv.org/abs/2404.12065v2

Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities

The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.

Updated: 2024-07-11 20:12:22

标题: 多模态联邦学习在非IID数据集上用于癌症分期的不平衡模态

摘要: 机器学习（ML）在医学图像分析中用于癌症分期的应用已经引起了医学领域的广泛关注。当与创新的联邦学习（FL）框架结合使用时，ML技术可以进一步解决与患者数据曝露相关的隐私问题。鉴于患者记录中存在多样化数据模态的频繁存在，利用FL在多模态学习框架中具有相当大的潜力，可用于癌症分期。然而，现有的多模态FL作品往往假设所有数据收集机构都可以访问系统中的所有数据模态。这种过于简化的方法忽视了只能访问系统中部分数据模态的机构。在这项工作中，我们引入了一种新颖的FL架构，旨在不仅适应数据样本的异质性，还适应机构之间数据模态的固有异质性/非均匀性。我们阐明了在我们的FL系统中观察到的不同数据模态之间的收敛速度变化所带来的挑战。随后，我们提出了一种解决方案，通过设计一种适用于多模态FL的分布式梯度混合和基于接近度的客户端加权策略来应对这些挑战。为了展示我们方法的优越性，我们使用癌症基因组图谱计划（TCGA）数据湖进行实验，考虑不同癌症类型和三种数据模态：mRNA序列、组织病理图像数据和临床信息。我们的结果进一步揭示了基于类别与基于类型的异质性对模型性能的影响和严重性，从而拓宽了多模态FL文献中的数据异质性概念。

更新时间: 2024-07-11 20:12:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.03609v2

Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs

This study introduces bifurcated attention, a method designed to enhance language model inference in shared-context batch decoding scenarios. Our approach addresses the challenge of redundant memory IO costs, a critical factor contributing to latency in high batch sizes and extended context lengths. Bifurcated attention achieves this by strategically dividing the attention mechanism during incremental decoding into two separate GEMM operations: one focusing on the KV cache from prefill, and another on the decoding process itself. While maintaining the computational load (FLOPs) of standard attention mechanisms, bifurcated attention ensures precise computation with significantly reduced memory IO. Our empirical results show over 2.1$\times$ speedup when sampling 16 output sequences and more than 6.2$\times$ speedup when sampling 32 sequences at context lengths exceeding 8k tokens on a 7B model that uses multi-head attention. The efficiency gains from bifurcated attention translate into lower latency, making it particularly suitable for real-time applications. For instance, it enables massively parallel answer generation without substantially increasing latency, thus enhancing performance when integrated with post-processing techniques such as re-ranking.

Updated: 2024-07-11 20:07:30

标题: 分叉注意力：在LLM中通过共享前缀加速大规模并行解码

摘要: 这项研究介绍了双分支注意力，这是一种旨在增强在共享上下文批处理解码场景中语言模型推理的方法。我们的方法解决了冗余内存IO成本的挑战，这是导致高批量大小和扩展上下文长度的延迟的关键因素。双分支注意力通过在增量解码过程中战略性地将注意力机制分为两个独立的GEMM操作来实现这一点：一个专注于来自预先填充的KV缓存，另一个专注于解码过程本身。在保持标准注意力机制的计算负载（FLOPs）的同时，双分支注意力确保了精确计算，并大大减少了内存IO。我们的实证结果显示，当对一个使用多头注意力的7B模型进行超过8k个令牌长度的上下文序列采样时，生成16个输出序列时可以获得超过2.1倍的加速，生成32个序列时可以获得超过6.2倍的加速。双分支注意力带来的效率提升转化为更低的延迟，使其特别适用于实时应用。例如，它可以实现大规模并行答案生成，而不会显著增加延迟，从而在与重新排序等后处理技术集成时提高性能。

更新时间: 2024-07-11 20:07:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.08845v2

Interventions Against Machine-Assisted Statistical Discrimination

I study statistical discrimination driven by verifiable beliefs, such as those generated by machine learning, rather than by humans. When beliefs are verifiable, interventions against statistical discrimination can move beyond simple, belief-free designs like affirmative action, to more sophisticated ones, that constrain decision makers based on what they are thinking. Such mind reading interventions can perform well where affirmative action does not, even when the minds being read are biased. My theory of belief-contingent intervention design sheds light on influential methods of regulating machine learning, and yields novel interventions robust to covariate shift and incorrect, biased beliefs.

Updated: 2024-07-11 20:01:41

标题: 对机器辅助统计歧视的干预措施

摘要: 我研究由可验证信念驱动的统计歧视，如机器学习生成的信念，而不是人类生成的。当信念是可验证的时，针对统计歧视的干预可以超越简单的、无信念的设计，如平权行动，而是更复杂的设计，限制决策者的决定基于他们的思维。这种心灵阅读干预可以在平权行动无法成功的情况下表现出色，即使被阅读的心智是有偏见的。我的信念相关干预设计理论揭示了监管机器学习的影响力方法，并产生了新颖的、对协变量转移和不正确、有偏见的信念具有强大鲁棒性的干预措施。

更新时间: 2024-07-11 20:01:41

领域: econ.TH,cs.LG

下载: http://arxiv.org/abs/2310.04585v3

Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models

Beyond estimating parameters of interest from data, one of the key goals of statistical inference is to properly quantify uncertainty in these estimates. In Bayesian inference, this uncertainty is provided by the posterior distribution, the computation of which typically involves an intractable high-dimensional integral. Among available approximation methods, sampling-based approaches come with strong theoretical guarantees but scale poorly to large problems, while variational approaches scale well but offer few theoretical guarantees. In particular, variational methods are known to produce overconfident estimates of posterior uncertainty and are typically non-identifiable, with many latent variable configurations generating equivalent predictions. Here, we address these challenges by showing how diffusion-based models (DBMs), which have recently produced state-of-the-art performance in generative modeling tasks, can be repurposed for performing calibrated, identifiable Bayesian inference. By exploiting a previously established connection between the stochastic and probability flow ordinary differential equations (pfODEs) underlying DBMs, we derive a class of models, inflationary flows, that uniquely and deterministically map high-dimensional data to a lower-dimensional Gaussian distribution via ODE integration. This map is both invertible and neighborhood-preserving, with controllable numerical error, with the result that uncertainties in the data are correctly propagated to the latent space. We demonstrate how such maps can be learned via standard DBM training using a novel noise schedule and are effective at both preserving and reducing intrinsic data dimensionality. The result is a class of highly expressive generative models, uniquely defined on a low-dimensional latent space, that afford principled Bayesian inference.

Updated: 2024-07-11 19:58:19

标题: 通货膨胀流动：基于扩散模型的校准贝叶斯推断

摘要: 除了从数据中估计感兴趣的参数外，统计推断的一个关键目标是正确量化这些估计值的不确定性。在贝叶斯推断中，这种不确定性由后验分布提供，其计算通常涉及一个难以处理的高维积分。在可用的近似方法中，基于抽样的方法具有强大的理论保证，但不适用于大问题，而变分方法适用于大问题，但提供的理论保证较少。特别地，变分方法被认为会产生对后验不确定性的过于自信的估计，并且通常是不可识别的，许多潜在变量配置会生成相同的预测。在这里，我们通过展示最近在生成建模任务中取得最先进性能的扩散模型（DBMs）如何被重新用于执行校准、可识别的贝叶斯推断来解决这些挑战。通过利用DBMs底层的随机和概率流常微分方程（pfODEs）之间先前建立的连接，我们推导出一类模型，通货膨胀流，通过ODE积分将高维数据唯一地和确定地映射到一个低维的高斯分布。这个映射是可逆且保持局部性，具有可控的数值误差，结果是数据中的不确定性被正确地传播到潜在空间。我们展示了这种映射如何通过使用新颖的噪声调度进行标准DBM训练来学习，并且在保持和减少内在数据维度方面是有效的。结果是一类高度表达力的生成模型，唯一定义在低维潜在空间上，为原则上的贝叶斯推断提供了可能。

更新时间: 2024-07-11 19:58:19

领域: cs.LG,stat.ML,68T99 (Primary) 62M45 (Secondary),G.3; I.6.5; I.2

下载: http://arxiv.org/abs/2407.08843v1

Data-driven Model Reduction for Soft Robots via Lagrangian Operator Inference

Data-driven model reduction methods provide a nonintrusive way of constructing computationally efficient surrogates of high-fidelity models for real-time control of soft robots. This work leverages the Lagrangian nature of the model equations to derive structure-preserving linear reduced-order models via Lagrangian Operator Inference and compares their performance with prominent linear model reduction techniques through an anguilliform swimming soft robot model example with 231,336 degrees of freedom. The case studies demonstrate that preserving the underlying Lagrangian structure leads to learned models with higher predictive accuracy and robustness to unseen inputs.

Updated: 2024-07-11 19:55:21

标题: 基于Lagrangian算子推断的软机器人数据驱动模型简化

摘要: 数据驱动的模型简化方法提供了一种非侵入性的方式，用于构建高保真度模型的计算效率较高的替代品，以便实时控制软机器人。本文利用模型方程的拉格朗日性质，通过拉格朗日算子推断来推导保持结构的线性降阶模型，并通过一个具有231,336个自由度的蛇鳗游动软机器人模型示例，将其性能与著名的线性模型简化技术进行比较。案例研究表明，保持底层拉格朗日结构会导致具有更高预测准确性和对未知输入更具鲁棒性的学习模型。

更新时间: 2024-07-11 19:55:21

领域: cs.RO,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2407.08840v1

A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes

With the proliferation of Artificial Intelligence, there has been a massive increase in the amount of data required to be accumulated and disseminated digitally. As the data are available online in digital landscapes with complex and sophisticated infrastructures, it is crucial to implement various defense mechanisms based on cybersecurity. Generative Adversarial Networks (GANs), which are deep learning models, have emerged as powerful solutions for addressing the constantly changing security issues. This survey studies the significance of the deep learning model, precisely on GANs, in strengthening cybersecurity defenses. Our survey aims to explore the various works completed in GANs, such as Intrusion Detection Systems (IDS), Mobile and Network Trespass, BotNet Detection, and Malware Detection. The focus is to examine how GANs can be influential tools to strengthen cybersecurity defenses in these domains. Further, the paper discusses the challenges and constraints of using GANs in these areas and suggests future research directions. Overall, the paper highlights the potential of GANs in enhancing cybersecurity measures and addresses the need for further exploration in this field.

Updated: 2024-07-11 19:51:48

标题: 一份关于生成对抗网络在网络安全中应用的调查：前景、方向和开放研究范围

摘要: 随着人工智能的普及，需要积累和传播的数据量大幅增加。由于数据在线上以复杂和精密的基础设施出现，基于网络安全的各种防御机制变得至关重要。生成对抗网络（GANs）是深度学习模型，已成为解决不断变化的安全问题的强大解决方案。这项调查研究了深度学习模型（尤其是GANs）在加强网络安全防御方面的重要性。我们的调查旨在探讨GANs在各个领域中的作用，如入侵检测系统（IDS）、移动和网络侵入、僵尸网络检测和恶意软件检测等。重点是研究如何利用GANs作为加强这些领域网络安全防御的有力工具。此外，本文还讨论了在这些领域使用GANs面临的挑战和限制，并提出了未来研究方向。总的来说，本文强调了GANs在增强网络安全措施方面的潜力，并指出有必要在这一领域进一步探索。

更新时间: 2024-07-11 19:51:48

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08839v1

Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation

Deep learning (DL) has emerged as a crucial tool in network anomaly detection (NAD) for cybersecurity. While DL models for anomaly detection excel at extracting features and learning patterns from data, they are vulnerable to data contamination -- the inadvertent inclusion of attack-related data in training sets presumed benign. This study evaluates the robustness of six unsupervised DL algorithms against data contamination using our proposed evaluation protocol. Results demonstrate significant performance degradation in state-of-the-art anomaly detection algorithms when exposed to contaminated data, highlighting the critical need for self-protection mechanisms in DL-based NAD models. To mitigate this vulnerability, we propose an enhanced auto-encoder with a constrained latent representation, allowing normal data to cluster more densely around a learnable center in the latent space. Our evaluation reveals that this approach exhibits improved resistance to data contamination compared to existing methods, offering a promising direction for more robust NAD systems.

Updated: 2024-07-11 19:47:37

标题: 深度学习在数据污染下的网络异常检测：评估鲁棒性并减轻性能下降

摘要: 深度学习（DL）已经成为网络异常检测（NAD）在网络安全中的关键工具。尽管用于异常检测的DL模型擅长从数据中提取特征和学习模式，但它们容易受到数据污染的影响——即在被认为是良性的训练集中无意中包含与攻击相关的数据。本研究评估了六种无监督DL算法在数据污染下的稳健性，使用我们提出的评估协议。结果表明，在暴露于受污染数据时，最先进的异常检测算法表现出显著的性能下降，突出了DL基础的NAD模型中自我保护机制的重要性。为了减轻这种脆弱性，我们提出了一种增强的自动编码器，具有受限潜在表示，使正常数据在潜在空间中更密集地聚集在一个可学习的中心周围。我们的评估表明，与现有方法相比，这种方法对数据污染表现出更好的抵抗力，为更加稳健的NAD系统提供了一个有前景的方向。

更新时间: 2024-07-11 19:47:37

领域: cs.LG,cs.CR,cs.NI

下载: http://arxiv.org/abs/2407.08838v1

Fault Diagnosis in Power Grids with Large Language Model

Power grid fault diagnosis is a critical task for ensuring the reliability and stability of electrical infrastructure. Traditional diagnostic systems often struggle with the complexity and variability of power grid data. This paper proposes a novel approach that leverages Large Language Models (LLMs), specifically ChatGPT and GPT-4, combined with advanced prompt engineering to enhance fault diagnosis accuracy and explainability. We designed comprehensive, context-aware prompts to guide the LLMs in interpreting complex data and providing detailed, actionable insights. Our method was evaluated against baseline techniques, including standard prompting, Chain-of-Thought (CoT), and Tree-of-Thought (ToT) methods, using a newly constructed dataset comprising real-time sensor data, historical fault records, and component descriptions. Experimental results demonstrate significant improvements in diagnostic accuracy, explainability quality, response coherence, and contextual understanding, underscoring the effectiveness of our approach. These findings suggest that prompt-engineered LLMs offer a promising solution for robust and reliable power grid fault diagnosis.

Updated: 2024-07-11 19:44:18

标题: 具有大型语言模型的电网故障诊断

摘要: 电网故障诊断是确保电力基础设施可靠性和稳定性的关键任务。传统的诊断系统往往难以应对电网数据的复杂性和变异性。本文提出了一种新颖的方法，利用大型语言模型（LLMs），特别是ChatGPT和GPT-4，结合先进的提示工程来提高故障诊断的准确性和可解释性。我们设计了全面的、上下文感知的提示，以指导LLMs解释复杂数据并提供详细可操作的见解。我们的方法与基准技术进行了评估，包括标准提示、思维链（CoT）和思维树（ToT）方法，使用一个新构建的数据集，包括实时传感器数据、历史故障记录和组件描述。实验结果表明，我们的方法在诊断准确性、可解释性质量、响应连贯性和上下文理解方面实现了显著改进，突显了我们方法的有效性。这些发现表明，提示工程化的LLMs为强大可靠的电网故障诊断提供了一个有希望的解决方案。

更新时间: 2024-07-11 19:44:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08836v1

Unraveling overoptimism and publication bias in ML-driven science

Machine Learning (ML) is increasingly used across many disciplines with impressive reported results. However, recent studies suggest published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy should improve or remain stable with increasing sample size. This paper investigates factors contributing to overoptimism in ML-driven science, focusing on overfitting and publication bias. We introduce a novel stochastic model for observed accuracy, integrating parametric learning curves and the aforementioned biases. We construct an estimator that corrects for these biases in observed data. Theoretical and empirical results show that our framework can estimate the underlying learning curve, providing realistic performance assessments from published results. Applying the model to meta-analyses of classifications of neurological conditions, we estimate the inherent limits of ML-based prediction in each domain.

Updated: 2024-07-11 19:40:20

标题: 揭示机器学习驱动科学中的过度乐观和出版偏见

摘要: 机器学习（ML）越来越广泛地应用于许多学科，并取得了令人印象深刻的成果。然而，最近的研究表明，已发表的ML模型的性能往往过于乐观。发表的ML模型的准确性与样本量之间存在逆向关系的发现强调了有效性的担忧，这与学习曲线理论相矛盾，后者认为准确性应随着样本量增加而改善或保持稳定。本文研究了导致ML驱动科学中过度乐观的因素，重点关注过拟合和发表偏见。我们介绍了一个新颖的随机模型，用于观察准确性，将参数学习曲线和上述偏见整合在一起。我们构建了一个校正观察数据中这些偏见的估计器。理论和实证结果表明，我们的框架可以估计潜在的学习曲线，从已发表的结果中提供现实的性能评估。将该模型应用于神经疾病分类的元分析中，我们估计了每个领域基于ML的预测的固有限制。

更新时间: 2024-07-11 19:40:20

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.14422v3

Neural Networks Meet Elliptic Curve Cryptography: A Novel Approach to Secure Communication

In recent years, neural networks have been used to implement symmetric cryptographic functions for secure communications. Extending this domain, the proposed approach explores the application of asymmetric cryptography within a neural network framework to safeguard the exchange between two communicating entities, i.e., Alice and Bob, from an adversarial eavesdropper, i.e., Eve. It employs a set of five distinct cryptographic keys to examine the efficacy and robustness of communication security against eavesdropping attempts using the principles of elliptic curve cryptography. The experimental setup reveals that Alice and Bob achieve secure communication with negligible variation in security effectiveness across different curves. It is also designed to evaluate cryptographic resilience. Specifically, the loss metrics for Bob oscillate between 0 and 1 during encryption-decryption processes, indicating successful message comprehension post-encryption by Alice. The potential vulnerability with a decryption accuracy exceeds 60\%, where Eve experiences enhanced adversarial training, receiving twice the training iterations per batch compared to Alice and Bob.

Updated: 2024-07-11 19:34:16

标题: 神经网络遇上椭圆曲线密码学：一种新的安全通信方法

摘要: 近年来，神经网络已被用于实现对称加密功能以实现安全通信。在这一领域的拓展中，提出的方法探索了在神经网络框架内应用非对称加密来保护两个通信实体（即Alice和Bob）之间的交流免受对手窃听者（即Eve）的影响。它利用一组五个不同的加密密钥来检验使用椭圆曲线密码学原则抵御窃听尝试的通信安全性的有效性和稳健性。实验设置显示，Alice和Bob在不同曲线下的安全通信效果几乎没有变化。它还设计用于评估密码弹性。具体而言，Bob的损失指标在加密解密过程中在0和1之间波动，表明Alice在加密后成功理解消息。解密准确性超过60\%的潜在漏洞，Eve经历了增强的对抗训练，每批次接受的训练迭代次数是Alice和Bob的两倍。

更新时间: 2024-07-11 19:34:16

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.08831v1

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap}\footnotemark[3] by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8k. Since problems with logical flaws are quite rare in the real world, these represent ``unseen'' cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. We find that LLMs' performance can be \textbf{passively} improved through the above external intervention. Overall, systematic compositionality remains an open challenge for large language models.

Updated: 2024-07-11 19:26:18

标题: 探索大型语言模型在数学推理中的组成缺陷

摘要: 人类认知表现出系统性组合性，即从有限学习组件生成无限新组合的代数能力，这是理解和推理复杂逻辑的关键。在这项工作中，我们研究了大型语言模型（LLMs）在数学推理中的组合性。具体而言，我们通过在MATH和GSM8k的问题描述中引入精心设计的逻辑陷阱，构建了一个新数据集\textsc{MathTrap}。由于现实世界中存在逻辑缺陷的问题非常少见，这些问题对LLMs来说代表了“未见过”的情况。解决这些问题要求模型系统地组合（1）原始问题中涉及的数学知识和（2）与引入的陷阱相关的知识。我们的实验证明，虽然LLMs拥有所需知识的两个组成部分，但它们并不会“自发地”将它们组合起来处理这些新情况。我们探讨了几种方法来缓解这一不足，例如自然语言提示、少量示范和微调。我们发现LLMs的性能可以通过上述外部干预进行“被动”改善。总体而言，系统性组合性仍然是大型语言模型面临的一个挑战。

更新时间: 2024-07-11 19:26:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06680v2

Proving that Cryptic Crossword Clue Answers are Correct

Cryptic crossword clues are challenging cognitive tasks, for which new test sets are released on a daily basis by multiple international newspapers. Each cryptic clue contains both the definition of the answer to be placed in the crossword grid (in common with regular crosswords), and `wordplay' that proves that the answer is correct (i.e. a human solver can be confident that an answer is correct without needing crossing words to confirm it). Using an existing cryptic wordplay proving framework (operating on Python proofs created by an LLM), we show that it is possible to distinguish between correct answers and almost-correct ones based upon whether the wordplay `works'.

Updated: 2024-07-11 19:13:16

标题: 证明谜题填字游戏提示答案是正确的

摘要: 密码填字游戏提示是具有挑战性的认知任务，多个国际报纸每天发布新的测试集。每个密码提示包含要放置在填字网格中的答案的定义（与普通填字相同），以及证明答案正确的“字谜游戏”（即，人类解谜者可以自信地知道答案正确，而无需交叉字确认）。使用现有的密码字谜游戏证明框架（在由LLM创建的Python证明上运行），我们展示了可以根据字谜游戏是否“有效”来区分正确答案和几乎正确答案之间的区别。

更新时间: 2024-07-11 19:13:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08824v1

FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging

For medical imaging AI models to be clinically impactful, they must generalize. However, this goal is hindered by (i) diverse types of distribution shifts, such as temporal, demographic, and label shifts, and (ii) limited diversity in datasets that are siloed within single medical institutions. While these limitations have spurred interest in federated learning, current evaluation benchmarks fail to evaluate different shifts simultaneously. However, in real healthcare settings, multiple types of shifts co-exist, yet their impact on medical imaging performance remains unstudied. In response, we introduce FedMedICL, a unified framework and benchmark to holistically evaluate federated medical imaging challenges, simultaneously capturing label, demographic, and temporal distribution shifts. We comprehensively evaluate several popular methods on six diverse medical imaging datasets (totaling 550 GPU hours). Furthermore, we use FedMedICL to simulate COVID-19 propagation across hospitals and evaluate whether methods can adapt to pandemic changes in disease prevalence. We find that a simple batch balancing technique surpasses advanced methods in average performance across FedMedICL experiments. This finding questions the applicability of results from previous, narrow benchmarks in real-world medical settings.

Updated: 2024-07-11 19:12:23

标题: FedMedICL：面向联邦医学成像中分布变化的整体评估

摘要: 为了使医学影像人工智能模型具有临床影响力，它们必须具有泛化能力。然而，这一目标受到多种类型的分布转移的阻碍，例如时间、人口统计和标签变化，以及单一医疗机构中数据集的有限多样性。虽然这些限制已经激发了对联邦学习的兴趣，但目前的评估基准未能同时评估不同的转移。然而，在真实的医疗环境中，多种类型的转移同时存在，然而它们对医学影像性能的影响尚未被研究。为此，我们引入了FedMedICL，一个统一的框架和基准，以全面评估联邦医学影像挑战，同时捕捉标签、人口统计和时间分布的变化。我们全面评估了六个不同的医学影像数据集上的几种流行方法（总计550个GPU小时）。此外，我们使用FedMedICL模拟COVID-19在医院间的传播，评估方法是否能够适应流行病变化对疾病患病率的影响。我们发现一个简单的批量平衡技术在FedMedICL实验中的平均表现超过了先进的方法。这一发现对以前狭窄基准的结果在现实医疗环境中的适用性提出了质疑。

更新时间: 2024-07-11 19:12:23

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.08822v1

SoK: What don't we know? Understanding Security Vulnerabilities in SNARKs

Zero-knowledge proofs (ZKPs) have evolved from being a theoretical concept providing privacy and verifiability to having practical, real-world implementations, with SNARKs (Succinct Non-Interactive Argument of Knowledge) emerging as one of the most significant innovations. Prior work has mainly focused on designing more efficient SNARK systems and providing security proofs for them. Many think of SNARKs as "just math," implying that what is proven to be correct and secure is correct in practice. In contrast, this paper focuses on assessing end-to-end security properties of real-life SNARK implementations. We start by building foundations with a system model and by establishing threat models and defining adversarial roles for systems that use SNARKs. Our study encompasses an extensive analysis of 141 actual vulnerabilities in SNARK implementations, providing a detailed taxonomy to aid developers and security researchers in understanding the security threats in systems employing SNARKs. Finally, we evaluate existing defense mechanisms and offer recommendations for enhancing the security of SNARK-based systems, paving the way for more robust and reliable implementations in the future.

Updated: 2024-07-11 19:11:14

标题: SoK：我们不了解什么？理解SNARKs中的安全漏洞

摘要: 零知识证明（ZKPs）已经从提供隐私和可验证性的理论概念发展为具有实际、现实世界实现的技术，其中SNARKs（简洁非交互式知识论证）成为最重要的创新之一。先前的工作主要集中在设计更高效的SNARK系统，并为它们提供安全证明。许多人认为SNARKs只是“数学”，暗示着在实践中被证明正确和安全的内容也是正确的。相比之下，本文关注评估现实生活中SNARK实现的端到端安全性质。我们首先通过建立系统模型和建立威胁模型以及为使用SNARKs的系统定义对抗角色来奠定基础。我们的研究涵盖了对141个实际SNARK实现中的漏洞进行广泛分析，提供了详细的分类法，以帮助开发人员和安全研究人员了解使用SNARKs的系统中存在的安全威胁。最后，我们评估现有的防御机制，并提出增强基于SNARK的系统安全性的建议，为未来更加强大和可靠的实现铺平道路。

更新时间: 2024-07-11 19:11:14

领域: cs.CR

下载: http://arxiv.org/abs/2402.15293v4

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., different retinal imaging modalities) for patient diagnosis. This paper presents FairDomain, a pioneering systemic study into algorithmic fairness under domain shifts, employing state-of-the-art domain adaptation (DA) and generalization (DG) algorithms for both medical segmentation and classification tasks to understand how biases are transferred between different domains. We also introduce a novel plug-and-play fair identity attention (FIA) module that adapts to various DA and DG algorithms to improve fairness by using self-attention to adjust feature importance based on demographic attributes. Additionally, we curate the first fairness-focused dataset with two paired imaging modalities for the same patient cohort on medical segmentation and classification tasks, to rigorously assess fairness in domain-shift scenarios. Excluding the confounding impact of demographic distribution variation between source and target domains will allow clearer quantification of the performance of domain transfer models. Our extensive evaluations reveal that the proposed FIA significantly enhances both model performance accounted for fairness across all domain shift settings (i.e., DA and DG) with respect to different demographics, which outperforms existing methods on both segmentation and classification. The code and data can be accessed at https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k.

Updated: 2024-07-11 18:52:32

标题: FairDomain：实现跨领域医学图像分割和分类的公平性

摘要: 在人工智能（AI）中解决公平性问题，特别是在医疗AI中，对确保公平的医疗结果至关重要。最近的努力致力于增强公平性，在医疗AI中引入了新的方法和数据集。然而，在领域转移设置下的公平性问题几乎没有被探究，而临床常常依赖于不同的成像技术（例如，不同的视网膜成像模式）进行患者诊断。本文介绍了FairDomain，这是对算法公平性在领域转移下的系统研究，采用最先进的领域自适应（DA）和泛化（DG）算法，用于医学分割和分类任务，以了解偏见如何在不同领域之间传递。我们还引入了一种新颖的即插即用的公平身份注意（FIA）模块，适应各种DA和DG算法，通过自注意力根据人口属性调整特征重要性，从而提高公平性。此外，我们为同一患者队列的医学分割和分类任务策划了第一个关注公平性的数据集，以严格评估领域转移场景中的公平性。排除源领域和目标领域之间人口分布变化的混淆影响将使领域转移模型的性能更清晰。我们的广泛评估显示，所提出的FIA显著提高了在所有领域转移设置（即DA和DG）中不同人口属性的公平性考虑的模型性能，优于现有方法在分割和分类方面的表现。代码和数据可在https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k访问。

更新时间: 2024-07-11 18:52:32

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.08813v1

HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks

Gradient-based attacks are a primary tool to evaluate robustness of machine-learning models. However, many attacks tend to provide overly-optimistic evaluations as they use fixed loss functions, optimizers, step-size schedulers, and default hyperparameters. In this work, we tackle these limitations by proposing a parametric variation of the well-known fast minimum-norm attack algorithm, whose loss, optimizer, step-size scheduler, and hyperparameters can be dynamically adjusted. We re-evaluate 12 robust models, showing that our attack finds smaller adversarial perturbations without requiring any additional tuning. This also enables reporting adversarial robustness as a function of the perturbation budget, providing a more complete evaluation than that offered by fixed-budget attacks, while remaining efficient. We release our open-source code at https://github.com/pralab/HO-FMN.

Updated: 2024-07-11 18:30:01

标题: HO-FMN: 快速最小范数攻击的超参数优化

摘要: 基于梯度的攻击是评估机器学习模型稳健性的主要工具。然而，许多攻击往往提供过于乐观的评估，因为它们使用固定的损失函数、优化器、步长调度器和默认超参数。在这项工作中，我们通过提出一个参数化变体的著名的快速最小范数攻击算法来解决这些限制，该算法的损失、优化器、步长调度器和超参数可以动态调整。我们重新评估了12个稳健模型，显示我们的攻击找到了更小的对抗性扰动，而无需任何额外调整。这也使得能够将对抗性稳健性作为扰动预算的函数进行报告，提供比固定预算攻击更完整的评估，同时保持高效。我们在https://github.com/pralab/HO-FMN上发布了我们的开源代码。

更新时间: 2024-07-11 18:30:01

领域: cs.LG

下载: http://arxiv.org/abs/2407.08806v1

Mitigating Group Bias in Federated Learning for Heterogeneous Devices

Federated Learning is emerging as a privacy-preserving model training approach in distributed edge applications. As such, most edge deployments are heterogeneous in nature i.e., their sensing capabilities and environments vary across deployments. This edge heterogeneity violates the independence and identical distribution (IID) property of local data across clients and produces biased global models i.e. models that contribute to unfair decision-making and discrimination against a particular community or a group. Existing bias mitigation techniques only focus on bias generated from label heterogeneity in non-IID data without accounting for domain variations due to feature heterogeneity and do not address global group-fairness property. Our work proposes a group-fair FL framework that minimizes group-bias while preserving privacy and without resource utilization overhead. Our main idea is to leverage average conditional probabilities to compute a cross-domain group \textit{importance weights} derived from heterogeneous training data to optimize the performance of the worst-performing group using a modified multiplicative weights update method. Additionally, we propose regularization techniques to minimize the difference between the worst and best-performing groups while making sure through our thresholding mechanism to strike a balance between bias reduction and group performance degradation. Our evaluation of human emotion recognition and image classification benchmarks assesses the fair decision-making of our framework in real-world heterogeneous settings.

Updated: 2024-07-11 18:25:51

标题: 减轻异构设备上联邦学习中的群体偏见

摘要: 联邦学习作为一种隐私保护的模型训练方法，在分布式边缘应用中逐渐兴起。因此，大多数边缘部署是异构的，即它们的感知能力和环境在部署中有所不同。这种边缘异构性违反了客户端之间本地数据的独立性和相同分布（IID）属性，并产生了偏倚的全局模型，即导致不公平决策和对特定社区或群体的歧视的模型。现有的偏差缓解技术只专注于非IID数据中由标签异构性产生的偏差，而不考虑由特征异构性产生的领域变化，并且没有解决全局群体公平性属性。我们的工作提出了一个群体公平的FL框架，可以在保护隐私且不增加资源利用开销的情况下最小化群体偏差。我们的主要思想是利用平均条件概率来计算由异构训练数据派生的跨领域群体\textit{重要性权重}，以使用修改后的乘性权重更新方法优化表现最差的群体的性能。此外，我们提出正则化技术，以最小化最差和最佳表现群体之间的差异，同时通过我们的阈值机制确保在减少偏差和群体性能下降之间取得平衡。我们在人类情绪识别和图像分类基准上对我们的框架进行了评估，评估了我们的框架在真实世界异构环境中的公平决策能力。

更新时间: 2024-07-11 18:25:51

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2309.07085v2

PID Accelerated Temporal Difference Algorithms

Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting in which only samples from the environment are available. We give theoretical analysis of their convergence and acceleration compared to their traditional counterparts. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.

Updated: 2024-07-11 18:23:46

标题: PID加速时间差分算法

摘要: 长期任务，具有较大折现因子，对大多数传统的强化学习（RL）算法构成挑战。像价值迭代和时间差异（TD）学习这样的算法收敛速度较慢，在这些任务中变得低效。当转移分布已知时，最近引入了PID VI算法，以加速使用控制理论思想的价值迭代的收敛过程。受此启发，我们引入了PID TD学习和PID Q学习算法，用于仅可获得环境样本的RL设置。我们对其收敛性和加速性进行了理论分析，与传统方法进行了比较。我们还介绍了一种在存在噪声的情况下调整PID增益的方法，并在实证验证了其有效性。

更新时间: 2024-07-11 18:23:46

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2407.08803v1

Challenges in Mechanistically Interpreting Model Representations

Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn. Most works in MI so far have studied behaviors and capabilities that are trivial and token-aligned. However, most capabilities important for safety and trust are not that trivial, which advocates for the study of hidden representations inside these networks as the unit of analysis. We formalize representations for features and behaviors, highlight their importance and evaluation, and perform an exploratory study of dishonesty representations in `Mistral-7B-Instruct-v0.1'. We justify that studying representations is an important and under-studied field, and highlight several challenges that arise while attempting to do so through currently established methods in MI, showing their insufficiency and advocating work on new frameworks for the same.

Updated: 2024-07-11 18:21:59

标题: 在解释模型表示的机理方面面临的挑战

摘要: 机制可解释性（MI）旨在通过逆向工程精确算法来理解AI模型学习的内容。迄今为止，MI领域的大多数研究都集中在研究行为和能力，这些行为和能力是琐碎且标记对齐的。然而，对于安全性和信任度至关重要的大多数能力并不那么琐碎，这促使我们研究这些网络内部的隐藏表示作为分析的单位。我们为特征和行为形式化表示，强调它们的重要性和评估，并对“Mistral-7B-Instruct-v0.1”中的不诚实行为进行了探索性研究。我们证明研究表示是一个重要且未被充分研究的领域，并强调在尝试通过当前建立的MI方法来进行研究时出现的几个挑战，显示它们的不足，并倡导开展针对同一问题的新框架的工作。

更新时间: 2024-07-11 18:21:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.03855v2

Local Clustering for Lung Cancer Image Classification via Sparse Solution Technique

In this work, we propose to use a local clustering approach based on the sparse solution technique to study the medical image, especially the lung cancer image classification task. We view images as the vertices in a weighted graph and the similarity between a pair of images as the edges in the graph. The vertices within the same cluster can be assumed to share similar features and properties, thus making the applications of graph clustering techniques very useful for image classification. Recently, the approach based on the sparse solutions of linear systems for graph clustering has been found to identify clusters more efficiently than traditional clustering methods such as spectral clustering. We propose to use the two newly developed local clustering methods based on sparse solution of linear system for image classification. In addition, we employ a box spline-based tight-wavelet-framelet method to clean these images and help build a better adjacency matrix before clustering. The performance of our methods is shown to be very effective in classifying images. Our approach is significantly more efficient and either favorable or equally effective compared with other state-of-the-art approaches. Finally, we shall make a remark by pointing out two image deformation methods to build up more artificial image data to increase the number of labeled images.

Updated: 2024-07-11 18:18:32

标题: 通过稀疏解决方案技术进行肺癌图像分类的本地聚类

摘要: 在这项工作中，我们提出使用基于稀疏解技术的局部聚类方法来研究医学图像，特别是肺癌图像分类任务。我们将图像视为加权图中的顶点，将一对图像之间的相似性视为图中的边。在同一聚类中的顶点可以假定具有相似的特征和属性，因此图聚类技术的应用对图像分类非常有用。最近，基于图聚类的稀疏线性系统解法被发现比传统的聚类方法（如谱聚类）更有效地识别聚类。我们提出使用基于稀疏线性系统解法的两种新的局部聚类方法进行图像分类。此外，我们采用基于盒样条的紧波尔特-框架方法清洁这些图像，并帮助在聚类之前构建更好的邻接矩阵。我们的方法在图像分类中表现出非常有效的性能。与其他最先进的方法相比，我们的方法明显更高效，要么更具有优势，要么同样有效。最后，我们将指出两种图像变形方法，以建立更多人工图像数据来增加已标记图像的数量。

更新时间: 2024-07-11 18:18:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08800v1

Deep Inverse Design for High-Level Synthesis

High-level synthesis (HLS) has significantly advanced the automation of digital circuits design, yet the need for expertise and time in pragma tuning remains challenging. Existing solutions for the design space exploration (DSE) adopt either heuristic methods, lacking essential information for further optimization potential, or predictive models, missing sufficient generalization due to the time-consuming nature of HLS and the exponential growth of the design space. To address these challenges, we propose Deep Inverse Design for HLS (DID4HLS), a novel approach that integrates graph neural networks and generative models. DID4HLS iteratively optimizes hardware designs aimed at compute-intensive algorithms by learning conditional distributions of design features from post-HLS data. Compared to four state-of-the-art DSE baselines, our method achieved an average improvement of 42.5% on average distance to reference set (ADRS) compared to the best-performing baselines across six benchmarks, while demonstrating high robustness and efficiency.

Updated: 2024-07-11 18:13:38

标题: 深度反向设计用于高级综合

摘要: 高级综合（HLS）显著推进了数字电路设计的自动化，但在编译指令调整方面仍然需要专业知识和时间，这仍然具有挑战性。现有的设计空间探索（DSE）解决方案采用启发式方法或预测模型，但由于HLS的耗时性质和设计空间的指数增长，缺乏进一步优化潜力的基本信息或缺乏足够泛化能力。为了解决这些挑战，我们提出了面向HLS的深度逆向设计（DID4HLS）方法，该方法整合了图神经网络和生成模型。DID4HLS通过从HLS后的数据中学习设计特征的条件分布，迭代优化旨在计算密集型算法的硬件设计。与四种最先进的设计空间探索（DSE）基线相比，我们的方法在六个基准测试中相对于表现最佳的基线，平均距离到参考集（ADRS）平均改进了42.5％，同时表现出高鲁棒性和效率。

更新时间: 2024-07-11 18:13:38

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2407.08797v1

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed at enhancing the potential of multiple text prompts for matching associated pixel embeddings. We first propose Multi-Prompts Sinkhorn (MPS) based on the Optimal Transport (OT) algorithm, which leads multiple text prompts to selectively focus on various semantic features within image pixels. Moreover, inspired by the success of Sinkformers in unimodal settings, we introduce the extension of MPS, called Multi-Prompts Sinkhorn Attention (MPSA) , which effectively replaces cross-attention mechanisms within Transformer framework in multimodal settings. Through extensive experiments, we demonstrate that OTSeg achieves state-of-the-art (SOTA) performance with significant gains on Zero-Shot Semantic Segmentation (ZS3) tasks across three benchmark datasets.

Updated: 2024-07-11 18:09:48

标题: OTSeg: 多提示Sinkhorn注意力用于零样本语义分割

摘要: 最近CLIP的成功证明了通过将多模态知识转移到像素级分类，零样本语义分割取得了有希望的结果。然而，利用预训练的CLIP知识来紧密对齐文本嵌入和像素嵌入在现有方法中仍然存在局限。为了解决这个问题，我们提出了OTSeg，这是一种新颖的多模态注意机制，旨在增强多个文本提示匹配相关像素嵌入的潜力。我们首先提出了基于最优传输（OT）算法的多提示Sinkhorn（MPS），这将多个文本提示引导到选择性地关注图像像素中的各种语义特征。此外，受单模态设置中Sinkformers成功的启发，我们引入了MPS的扩展，称为多提示Sinkhorn注意（MPSA），在多模态设置中有效地替换了Transformer框架中的交叉注意机制。通过大量实验证明，OTSeg在三个基准数据集上在零样本语义分割（ZS3）任务上取得了最先进的性能，获得了显著的收益。

更新时间: 2024-07-11 18:09:48

领域: cs.CV,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.14183v2

ProxyGPT: Enabling Anonymous Queries in AI Chatbots with (Un)Trustworthy Browser Proxies

AI-powered chatbots (ChatGPT, Claude, etc.) require users to create an account using their email and phone number, thereby linking their personally identifiable information to their conversational data and usage patterns. As these chatbots are increasingly being used for tasks involving sensitive information, privacy concerns have been raised about how chatbot providers handle user data. To address these concerns, we present ProxyGPT, a privacy-enhancing system that enables anonymous queries in popular chatbot platforms. ProxyGPT leverages volunteer proxies to submit user queries on their behalf, thus providing network-level anonymity for chatbot users. The system is designed to support key security properties such as content integrity via TLS-backed data provenance, end-to-end encryption, and anonymous payment, while also ensuring usability and sustainability. We provide a thorough analysis of the privacy, security, and integrity of our system and identify various future research directions, particularly in the area of private chatbot query synthesis. Our human evaluation shows that ProxyGPT offers users a greater sense of privacy compared to traditional AI chatbots, especially in scenarios where users are hesitant to share their identity with chatbot providers. Although our proof-of-concept has higher latency than popular chatbots, our human interview participants consider this to be an acceptable trade-off for anonymity. To the best of our knowledge, ProxyGPT is the first comprehensive proxy-based solution for privacy-preserving AI chatbots. Our codebase is available at https://github.com/dzungvpham/proxygpt.

Updated: 2024-07-11 18:08:04

标题: ProxyGPT：使用（不）可信的浏览器代理在AI聊天机器人中实现匿名查询

摘要: 基于AI的聊天机器人（ChatGPT、Claude等）要求用户使用其电子邮件和电话号码创建帐户，从而将其个人可识别信息与其对话数据和使用模式联系起来。随着这些聊天机器人越来越多地用于涉及敏感信息的任务，人们对聊天机器人提供商如何处理用户数据提出了隐私担忧。为了解决这些问题，我们提出了ProxyGPT，这是一个增强隐私的系统，可以在流行的聊天机器人平台上进行匿名查询。ProxyGPT利用志愿代理来代表用户提交查询，从而为聊天机器人用户提供网络级匿名性。该系统旨在支持通过TLS支持的数据来源的内容完整性、端到端加密和匿名支付等关键安全属性，同时确保可用性和可持续性。我们对系统的隐私、安全和完整性进行了彻底分析，并确定了各种未来研究方向，特别是在私人聊天机器人查询合成领域。我们的人类评估显示，与传统的AI聊天机器人相比，ProxyGPT为用户提供了更大的隐私感，特别是在用户不愿与聊天机器人提供商分享其身份的场景中。尽管我们的概念验证比流行的聊天机器人具有更高的延迟，但我们的人类访谈参与者认为，这是匿名性的可接受折衷。据我们所知，ProxyGPT是第一个全面的基于代理的隐私保护AI聊天机器人解决方案。我们的代码库可在https://github.com/dzungvpham/proxygpt上找到。

更新时间: 2024-07-11 18:08:04

领域: cs.CR

下载: http://arxiv.org/abs/2407.08792v1

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Large language models (LLMs) have made significant advancements in natural language understanding. However, through that enormous semantic representation that the LLM has learnt, is it somehow possible for it to understand images as well? This work investigates this question. To enable the LLM to process images, we convert them into a representation given by Scalable Vector Graphics (SVG). To study what the LLM can do with this XML-based textual description of images, we test the LLM on three broad computer vision tasks: (i) visual reasoning and question answering, (ii) image classification under distribution shift, few-shot learning, and (iii) generating new images using visual prompting. Even though we do not naturally associate LLMs with any visual understanding capabilities, our results indicate that the LLM can often do a decent job in many of these tasks, potentially opening new avenues for research into LLMs' ability to understand image data. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm.

Updated: 2024-07-11 17:59:53

标题: 利用大型语言模型进行可伸缩的矢量图形驱动图像理解

摘要: 大型语言模型（LLMs）在自然语言理解方面取得了重大进展。然而，通过LLM学习的巨大语义表示，是否有可能让它也能理解图像呢？这项工作探讨了这个问题。为了让LLM能够处理图像，我们将它们转换为由可伸缩矢量图形（SVG）给出的表示。为了研究LLM可以如何处理这些基于XML的图像文本描述，我们在三个广泛的计算机视觉任务上测试了LLM：（i）视觉推理和问题回答，（ii）在分布转移、少样本学习下的图像分类，以及（iii）使用视觉提示生成新图像。尽管我们并不自然地将LLMs与任何视觉理解能力联系起来，但我们的结果表明，LLM在许多这些任务中通常能够做出体面的工作，可能为研究LLM理解图像数据的能力开辟新的研究途径。我们的代码、数据和模型可以在这里找到：https://github.com/mu-cai/svg-llm。

更新时间: 2024-07-11 17:59:53

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2306.06094v2

Anatomy-aware and acquisition-agnostic joint registration with SynthMorph

Affine image registration is a cornerstone of medical image analysis. While classical algorithms can achieve excellent accuracy, they solve a time-consuming optimization for every image pair. Deep-learning (DL) methods learn a function that maps an image pair to an output transform. Evaluating the function is fast, but capturing large transforms can be challenging, and networks tend to struggle if a test-image characteristic shifts from the training domain, such as resolution. Most affine methods are agnostic to the anatomy the user wishes to align, meaning the registration will be inaccurate if algorithms consider all structures in the image. We address these shortcomings with SynthMorph, a fast, symmetric, diffeomorphic, and easy-to-use DL tool for joint affine-deformable registration of any brain image without preprocessing. First, we leverage a strategy that trains networks with widely varying images synthesized from label maps, yielding robust performance for image types unseen at training. Second, we optimize the spatial overlap of select anatomical labels. This enables networks to distinguish anatomy of interest from irrelevant structures, removing the need for preprocessing that excludes content that may impinge on anatomy-specific registration. Third, we combine the affine model with a deformable hypernetwork that lets users choose the optimal deformation-field regularity for their specific data, at registration time, in a fraction of the time required by classical methods. We analyze how competing architectures learn affine transforms and compare state-of-the-art registration tools across an extremely diverse set of neuroimaging data, aiming to truly capture the behavior of methods in the real world. SynthMorph demonstrates high accuracy and is available at https://w3id.org/synthmorph, as a single complete end-to-end solution for registration of brain MRI.

Updated: 2024-07-11 17:59:50

标题: 解剖学感知和获取无关的SynthMorph联合注册

摘要: 仿射图像配准是医学图像分析的基石。虽然经典算法可以实现出色的准确性，但它们需要为每对图像进行耗时的优化。深度学习（DL）方法学习一个将图像对映射到输出变换的函数。评估该函数速度快，但捕捉大的变换可能具有挑战性，而且网络往往会在测试图像特征从训练域中移出，如分辨率时表现不佳。大多数仿射方法对用户希望对齐的解剖结构是不可知的，这意味着如果算法考虑图像中的所有结构，则配准将不准确。我们通过SynthMorph来解决这些缺点，这是一个快速、对称、可微分且易于使用的DL工具，用于对任何脑图像进行联合仿射-可变形配准，无需预处理。首先，我们利用一种策略，从标签映射中合成各种图像来训练网络，以获得对训练中未见图像类型的稳健性能。其次，我们优化选择解剖标签的空间重叠。这使得网络能区分感兴趣的解剖结构和无关结构，消除了排除可能影响特定解剖结构配准的内容的预处理的需求。第三，我们将仿射模型与可变形超网络相结合，让用户在注册时选择适合其特定数据的最佳变形场正则性，而且仅需传统方法所需时间的一小部分。我们分析了竞争架构如何学习仿射变换，并比较了在极其多样化的神经影像数据集上的最新配准工具，旨在真正捕捉方法在实际世界中的行为。SynthMorph显示出高准确性，并可在https://w3id.org/synthmorph上获得，作为一种用于脑MRI配准的单一完整端到端解决方案。

更新时间: 2024-07-11 17:59:50

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2301.11329v4

Video Diffusion Alignment via Reward Gradients

We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we utilize pre-trained reward models that are learned via preferences on top of powerful vision discriminative models to adapt video diffusion models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to efficient learning in complex search spaces, such as videos. We show that backpropagating gradients from these reward models to a video diffusion model can allow for compute and sample efficient alignment of the video diffusion model. We show results across a variety of reward models and video diffusion models, demonstrating that our approach can learn much more efficiently in terms of reward queries and computation than prior gradient-free approaches. Our code, model weights,and more visualization are available at https://vader-vid.github.io.

Updated: 2024-07-11 17:59:45

标题: 视频扩散对齐通过奖励梯度

摘要: 我们已经在构建基础视频扩散模型方面取得了显著进展。由于这些模型是使用大规模无监督数据进行训练的，因此将这些模型适应特定的下游任务变得至关重要。通过监督微调来调整这些模型需要收集视频目标数据集，这是具有挑战性和繁琐的。在这项工作中，我们利用预先训练的奖励模型，该模型是通过对强大的视觉判别模型的首选项进行学习来调整视频扩散模型。这些模型包含与生成的RGB像素相关的密集梯度信息，这对于在复杂的搜索空间（如视频）中进行高效学习至关重要。我们展示了从这些奖励模型向视频扩散模型反向传播梯度可以实现视频扩散模型的计算和采样高效对齐。我们展示了在各种奖励模型和视频扩散模型中的结果，表明我们的方法在奖励查询和计算方面可以比以前的无梯度方法学习得更加高效。我们的代码、模型权重和更多可视化内容可在https://vader-vid.github.io上找到。

更新时间: 2024-07-11 17:59:45

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.08737v1

Real-Time Anomaly Detection and Reactive Planning with Large Language Models

Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology towards detecting and mitigating out-of-distribution failure modes of robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work, we present a two-stage reasoning framework: First is a fast binary anomaly classifier that analyzes observations in an LLM embedding space, which may then trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans to account for the slow reasoner's latency as soon as an anomaly is detected, thus ensuring safety. We show that our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models, even when instantiated with relatively small language models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles, under resource and time constraints. Videos illustrating our approach in both simulation and real-world experiments are available on this project page: https://sites.google.com/view/aesop-llm.

Updated: 2024-07-11 17:59:22

标题: 实时异常检测和大型语言模型的反应式规划

摘要: 基于互联网规模数据训练的基础模型，例如大型语言模型(LLMs)，具有零次泛化能力，这使它们成为检测和缓解机器人系统的超出分布故障模式的有前途的技术。然而，充分实现这一承诺面临两个挑战：(i)减少这些模型的计算成本，使它们可以在线应用；(ii)将它们对潜在异常的判断纳入安全控制框架。在这项工作中，我们提出了一个两阶段推理框架：首先是一个快速的二进制异常分类器，分析LLM嵌入空间中的观察结果，然后触发一个较慢的备用选择阶段，利用生成LLM的推理能力。这些阶段对应于模型预测控制策略中的分支点，该策略维持了沿着各种备用计划继续的联合可行性，以考虑慢速推理者的延迟，从而确保安全。我们展示了我们的快速异常分类器优于基于最先进的GPT模型的自回归推理，即使使用相对较小的语言模型实例化。这使得我们的运行时监视器能够在资源和时间约束下提高动态机器人系统(如四轴飞行器或自动驾驶车辆)的可信度。演示我们方法在模拟和现实实验中的视频可在此项目页面上找到：https://sites.google.com/view/aesop-llm。

更新时间: 2024-07-11 17:59:22

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.08735v1

Transformer Circuit Faithfulness Metrics are not Robust

Mechanistic interpretability work attempts to reverse engineer the learned algorithms present inside neural networks. One focus of this work has been to discover 'circuits' -- subgraphs of the full model that explain behaviour on specific tasks. But how do we measure the performance of such circuits? Prior work has attempted to measure circuit 'faithfulness' -- the degree to which the circuit replicates the performance of the full model. In this work, we survey many considerations for designing experiments that measure circuit faithfulness by ablating portions of the model's computation. Concerningly, we find existing methods are highly sensitive to seemingly insignificant changes in the ablation methodology. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit - the task a circuit is required to perform depends on the ablation used to test it. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits. We open source a library at https://github.com/UFO-101/auto-circuit that includes highly efficient implementations of a wide range of ablation methodologies and circuit discovery algorithms.

Updated: 2024-07-11 17:59:00

标题: 变压器电路忠实度指标不够稳健

摘要: 机制解释性工作试图逆向工程神经网络内部学习算法。这项工作的一个重点是发现“电路” - 全模型的子图，可以解释特定任务上的行为。但是我们如何衡量这些电路的性能呢？之前的工作试图衡量电路的“忠实度” - 即电路复制全模型性能的程度。在这项工作中，我们调查了设计实验以通过消除模型计算部分来衡量电路忠实度的许多考虑因素。令人担忧的是，我们发现现有方法对消融方法中看似微不足道的变化非常敏感。我们得出结论，现有的电路忠实度评分既反映了研究人员的方法选择，也反映了电路的实际组件 - 电路需要执行的任务取决于用于测试它的消融。机制解释性工作的最终目标是理解神经网络，因此我们强调对电路的准确声明需要更多的清晰度。我们开源了一个库，网址为https://github.com/UFO-101/auto-circuit，其中包括高效的多种消融方法和电路发现算法的实现。

更新时间: 2024-07-11 17:59:00

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.08734v1

BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration

The goal of this paper is to address the problem of \textit{global} point cloud registration (PCR) i.e., finding the optimal alignment between point clouds irrespective of the initial poses of the scans. This problem is notoriously challenging for classical optimization methods due to computational constraints. First, we show that state-of-the-art deep learning methods suffer from huge performance degradation when the point clouds are arbitrarily placed in space. We propose that \textit{equivariant deep learning} should be utilized for solving this task and we characterize the specific type of bi-equivariance of PCR. Then, we design BiEquiformer a novel and scalable \textit{bi-equivariant} pipeline i.e. equivariant to the independent transformations of the input point clouds. While a naive approach would process the point clouds independently we design expressive bi-equivariant layers that fuse the information from both point clouds. This allows us to extract high-quality superpoint correspondences and in turn, robust point-cloud registration. Extensive comparisons against state-of-the-art methods show that our method achieves comparable performance in the canonical setting and superior performance in the robust setting in both the 3DMatch and the challenging low-overlap 3DLoMatch dataset.

Updated: 2024-07-11 17:58:10

标题: BiEquiFormer：全局点云配准的双等变表示

摘要: 本文的目标是解决全局点云配准（PCR）问题，即找到点云之间的最佳对齐方式，而不考虑扫描的初始姿势。由于计算约束，这个问题对于传统的优化方法来说非常具有挑战性。首先，我们展示了目前最先进的深度学习方法在点云被随机放置在空间中时性能急剧下降的情况。我们提出应该利用等变深度学习来解决这个问题，并对PCR的特定双等变性质进行了表征。然后，我们设计了BiEquiformer，一个新颖且可扩展的双等变流水线，即对输入点云的独立变换保持等变性。虽然一个朴素的方法会独立处理点云，但我们设计了表达丰富的双等变层，从而融合了来自两个点云的信息。这使我们能够提取高质量的超点对应关系，从而实现鲁棒的点云配准。与最先进的方法进行了广泛的比较，结果表明我们的方法在经典设置下具有可比的性能，并在3DMatch和具有挑战性的低重叠3DLoMatch数据集的鲁棒设置下表现优越。

更新时间: 2024-07-11 17:58:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08729v1

MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces

Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while diverse robot dogs and humanoids have recently emerged in the street. Ensuring the generalizability and safety of these forthcoming mobile machines is crucial when navigating through the bustling streets in urban spaces. In this work, we present MetaUrban, a compositional simulation platform for Embodied AI research in urban spaces. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for embodied AI research and establish various baselines of Reinforcement Learning and Imitation Learning. Experiments demonstrate that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide more research opportunities and foster safe and trustworthy embodied AI in urban spaces.

Updated: 2024-07-11 17:56:49

标题: MetaUrban：一个用于城市空间中具身AI的模拟平台

摘要: 公共城市空间，如街道景观和广场，为居民提供服务，容纳各种生活的社交活动。最近在机器人技术和具象人工智能方面取得了进展，使得公共城市空间不再仅限于人类。食品送货机器人和电动轮椅已经开始与行人共享人行道，同时各种机器狗和人形机器人最近也出现在街道上。在城市空间繁忙的街道上行驶时，确保这些即将到来的移动机器的泛化性和安全性至关重要。在这项工作中，我们提出了MetaUrban，这是一个用于城市空间具象人工智能研究的组合模拟平台。MetaUrban可以从组合元素构建无限数量的交互式城市场景，涵盖广泛的地面平面图、物体放置、行人、易受伤路上用户和其他移动代理的外观和动态。我们设计了点导航和社交导航任务作为使用MetaUrban进行具象人工智能研究的试点研究，并建立了各种强化学习和模仿学习的基线。实验表明，模拟环境的组合性质可以显著提高受过训练的移动代理的泛化性和安全性。MetaUrban将被公开提供，以提供更多研究机会，并促进城市空间中安全可信的具象人工智能。

更新时间: 2024-07-11 17:56:49

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.08725v1

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.

Updated: 2024-07-11 17:56:03

标题: 离散时间随机优化算法的拓扑概括界限

摘要: 我们提出了一组新颖严谨且计算效率高的基于拓扑的复杂性概念，这些概念与现代深度神经网络（DNNs）中的泛化差距呈强烈相关。DNNs表现出卓越的泛化性能，然而这些能力的来源仍然难以捉摸，挑战着已建立的统计学习理论。最近的研究表明，训练轨迹的性质可能预示着泛化能力。借鉴这一洞察力，最先进的方法已经利用这些轨迹的拓扑结构，特别是它们的分形维数，来量化泛化。大多数现有研究通过假设连续或无限时间的训练动态来计算这个数量，这使得开发出能够准确预测泛化而无需访问测试数据的实用估计器变得复杂。在本文中，我们尊重训练轨迹的离散时间特性，并研究可以适用于拓扑数据分析工具的潜在拓扑数量。这导致了一系列可靠的拓扑复杂性度量，可以明确界定泛化误差，消除了对限制性几何假设的需求。这些度量在计算上友好，使我们能够提出简单而有效的算法来计算泛化指数。此外，我们的灵活框架可以扩展到不同的领域、任务和架构。我们的实验结果表明，我们的新复杂性度量与行业标准架构（如变压器和深度图网络）中的泛化误差高度相关。我们的方法在各种数据集、模型和优化器上始终优于现有的拓扑界限，突显了我们复杂性度量的实际相关性和有效性。

更新时间: 2024-07-11 17:56:03

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2407.08723v1

High-Precision, Fair University Course Scheduling During a Pandemic

Scheduling university courses is extra challenging when classroom capacities are reduced because of social distancing requirements that are implemented in response to a pandemic such as COVID-19. In this work, we propose an expanded taxonomy of course delivery modes, present an integer program, and develop a course scheduling algorithm to enable all course sections -- even the largest -- to have a significant classroom learning component during a pandemic. Our approach is fair by ensuring that a certain fraction of the instruction in every course section occurs in the classroom. Unlike previous studies, we do not allow rotating attendance and instead require simultaneous attendance in which all students in a section meet in 1-5 rooms at the same time but less often than in a normal semester. These mass meetings, which create opportunities for in-person midterm exams and group activities, are scheduled at high precision across all days of the semester rather than a single, repeating week. A fast heuristic algorithm makes the schedule in an hour. Results: We consider the 1834 in-person course sections, 172 classrooms, and 96 days in the fall 2022 semester at [UniversityXYZ]. If average classroom capacity is reduced by 75% due to a pandemic, our approach still allows at least 25% of the instruction in every section, and more than 49% of all instruction across the entire campus, to be in the classroom. Our method also produces excellent results for regular classroom assignment. Managerial implications: An algorithm based on the principles of fairness and simultaneous attendance can significantly improve university course schedules during a pandemic and in normal times. High-precision schedules that prepare a campus for various pandemic possibilities can be created with minimal administrative effort and activated at a moment's notice before or during a semester if an outbreak occurs.

Updated: 2024-07-11 17:56:00

标题: 疫情期间高精度、公平的大学课程安排

摘要: 在面对像COVID-19这样的大流行病所实施的社交疏远要求导致教室容量减少时，安排大学课程变得更具挑战性。在这项工作中，我们提出了一个扩展的课程交付模式分类法，提出了一个整数规划，并开发了一个课程排课算法，以确保即使在大流行期间，所有课程部分（甚至是最大的部分）也能在教室中有显著的学习组成部分。我们的方法通过确保每个课程部分的一定比例的教学发生在教室内，是公平的。与先前的研究不同，我们不允许轮流上课，而是要求同时上课，即在同一时间，每个部分的所有学生在1-5个教室中见面，但频率不如正常学期那样频繁。这些大规模会议为面对面的期中考试和小组活动提供了机会，安排在整个学期的所有日子上，而不是在一个重复的周内。一个快速的启发式算法可以在一个小时内制定出时间表。结果：我们考虑了[UniversityXYZ]在2022年秋季学期的1834个面对面课程部分、172个教室和96天。如果因大流行而导致平均教室容量减少了75%，我们的方法仍然允许每个部分至少有25%的教学发生在教室内，整个校园超过49%的教学都在教室内进行。我们的方法还为常规教室分配产生了出色的结果。管理意义：基于公平和同时上课原则的算法可以显著改善大流行期间和正常时期的大学课程安排。可以通过最少的管理工作制定高精度的时间表，以应对各种大流行可能性，并在需要时在学期之前或期间立即启用。

更新时间: 2024-07-11 17:56:00

领域: math.OC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.07355v2

Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

Updated: 2024-07-11 17:55:49

标题: 使用单个摄像头统一表示和控制多样化机器人的3D表示

摘要: 模仿自然生物的复杂结构和多样功能是机器人领域长期以来的挑战。现代制造技术已大大扩展了可行的硬件，但部署这些系统需要控制软件将所需运动转化为执行器命令。虽然传统机器人可以轻松地建模为通过关节连接的刚性链接，但对于常常是多材料或软体、缺乏感知能力并且可能随使用而改变材料特性的仿生机器人进行建模和控制仍然是一个挑战。在这里，我们介绍了神经雅可比场，这是一种能够自主学习仅通过视觉来建模和控制机器人的架构。我们的方法不假设机器人的材料、驱动或感知，仅需要一个摄像头进行控制，并通过观察执行随机命令来学习控制机器人，无需专家干预。我们在各种不同的机器人操纵器上展示了我们的方法，这些机器人在驱动、材料、制造和成本上各不相同。我们的方法实现了准确的闭环控制，并恢复了每个机器人的因果动态结构。通过仅使用通用摄像头作为唯一传感器来实现机器人控制，我们预计我们的工作将极大地拓宽机器人系统的设计空间，并作为降低机器人自动化门槛的起点。

更新时间: 2024-07-11 17:55:49

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08722v1

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current methods for detoxification or preventing jailbreaking usually involve Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), which requires finetuning billions of parameters through gradient descent with substantial computation cost. Furthermore, models modified through SFT and RLHF may deviate from the pretrained models, potentially leading to a degradation in foundational LLM capabilities. In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs, such as detoxification and resistance to jailbreaking. Specifically, for a behavior that we aim to avoid, we employ a linear classifier, which we term the behavior probe, to classify binary behavior labels within the hidden state space of the LLM. Using this probe, we introduce an algorithm to identify a critical subset of LLM parameters that significantly influence this targeted behavior. Then we directly edit these selected parameters by shifting them towards the behavior probe. Such a direct parameter editing method necessitates only inference-level computational resources. Experiments demonstrate that in the representative detoxification task, our approach achieves reductions of up to 90.0\% in toxicity on the RealToxicityPrompts dataset and 49.2\% on ToxiGen, while maintaining the LLM's general capabilities in areas such as common sense, question answering, and mathematics. Our code is available at https://github.com/lucywang720/model-surgery.

Updated: 2024-07-11 17:52:03

标题: 模型手术：通过简单参数编辑调节LLM的行为

摘要: 大型语言模型（LLMs）已经展示出作为通用助手的巨大潜力，展示了强大的任务理解和问题解决能力。要将LLMs部署为人工智能助手，这些模型展现出良好的行为特征至关重要，例如无毒性和抗破解能力。目前用于解毒或防止破解的方法通常涉及监督微调（SFT）或通过人类反馈进行强化学习（RLHF），这需要通过梯度下降对数十亿个参数进行微调，计算成本相当大。此外，通过SFT和RLHF修改的模型可能偏离预训练模型，潜在地导致LLM基本能力的下降。在本文中，我们观察到令人惊讶的是，直接编辑一小部分参数可以有效调节LLMs的特定行为，例如解毒和抗破解。具体而言，对于我们希望避免的行为，我们使用线性分类器，我们将其称为行为探针，来对LLMs的隐藏状态空间中的二进制行为标签进行分类。使用这个探针，我们引入了一种算法，以识别显著影响这个目标行为的LLM参数的关键子集。然后，我们通过将这些选定的参数向行为探针移动来直接编辑这些参数。这种直接参数编辑方法只需要推断级的计算资源。实验证明，在代表性的解毒任务中，我们的方法在RealToxicityPrompts数据集上实现了高达90.0%的毒性减少，在ToxiGen上实现了49.2%，同时保持了LLM在常识、问题回答和数学等领域的一般能力。我们的代码可在https://github.com/lucywang720/model-surgery找到。

更新时间: 2024-07-11 17:52:03

领域: cs.AI,68T50 (Primary) 68T07, 62M45 (Secondary),I.2.7

下载: http://arxiv.org/abs/2407.08770v1

WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics

Lip-based biometric authentication (LBBA) has attracted many researchers during the last decade. The lip is specifically interesting for biometric researchers because it is a twin biometric with the potential to function both as a physiological and a behavioral trait. Although much valuable research was conducted on LBBA, none of them considered the different emotions of the client during the video acquisition step of LBBA, which can potentially affect the client's facial expressions and speech tempo. We proposed a novel network structure called WhisperNetV2, which extends our previously proposed network called WhisperNet. Our proposed network leverages a deep Siamese structure with triplet loss having three identical SlowFast networks as embedding networks. The SlowFast network is an excellent candidate for our task since the fast pathway extracts motion-related features (behavioral lip movements) with a high frame rate and low channel capacity. The slow pathway extracts visual features (physiological lip appearance) with a low frame rate and high channel capacity. Using an open-set protocol, we trained our network using the CREMA-D dataset and acquired an Equal Error Rate (EER) of 0.005 on the test set. Considering that the acquired EER is less than most similar LBBA methods, our method can be considered as a state-of-the-art LBBA method.

Updated: 2024-07-11 17:51:49

标题: WhisperNetV2：SlowFast孪生网络用于基于唇部生物特征的识别

摘要: 基于嘴唇的生物特征认证（LBBA）在过去十年中吸引了许多研究人员的关注。嘴唇对生物特征研究人员特别感兴趣，因为它是一种具有生理和行为特征潜力的双重生物特征。尽管在LBBA上进行了许多有价值的研究，但没有一项考虑到客户在LBBA视频采集阶段的不同情绪，这可能会影响客户的面部表情和语速。我们提出了一种新颖的网络结构，名为WhisperNetV2，它扩展了我们之前提出的网络WhisperNet。我们提出的网络利用了一个深度孪生结构，三个相同的SlowFast网络作为嵌入网络，采用三胞胎损失。SlowFast网络是我们任务的一个很好的候选者，因为快速通道以高帧率和低通道容量提取与运动相关的特征（行为嘴唇运动），而慢速通道以低帧率和高通道容量提取视觉特征（生理嘴唇外观）。我们使用开放式协议，使用CREMA-D数据集训练我们的网络，并在测试集上获得了0.005的等误差率（EER）。考虑到获得的EER低于大多数类似的LBBA方法，我们的方法可以被认为是一种最先进的LBBA方法。

更新时间: 2024-07-11 17:51:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08717v1

Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices

Time-series data processing is an important component of many real-world applications, such as health monitoring, environmental monitoring, and digital agriculture. These applications collect distinct windows of sensor data (e.g., few seconds) and process them to assess the environment. Machine learning (ML) models are being employed in time-series applications due to their generalization abilities for classification. State-of-the-art time-series applications wait for entire sensor data window to become available before processing the data using ML algorithms, resulting in high sensor energy consumption. However, not all situations require processing full sensor window to make accurate inference. For instance, in activity recognition, sitting and standing activities can be inferred with partial windows. Using this insight, we propose to employ early exit classifiers with partial sensor windows to minimize energy consumption while maintaining accuracy. Specifically, we first utilize multiple early exits with successively increasing amount of data as they become available in a window. If early exits provide inference with high confidence, we return the label and enter low power mode for sensors. The proposed approach has potential to enable significant energy savings in time series applications. We utilize neural networks and random forest classifiers to evaluate our approach. Our evaluations with six datasets show that the proposed approach enables up to 50-60% energy savings on average without any impact on accuracy. The energy savings can enable time-series applications in remote locations with limited energy availability.

Updated: 2024-07-11 17:50:31

标题: 感知器感知分类器用于物联网设备上的能源高效时间序列应用

摘要: 时间序列数据处理是许多现实世界应用的重要组成部分，例如健康监测、环境监测和数字农业。这些应用程序收集不同时间窗口的传感器数据（例如几秒钟），并对其进行处理以评估环境。由于其用于分类的泛化能力，机器学习（ML）模型被应用于时间序列应用中。最先进的时间序列应用程序在处理数据之前等待整个传感器数据窗口变为可用状态，使用ML算法进行处理，导致传感器能量消耗高。然而，并非所有情况都需要处理完整的传感器窗口以进行准确推断。例如，在活动识别中，坐姿和站立活动可以使用部分窗口进行推断。基于这一观察结果，我们建议使用具有部分传感器窗口的早期退出分类器，以最大限度地减少能量消耗同时保持准确性。具体而言，我们首先利用多个早期退出，随着在窗口中变得可用的数据量逐渐增加。如果早期退出提供高置信度的推断，我们将返回标签并进入传感器的低功耗模式。所提出的方法有潜力在时间序列应用中实现显著的能量节约。我们利用神经网络和随机森林分类器来评估我们的方法。我们对六个数据集进行评估，结果表明所提出的方法平均能够实现50-60%的能量节约，而不会对准确性造成任何影响。这种能量节约可以使时间序列应用能够在能源有限的偏远地区运行。

更新时间: 2024-07-11 17:50:31

领域: cs.LG

下载: http://arxiv.org/abs/2407.08715v1

GTA: A Benchmark for General Tool Agents

Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.

Updated: 2024-07-11 17:50:09

标题: GTA：通用工具代理的基准测试

摘要: 重点放在将大型语言模型（LLMs）与各种工具整合，以开发通用代理。这对LLMs的工具使用能力构成了挑战。然而，现有工具使用评估与真实场景之间存在明显差距。目前的评估通常使用人工智能生成的查询、单步任务、虚拟工具和仅文本互动，未能有效揭示代理的真实问题解决能力。为了解决这一问题，我们提出了GTA，一个通用工具代理的基准，包括三个主要方面：（i）真实用户查询：人工编写的带有简单真实世界目标但隐含工具使用的查询，要求LLM推理出合适的工具并规划解决步骤。（ii）真实部署的工具：一个评估平台，配备了跨感知、操作、逻辑和创造力类别的工具，以评估代理的实际任务执行性能。（iii）真实多模态输入：真实的图像文件，如空间场景、网页截图、表格、代码片段和印刷/手写材料，作为查询上下文，与真实世界场景密切对齐。我们设计了229个真实世界任务和可执行的工具链，用于评估主流LLMs。我们的发现显示，现有LLMs对真实世界用户查询具有挑战性，GPT-4完成任务不足50％，大多数LLMs的完成率在25％以下。这种评估揭示了当前LLMs在真实世界场景中工具使用能力的瓶颈，为推进通用工具代理提供了未来方向。代码和数据集可在https://github.com/open-compass/GTA 上找到。

更新时间: 2024-07-11 17:50:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08713v1

eyeballvul: a future-proof benchmark for vulnerability detection in the wild

Long contexts of recent LLMs have enabled a new use case: asking models to find security vulnerabilities in entire codebases. To evaluate model performance on this task, we introduce eyeballvul: a benchmark designed to test the vulnerability detection capabilities of language models at scale, that is sourced and updated weekly from the stream of published vulnerabilities in open-source repositories. The benchmark consists of a list of revisions in different repositories, each associated with the list of known vulnerabilities present at that revision. An LLM-based scorer is used to compare the list of possible vulnerabilities returned by a model to the list of known vulnerabilities for each revision. As of July 2024, eyeballvul contains 24,000+ vulnerabilities across 6,000+ revisions and 5,000+ repositories, and is around 55GB in size.

Updated: 2024-07-11 17:46:21

标题: eyeballvul: 一种未来可靠的野外漏洞检测基准

摘要: 最近LLMs的长篇背景使得出现了一个新的用例：要求模型在整个代码库中查找安全漏洞。为了评估模型在这项任务上的性能，我们引入了eyeballvul：一个旨在测试语言模型在规模上的漏洞检测能力的基准，它每周从开源存储库中发布的漏洞流中获取并更新。该基准由不同存储库中的修订列表组成，每个修订与该修订中存在的已知漏洞列表相关联。基于LLM的评分器用于比较模型返回的可能漏洞列表与每个修订的已知漏洞列表。截至2024年7月，eyeballvul包含超过24,000个漏洞，涵盖6,000多个修订和5,000多个存储库，大小约为55GB。

更新时间: 2024-07-11 17:46:21

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08708v1

Extracting Training Data from Document-Based VQA Models

Vision-Language Models (VLMs) have made remarkable progress in document-based Visual Question Answering (i.e., responding to queries about the contents of an input document provided as an image). In this work, we show these models can memorize responses for training samples and regurgitate them even when the relevant visual information has been removed. This includes Personal Identifiable Information (PII) repeated once in the training set, indicating these models could divulge memorised sensitive information and therefore pose a privacy risk. We quantitatively measure the extractability of information in controlled experiments and differentiate between cases where it arises from generalization capabilities or from memorization. We further investigate the factors that influence memorization across multiple state-of-the-art models and propose an effective heuristic countermeasure that empirically prevents the extractability of PII.

Updated: 2024-07-11 17:44:41

标题: 从基于文档的VQA模型中提取训练数据

摘要: 视觉语言模型（VLMs）在基于文档的视觉问答（即回答关于提供作为图像的输入文档内容的查询）方面取得了显著进展。在这项工作中，我们展示这些模型可以记住训练样本的响应，并在相关视觉信息被移除时仍能复述它们。这包括在训练集中重复出现的个人可识别信息（PII），表明这些模型可能泄露记忆的敏感信息，因此存在隐私风险。我们在控制实验中定量衡量信息的可提取性，并区分其是来源于泛化能力还是记忆。我们进一步研究影响多个最先进模型记忆化的因素，并提出一种有效的经验性对策，可以防止PII的可提取性。

更新时间: 2024-07-11 17:44:41

领域: cs.CV,cs.LG,I.2.7; I.2.10; K.4.1

下载: http://arxiv.org/abs/2407.08707v1

Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware

This paper explores the synergistic potential of neuromorphic and edge computing to create a versatile machine learning (ML) system tailored for processing data captured by dynamic vision sensors. We construct and train hybrid models, blending spiking neural networks (SNNs) and artificial neural networks (ANNs) using PyTorch and Lava frameworks. Our hybrid architecture integrates an SNN for temporal feature extraction and an ANN for classification. We delve into the challenges of deploying such hybrid structures on hardware. Specifically, we deploy individual components on Intel's Neuromorphic Processor Loihi (for SNN) and Jetson Nano (for ANN). We also propose an accumulator circuit to transfer data from the spiking to the non-spiking domain. Furthermore, we conduct comprehensive performance analyses of hybrid SNN-ANN models on a heterogeneous system of neuromorphic and edge AI hardware, evaluating accuracy, latency, power, and energy consumption. Our findings demonstrate that the hybrid spiking networks surpass the baseline ANN model across all metrics and outperform the baseline SNN model in accuracy and latency.

Updated: 2024-07-11 17:40:39

标题: 朝着在神经形态和边缘人工智能硬件上高效部署混合SNNs

摘要: 本文探讨了神经形态学和边缘计算相结合的潜力，创建一个适合处理动态视觉传感器捕获数据的多功能机器学习（ML）系统。我们构建和训练混合模型，使用PyTorch和Lava框架混合脉冲神经网络（SNNs）和人工神经网络（ANNs）。我们的混合架构集成了一个用于时间特征提取的SNN和一个用于分类的ANN。我们深入探讨了在硬件上部署这种混合结构的挑战。具体来说，我们在英特尔的神经形态处理器Loihi（用于SNN）和Jetson Nano（用于ANN）上部署了各个组件。我们还提出了一个累加器电路，将数据从脉冲领域转移到非脉冲领域。此外，我们对混合SNN-ANN模型在神经形态和边缘AI硬件的异构系统上进行了全面的性能分析，评估准确性、延迟、功耗和能耗。我们的研究结果表明，混合脉冲网络在所有指标上均优于基准ANN模型，并在准确性和延迟方面优于基准SNN模型。

更新时间: 2024-07-11 17:40:39

领域: cs.NE,cs.AI,cs.AR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08704v1

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML accelerators, like graphical processing units (GPUs), being designed specifically to perform the multiply-accumulate (MAC) operations required in the matrix-matrix and matrix-vector multiplies extensively present throughout the execution of deep neural networks (DNNs). Such improvements include maximizing data reuse and minimizing data transfer by leveraging the temporal dataflow paradigms provided by the systolic array architecture. While this design provides a significant performance benefit, the current implementations are restricted to a single dataflow consisting of either input, output, or weight stationary architectures. This can limit the achievable performance of DNN inference and reduce the utilization of compute units. Therefore, the work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. Our experiments thoroughly test the viability of the Flex-TPU comparing it to conventional TPU designs across multiple well-known ML workloads. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.

Updated: 2024-07-11 17:33:38

标题: Flex-TPU：具有运行时可重构数据流架构的灵活TPU

摘要: 张量处理单元（TPUs）是最知名的机器学习（ML）加速器之一，被广泛应用于数据中心和小型ML应用中。TPUs在设计上与传统的ML加速器（如图形处理单元（GPUs））相比具有几项改进和优势，专门用于执行深度神经网络（DNNs）中广泛存在的矩阵-矩阵和矩阵-向量相乘所需的乘积累加（MAC）操作。这些改进包括通过利用系统阵列架构提供的时间数据流范式来最大化数据重用和最小化数据传输。尽管这种设计提供了显著的性能优势，但目前的实现仅限于包含输入、输出或权重静态体系结构的单一数据流。这可能限制了DNN推断的可实现性能，并降低了计算单元的利用率。因此，本文的工作包括开发一种可重构数据流TPU，称为Flex-TPU，它可以在运行时动态更改每层的数据流。我们的实验对Flex-TPU的可行性进行了全面测试，并将其与多个知名ML工作负载中的传统TPU设计进行了比较。结果显示，与传统TPU相比，我们的Flex-TPU设计实现了高达2.75倍的显著性能提升，仅具有轻微的面积和功耗开销。

更新时间: 2024-07-11 17:33:38

领域: cs.AR,cs.AI,cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2407.08700v1

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's capabilities, severely limiting the usefulness of the resulting model. We address this issue by proposing Branch-and-Merge (BaM), a new adaptation method based on iteratively merging multiple models, fine-tuned on a subset of the available training data. BaM is based on the insight that this yields lower magnitude but higher quality weight changes, reducing forgetting of the source domain while maintaining learning on the target domain. We demonstrate in an extensive empirical study on Bulgarian and German that BaM can significantly reduce forgetting while matching or even improving target domain performance compared to both standard continued pretraining and instruction finetuning across different model architectures.

Updated: 2024-07-11 17:32:40

标题: 通过模型合并减轻语言迁移中的灾难性遗忘

摘要: 随着开放权重的大型语言模型（LLMs）在英语中在各种任务中取得越来越令人印象深刻的表现，从业者们希望将这些模型适应不同的语言。然而，这种语言适应通常伴随着基础模型能力的灾难性遗忘，严重限制了生成模型的实用性。我们通过提出Branch-and-Merge（BaM）来解决这个问题，这是一种基于逐步合并多个模型的新适应方法，这些模型在可用训练数据的子集上进行了微调。BaM基于这样的见解，即这样做会产生更低幅度但更高质量的权重变化，减少了源领域的遗忘，同时保持了目标领域的学习。我们在保加利亚语和德语的广泛实证研究中展示了，与标准持续预训练和指导微调相比，BaM可以显著减少遗忘，同时匹配甚至改善不同模型架构下的目标领域性能。

更新时间: 2024-07-11 17:32:40

领域: cs.LG

下载: http://arxiv.org/abs/2407.08699v1

Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to capture relationships between varied cloud system performance metrics. To be effective, however, systems developers must correctly define the causal graph of their system, which is a time-consuming, brittle, and challenging task that increases in difficulty for large and dynamic systems and requires domain expertise. Alternatively, automated data-driven approaches have limited efficacy for cloud systems due to the inherent rarity of incidents. In this work, we present Atlas, a novel approach to automatically synthesizing causal graphs for cloud systems. Atlas leverages large language models (LLMs) to generate causal graphs using system documentation, telemetry, and deployment feedback. Atlas is complementary to data-driven causal discovery techniques, and we further enhance Atlas with a data-driven validation step. We evaluate Atlas across a range of fault localization scenarios and demonstrate that Atlas is capable of generating causal graphs in a scalable and generalizable manner, with performance that far surpasses that of data-driven algorithms and is commensurate to the ground-truth baseline.

Updated: 2024-07-11 17:31:12

标题: 《云图：利用语言模型和因果洞察实现云系统的高效故障定位》

摘要: 现代云系统中运行时故障和性能降级是司空见惯的。对于云服务提供商来说，自动确定事故的根本原因至关重要，以确保高可靠性和可用性，及时的故障定位可以加快诊断和分类，有助于及时解决问题。最近的研究中探索了一种引人注目的解决方案，即使用因果图进行因果推理，以捕捉不同云系统性能指标之间的关系。然而，要使系统有效，开发人员必须正确定义系统的因果图，这是一项耗时、脆弱且具有挑战性的任务，对于大型和动态系统来说难度增加，并需要领域专业知识。另外，由于事故的固有稀有性，自动数据驱动方法在云系统中的效力有限。在这项工作中，我们提出了Atlas，一种自动合成云系统因果图的新方法。Atlas利用大型语言模型(LLMs)使用系统文档、遥测和部署反馈生成因果图。Atlas与数据驱动的因果发现技术互补，并通过数据驱动的验证步骤进一步增强了Atlas。我们在一系列故障定位场景中评估了Atlas，并展示了Atlas能够以一种可扩展且可泛化的方式生成因果图，其性能远远超过数据驱动算法，并与基准真实值相当。

更新时间: 2024-07-11 17:31:12

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08694v1

Robotic Control via Embodied Chain-of-Thought Reasoning

A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is common in regular CoT, is insufficient for robot policies that need to ground their reasoning in sensory observations and the robot state. To this end, we introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features like object bounding boxes and end effector positions, before predicting the robot action. We design a scalable pipeline for generating synthetic training data for ECoT on large robot datasets. We demonstrate, that ECoT increases the absolute success rate of OpenVLA, the current strongest open-source VLA policy, by 28% across challenging generalization tasks, without any additional robot training data. Additionally, ECoT makes it easier for humans to interpret a policy's failures and correct its behavior using natural language.

Updated: 2024-07-11 17:31:01

标题: 机器人控制通过具身思维链推理

摘要: 学习机器人控制策略的一个关键限制是它们无法推广到训练数据之外。最近关于视觉-语言-行动模型（VLAs）的研究表明，将大型、互联网预训练的视觉-语言模型作为学习机器人策略的基础可以显着提高它们的稳健性和泛化能力。然而，在其他领域中，大型视觉-语言模型最令人兴奋的能力之一是能够通过复杂问题进行迭代推理。同样的能力是否可以引入到机器人技术中，以允许策略在行动之前通过对给定任务进行推理来提高性能？由于它们所能获得的训练示例相对简单，普通VLAs的“思维链”（CoT）风格提示的天真使用明显不够有效。此外，纯粹的关于子任务的语义推理，正如在常规CoT中常见的那样，对于需要将推理根植于感官观察和机器人状态的机器人策略来说是不够的。为此，我们引入了面向体验的思维链推理（ECoT）VLAs，其中我们训练VLAs对计划、子任务、动作和视觉基础的特征（如物体边界框和末端执行器位置）进行多步推理，然后预测机器人动作。我们为ECoT在大型机器人数据集上生成合成训练数据的可扩展流水线。我们展示，ECoT在具有挑战性的泛化任务中，绝对成功率提高了当前最强大的开源VLA策略OpenVLA的28%，而无需任何额外的机器人训练数据。此外，ECoT使人类更容易解释策略的失败，并使用自然语言纠正其行为。

更新时间: 2024-07-11 17:31:01

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.08693v1

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs. However, this leads to performance degradation for ASTs in the inference when input lengths vary from the training. This paper introduces an approach that enables the use of variable-length audio inputs with AST models during both training and inference. By employing sequence packing, our method ElasticAST, accommodates any audio length during training, thereby offering flexibility across all lengths and resolutions at the inference. This flexibility allows ElasticAST to maintain evaluation capabilities at various lengths or resolutions and achieve similar performance to standard ASTs trained at specific lengths or resolutions. Moreover, experiments demonstrate ElasticAST's better performance when trained and evaluated on native-length audio datasets.

Updated: 2024-07-11 17:29:56

标题: ElasticAST：适用于所有长度和分辨率的音频频谱图变换器

摘要: 变压器已迅速取代基于CNN的架构成为音频分类的新标准。基于变压器的模型，如音频谱图变压器（AST），也继承了CNN中的固定输入范式。然而，当输入长度从训练中变化时，这会导致AST在推断中性能下降。本文介绍了一种方法，可以在训练和推断过程中使用可变长度的音频输入与AST模型。通过使用序列打包，我们的方法ElasticAST在训练过程中可以容纳任何音频长度，从而在推断中提供各种长度和分辨率的灵活性。这种灵活性使得ElasticAST能够在各种长度或分辨率下保持评估能力，并实现与在特定长度或分辨率下训练的标准AST相似的性能。此外，实验证明，当在原始长度音频数据集上训练和评估时，ElasticAST表现出更好的性能。

更新时间: 2024-07-11 17:29:56

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.08691v1

Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers

As Artificial Intelligence (AI) tools are increasingly employed in diverse real-world applications, there has been significant interest in regulating these tools. To this end, several regulatory frameworks have been introduced by different countries worldwide. For example, the European Union recently passed the AI Act, the White House issued an Executive Order on safe, secure, and trustworthy AI, and the White House Office of Science and Technology Policy issued the Blueprint for an AI Bill of Rights (AI BoR). Many of these frameworks emphasize the need for auditing and improving the trustworthiness of AI tools, underscoring the importance of safety, privacy, explainability, fairness, and human fallback options. Although these regulatory frameworks highlight the necessity of enforcement, practitioners often lack detailed guidance on implementing them. Furthermore, the extensive research on operationalizing each of these aspects is frequently buried in technical papers that are difficult for practitioners to parse. In this write-up, we address this shortcoming by providing an accessible overview of existing literature related to operationalizing regulatory principles. We provide easy-to-understand summaries of state-of-the-art literature and highlight various gaps that exist between regulatory guidelines and existing AI research, including the trade-offs that emerge during operationalization. We hope that this work not only serves as a starting point for practitioners interested in learning more about operationalizing the regulatory guidelines outlined in the Blueprint for an AI BoR but also provides researchers with a list of critical open problems and gaps between regulations and state-of-the-art AI research. Finally, we note that this is a working paper and we invite feedback in line with the purpose of this document as described in the introduction.

Updated: 2024-07-11 17:28:07

标题: 实施AI权利法案蓝图：对从业者、研究人员和政策制定者的建议

摘要: 随着人工智能（AI）工具在各种真实世界应用中的日益广泛应用，对于对这些工具进行监管引起了极大兴趣。为此，全球各国引入了几种监管框架。例如，欧盟最近通过了AI法案，白宫发布了关于安全、可靠和值得信赖的AI的行政命令，白宫科学技术政策办公室发布了AI权利法案的蓝图（AI BoR）。许多这些框架强调了审计和改进AI工具的值得信赖性的重要性，强调了安全、隐私、可解释性、公平性和人类备用选项的重要性。尽管这些监管框架强调了执法的必要性，但从业者经常缺乏有关实施的详细指导。此外，关于每个方面的运作化的广泛研究经常埋在对从业者难以解读的技术论文中。在这篇文章中，我们通过提供一个易于理解的现有文献概述来解决这个缺点，该概述与运作监管原则相关。我们提供了对最新文献的易于理解的摘要，并强调了监管指导和现有AI研究之间存在的各种差距，包括在运作过程中出现的权衡。我们希望这项工作不仅可以作为对于学习如何实施AI BoR蓝图中概述的监管指导感兴趣的从业者的起点，还可以为研究人员提供一份关于监管和最新AI研究之间关键开放问题和差距的清单。最后，我们指出这是一份工作文件，欢迎根据本文介绍中的目的提供反馈。

更新时间: 2024-07-11 17:28:07

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2407.08689v1

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

SLEDGE is the first generative simulator for vehicle motion planning trained on real-world driving logs. Its core component is a learned model that is able to generate agent bounding boxes and lane graphs. The model's outputs serve as an initial state for rule-based traffic simulation. The unique properties of the entities to be generated for SLEDGE, such as their connectivity and variable count per scene, render the naive application of most modern generative models to this task non-trivial. Therefore, together with a systematic study of existing lane graph representations, we introduce a novel raster-to-vector autoencoder. It encodes agents and the lane graph into distinct channels in a rasterized latent map. This facilitates both lane-conditioned agent generation and combined generation of lanes and agents with a Diffusion Transformer. Using generated entities in SLEDGE enables greater control over the simulation, e.g. upsampling turns or increasing traffic density. Further, SLEDGE can support 500m long routes, a capability not found in existing data-driven simulators like nuPlan. It presents new challenges for planning algorithms, evidenced by failure rates of over 40% for PDM, the winner of the 2023 nuPlan challenge, when tested on hard routes and dense traffic generated by our model. Compared to nuPlan, SLEDGE requires 500$\times$ less storage to set up (<4 GB), making it a more accessible option and helping with democratizing future research in this field.

Updated: 2024-07-11 17:27:49

标题: SLEDGE：使用生成模型和基于规则的交通合成驾驶环境

摘要: SLEDGE是第一个基于真实行车记录训练的车辆运动规划生成模拟器。它的核心组件是一个学习模型，能够生成代理边界框和车道图。该模型的输出作为基于规则的交通仿真的初始状态。为了为SLEDGE生成的实体引入了独特的属性，如它们的连通性和每个场景的变量数量，使得大多数现代生成模型在这个任务上的简单应用变得困难。因此，我们系统地研究了现有车道图表示，并引入了一种新颖的光栅到矢量自动编码器。它将代理和车道图编码为光栅化潜在地图中的不同通道。这有助于车道条件的代理生成和使用扩散变压器同时生成车道和代理。在SLEDGE中使用生成的实体可以更好地控制仿真，例如增加转弯次数或增加交通密度。此外，SLEDGE可以支持500米长的路线，这是现有数据驱动模拟器如nuPlan所不具备的功能。对于规划算法提出了新挑战，通过测试我们的模型生成的艰难路线和密集交通，PDM（2023年nuPlan挑战赛的获胜者）的失败率超过40％。与nuPlan相比，SLEDGE设置所需存储空间少500倍（<4 GB），使其成为更具可访问性的选择，并有助于民主化未来这一领域的研究。

更新时间: 2024-07-11 17:27:49

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.17933v2

Generative Inverse Design of Metamaterials with Functional Responses by Interpretable Learning

Metamaterials with functional responses, such as wave-based responses or deformation-induced property variation under external stimuli, can exhibit varying properties or functionalities under different conditions. Herein, we aim at rapid inverse design of these metamaterials to meet target qualitative functional behaviors. This inverse problem is challenging due to its intractability and the existence of non-unique solutions. Past works mainly focus on deep-learning-based methods that are data-demanding, require time-consuming training and hyperparameter tuning, and are non-interpretable. To overcome these limitations, we propose the Random-forest-based Interpretable Generative Inverse Design (RIGID), an iteration-free, single-shot inverse design method to achieve the fast generation of metamaterial designs with on-demand functional behaviors. Unlike most existing methods, by exploiting the interpretability of the random forest, we eliminate the need to train an inverse model mapping responses to designs. Based on the likelihood of target satisfaction derived from the trained forward model, one can sample design solutions using Markov chain Monte Carlo methods. The RIGID method therefore functions as a generative model that captures the conditional distribution of satisfying solutions given a design target. We demonstrate the effectiveness and efficiency of RIGID on both acoustic and optical metamaterial design problems where only small datasets (less than 250 training samples) are available. Synthetic design problems are created to further illustrate and validate the mechanism of likelihood estimation in RIGID. This work offers a new perspective on solving on-demand inverse design problems, showcasing the potential for incorporating interpretable machine learning into generative design and eliminating its large data requirement.

Updated: 2024-07-11 17:27:35

标题: 用可解释学习进行具有功能响应的变形材料的生成式反向设计

摘要: 具有功能响应的超材料，如基于波的响应或在外部刺激下引起的性质变化，可以在不同条件下展现不同的特性或功能。在这里，我们旨在快速逆向设计这些超材料，以满足目标定性功能行为。这个逆向问题具有挑战性，因为它难以处理并且存在非唯一解。过去的工作主要集中在基于深度学习的方法上，这些方法对数据要求高，需要耗时的训练和超参数调整，并且不易解释。为了克服这些限制，我们提出了基于随机森林的可解释生成逆向设计（RIGID），这是一种无迭代、一次性逆向设计方法，可以快速生成具有按需功能行为的超材料设计。与大多数现有方法不同的是，通过利用随机森林的可解释性，我们消除了训练将响应映射到设计的逆向模型的需求。根据训练前向模型得出的目标满足可能性，可以使用马尔可夫链蒙特卡洛方法对设计解决方案进行采样。因此，RIGID方法作为一个生成模型，捕捉了在给定设计目标的情况下满足解决方案的条件分布。我们展示了RIGID在声学和光学超材料设计问题上的有效性和效率，其中只有少量数据集（少于250个训练样本）可用。我们创建了合成设计问题来进一步说明和验证RIGID中可能性估计的机制。这项工作提供了解决按需逆向设计问题的新视角，展示了将可解释的机器学习融入生成设计中并消除其大数据需求的潜力。

更新时间: 2024-07-11 17:27:35

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2401.00003v2

Hardware Neural Control of CartPole and F1TENTH Race Car

Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race car. Our results show that the NCs match the control performance of the NMPCs in simulation and outperform it in reality, due to the faster control rate that is afforded by the quick FPGA NC inference. We demonstrate kHz control rates for a physical cartpole and offloading control to the FPGA hardware on the F1TENTH car. Code and hardware implementation for this paper are available at https:// github.com/SensorsINI/Neural-Control-Tools.

Updated: 2024-07-11 17:14:19

标题: 硬件神经控制CartPole和F1TENTH赛车

摘要: 非线性模型预测控制（NMPC）已被证明是一种有效的控制方法，但计算成本高昂。本文展示了使用硬件FPGA神经网络控制器进行训练，以模仿NMPC并通过监督学习。我们在廉价的嵌入式FPGA硬件上实现了这些神经控制器（NCs），用于对物理摆杆和F1TENTH赛车进行高频控制。我们的结果显示，由于快速的FPGA NC推理所带来的更快的控制速率，NCs在模拟中与NMPCs的控制性能相匹配，并在实际应用中表现优异。我们展示了对物理摆杆进行千赫兹控制率，并将控制转移到F1TENTH赛车的FPGA硬件上。本文的代码和硬件实现可在https://github.com/SensorsINI/Neural-Control-Tools找到。

更新时间: 2024-07-11 17:14:19

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.08681v1

How to beat a Bayesian adversary

Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.

Updated: 2024-07-11 17:12:42

标题: 如何打败一个贝叶斯对手

摘要: 深度神经网络和其他现代机器学习模型经常容易受到对抗攻击的影响。事实上，对手通常可以通过对模型输入的小而有方向性的扰动来改变模型的预测结果 - 这在安全关键应用中是一个问题。对抗鲁棒的机器学习通常基于一个最小化机器学习损失的最大化对抗攻击的最优化问题。在这项工作中，我们研究了使用贝叶斯统计方法而不是最大化来确定攻击的对手。由此产生的贝叶斯对抗鲁棒性问题是通常的最小最大问题的一个放松。为了解决这个问题，我们提出了Abram - 一个连续时间的粒子系统，它将近似于对应于基础学习问题的梯度流。我们展示了Abram近似一个麦克恩-弗拉索夫过程，并通过给出使麦克恩-弗拉索夫过程找到贝叶斯对抗鲁棒性问题的最小化器的假设来证明Abram的使用。我们讨论了两种离散化Abram的方法，并展示了它在基准对抗深度学习实验中的适用性。

更新时间: 2024-07-11 17:12:42

领域: cs.LG,math.OC,stat.CO,stat.ML,90C15, 65C35, 68T07

下载: http://arxiv.org/abs/2407.08678v1

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.

Updated: 2024-07-11 17:10:24

标题: 对抗训练中的大规模数据集修剪通过数据重要性外推

摘要: 它们对于微小、难以察觉的攻击的脆弱性限制了深度学习模型在实际系统中的应用。敌对训练已被证明是针对这些攻击最有希望的策略之一，但以大幅增加训练时间为代价。随着不断融合大规模合成数据的趋势，这种情况只会进一步加剧。因此，有必要采用以数据为中心的方法，以减少训练样本数量同时保持准确性和鲁棒性。虽然数据修剪和主动学习是深度学习中突出的研究课题，但目前在敌对训练文献中尚未得到充分探讨。我们填补了这一空白，并提出了一种基于从少量数据中推断数据重要性评分的新数据修剪策略。在实证评估中，我们证明了基于推断的修剪可以有效减少数据集大小同时保持鲁棒性。

更新时间: 2024-07-11 17:10:24

领域: cs.LG

下载: http://arxiv.org/abs/2406.13283v2

Korean Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering

Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean restaurant reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and MSP-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps, achieving positive results in Korean ABSA with minimal resources. Through additional data injection pipelines, our approach aims to utilize high-resource data and construct effective models within communities, whether corporate or individual, in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3% difference in F1 scores and accuracy. We release the dataset and our code for Korean ABSA, at this link.

Updated: 2024-07-11 17:08:36

标题: 通过隐式特征对韩国方面情感分析进行语料库过滤的对齐

摘要: 对韩国餐厅评论进行基于方面的情感分析(ABSA)的研究在现有文献中明显缺乏。我们的研究提出了一个直观且有效的框架，用于在韩文等低资源语言中进行ABSA。通过整合翻译的基准和未标记的韩文数据来优化预测标签。使用在翻译数据上微调的模型，我们伪标记了实际的韩文NLI集。随后，我们应用LaBSE和基于MSP的过滤器到这个伪NLI集，作为隐含特征，通过额外的训练增强了方面类别检测和极性判断。通过整合双重过滤，该模型弥合了数据集之间的差距，在资源有限的情况下取得了韩文ABSA的积极结果。通过额外的数据注入管道，我们的方法旨在利用高资源数据，并在低资源语言国家的社区中构建有效的模型，无论是企业还是个人。与英文ABSA相比，我们的框架在F1分数和准确性上显示出约3%的差异。我们在此链接发布了韩文ABSA的数据集和代码。

更新时间: 2024-07-11 17:08:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00342v2

CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs

Text-to-image generative models have increasingly been used to assist designers during concept generation in various creative domains, such as graphic design, user interface design, and fashion design. However, their applications in engineering design remain limited due to the models' challenges in generating images of feasible designs concepts. To address this issue, this paper introduces a method that improves the design feasibility by prompting the generation with feasible CAD images. In this work, the usefulness of this method is investigated through a case study with a bike design task using an off-the-shelf text-to-image model, Stable Diffusion 2.1. A diverse set of bike designs are produced in seven different generation settings with varying CAD image prompting weights, and these designs are evaluated on their perceived feasibility and novelty. Results demonstrate that the CAD image prompting successfully helps text-to-image models like Stable Diffusion 2.1 create visibly more feasible design images. While a general tradeoff is observed between feasibility and novelty, when the prompting weight is kept low around 0.35, the design feasibility is significantly improved while its novelty remains on par with those generated by text prompts alone. The insights from this case study offer some guidelines for selecting the appropriate CAD image prompting weight for different stages of the engineering design process. When utilized effectively, our CAD image prompting method opens doors to a wider range of applications of text-to-image models in engineering design.

Updated: 2024-07-11 17:07:32

标题: CAD-Prompted生成模型：通往可行和新颖工程设计的途径

摘要: 文本到图像生成模型越来越被用于在各种创意领域中协助设计师进行概念生成，如图形设计、用户界面设计和时尚设计。然而，由于模型在生成可行设计概念图像方面的挑战，它们在工程设计中的应用仍然有限。为了解决这个问题，本文介绍了一种通过提示生成可行CAD图像来改善设计可行性的方法。在这项工作中，通过一个自行车设计任务的案例研究，使用一种现成的文本到图像模型Stable Diffusion 2.1，探讨了这种方法的实用性。在七种不同的生成设置中生成了一系列不同设计的自行车，并对其感知可行性和新颖性进行了评估。结果表明，CAD图像提示成功帮助像Stable Diffusion 2.1这样的文本到图像模型创建更具可行性的设计图像。虽然在可行性和新颖性之间观察到了一般的权衡，但当提示权重保持在约0.35左右时，设计可行性显著提高，而其新颖性与仅使用文本提示生成的设计保持一致。此案例研究的见解为在工程设计过程的不同阶段选择适当的CAD图像提示权重提供了一些指导。有效利用我们的CAD图像提示方法，可以为工程设计中文本到图像模型的更广泛应用打开大门。

更新时间: 2024-07-11 17:07:32

领域: cs.AI,I.2

下载: http://arxiv.org/abs/2407.08675v1

Estimation of spatio-temporal extremes via generative neural networks

Recent methods in modeling spatial extreme events have focused on utilizing parametric max-stable processes and their underlying dependence structure. In this work, we provide a unified approach for analyzing spatial extremes with little available data by estimating the distribution of model parameters or the spatial dependence directly. By employing recent developments in generative neural networks we predict a full sample-based distribution, allowing for direct assessment of uncertainty regarding model parameters or other parameter dependent functionals. We validate our method by fitting several simulated max-stable processes, showing a high accuracy of the approach, regarding parameter estimation, as well as uncertainty quantification. Additional robustness checks highlight the generalization and extrapolation capabilities of the model, while an application to precipitation extremes across Western Germany demonstrates the usability of our approach in real-world scenarios.

Updated: 2024-07-11 16:57:17

标题: 利用生成神经网络估计时空极端值

摘要: 最近在建模空间极端事件方面的方法主要集中在利用参数化的最大稳定过程及其基础的依赖结构。在这项工作中，我们提供了一种统一的方法来分析具有少量可用数据的空间极端事件，即通过直接估计模型参数的分布或空间依赖性。通过利用生成神经网络的最新发展，我们预测了一个完整的基于样本的分布，从而可以直接评估关于模型参数或其他参数相关功能的不确定性。我们通过拟合几个模拟的最大稳定过程来验证我们的方法，显示了该方法在参数估计以及不确定性量化方面的高准确性。额外的稳健性检查突出了模型的泛化和外推能力，而对德国西部降水极端情况的应用则展示了我们方法在实际场景中的可用性。

更新时间: 2024-07-11 16:57:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.08668v1

Mon CHÈRI <3 Adapting Capability Hardware Enhanced RISC with Conditional Capabilities

Up to 10% of memory-safety vulnerabilities in languages like C and C++ stem from uninitialized variables. This work addresses the prevalence and lack of adequate software mitigations for uninitialized memory issues, proposing architectural protections in hardware. Capability-based addressing, such as the University of Cambridge's CHERI, mitigates many memory defects, including spatial and temporal safety violations at an architectural level. However, current CHERI designs do not handle undefined behavior from uninitialized variables. We extend the CHERI capability model to include "conditional capabilities", enabling memory-access policies based on prior operations. This allows enforcement of policies that satisfy memory safety objectives such as "no reads to memory without at least one prior write" (Write-before-Read). We present our architecture extension, compiler support, and a detailed evaluation of our approach using the QEMU full-system simulator and our modified FPGA-based CHERI-RISCV softcore. Our evaluation shows Write-before-Read conditional capabilities are practical, with high detection accuracy while adding a small (~3.5%) overhead to the existing CHERI architecture.

Updated: 2024-07-11 16:51:36

标题: 亲爱的CHÈRI：将硬件增强的RISC与条件能力相结合

摘要: 高达10%的C和C++等语言中的内存安全漏洞源于未初始化的变量。本文针对未初始化内存问题的普遍性和缺乏足够的软件缓解措施，提出在硬件中实施架构保护。基于能力的寻址，如剑桥大学的CHERI，可以在架构层面上减轻许多内存缺陷，包括空间和时间安全性违规。然而，当前的CHERI设计不能处理来自未初始化变量的未定义行为。我们扩展了CHERI能力模型，包括“条件能力”，使基于先前操作的内存访问策略成为可能。这允许执行符合内存安全目标的策略，例如“在至少一个先前写入之前不允许读取内存”（写入-读取）。我们介绍了我们的架构扩展、编译器支持，并使用QEMU全系统模拟器和我们修改的基于FPGA的CHERI-RISCV软核心进行了详细评估。我们的评估显示，写入-读取条件能力是实用的，具有高检测准确性，同时对现有的CHERI架构增加了少量（~3.5%）的开销。

更新时间: 2024-07-11 16:51:36

领域: cs.CR

下载: http://arxiv.org/abs/2407.08663v1

Uncertainty Estimation of Large Language Models in Medical Question Answering

Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications. We also observe that larger models tend to yield better results, suggesting a correlation between model size and the reliability of UE. To address these challenges, we propose Two-phase Verification, a probability-free Uncertainty Estimation approach. First, an LLM generates a step-by-step explanation alongside its initial answer, followed by formulating verification questions to check the factual claims in the explanation. The model then answers these questions twice: first independently, and then referencing the explanation. Inconsistencies between the two sets of answers measure the uncertainty in the original response. We evaluate our approach on three biomedical question-answering datasets using Llama 2 Chat models and compare it against the benchmarked baseline methods. The results show that our Two-phase Verification method achieves the best overall accuracy and stability across various datasets and model sizes, and its performance scales as the model size increases.

Updated: 2024-07-11 16:51:33

标题: 医学问题回答中大型语言模型的不确定性估计

摘要: 大型语言模型（LLMs）在医疗保健领域的自然语言生成中表现出潜力，但存在风险产生事实不正确的信息。部署LLMs进行医疗问题回答需要可靠的不确定性估计（UE）方法来检测幻觉。在这项工作中，我们在医疗问题回答数据集上基准测试了不同模型大小的流行UE方法。我们的结果显示，当前方法在这一领域通常表现不佳，突显了医疗应用中UE的挑战。我们还观察到较大模型往往产生更好的结果，表明模型大小与UE的可靠性之间存在相关性。为了解决这些挑战，我们提出了一种无概率的不确定性估计方法——两阶段验证。首先，一个LLM生成一个逐步解释以及其初始答案，然后制定验证问题来检查解释中的事实主张。模型然后两次回答这些问题：首先独立回答，然后参考解释。两组答案之间的不一致性测量原始响应中的不确定性。我们使用Llama 2 Chat模型在三个生物医学问题回答数据集上评估我们的方法，并将其与基准基准方法进行比较。结果显示，我们的两阶段验证方法在各种数据集和模型大小上实现了最佳的整体准确性和稳定性，并且其性能随着模型大小的增加而提升。

更新时间: 2024-07-11 16:51:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08662v1

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.

Updated: 2024-07-11 16:50:08

标题: BeTAIL：来自人类赛车游戏的行为转换对抗性模仿学习

摘要: 模仿学习是从演示中学习策略而无需手动设计奖励函数的方法。在许多机器人任务中，如自主赛车，模仿的策略必须模拟复杂的环境动态和人类决策。序列建模在捕捉运动序列的复杂模式方面非常有效，但在适应新环境或分布变化方面往往遇到困难，这在现实世界的机器人任务中很常见。相比之下，对抗性模仿学习（AIL）可以缓解这种效应，但在样本效率和处理复杂的运动模式方面遇到困难。因此，我们提出了BeTAIL：行为变换器对抗性模仿学习，它将来自人类演示的行为变换器（BeT）策略与在线AIL相结合。BeTAIL将一个AIL残差策略添加到BeT策略中，以模拟人类专家的顺序决策过程，并纠正分布外的状态或环境动态变化。我们在Gran Turismo Sport中对BeTAIL进行了三个挑战的测试，使用真实人类游戏的专家级演示。我们提出的残差BeTAIL减少了环境交互并提高了赛车性能和稳定性，即使BeT在不同赛道上进行预训练也是如此。视频和代码可在以下网址获取：https://sites.google.com/berkeley.edu/BeTAIL/home。

更新时间: 2024-07-11 16:50:08

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2402.14194v2

Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our approach offers three distinct techniques to adjust the fidelity and diversity of deep generative models: 1) Per-sample perturbation, enabling precise adjustments for individual samples towards either more common or more unique characteristics; 2) Importance sampling during model inference to enhance either fidelity or diversity in the generated data; 3) Fine-tuning with importance sampling, which guides the generative model to learn an adjusted distribution, thus controlling fidelity and diversity. Furthermore, our fine-tuning method demonstrates the ability to improve the Frechet Inception Distance (FID) for pre-trained generative models with minimal iterations.

Updated: 2024-07-11 16:46:04

标题: 通过伪密度控制深度生成模型的准确性和多样性

摘要: 我们引入了一种方法，可以偏置深度生成模型（如GAN和扩散模型）以生成具有增强保真度或增加多样性的数据。我们的方法涉及通过一种名为伪密度的新型指标操纵训练和生成数据的分布，该指标基于真实样本的最近邻信息。我们的方法提供了三种不同的技术来调整深度生成模型的保真度和多样性：1）逐个样本扰动，使得可以精确调整单个样本的更常见或更独特的特征；2）在模型推断过程中进行重要性抽样，以增强生成数据的保真度或多样性；3）使用重要性抽样进行微调，引导生成模型学习调整后的分布，从而控制保真度和多样性。此外，我们的微调方法展示了如何在最少迭代次数内改善预训练生成模型的Frechet Inception Distance（FID）的能力。

更新时间: 2024-07-11 16:46:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.08659v1

A Distributed ADMM-based Deep Learning Approach for Thermal Control in Multi-Zone Buildings under Demand Response Events

The increasing electricity use and reliance on intermittent renewable energy sources challenge power grid management during peak demand, making Demand Response programs and energy conservation measures essential. This research combines distributed optimization using ADMM with deep learning models to plan indoor temperature setpoints effectively. A two-layer hierarchical structure is used, with a central building coordinator at the upper layer and local controllers at the thermal zone layer. The coordinator must limit the building's maximum power by translating the building's total power to local power targets for each zone. Local controllers can modify the temperature setpoints to meet the local power targets. While most algorithms are either centralized or require prior knowledge about the building's structure, our approach is distributed and fully data-driven. The proposed algorithm, called Distributed Planning Networks, is designed to be both adaptable and scalable to many types of buildings, tackling two of the main challenges in the development of such systems. The proposed approach is tested on an 18-zone building modeled in EnergyPlus. The algorithm successfully manages Demand Response peak events.

Updated: 2024-07-11 16:43:38

标题: 一个基于分布式ADMM的深度学习方法：在需求响应事件下控制多区域建筑的热控制

摘要: 随着对间歇性可再生能源的依赖和电力使用的增加，电网管理在高峰需求期间面临挑战，因此需求响应计划和能源节约措施变得至关重要。本研究将使用ADMM的分布式优化与深度学习模型相结合，以有效规划室内温度设定点。采用了两层分层结构，上层为中央建筑协调员，下层为热区控制器。协调员必须通过将建筑总功率转化为每个区域的局部功率目标来限制建筑的最大功率。本地控制器可以修改温度设定点以满足局部功率目标。尽管大多数算法要么是集中式的，要么需要先验知识关于建筑的结构，我们的方法是分布式的，并且完全基于数据驱动。提出的算法称为分布式规划网络，旨在适应和扩展到许多类型的建筑，解决了这类系统开发中的两个主要挑战。该方法在EnergyPlus中建模的18个区域建筑上进行了测试，算法成功管理了需求响应高峰事件。

更新时间: 2024-07-11 16:43:38

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2312.05073v2

SPOCKMIP: Segmentation of Vessels in MRAs with Enhanced Continuity using Maximum Intensity Projection as Loss

Identification of vessel structures of different sizes in biomedical images is crucial in the diagnosis of many neurodegenerative diseases. However, the sparsity of good-quality annotations of such images makes the task of vessel segmentation challenging. Deep learning offers an efficient way to segment vessels of different sizes by learning their high-level feature representations and the spatial continuity of such features across dimensions. Semi-supervised patch-based approaches have been effective in identifying small vessels of one to two voxels in diameter. This study focuses on improving the segmentation quality by considering the spatial correlation of the features using the Maximum Intensity Projection~(MIP) as an additional loss criterion. Two methods are proposed with the incorporation of MIPs of label segmentation on the single~(z-axis) and multiple perceivable axes of the 3D volume. The proposed MIP-based methods produce segmentations with improved vessel continuity, which is evident in visual examinations of ROIs. Patch-based training is improved by introducing an additional loss term, MIP loss, to penalise the predicted discontinuity of vessels. A training set of 14 volumes is selected from the StudyForrest dataset comprising of 18 7-Tesla 3D Time-of-Flight~(ToF) Magnetic Resonance Angiography (MRA) images. The generalisation performance of the method is evaluated using the other unseen volumes in the dataset. It is observed that the proposed method with multi-axes MIP loss produces better quality segmentations with a median Dice of $80.245 \pm 0.129$. Also, the method with single-axis MIP loss produces segmentations with a median Dice of $79.749 \pm 0.109$. Furthermore, a visual comparison of the ROIs in the predicted segmentation reveals a significant improvement in the continuity of the vessels when MIP loss is incorporated into training.

Updated: 2024-07-11 16:39:24

标题: SPOCKMIP：使用最大强度投影作为损失的增强连续性MRAs中的血管分割

摘要: 在生物医学图像中识别不同尺寸的血管结构对于许多神经退行性疾病的诊断至关重要。然而，这些图像缺乏高质量的注释使得血管分割的任务具有挑战性。深度学习通过学习不同尺寸血管的高级特征表示和这些特征在不同维度上的空间连续性，提供了一种有效的血管分割方法。基于半监督的基于补丁的方法已经有效地识别了直径为一到两个体素的小血管。本研究旨在通过考虑特征的空间相关性，使用最大强度投影（MIP）作为额外的损失准则，改善分割质量。提出了两种方法，将标签分割的MIPs融入到三维体积的单个（z轴）和多个可感知轴中。提出的基于MIP的方法产生了具有改善血管连续性的分割结果，在感兴趣区域的视觉检查中明显可见。引入额外的损失项MIP损失来惩罚血管预测的不连续性，改善了基于补丁的训练。从包含18个7-特斯拉三维飞行时间磁共振血管造影（MRA）图像的StudyForrest数据集中选择了14个体积的训练集。使用数据集中其他未见的体积评估了该方法的泛化性能。观察到，采用多轴MIP损失的提出方法产生了质量更好的分割结果，Dice中位数为80.245±0.129。另外，采用单轴MIP损失的方法产生了Dice中位数为79.749±0.109的分割结果。此外，通过将MIP损失引入训练，预测分割中感兴趣区域的视觉比较显示了血管连续性的显著改善。

更新时间: 2024-07-11 16:39:24

领域: eess.IV,cs.AI,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2407.08655v1

Have We Reached AGI? Comparing ChatGPT, Claude, and Gemini to Human Literacy and Education Benchmarks

Recent advancements in AI, particularly in large language models (LLMs) like ChatGPT, Claude, and Gemini, have prompted questions about their proximity to Artificial General Intelligence (AGI). This study compares LLM performance on educational benchmarks with Americans' average educational attainment and literacy levels, using data from the U.S. Census Bureau and technical reports. Results show that LLMs significantly outperform human benchmarks in tasks such as undergraduate knowledge and advanced reading comprehension, indicating substantial progress toward AGI. However, true AGI requires broader cognitive assessments. The study highlights the implications for AI development, education, and societal impact, emphasizing the need for ongoing research and ethical considerations.

Updated: 2024-07-11 16:38:40

标题: 我们已经达到了通用人工智能吗？将ChatGPT、Claude和Gemini与人类识字和教育基准进行比较

摘要: 最近人工智能领域的进展，特别是在大型语言模型（LLMs）如ChatGPT、Claude和Gemini方面，引发了关于它们与人工通用智能（AGI）之间距离的问题。本研究使用来自美国人口调查局和技术报告的数据，将LLM在教育基准测试上的表现与美国人的平均教育水平和读写能力水平进行比较。结果显示，LLMs在诸如本科知识和高级阅读理解等任务中明显优于人类基准，表明朝着AGI取得了实质性进展。然而，真正的AGI需要更广泛的认知评估。该研究强调了对人工智能发展、教育和社会影响的影响，强调了对持续研究和伦理考虑的需求。

更新时间: 2024-07-11 16:38:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09573v1

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

Large language models, particularly multilingual ones, are designed, claimed, and expected to cater to native speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may not perfectly align with this objective owing to a heavy reliance on translation, which can introduce translation artefacts and defects. It remains unknown whether the nature of the instruction data has an impact on the model output; conversely, it is questionable whether translated test sets can capture such nuances. Due to the often coupled practices of using translated data in both stages, such imperfections could have been overlooked. This work investigates these issues using controlled native or translated data during instruction tuning and evaluation stages. Experiments on eight base models and eight different benchmarks show that native or generation benchmarks reveal a notable difference between native and translated instruction data especially when model performance is high, whereas other types of test sets cannot. The comparison between round-trip and single-pass translations reflects the importance of knowledge from language-native resources. Finally, we demonstrate that regularization is beneficial to bridging this gap on structured but not generative tasks.

Updated: 2024-07-11 16:37:24

标题: 这句话的翻译是：这是用于多语言教学调整的好数据，还是大型语言模型的不良多语言评估？

摘要: 大型语言模型，特别是多语言模型，旨在为各种语言的母语使用者提供服务。我们假设目前微调和评估这些模型的做法可能不完全符合这一目标，因为过度依赖翻译可能会引入翻译的缺陷和瑕疵。目前尚不清楚指导数据的性质是否会影响模型输出；相反，疑问的是翻译的测试集能否捕捉到这些细微之处。由于常常在两个阶段使用翻译数据的实践紧密相连，这些瑕疵可能被忽视。本研究使用控制的本地或翻译数据在指导调整和评估阶段探讨这些问题。对八个基础模型和八个不同的基准进行的实验表明，本地或生成基准在模型性能高时尤其能显示出本地和翻译指导数据之间明显差异，而其他类型的测试集则无法。往返和单程翻译之间的比较反映了来自语言母语资源的知识的重要性。最后，我们证明了正则化对于弥合这种差距在结构化任务上是有益的，但不适用于生成任务。

更新时间: 2024-07-11 16:37:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12822v2

Adaptive Smooth Non-Stationary Bandits

We study a $K$-armed non-stationary bandit model where rewards change smoothly, as captured by H\"{o}lder class assumptions on rewards as functions of time. Such smooth changes are parametrized by a H\"{o}lder exponent $\beta$ and coefficient $\lambda$. While various sub-cases of this general model have been studied in isolation, we first establish the minimax dynamic regret rate generally for all $K,\beta,\lambda$. Next, we show this optimal dynamic regret can be attained adaptively, without knowledge of $\beta,\lambda$. To contrast, even with parameter knowledge, upper bounds were only previously known for limited regimes $\beta\leq 1$ and $\beta=2$ (Slivkins, 2014; Krishnamurthy and Gopalan, 2021; Manegueu et al., 2021; Jia et al.,2023). Thus, our work resolves open questions raised by these disparate threads of the literature. We also study the problem of attaining faster gap-dependent regret rates in non-stationary bandits. While such rates are long known to be impossible in general (Garivier and Moulines, 2011), we show that environments admitting a safe arm (Suk and Kpotufe, 2022) allow for much faster rates than the worst-case scaling with $\sqrt{T}$. While previous works in this direction focused on attaining the usual logarithmic regret bounds, as summed over stationary periods, our new gap-dependent rates reveal new optimistic regimes of non-stationarity where even the logarithmic bounds are pessimistic. We show our new gap-dependent rate is tight and that its achievability (i.e., as made possible by a safe arm) has a surprisingly simple and clean characterization within the smooth H\"{o}lder class model.

Updated: 2024-07-11 16:37:15

标题: 自适应平滑非平稳臂赌博机

摘要: 我们研究了一个$K$臂非静态赌博机模型，其中奖励平滑变化，如时间函数的H\"{o}lder类假设所捕获。这种平滑变化由H\"{o}lder指数$\beta$和系数$\lambda$参数化。虽然这个一般模型的各种子情况已被孤立研究过，我们首先确定了所有$K,\beta,\lambda$的极小动态后悔率。接下来，我们展示了这种最优动态后悔可以自适应地实现，而无需了解$\beta,\lambda$。相比之下，即使有参数知识，上界之前仅在有限的$\beta\leq 1$和$\beta=2$范围内被已知(Slivkins, 2014; Krishnamurthy and Gopalan, 2021; Manegueu et al., 2021; Jia et al.,2023)。因此，我们的工作解决了这些文献中提出的不同主题引起的未解问题。我们还研究了在非静态赌博机中实现更快的间隙相关后悔率的问题。虽然这样的速率在一般情况下早已被认为不可能(Garivier and Moulines, 2011)，我们展示了接受一个安全臂(Suk and Kpotufe, 2022)的环境允许比$\sqrt{T}$的最坏情况缩放要快得多的速率。虽然以前的研究方向集中在实现通常的对数后悔上限，作为静态期间总和，我们的新间隙相关速率揭示了新的乐观的非静态情况，即使对数上限也是悲观的。我们展示了我们新的间隙相关率是紧的，并且它的可实现性(即由一个安全臂可能实现)在平滑的H\"{o}lder类模型中有一个令人惊讶简单和清晰的表征。

更新时间: 2024-07-11 16:37:15

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2407.08654v1

Large Pre-trained time series models for cross-domain Time series analysis tasks

Large pre-trained models have been vital in recent advancements in domains like language and vision, making model training for individual downstream tasks more efficient and provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a foundational time-series model from multi-domain time-series datasets: extracting semantically useful tokenized inputs to the model across heterogenous time-series from different domains. We propose Large Pre-trained Time-series Models (LPTM) that introduces a novel method of \textit{adaptive segmentation} that automatically identifies optimal dataset-specific segmentation strategy during pre-training. This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings. LPTM achieves superior forecasting and time-series classification results taking up to 40% less data and 50% less training time compared to state-of-art baselines.

Updated: 2024-07-11 16:32:12

标题: 大型预训练时间序列模型用于跨领域时间序列分析任务

摘要: 最近在语言和视觉等领域的进展中，大型预训练模型已经变得至关重要，使得为个体下游任务训练模型更加高效并提供卓越性能成为可能。然而，处理时间序列分析任务通常涉及设计和训练一个单独的模型，从头开始利用与任务特定的训练数据和领域专业知识。我们面临一个重大挑战，即从多领域时间序列数据集中预训练基础时间序列模型：从不同领域的异构时间序列中提取语义上有用的标记化模型输入。我们提出了大型预训练时间序列模型（LPTM），引入了一种新颖的自适应分段方法，可以在预训练期间自动识别最佳的数据集特定分段策略。这使得LPTM在微调为不同的下游时间序列分析任务并在零射击设置下表现类似或优于领域特定的最先进模型时。与最先进的基线相比，LPTM实现了卓越的预测和时间序列分类结果，需要的数据量少了高达40％，训练时间减少了50％。

更新时间: 2024-07-11 16:32:12

领域: cs.LG

下载: http://arxiv.org/abs/2311.11413v2

Confidence-based Estimators for Predictive Performance in Model Monitoring

After a machine learning model has been deployed into production, its predictive performance needs to be monitored. Ideally, such monitoring can be carried out by comparing the model's predictions against ground truth labels. For this to be possible, the ground truth labels must be available relatively soon after inference. However, there are many use cases where ground truth labels are available only after a significant delay, or in the worst case, not at all. In such cases, directly monitoring the model's predictive performance is impossible. Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed. Many of these methods leverage model confidence or other uncertainty estimates and are experimentally compared against a naive baseline method, namely Average Confidence (AC), which estimates model accuracy as the average of confidence scores for a given set of predictions. However, until now the theoretical properties of the AC method have not been properly explored. In this paper, we try to fill this gap by reviewing the AC method and show that under certain general assumptions, it is an unbiased and consistent estimator of model accuracy with many desirable properties. We also compare this baseline estimator against some more complex estimators empirically and show that in many cases the AC method is able to beat the others, although the comparative quality of the different estimators is heavily case-dependent.

Updated: 2024-07-11 16:28:31

标题: 基于置信度的模型监控中预测性能估计器

摘要: 一旦一个机器学习模型被部署到生产环境中，其预测性能就需要进行监测。理想情况下，这种监测可以通过将模型的预测结果与实际标签进行比较来进行。为了实现这一点，实际标签必须在推理后相对较快地提供。然而，有许多使用情况下，实际标签只在相当延迟之后可用，甚至在最糟糕的情况下根本不可用。在这种情况下，直接监测模型的预测性能是不可能的。最近，已经开发了一些新的方法，用于在实际标签不可用时估计模型的预测性能。这些方法中许多利用了模型置信度或其他不确定性估计，并在实验中与一个天真的基准方法进行了比较，即平均置信度（AC）方法，该方法将模型准确性估计为给定预测集的置信度分数的平均值。然而，直到现在，AC方法的理论特性还没有得到适当的探讨。本文试图填补这一空白，通过回顾AC方法，并展示在某些一般假设下，它是一个无偏和一致的模型准确性估计器，具有许多理想的特性。我们还通过实证比较这个基准估计器与一些更复杂的估计器，并显示在许多情况下AC方法能够击败其他方法，尽管不同估计器的比较质量在很大程度上取决于具体情况。

更新时间: 2024-07-11 16:28:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08649v1

BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities.

Updated: 2024-07-11 16:26:09

标题: BiGym：一个以演示为驱动的移动双手双臂操作基准测试

摘要: 我们介绍了一种名为BiGym的新的基准和学习环境，用于移动双手示范驱动的机器人操作。BiGym包含40个不同的任务，设置在家庭环境中，从简单的目标到复杂的厨房清洁。为了准确捕捉真实世界的表现，我们为每个任务提供了人类收集的示范，反映了真实世界机器人轨迹中发现的多种形式。BiGym支持各种观察方式，包括本体感知数据和视觉输入，如RGB和来自3个摄像头视图的深度。为了验证BiGym的可用性，我们在环境中彻底评估了最先进的模仿学习算法和示范驱动的强化学习算法，并讨论了未来的机会。

更新时间: 2024-07-11 16:26:09

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.07788v2

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations. This is different from previous methods which only can work on a specific motion domain or show weak control abilities with diffusion prior. To achieve our goal, we design several domain-aware motion field adapters (\ie, MOFA-Adapters) to control the generated motions in the video generation pipeline. For MOFA-Adapters, we consider the temporal motion consistency of the video and generate the dense motion flow from the given sparse control conditions first, and then, the multi-scale features of the given image are wrapped as a guided feature for stable video diffusion generation. We naively train two motion adapters for the manual trajectories and the human landmarks individually since they both contain sparse information about the control. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation. Project Page: https://myniuuu.github.io/MOFA_Video/

Updated: 2024-07-11 16:26:03

标题: MOFA-Video：通过冻结的图像到视频扩散模型中的生成运动场调整实现可控图像动画

摘要: 我们提出了MOFA-Video，这是一种先进的可控图像动画方法，通过使用各种额外的可控信号（如人类标志物参考、手动轨迹和其他提供的视频）或它们的组合从给定图像生成视频。这与以前的方法不同，以前的方法只能在特定运动领域上工作，或者在扩散先验条件下显示弱控制能力。为了实现我们的目标，我们设计了几个领域感知的运动场适配器（即MOFA-Adapters）来控制视频生成流程中生成的动作。对于MOFA-Adapters，我们考虑视频的时间运动一致性，并首先从给定的稀疏控制条件生成密集的运动流，然后将给定图像的多尺度特征包装为引导特征，用于稳定视频扩散生成。我们分别为手动轨迹和人类标志物训练了两个运动适配器，因为它们都包含有关控制的稀疏信息。训练后，不同领域的MOFA-Adapters也可以共同工作，以获得更可控的视频生成。项目页面：https://myniuuu.github.io/MOFA_Video/

更新时间: 2024-07-11 16:26:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20222v3

From Real to Cloned Singer Identification

Cloned voices of popular singers sound increasingly realistic and have gained popularity over the past few years. They however pose a threat to the industry due to personality rights concerns. As such, methods to identify the original singer in synthetic voices are needed. In this paper, we investigate how singer identification methods could be used for such a task. We present three embedding models that are trained using a singer-level contrastive learning scheme, where positive pairs consist of segments with vocals from the same singers. These segments can be mixtures for the first model, vocals for the second, and both for the third. We demonstrate that all three models are highly capable of identifying real singers. However, their performance deteriorates when classifying cloned versions of singers in our evaluation set. This is especially true for models that use mixtures as an input. These findings highlight the need to understand the biases that exist within singer identification systems, and how they can influence the identification of voice deepfakes in music.

Updated: 2024-07-11 16:25:21

标题: 从真实到克隆歌手识别

摘要: Popular singers的克隆声音越来越逼真，并在过去几年中变得越来越受欢迎。然而，由于个性权利的担忧，它们对音乐产业构成威胁。因此，需要方法来识别合成声音中的原唱歌手。在本文中，我们研究了如何将歌手识别方法应用于这一任务。我们提出了三种embedding模型，这些模型使用歌手级对比学习方案进行训练，其中正对组成员包括来自相同歌手的带有人声的段落。这些段落可以是混合音频，第二个模型是人声，第三个模型是两者兼有。我们证明这三种模型都非常擅长识别真实的歌手。然而，在我们的评估集中，当对克隆版本的歌手进行分类时，它们的性能会下降。对于使用混合音频作为输入的模型来说尤其如此。这些发现突显了需要了解歌手识别系统中存在的偏见，以及它们如何影响识别音乐中的声音深度伪造。

更新时间: 2024-07-11 16:25:21

领域: cs.SD,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.08647v1

BenthicNet: A global compilation of seafloor images for deep learning applications

Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614.

Updated: 2024-07-11 16:24:52

标题: BenthicNet：用于深度学习应用的全球海床图像汇编

摘要: 水下成像技术的进步使得收集广泛的海底图像数据集成为监测重要底栖生态系统所必需的。收集海底图像的能力已经超过了我们分析的能力，这妨碍了这些重要环境信息的迅速应用。最近的机器学习方法提供了增加海底图像数据集分析效率的机会，然而支持这些方法发展所必需的大型和一致的数据集却很稀缺。在这里，我们介绍了BenthicNet：一个全球海底图像编译，旨在支持大规模图像识别模型的训练和评估。初步收集了超过1140万张图像，并经过筛选，用代表性的130万张图像来代表多样的海底环境。这些图像伴随着26万个CATAMI方案的注释，涵盖了19万张图像。在这个编译中训练了一个大规模深度学习模型，初步结果表明它能够用于自动化大规模和小规模图像分析任务。这个编译和模型已经向科学界开放使用，网址为https://doi.org/10.20383/103.0614。

更新时间: 2024-07-11 16:24:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.05241v2

How more data can hurt: Instability and regularization in next-generation reservoir computing

It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.

Updated: 2024-07-11 16:22:13

标题: 如何更多的数据可能会对其造成伤害：下一代储层计算中的不稳定性和正则化

摘要: 最近发现更多数据可能出人意料地会损害深度神经网络的性能。在这里，我们展示了这种现象的更极端版本在数据驱动的动态系统模型中发生。为了阐明其基本机制，我们关注下一代储水计算（NGRC）- 一种从数据中学习动态的流行框架。我们发现，尽管使用更多训练数据学习了流动图的更好表征，NGRC可能会采用一个条件不佳的“积分器”并失去稳定性。我们将这种数据诱导的不稳定性与NGRC中延迟状态所创建的辅助维度联系起来。基于这些发现，我们提出了简单的策略来减轻不稳定性，要么通过与数据大小一同增加正则化强度，要么在训练过程中谨慎引入噪声。我们的结果突显了在数据驱动建模动态系统中适当正则化的重要性。

更新时间: 2024-07-11 16:22:13

领域: cs.LG,cs.NE,math.DS,nlin.AO

下载: http://arxiv.org/abs/2407.08641v1

Clap: a Semantic-Preserving Optimizing eDSL for Plonkish Proof Systems

Plonkish is a popular circuit format for developing zero-knowledge proof systems that powers a number of major projects in the blockchain space, responsible for holding billions of dollars and processing millions of transactions per day. These projects, including zero-knowledge rollups, rely on highly hand-optimized circuits whose correctness comes at the cost of time-consuming testing and auditing. In this paper, we present Clap, the first Rust eDSL with a proof system agnostic circuit format, facilitating extensibility, automatic optimizations, and formal assurances for the resultant constraint system. Clap casts the problem of producing Plonkish constraint systems and their witness generators as a semantic-preserving compilation problem. Soundness and completeness of the transformation guarantees the absence of subtle bugs caused by under- or over-constraining. Our experimental evaluation shows that its automatic optimizations achieve better performance compared to manual circuit optimization. The optimizer can also be used to automatically derive custom gates from circuit descriptions.

Updated: 2024-07-11 16:21:25

标题: Clap：一种保留语义的优化eDSL，用于Plonkish证明系统

摘要: Plonkish是一种流行的电路格式，用于开发零知识证明系统，在区块链领域支持许多重要项目，负责持有数十亿美元和每天处理数百万交易。这些项目，包括零知识Rollups，依赖于高度手动优化的电路，其正确性以耗时的测试和审计为代价。在本文中，我们提出了Clap，这是第一个具有证明系统不可知电路格式的Rust eDSL，促进了可扩展性、自动优化和对生成的约束系统的形式保证。Clap将生成Plonkish约束系统及其见证生成器的问题作为一个保持语义的编译问题。转换的完备性和完整性保证了由于过度或不足约束而引起的微妙错误的缺失。我们的实验评估显示，其自动优化比手动电路优化实现了更好的性能。优化器还可以用于自动从电路描述中导出自定义门。

更新时间: 2024-07-11 16:21:25

领域: cs.CR

下载: http://arxiv.org/abs/2405.12115v2

$β$-DPO: Direct Preference Optimization with Dynamic $β$

Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $\beta$, as well as to the quality of the preference data. We analyze the impact of $\beta$ and data quality on DPO, uncovering that optimal $\beta$ values vary with the informativeness of pairwise data. Addressing the limitations of static $\beta$ values, we introduce a novel framework that dynamically calibrates $\beta$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $\beta$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $\beta$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{https://github.com/junkangwu/beta-DPO}.

Updated: 2024-07-11 16:21:18

标题: $β$-DPO: 动态β直接偏好优化

摘要: 直接偏好优化（DPO）已成为训练大型语言模型（LLMs）以符合人类偏好的引人注目的方法。然而，DPO的性能对其权衡参数$\beta$的微调以及偏好数据的质量敏感。我们分析了$\beta$和数据质量对DPO的影响，发现最佳$\beta$值随成对数据的信息量而变化。为了解决静态$\beta$值的局限性，我们引入了一个新颖的框架，该框架通过考虑数据质量来在批处理级别动态校准$\beta$。此外，我们的方法还包括$\beta$引导的数据过滤，以防止异常值的影响。通过实证评估，我们证明了我们的动态$\beta$调整技术显著提高了DPO在各种模型和数据集上的性能，为将LLMs与人类反馈对齐提供了更加健壮和可适应的训练范式。该代码可在\url{https://github.com/junkangwu/beta-DPO} 上找到。

更新时间: 2024-07-11 16:21:18

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08639v1

Toto: Time Series Optimized Transformer for Observability

This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be specifically tuned for observability metrics. Toto was trained on a dataset of one trillion time series data points, the largest among all currently published time series foundation models. Alongside publicly available time series datasets, 75% of the data used to train Toto consists of fully anonymous numerical metric data points from the Datadog platform. In our experiments, Toto outperforms existing time series foundation models on observability data. It does this while also excelling at general-purpose forecasting tasks, achieving state-of-the-art zero-shot performance on multiple open benchmark datasets.

Updated: 2024-07-11 16:18:40

标题: Toto: 时间序列优化Transformer用于可观测性

摘要: 这份技术报告描述了由Datadog开发的用于可观测性的时间序列优化变压器（Toto），这是一种新的时间序列预测领域的最新基础模型。除了在电力和天气等领域的广义时间序列基准上推进技术进步外，这个模型是第一个专门针对可观测性指标进行调整的通用时间序列预测基础模型。 Toto是在一个包含万亿个时间序列数据点的数据集上进行训练的，这是目前所有已发布的时间序列基础模型中最大的。除了公开可用的时间序列数据集外，用于训练Toto的数据中有75%是来自Datadog平台的完全匿名的数值度量数据点。在我们的实验中，Toto在可观测性数据上表现优于现有的时间序列基础模型。它不仅在通用预测任务上表现出色，还在多个开放基准数据集上实现了最先进的零击中表现。

更新时间: 2024-07-11 16:18:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.07874v2

A Novel Framework for Automated Warehouse Layout Generation

Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.

Updated: 2024-07-11 16:15:09

标题: 一个自动仓库布局生成的新框架

摘要: 优化仓库布局是至关重要的，因为它对效率和生产力有显著影响。我们提出了一种基于人工智能的自动化仓库布局生成框架。该框架采用受限束搜索来在给定的空间参数内得出最佳布局，遵循所有功能要求。生成的布局的可行性根据项目可访问性、所需最小间隙和通道连接等标准进行验证。然后使用评分函数来评估可行布局，考虑存储位置数量、访问点和可访问性成本。我们展示了我们的方法能够为各种仓库尺寸和形状、不同门位置和互联方式生成可行的最佳布局。这种方法目前正在准备部署，将使人类设计师能够快速探索和确认选项，促进选择最适合其用例的布局。

更新时间: 2024-07-11 16:15:09

领域: cs.AI

下载: http://arxiv.org/abs/2407.08633v1

Generalization Error Matters in Decentralized Learning Under Byzantine Attacks

Recently, decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm that enables model training across geographically distributed agents in a scalable manner, without the presence of any central server. When some of the agents are malicious (also termed as Byzantine), resilient decentralized learning algorithms are able to limit the impact of these Byzantine agents without knowing their number and identities, and have guaranteed optimization errors. However, analysis of the generalization errors, which are critical to implementations of the trained models, is still lacking. In this paper, we provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized stochastic gradient descent (DSGD) algorithms. Our theoretical results reveal that the generalization errors cannot be entirely eliminated because of the presence of the Byzantine agents, even if the number of training samples are infinitely large. Numerical experiments are conducted to confirm our theoretical results.

Updated: 2024-07-11 16:12:53

标题: 分散学习中拜占庭攻击下的泛化误差问题很重要

摘要: 最近，分散式学习已经成为一种流行的点对点信号和信息处理范式，它能够以可扩展的方式在地理分布的代理之间进行模型训练，而无需中央服务器的存在。当一些代理是恶意的（也称为拜占庭），具有韧性的分散式学习算法能够限制这些拜占庭代理的影响，而不需要知道它们的数量和身份，并且保证优化错误。然而，对于训练模型实现至关重要的泛化错误的分析仍然缺乏。在本文中，我们提供了一类流行的拜占庭韧性分散式随机梯度下降（DSGD）算法的泛化错误的首次分析。我们的理论结果揭示，由于拜占庭代理的存在，即使训练样本的数量是无限大，泛化错误也无法完全消除。我们进行了数值实验来验证我们的理论结果。

更新时间: 2024-07-11 16:12:53

领域: cs.LG

下载: http://arxiv.org/abs/2407.08632v1

RoboMorph: Evolving Robot Morphology using Large Language Models

We introduce RoboMorph, an automated approach for generating and optimizing modular robot designs using large language models (LLMs) and evolutionary algorithms. In this framework, we represent each robot design as a grammar and leverage the capabilities of LLMs to navigate the extensive robot design space, which is traditionally time-consuming and computationally demanding. By integrating automatic prompt design and a reinforcement learning based control algorithm, RoboMorph iteratively improves robot designs through feedback loops. Our experimental results demonstrate that RoboMorph can successfully generate nontrivial robots that are optimized for a single terrain while showcasing improvements in morphology over successive evolutions. Our approach demonstrates the potential of using LLMs for data-driven and modular robot design, providing a promising methodology that can be extended to other domains with similar design frameworks.

Updated: 2024-07-11 16:05:56

标题: RoboMorph: 使用大型语言模型演化机器人形态

摘要: 我们介绍了RoboMorph，这是一种利用大型语言模型（LLMs）和进化算法生成和优化模块化机器人设计的自动化方法。在这个框架中，我们将每个机器人设计表示为一个语法，并利用LLMs的能力来导航传统上耗时且计算需求高的广泛机器人设计空间。通过集成自动提示设计和基于强化学习的控制算法，RoboMorph通过反馈循环迭代地改进机器人设计。我们的实验结果表明，RoboMorph可以成功生成针对单一地形优化的非平凡机器人，同时展示出在连续进化过程中形态学改进。我们的方法展示了利用LLMs进行数据驱动和模块化机器人设计的潜力，提供了一种有希望扩展到其他具有类似设计框架的领域的方法论。

更新时间: 2024-07-11 16:05:56

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.08626v1

RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial Data Manipulation

Federated learning has emerged as a promising privacy-preserving solution for machine learning domains that rely on user interactions, particularly recommender systems and online learning to rank. While there has been substantial research on the privacy of traditional federated learning, little attention has been paid to the privacy properties of these interaction-based settings. In this work, we show that users face an elevated risk of having their private interactions reconstructed by the central server when the server can control the training features of the items that users interact with. We introduce RAIFLE, a novel optimization-based attack framework where the server actively manipulates the features of the items presented to users to increase the success rate of reconstruction. Our experiments with federated recommendation and online learning-to-rank scenarios demonstrate that RAIFLE is significantly more powerful than existing reconstruction attacks like gradient inversion, achieving high performance consistently in most settings. We discuss the pros and cons of several possible countermeasures to defend against RAIFLE in the context of interaction-based federated learning. Our code is open-sourced at https://github.com/dzungvpham/raifle.

Updated: 2024-07-11 16:04:27

标题: RAIFLE：基于交互的联邦学习中的重构攻击与对抗数据操纵

摘要: 联邦学习已经成为一种有前途的保护隐私的解决方案，适用于依赖用户交互的机器学习领域，特别是推荐系统和在线学习排名。虽然传统联邦学习的隐私性已经得到了大量研究，但很少有人关注这些基于交互的设置的隐私属性。在这项工作中，我们展示了当服务器可以控制用户与之交互的项的训练特征时，用户面临着私人交互被中央服务器重构的风险增加。我们介绍了RAIFLE，这是一个基于优化的新型攻击框架，其中服务器积极操纵呈现给用户的项目的特征，以增加重构的成功率。我们的实验涉及联邦推荐和在线学习排名场景，表明RAIFLE比现有的重构攻击如梯度反转更加强大，在大多数设置中始终表现出高性能。我们讨论了几种可能的对抗措施的优缺点，以抵御基于交互的联邦学习中的RAIFLE。我们的代码在https://github.com/dzungvpham/raifle 上开源。

更新时间: 2024-07-11 16:04:27

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2310.19163v2

Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)

The advancement in computational power and hardware efficiency has enabled the tackling of increasingly complex and high-dimensional problems. While artificial intelligence (AI) has achieved remarkable results in various scientific and technological fields, the interpretability of these high-dimensional solutions remains challenging. A critical issue in this context is the comparison of multidimensional quantities, which is essential in techniques like Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and k-means clustering. Common metrics such as cosine similarity, Euclidean distance, and Manhattan distance are often used for such comparisons - for example in muscular synergies of the human motor control system. However, their applicability and interpretability diminish as dimensionality increases. This paper provides a comprehensive analysis of the effects of dimensionality on these three widely used metrics. Our results reveal significant limitations of cosine similarity, particularly its dependency on the dimensionality of the vectors, leading to biased and less interpretable outcomes. To address this, we introduce the Dimension Insensitive Euclidean Metric (DIEM), derived from the Euclidean distance, which demonstrates superior robustness and generalizability across varying dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a more reliable tool for high-dimensional comparisons. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine learning and deep learning.

Updated: 2024-07-11 16:00:22

标题: 超越余弦相似度进行多维比较：维度无关的欧几里德度量（DIEM）

摘要: 计算能力和硬件效率的进步使得处理日益复杂和高维问题成为可能。虽然人工智能在各种科学和技术领域取得了显著成就，但这些高维解决方案的可解释性仍然具有挑战性。在这种情况下，一个关键问题是多维量的比较，在主成分分析（PCA）、奇异值分解（SVD）和k均值聚类等技术中至关重要。通常用于这种比较的共同指标如余弦相似度、欧几里得距离和曼哈顿距离，例如在人类运动控制系统的肌肉协同中经常使用。然而，随着维度的增加，它们的适用性和可解释性会减弱。本文对这三个广泛使用的指标在维度上的影响进行了全面分析。我们的结果显示了余弦相似度存在显著限制，特别是其对向量维度的依赖性，导致偏倚和不太可解释的结果。为了解决这个问题，我们引入了派生自欧几里得距离的“维度不敏感欧几里得度量”（DIEM），展示了在不同维度下的卓越稳健性和普适性。DIEM保持了一致的变异性，并消除了传统指标中观察到的偏见，使其成为高维比较更可靠的工具。这种新颖的指标有潜力取代余弦相似度，提供更准确和深入的分析多维数据的方法，涵盖从神经运动控制到机器学习和深度学习等领域。

更新时间: 2024-07-11 16:00:22

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.08623v1

Particle Swarm Optimization based on Novelty Search

In this paper we propose a Particle Swarm Optimization algorithm combined with Novelty Search. Novelty Search finds novel place to search in the search domain and then Particle Swarm Optimization rigorously searches that area for global optimum solution. This method is never blocked in local optima because it is controlled by Novelty Search which is objective free. For those functions where there are many more local optima and second global optimum is far from true optimum, the present method works successfully. The present algorithm never stops until it searches entire search area. A series of experimental trials prove the robustness and effectiveness of the present algorithm on complex optimization test functions.

Updated: 2024-07-11 16:00:01

标题: 基于新颖性搜索的粒子群优化

摘要: 在本文中，我们提出了一种将粒子群优化算法与新颖性搜索结合的方法。新颖性搜索在搜索领域中找到新颖的位置进行搜索，然后粒子群优化算法在该区域严格搜索全局最优解。这种方法永远不会被局部最优解阻碍，因为它由无目标的新颖性搜索控制。对于那些存在许多局部最优解且第二全局最优解远离真正最优解的函数，本方法可以成功运行。该算法永远不会停止，直到搜索完整个搜索区域。一系列实验试验证明了本算法在复杂优化测试函数上的稳健性和有效性。

更新时间: 2024-07-11 16:00:01

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2203.05674v3

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps. At the same time, it has been repeatedly shown that LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available, on three algorithmic tasks characterized by the possibility to control the problem difficulty with two parameters. We compare the performance of GPT-4 with that of its predecessor (GPT-3.5) and with a variant of the Transformer-Encoder architecture recently introduced to solve similar tasks, the Neural Data Router. We find that the deployment of advanced prompting techniques allows GPT-4 to reach superior accuracy on all tasks, demonstrating that state-of-the-art LLMs constitute a very strong baseline also in challenging tasks that require systematic generalization.

Updated: 2024-07-11 15:54:45

标题: 在算法问题上对GPT-4进行基准测试：提示策略的系统评估

摘要: 大型语言模型（LLMs）通过其在大量文本语料库上获取的知识重复使用能力，在自然语言处理领域引起了革命，可以在各种下游任务中以最小（如果有的话）的调整步骤。与此同时，已经多次显示LLMs缺乏系统化泛化能力，无法将学习到的统计规律推广到训练分布之外。在这项工作中，我们对目前最先进的LLMs之一GPT-4进行了系统化基准测试，测试了三个算法任务，这些任务可以通过两个参数控制问题的难度。我们将GPT-4的性能与其前身（GPT-3.5）以及最近引入的解决类似任务的Transformer-Encoder架构变体Neural Data Router进行了比较。我们发现，采用先进的提示技术使GPT-4在所有任务上达到了更高的准确性，证明了目前最先进的LLMs在需要系统化泛化的挑战性任务中也构成了一个非常强大的基准。

更新时间: 2024-07-11 15:54:45

领域: cs.CL,cs.AI,cs.NE

下载: http://arxiv.org/abs/2402.17396v2

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.

Updated: 2024-07-11 15:51:39

标题: ODE方法用于具有马尔可夫噪声的随机逼近和强化学习

摘要: 随机逼近是一类算法，通过迭代、增量和随机地更新向量，包括随机梯度下降和时间差学习等。分析随机逼近算法的一个基本挑战是建立其稳定性，即证明随机向量迭代几乎必定有界。在本文中，我们将著名的Borkar-Meyn定理从鞅差噪声设置扩展到马尔可夫噪声设置，这大大提高了其在强化学习中的适用性，特别是在具有线性函数逼近和资格痕迹的离策略强化学习算法中。我们分析的核心是一些函数的渐近变化速率递减，这是由强大数定律的一种形式和常用的V4 Lyapunov漂移条件所隐含的，如果马尔可夫链是有限和不可约的，则显然成立。

更新时间: 2024-07-11 15:51:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.07844v5

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques to manage video-based reports is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about a reported bug. In this paper, we aim to overcome these challenges by advancing the bug report management task of duplicate detection for video-based reports. To this end, we introduce a new approach, called JANUS, that adapts the scene-learning capabilities of vision transformers to capture subtle visual and textual patterns that manifest on app UI screens - which is key to differentiating between similar screens for accurate duplicate report detection. JANUS also makes use of a video alignment technique capable of adaptive weighting of video frames to account for typical bug manifestation patterns. In a comprehensive evaluation on a benchmark containing 7,290 duplicate detection tasks derived from 270 video-based bug reports from 90 Android app bugs, the best configuration of our approach achieves an overall mRR/mAP of 89.8%/84.7%, and for the large majority of duplicate detection tasks, outperforms prior work by around 9% to a statistically significant degree. Finally, we qualitatively illustrate how the scene-learning capabilities provided by Janus benefits its performance.

Updated: 2024-07-11 15:48:36

标题: 语义GUI场景学习和视频对齐用于检测重复的基于视频的错误报告

摘要: 基于视频的故障报告越来越多地被用来记录围绕图形用户界面（GUI）的程序的故障。然而，开发自动化技术来管理基于视频的报告是具有挑战性的，因为它需要识别和理解捕捉关于报告的故障的关键信息的通常微妙的视觉模式。在本文中，我们旨在通过推进视频报告的重复检测的故障报告管理任务来克服这些挑战。为此，我们引入了一种新方法，名为JANUS，它借鉴了视觉变换器的场景学习能力，以捕捉在应用UI屏幕上显现的微妙的视觉和文本模式 - 这对于准确检测重复报告之间的类似屏幕至关重要。JANUS还利用了一种视频对齐技术，能够自适应地加权视频帧，以考虑典型的故障表现模式。在一个包含来自90个Android应用程序故障的270个基于视频的故障报告衍生的7,290个重复检测任务的基准测试中，我们方法的最佳配置实现了89.8%/84.7%的总体mRR/mAP，并且对于大多数重复检测任务，优于以前的工作约9%至统计学上显著的程度。最后，我们定性地说明了Janus提供的场景学习能力如何提高其性能。

更新时间: 2024-07-11 15:48:36

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2407.08610v1

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. FlashAttention elaborated an approach to speed up attention on GPUs through minimizing memory reads/writes. However, it has yet to take advantage of new capabilities present in recent hardware, with FlashAttention-2 achieving only 35% utilization on the H100 GPU. We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. We demonstrate that our method, FlashAttention-3, achieves speedup on H100 GPUs by 1.5-2.0$\times$ with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. We validate that FP8 FlashAttention-3 achieves 2.6$\times$ lower numerical error than a baseline FP8 attention.

Updated: 2024-07-11 15:44:48

标题: FlashAttention-3：具有异步性和低精度的快速准确的注意力

摘要: 注意力作为无处不在的Transformer架构的核心层，是大型语言模型和长上下文应用的瓶颈。FlashAttention通过最小化内存读写来加速GPU上的注意力，但尚未充分利用最新硬件的新功能，FlashAttention-2仅在H100 GPU上实现了35%的利用率。我们开发了三种主要技术来加速Hopper GPU上的注意力：利用Tensor Cores和TMA的异步性来（1）通过warp-specialization重叠整体计算和数据移动，（2）交错块状矩阵乘法和softmax操作，以及（3）块量化和不连贯处理，利用硬件支持FP8低精度。我们证明了我们的方法FlashAttention-3在H100 GPU上实现了1.5-2.0$\times$的加速，FP16达到了高达740 TFLOPs/s（利用率75%），而FP8接近1.2 PFLOPs/s。我们验证了FP8 FlashAttention-3比基线FP8注意力的数值误差降低了2.6$\times$。

更新时间: 2024-07-11 15:44:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08608v1

Simulacra as Conscious Exotica

The advent of conversational agents with increasingly human-like behaviour throws old philosophical questions into new light. Does it, or could it, ever make sense to speak of AI agents built out of generative language models in terms of consciousness, given that they are "mere" simulacra of human behaviour, and that what they do can be seen as "merely" role play? Drawing on the later writings of Wittgenstein, this paper attempts to tackle this question while avoiding the pitfalls of dualistic thinking.

Updated: 2024-07-11 15:42:47

标题: Simulacra作为有意识的异国情调

摘要: 随着具有越来越类似人类行为的对话代理的出现，古老的哲学问题被重新审视。在讨论基于生成语言模型构建的人工智能代理是否具有意识时，是否有必要或可能将其视为“仅仅”是人类行为的模拟，以及他们所做的是否仅仅是“仅仅”是角色扮演？本文借鉴维特根斯坦的后期著作，试图探讨这个问题，同时避免二元思维的陷阱。

更新时间: 2024-07-11 15:42:47

领域: cs.AI

下载: http://arxiv.org/abs/2402.12422v2

Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise two probabilistic finite-state agents interacting in a shared continuous-state environment. Each agent observes the environment using a neural perception mechanism, which converts inputs such as images into symbolic percepts, and makes decisions symbolically. We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs. These require a finite decomposition of the environment induced by the neural perception mechanisms of the agents and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose a value iteration algorithm called B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved.

Updated: 2024-07-11 15:40:13

标题: 零和神经符号并发随机博弈的策略合成

摘要: 神经符号人工智能方法将神经网络与经典符号技术结合起来，日益受到关注，这需要形式化方法来推理它们的正确性。我们提出了一种新颖的建模形式称为神经符号并发随机博弈（NS-CSGs），其中包括两个概率有限状态的代理在共享的连续状态环境中相互作用。每个代理使用神经感知机制观察环境，将输入（如图像）转换为符号感知，并进行符号决策。我们专注于具有伯雷尔状态空间的NS-CSGs类，并证明了在对这类模型的组件施加分段常数限制下，零和折扣累积奖励的价值函数的存在性和可测性。为了计算价值和综合策略，我们首次提出了实用的值迭代（VI）和策略迭代（PI）算法来解决这个新的连续状态CSGs子类。这些算法需要由代理的神经感知机制引起的环境的有限分解，并依赖于价值函数和策略的有限抽象表示，这些表示在VI或PI下是封闭的。首先，我们引入了价值函数的伯雷尔可测分段常数（B-PWC）表示，将极小化备份扩展到这种表示，并提出了一个称为B-PWC VI的值迭代算法。其次，我们引入了两种新颖的价值函数和策略表示，分别为常数分段线性（CON-PWL）和常数分段常数（CON-PWC），并提出了将最小最大动作自由PI扩展到伯雷尔状态空间的最小最大动作自由PI，这不需要解决正态形式的游戏。

更新时间: 2024-07-11 15:40:13

领域: cs.AI,cs.GT,cs.LO

下载: http://arxiv.org/abs/2202.06255v7

Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules

Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.

Updated: 2024-07-11 15:33:35

标题: 推理时间规则擦除器：通过提炼和移除偏见规则实现公平识别

摘要: 机器学习模型经常基于偏见特征（如性别、种族和其他社会属性）进行预测，这在社会应用中（如招聘、银行业和刑事司法）尤其存在重大的公平风险。传统方法解决这一问题通常涉及重新训练或微调具有公平感知优化目标的神经网络。然而，由于需要大量的计算资源、复杂的工业测试以及相关的CO2排放，这些方法可能并不实际。此外，普通用户通常无法微调模型，因为他们缺乏对模型参数的访问权限。在本文中，我们介绍了推理时间规则擦除器（Eraser），这是一种旨在通过在推理过程中从部署的模型中删除偏见决策规则而不改变模型权重的新方法，以解决公平性问题。我们首先通过贝叶斯分析建立了修改模型输出以消除偏见规则的理论基础。接下来，我们提出了Eraser的具体实现，包括两个阶段：（1）将部署模型中的偏见规则提炼到一个额外的补丁模型中，以及（2）在推理过程中从部署模型的输出中删除这些偏见规则。大量实验证实了我们方法的有效性，展示了其在解决AI系统公平性问题方面的优越性能。

更新时间: 2024-07-11 15:33:35

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.04814v3

Vision language models are blind

Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: https://vlmsareblind.github.io/

Updated: 2024-07-11 15:33:10

标题: 视觉语言模型是盲目的

摘要: 大型具有视觉能力的语言模型（VLMs），例如GPT-4o和Gemini 1.5 Pro，正在为无数的图像文本应用程序提供动力，并在许多视觉理解基准测试中得分很高。我们提出了BlindTest，这是一个包含7个对人类来说荒谬简单的视觉任务的套件，例如识别（a）两个圆是否重叠；（b）两条线是否相交；（c）一个单词中哪个字母被圈出；以及（d）在一个类似奥林匹克标志的徽标中计算圆圈的数量。令人惊讶的是，四个最先进的VLMs在我们的基准测试中平均只有56.20％的准确率，\newsonnet是最好的（73.77％的准确率）。在BlindTest中，VLMs在需要精确的空间信息和计数（从0到10）的任务上遇到困难，有时会给出一个近视者看到细节模糊并做出合理猜测的印象。代码可在以下网址获取：https://vlmsareblind.github.io/

更新时间: 2024-07-11 15:33:10

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06581v2

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.

Updated: 2024-07-11 15:31:53

标题: 使用双层自适应损失压缩加速深度学习推荐模型训练中的通信

摘要: DLRM是一种最先进的推荐系统模型，在各种行业应用中得到了广泛采用。然而，DLRM模型的巨大规模需要使用多个设备/GPUs进行高效训练。在这个过程中的一个重要瓶颈是需要耗时的全对全通信来收集来自所有设备的嵌入数据。为了缓解这个问题，我们引入了一种方法，利用误差有界的有损压缩来减小通信数据规模并加速DLRM训练。我们开发了一种新颖的误差有界的有损压缩算法，通过深入分析嵌入数据特征来实现高压缩比。此外，我们引入了一个双级自适应策略来调整误差上界，涵盖了表格级和迭代级两个方面，以平衡压缩带来的好处与对准确性的潜在影响。我们进一步优化了我们在GPU上为PyTorch张量设计的压缩器，最小化了压缩开销。评估结果显示，我们的方法实现了1.38倍的训练加速，对准确性的影响最小。

更新时间: 2024-07-11 15:31:53

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.04272v3

Learning Program Behavioral Models from Synthesized Input-Output Pairs

We introduce Modelizer - a novel framework that, given a black-box program, learns a _model from its input/output behavior_ using _neural machine translation_. The resulting model _mocks_ the original program: Given an input, the model predicts the output that would have been produced by the program. However, the model is also _reversible_ - that is, the model can predict the input that would have produced a given output. Finally, the model is _differentiable_ and can be efficiently restricted to predict only a certain aspect of the program behavior. Modelizer uses _grammars_ to synthesize inputs and to parse the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams. Other than input and output grammars, Modelizer only requires the ability to execute the program. The resulting models are _small_, requiring fewer than 6.3 million parameters for languages such as Markdown or HTML; and they are _accurate_, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. We foresee several _applications_ of these models, especially as the output of the program can be any aspect of program behavior. Besides mocking and predicting program behavior, the model can also synthesize inputs that are likely to produce a particular behavior, such as failures or coverage.

Updated: 2024-07-11 15:25:02

标题: 从合成输入输出对中学习程序行为模型

摘要: 我们引入了Modelizer - 一个新颖的框架，给定一个黑盒程序，使用神经机器翻译从其输入/输出行为中学习模型。生成的模型模拟了原始程序：给定一个输入，模型预测程序会产生的输出。然而，该模型也是可逆的 - 即模型可以预测生成给定输出所需的输入。最后，该模型是可微分的，并且可以有效地限制只预测程序行为的某个方面。 Modelizer使用语法来合成输入和解析生成的输出，从而使其能够学习标记流之间的序列-序列关联。除了输入和输出语法外，Modelizer只需要执行程序的能力。生成的模型规模小，对于诸如Markdown或HTML等语言，只需要不到630万个参数；它们准确性高，可以在模拟真实应用程序中达到高达95.4%的准确性和0.98的BLEU分数，标准误差为0.04。我们预见到这些模型的几个应用，特别是因为程序的输出可以是程序行为的任何方面。除了模拟和预测程序行为外，模型还可以合成可能产生特定行为的输入，例如故障或覆盖。

更新时间: 2024-07-11 15:25:02

领域: cs.SE,cs.LG,68T07 (Primary), 68N30 (Secondary), 68Q42,D.2.5; D.2.7; I.2.6; F.1.1; F.4.3

下载: http://arxiv.org/abs/2407.08597v1

NeuroIDBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research

Biometric systems based on brain activity have been proposed as an alternative to passwords or to complement current authentication techniques. By leveraging the unique brainwave patterns of individuals, these systems offer the possibility of creating authentication solutions that are resistant to theft, hands-free, accessible, and potentially even revocable. However, despite the growing stream of research in this area, faster advance is hindered by reproducibility problems. Issues such as the lack of standard reporting schemes for performance results and system configuration, or the absence of common evaluation benchmarks, make comparability and proper assessment of different biometric solutions challenging. Further, barriers are erected to future work when, as so often, source code is not published open access. To bridge this gap, we introduce NeuroIDBench, a flexible open source tool to benchmark brainwave-based authentication models. It incorporates nine diverse datasets, implements a comprehensive set of pre-processing parameters and machine learning algorithms, enables testing under two common adversary models (known vs unknown attacker), and allows researchers to generate full performance reports and visualizations. We use NeuroIDBench to investigate the shallow classifiers and deep learning-based approaches proposed in the literature, and to test robustness across multiple sessions. We observe a 37.6% reduction in Equal Error Rate (EER) for unknown attacker scenarios (typically not tested in the literature), and we highlight the importance of session variability to brainwave authentication. All in all, our results demonstrate the viability and relevance of NeuroIDBench in streamlining fair comparisons of algorithms, thereby furthering the advancement of brainwave-based authentication through robust methodological practices.

Updated: 2024-07-11 15:18:17

标题: NeuroIDBench：一种用于大脑波认证研究方法标准化的开源基准框架

摘要: 基于脑活动的生物特征系统已被提议作为密码的替代方案或用来补充当前的身份验证技术。通过利用个体独特的脑波模式，这些系统提供了创建具有防盗、免持续、易访问甚至可撤销的身份验证解决方案的可能性。然而，尽管这一领域的研究日益增多，但快速进展受到可复制性问题的阻碍。诸如缺乏性能结果和系统配置的标准报告方案，或者缺乏共同的评估基准等问题，使得不同生物特征解决方案的可比性和正确评估变得具有挑战性。此外，当源代码未以开放方式发布时，也会对未来工作设置障碍。为了弥合这一差距，我们引入了NeuroIDBench，这是一个灵活的开源工具，用于评估基于脑波的身份验证模型。它包含九个不同的数据集，实施了一套全面的预处理参数和机器学习算法，使得在两种常见的对抗模型（已知攻击者与未知攻击者）下进行测试，并允许研究人员生成完整的性能报告和可视化。我们使用NeuroIDBench来研究文献中提出的浅层分类器和基于深度学习的方法，并测试跨多个会话的稳健性。我们观察到在未知攻击者场景下（通常不在文献中进行测试）等误差率（EER）减少了37.6％，并强调了会话变异对脑波身份验证的重要性。总的来说，我们的结果证明了NeuroIDBench在简化算法的公平比较方面的可行性和相关性，从而通过健壮的方法论实践推动基于脑波的身份验证的进一步发展。

更新时间: 2024-07-11 15:18:17

领域: cs.CR

下载: http://arxiv.org/abs/2402.08656v5

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across various sets of standardized benchmarks showing high scores for such models. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales which claim strong function, using a simple, short, conventional common sense problem (AIW problem) formulated in concise natural language, easily solvable by humans. The breakdown is dramatic, as models show strong fluctuations across even slight problem variations that should not affect problem solving, also expressing strong overconfidence in the wrong solutions, often backed up by plausible sounding explanation-like confabulations. Various standard interventions in an attempt to get the right solution, like various type of enhanced prompting, or urging the models to reconsider the wrong solutions again by multi step re-evaluation, fail. We take these initial observations to the scientific and technological community to stimulate urgent re-assessment of the claimed capabilities of current generation of LLMs. Such re-assessment also requires common action to create standardized benchmarks that would allow proper detection of such basic reasoning deficits that obviously manage to remain undiscovered by current state-of-the-art evaluation procedures and benchmarks. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/AIW

Updated: 2024-07-11 15:17:36

标题: 《爱丽丝梦游仙境：最新大型语言模型中展示完全推理崩溃的简单任务》

摘要: 大型语言模型（LLMs）通常被描述为基础模型的实例 - 即，在少数或零次展示方式下，在各种任务和条件之间强烈传递的模型，同时展示出预训练规模增加时预测功能改进的扩展规律。这些在不同功能和任务中表现出色的声明依赖于在各种标准基准集上进行的测量，显示出这些模型的高分。我们在这里展示了训练在目前最大规模的最先进模型的功能和推理能力的惊人崩溃，这些模型声称具有强大的功能，使用简单、简短、常识问题（AIW问题）用简洁的自然语言表述，人类容易解决。这种崩溃是惊人的，因为模型在即使微小的问题变化中也表现出强烈的波动，这些变化不应该影响问题的解决，同时还表现出对错误解决方案的强烈过度自信，通常会有听起来像解释的合理解释。尝试获取正确解决方案的各种标准干预措施，如各种类型的增强提示，或者通过多步重新评估要求模型重新考虑错误解决方案，都失败了。我们将这些初步观察结果呈交给科学和技术界，以刺激对当前一代LLMs声称的能力进行紧急重新评估。这种重新评估还需要共同行动，创建标准化基准，以便正确检测这种基本推理缺陷，这些缺陷显然无法被当前最先进的评估程序和基准发现。可在https://github.com/LAION-AI/AIW找到用于重现本文实验的代码和原始实验数据。

更新时间: 2024-07-11 15:17:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.02061v3

A Review of Nine Physics Engines for Reinforcement Learning Research

We present a review of popular simulation engines and frameworks used in reinforcement learning (RL) research, aiming to guide researchers in selecting tools for creating simulated physical environments for RL and training setups. It evaluates nine frameworks (Brax, Chrono, Gazebo, MuJoCo, ODE, PhysX, PyBullet, Webots, and Unity) based on their popularity, feature range, quality, usability, and RL capabilities. We highlight the challenges in selecting and utilizing physics engines for RL research, including the need for detailed comparisons and an understanding of each framework's capabilities. Key findings indicate MuJoCo as the leading framework due to its performance and flexibility, despite usability challenges. Unity is noted for its ease of use but lacks scalability and simulation fidelity. The study calls for further development to improve simulation engines' usability and performance and stresses the importance of transparency and reproducibility in RL research. This review contributes to the RL community by offering insights into the selection process for simulation engines, facilitating informed decision-making.

Updated: 2024-07-11 15:13:28

标题: 一个关于强化学习研究中九种物理引擎的综述

摘要: 我们提供了一个关于在强化学习（RL）研究中使用的流行仿真引擎和框架的综述，旨在指导研究人员选择用于创建RL和训练设置的模拟物理环境的工具。它基于它们的流行程度、功能范围、质量、可用性和RL能力评估了九个框架（Brax、Chrono、Gazebo、MuJoCo、ODE、PhysX、PyBullet、Webots和Unity）。我们强调了选择和利用物理引擎进行RL研究的挑战，包括需要详细比较和了解每个框架的能力。主要发现表明，由于其性能和灵活性，MuJoCo是领先的框架，尽管存在可用性挑战。Unity以其易用性而闻名，但缺乏可扩展性和模拟保真度。该研究呼吁进一步发展以改善仿真引擎的可用性和性能，并强调RL研究中透明性和可重复性的重要性。这篇综述通过提供有关仿真引擎选择过程的见解，促进了知情决策，为RL社区做出了贡献。

更新时间: 2024-07-11 15:13:28

领域: cs.AI,cs.LG,cs.MA,I.2.0

下载: http://arxiv.org/abs/2407.08590v1

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Although end-to-end robot learning has shown some success for robot manipulation, the learned policies are often not sufficiently robust to variations in object pose or geometry. To improve the policy generalization, we introduce spatially-grounded parameterized motion primitives in our method HACMan++. Specifically, we propose an action representation consisting of three components: what primitive type (such as grasp or push) to execute, where the primitive will be grounded (e.g. where the gripper will make contact with the world), and how the primitive motion is executed, such as parameters specifying the push direction or grasp orientation. These three components define a novel discrete-continuous action space for reinforcement learning. Our framework enables robot agents to learn to chain diverse motion primitives together and select appropriate primitive parameters to complete long-horizon manipulation tasks. By grounding the primitives on a spatial location in the environment, our method is able to effectively generalize across object shape and pose variations. Our approach significantly outperforms existing methods, particularly in complex scenarios demanding both high-level sequential reasoning and object generalization. With zero-shot sim-to-real transfer, our policy succeeds in challenging real-world manipulation tasks, with generalization to unseen objects. Videos can be found on the project website: https://sgmp-rss2024.github.io.

Updated: 2024-07-11 15:10:14

标题: HACMan++: 用于操作的空间定位运动基元

摘要: 尽管端到端机器人学习在机器人操作方面取得了一些成功，但所学习的策略通常对物体姿态或几何形状的变化不够稳健。为了改进策略的泛化能力，我们在我们的方法HACMan++中引入了空间基础参数化运动原语。具体来说，我们提出了一个由三个组成部分组成的动作表示：要执行什么原语类型（例如抓取或推动），原语将被基于哪里（例如夹持器将与世界接触的地方），以及如何执行原语运动，例如指定推动方向或抓取方向的参数。这三个组件定义了一个新颖的离散连续动作空间，用于强化学习。我们的框架使机器人代理能够学习将不同的运动原语链接在一起，并选择适当的原语参数来完成长时间跨度的操作任务。通过将原语基于环境中的空间位置进行基础，我们的方法能够有效地在物体形状和姿态变化之间进行泛化。我们的方法在现有方法中表现显著优越，特别是在需要高级顺序推理和物体泛化的复杂场景中。通过零-shot模拟到现实的转移，我们的策略成功地完成了具有泛化性的挑战性现实世界操纵任务，包括对未见过的物体的泛化。视频可以在项目网站上找到：https://sgmp-rss2024.github.io。

更新时间: 2024-07-11 15:10:14

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08585v1

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the importance of data is receiving increasingly widespread attention and recognition. Tracing and analyzing recent data-oriented works for MLLMs, we find that the development of models and data is not two separate paths but rather interconnected. On the one hand, vaster and higher-quality data contribute to better performance of MLLMs, on the other hand, MLLMs can facilitate the development of data. The co-development of multi-modal data and MLLMs requires a clear view of 1) at which development stage of MLLMs can specific data-centric approaches be employed to enhance which capabilities, and 2) by utilizing which capabilities and acting as which roles can models contribute to multi-modal data. To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective. A regularly maintained project associated with this survey is accessible at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md.

Updated: 2024-07-11 15:08:11

标题: 数据和多模式大型语言模型之间的协同作用：来自共同发展视角的调查

摘要: 近年来，大型语言模型（LLMs）的快速发展已经引起了人们的关注。基于强大的LLMs，多模态LLMs（MLLMs）将模态从文本扩展到更广泛的领域，由于更广泛的应用场景，吸引了广泛关注。由于LLMs和MLLMs依赖于大量的模型参数和数据来实现新兴能力，数据的重要性越来越受到广泛关注和认可。追踪和分析最近针对MLLMs的数据导向作品，我们发现模型和数据的发展并不是两条独立的道路，而是相互连接的。一方面，更广泛和更高质量的数据有助于提高MLLMs的性能，另一方面，MLLMs可以促进数据的发展。多模态数据和MLLMs的共同发展需要清晰地了解：1）在MLLMs的哪个发展阶段可以采用特定的数据中心方法来增强哪些能力，以及2）通过利用哪些能力和扮演哪些角色，模型可以为多模态数据做出贡献。为了促进MLLM社区的数据-模型共同发展，我们从数据-模型共同发展的角度系统地审查与MLLMs相关的现有作品。与此调查相关的一个定期维护的项目可在https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md上访问。

更新时间: 2024-07-11 15:08:11

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08583v1

Neural Bipartite Matching

Graph neural networks (GNNs) have found application for learning in the space of algorithms. However, the algorithms chosen by existing research (sorting, Breadth-First search, shortest path finding, etc.) usually align perfectly with a standard GNN architecture. This report describes how neural execution is applied to a complex algorithm, such as finding maximum bipartite matching by reducing it to a flow problem and using Ford-Fulkerson to find the maximum flow. This is achieved via neural execution based only on features generated from a single GNN. The evaluation shows strongly generalising results with the network achieving optimal matching almost 100% of the time.

Updated: 2024-07-11 15:06:35

标题: 神经元二分匹配

摘要: 图神经网络（GNNs）已经被应用于算法学习的领域。然而，现有研究选择的算法（排序、广度优先搜索、最短路径查找等）通常与标准GNN架构完全契合。本报告描述了如何将神经执行应用于复杂算法，例如通过将最大二部匹配问题简化为流问题，并使用Ford-Fulkerson算法找到最大流。这是通过仅基于单个GNN生成的特征进行神经执行来实现的。评估结果显示，网络几乎100%的时间实现了最佳匹配，具有很强的泛化能力。

更新时间: 2024-07-11 15:06:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2005.11304v4

Three-layer deep learning network random trees for fault detection in chemical production process

With the development of technology, the chemical production process is becoming increasingly complex and large-scale, making fault detection particularly important. However, current detective methods struggle to address the complexities of large-scale production processes. In this paper, we integrate the strengths of deep learning and machine learning technologies, combining the advantages of bidirectional long and short-term memory neural networks, fully connected neural networks, and the extra trees algorithm to propose a novel fault detection model named three-layer deep learning network random trees (TDLN-trees). First, the deep learning component extracts temporal features from industrial data, combining and transforming them into a higher-level data representation. Second, the machine learning component processes and classifies the features extracted in the first step. An experimental analysis based on the Tennessee Eastman process verifies the superiority of the proposed method.

Updated: 2024-07-11 15:03:49

标题: 化工生产过程中用于故障检测的三层深度学习网络随机树

摘要: 随着技术的发展，化学生产过程变得越来越复杂和大规模化，因此故障检测变得尤为重要。然而，当前的检测方法难以解决大规模生产过程的复杂性。本文中，我们整合了深度学习和机器学习技术的优势，结合双向长短期记忆神经网络、全连接神经网络和额外树算法的优点，提出了一种名为三层深度学习网络随机树(TDLN-trees)的新型故障检测模型。首先，深度学习组件从工业数据中提取时间特征，将其结合和转换为更高级别的数据表示。其次，机器学习组件处理和分类第一步中提取的特征。基于田纳西伊斯曼过程的实验分析验证了所提出方法的优越性。

更新时间: 2024-07-11 15:03:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.00311v2

Multi-Group Proportional Representation

Image search and retrieval tasks can perpetuate harmful stereotypes, erase cultural identities, and amplify social disparities. Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity. We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We develop practical methods for estimating MPR, provide theoretical guarantees, and propose optimization algorithms to ensure MPR in retrieval. We demonstrate that existing methods optimizing for equal and proportional representation metrics may fail to promote MPR. Crucially, our work shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Updated: 2024-07-11 14:59:17

标题: 多组比例代表制

摘要: 图像搜索和检索任务可能会持续传播有害的刻板印象，抹去文化身份，并放大社会差距。目前缓解这些表征性伤害的方法是在由少数（通常是二进制）属性定义的人群群体之间平衡检索到的项目数量。然而，大多数现有方法忽视了由群体属性的组合确定的交叉群体，如性别、种族和族裔。我们引入了多群体比例代表（MPR），这是一种衡量跨交叉群体的代表性的新型度量。我们开发了估计MPR的实用方法，提供了理论保证，并提出了优化算法来确保检索中的MPR。我们证明了优化平等和比例代表性指标的现有方法可能无法促进MPR。关键是，我们的工作表明，优化MPR会在由丰富的功能类别指定的多个交叉群体之间产生更加比例的代表性，通常在检索准确性上几乎没有妥协的情况下。

更新时间: 2024-07-11 14:59:17

领域: cs.AI,cs.IR,cs.IT,cs.LG,math.IT,stat.ML

下载: http://arxiv.org/abs/2407.08571v1

Adaptive Parametric Activation

The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU.

Updated: 2024-07-11 14:57:27

标题: 自适应参数激活

摘要: 激活函数在模型优化中起着至关重要的作用，然而最佳选择仍不清楚。例如，在平衡分类任务中，Sigmoid激活是事实上的激活函数，然而在不平衡分类中，由于偏向频繁类别，它被证明不合适。在这项工作中，我们通过在平衡和不平衡网络的分类和中间层进行全面的统计分析，深入研究了这一现象，并从经验上表明，将激活函数与数据分布对齐，可以提高平衡和不平衡任务的性能。为此，我们提出了自适应参数化激活（APA）函数，这是一种新颖而多功能的激活函数，将大多数常见激活函数统一在一个公式下。APA可以应用于中间层和注意力层，显著优于几个不平衡基准，如ImageNet-LT、iNaturalist2018、Places-LT、CIFAR100-LT和LVIS，以及平衡基准，如ImageNet1K、COCO和V3DET。代码可在https://github.com/kostas1515/AGLU 上找到。

更新时间: 2024-07-11 14:57:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08567v1

The Career Interests of Large Language Models

Recent advancements in Large Language Models (LLMs) have significantly extended their capabilities, evolving from basic text generation to complex, human-like interactions. In light of the possibilities that LLMs could assume significant workplace responsibilities, it becomes imminently necessary to explore LLMs' capacities as professional assistants. This study focuses on the aspect of career interests by applying the Occupation Network's Interest Profiler short form to LLMs as if they were human participants and investigates their hypothetical career interests and competence, examining how these vary with language changes and model advancements. We analyzed the answers using a general linear mixed model approach and found distinct career interest inclinations among LLMs, particularly towards the social and artistic domains. Interestingly, these preferences did not align with the occupations where LLMs exhibited higher competence. This novel approach of using psychometric instruments and sophisticated statistical tools on LLMs unveils fresh perspectives on their integration into professional environments, highlighting human-like tendencies and promoting a reevaluation of LLMs' self-perception and competency alignment in the workforce.

Updated: 2024-07-11 14:54:46

标题: 大型语言模型的职业兴趣

摘要: 最近大型语言模型（LLMs）的进展显著扩展了它们的能力，从基本文本生成发展到复杂的、类似人类的互动。考虑到LLMs可能承担重要的工作责任的可能性，迫切需要探索LLMs作为专业助手的能力。本研究通过将职业网络兴趣剖析器（Interest Profiler）的简短形式应用于LLMs，将它们视为人类参与者，并调查它们的假设职业兴趣和能力，研究这些兴趣如何随着语言变化和模型进步而变化。我们采用一般线性混合模型方法分析了答案，并发现LLMs之间存在明显的职业兴趣倾向，特别是对社会和艺术领域。有趣的是，这些偏好与LLMs表现出较高能力的职业并不一致。这种在LLMs上使用心理测量工具和复杂统计工具的新颖方法揭示了它们融入专业环境的新视角，突出了类人倾向，并促进了对LLMs在职场中自我认知和能力匹配的重新评估。

更新时间: 2024-07-11 14:54:46

领域: cs.AI

下载: http://arxiv.org/abs/2407.08564v1

Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion

The recent development of large language models (LLMs) has spurred discussions about whether LLM-generated "synthetic samples" could complement or replace traditional surveys, considering their training data potentially reflects attitudes and behaviors prevalent in the population. A number of mostly US-based studies have prompted LLMs to mimic survey respondents, with some of them finding that the responses closely match the survey data. However, several contextual factors related to the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this study, we investigate the extent to which LLMs can estimate public opinion in Germany, using the example of vote choice. We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates on the aggregate and subgroup levels. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties. While the LLM captures the tendencies of "typical" voter subgroups, such as partisans, it misses the multifaceted factors swaying individual voter choices. By examining the LLM-based prediction of voting behavior in a new context, our study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.

Updated: 2024-07-11 14:52:18

标题: 大众之声，AI之声？使用语言模型估计德国公众舆论

摘要: 最近大型语言模型（LLMs）的发展引发了关于LLM生成的“合成样本”是否能够补充或取代传统调查的讨论，考虑到它们的训练数据可能反映了人口中普遍存在的态度和行为。一些主要基于美国的研究促使LLMs模仿调查受访者，其中一些发现回答与调查数据非常接近。然而，与LLM训练数据之间的关系相关的一些情境因素可能会影响这些发现的普适性。在本研究中，我们调查LLMs在德国能够估计公众意见的程度，以投票选择为例。我们生成一个与2017年德国纵向选举研究受访者的个人特征匹配的合成样本。我们请LLM GPT-3.5预测每个受访者的投票选择，并将这些预测与基于调查的整体和亚组水平的估计进行比较。我们发现GPT-3.5并不能准确预测公民的投票选择，存在偏向绿党和左翼党派的偏见。虽然这个LLM捕捉到了“典型”选民亚组的倾向，比如党派支持者，但它错过了影响个体选民选择的多方面因素。通过在新环境中检验LLM对选举行为的预测，我们的研究为关于在哪些条件下可以利用LLMs研究公众意见的研究不断增加。研究结果指出LLMs中意见表达的差异，并强调了将其应用于公众意见估计时的限制。

更新时间: 2024-07-11 14:52:18

领域: cs.AI,cs.CY,stat.AP

下载: http://arxiv.org/abs/2407.08563v1

Causal inference through multi-stage learning and doubly robust deep neural networks

Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size $n$ and the dimension of covariates $p$ are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality $p$ expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.

Updated: 2024-07-11 14:47:44

标题: 多阶段学习和双重稳健深度神经网络的因果推断

摘要: 深度神经网络（DNNs）在大规模监督学习问题中表现出显著的经验性能，特别是在样本大小$n$和协变量维度$p$都很大的情况下。本研究探讨了DNNs在广泛复杂因果推断任务中的应用，其中直接估计不足以满足要求，需要多阶段学习。示例包括估计条件平均处理效应和动态处理效应。在这一框架中，DNNs被顺序构建，后续阶段建立在前一阶段之上。为了减轻早期阶段的估计误差对后续阶段的影响，我们以双重稳健的方式整合了DNNs。与先前的研究相比，我们的研究在样本大小扩展的情况下提供了关于DNNs有效性的理论保证。这些发现在独立情况下具有重要意义，并可扩展到退化的单阶段学习问题。

更新时间: 2024-07-11 14:47:44

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2407.08560v1

Optimal Sharding for Scalable Blockchains with Deconstructed SMR

Sharding is proposed to enhance blockchain scalability. However, a size-security dilemma where every shard must be large enough to ensure its security constrains the efficacy of individual shards and the degree of sharding itself. Most existing sharding solutions therefore rely on either weakening the adversary or making stronger assumptions on network links. This paper presents Arete, an optimally scalable blockchain sharding protocol designed to resolve the dilemma based on an observation that if individual shards can tolerate a higher fraction of (Byzantine) faults, we can securely create smaller shards in a larger quantity. The key idea of Arete, therefore, is to improve the security resilience/threshold of shards by dividing the blockchain's State Machine Replication (SMR) process itself. Similar to modern blockchains, Arete first decouples SMR in three steps: transaction dissemination, ordering, and execution. However, unlike other blockchains, for Arete, a single ordering shard performs the ordering task while multiple processing shards perform the dissemination and execution of blocks. As processing shards do not run consensus, each of those can tolerate up to half compromised nodes. Moreover, the SMR process in the ordering shard is lightweight as it only operates on the block digests. Second, Arete considers safety and liveness against Byzantine failures separately to improve the safety threshold further while tolerating temporary liveness violations in a controlled manner. Apart from the creation of more optimal-size shards, such a deconstructed SMR scheme also empowers us to devise a novel certify-order-execute architecture to fully parallelize transaction handling, thereby improving the performance of sharded blockchain systems. We implement Arete and evaluate it on a geo-distributed AWS environment, showing that Arete outperforms the state-of-the-art sharding protocol.

Updated: 2024-07-11 14:46:43

标题: 可伸缩区块链的最佳分片方案：基于拆解的共识机制

摘要: Sharding被提出来增强区块链的可扩展性。然而，存在一个大小-安全性的两难境地，每个碎片必须足够大以确保其安全性，这限制了单个碎片和碎片化本身的有效性。因此，大多数现有的碎片化解决方案要么依赖于削弱对手，要么在网络链接上做出更强的假设。本文提出了Arete，这是一个设计得最优可扩展的区块链碎片协议，旨在解决这一困境。该协议基于一项观察，即如果单个碎片能够容忍更高比例的（拜占庭）故障，我们可以安全地创建更多更小的碎片。因此，Arete的关键思想是通过将区块链的状态机复制（SMR）过程本身进行划分，提高碎片的安全弹性/阈值。与现代区块链类似，Arete首先将SMR拆分为三个步骤：交易传播、排序和执行。然而，与其他区块链不同的是，对于Arete，一个单一的排序碎片负责排序任务，而多个处理碎片负责传播和执行区块。由于处理碎片不运行共识，因此每个碎片可以容忍最多的一半被损坏的节点。此外，排序碎片中的SMR过程是轻量级的，因为它仅在区块摘要上运行。其次，Arete将安全性和活性与拜占庭故障分开考虑，以进一步提高安全阈值，同时以受控方式容忍暂时的活性违规。除了创建更优化尺寸的碎片外，这种解构的SMR方案还使我们能够设计一种新颖的认证-排序-执行架构，完全并行化交易处理，从而提高碎片化区块链系统的性能。我们实施了Arete并在一个地理分布的AWS环境中进行评估，结果显示Arete优于最先进的碎片化协议。

更新时间: 2024-07-11 14:46:43

领域: cs.CR

下载: http://arxiv.org/abs/2406.08252v2

ST-Mamba: Spatial-Temporal Mamba for Traffic Flow Estimation Recovery using Limited Data

Traffic flow estimation (TFE) is crucial for urban intelligent traffic systems. While traditional on-road detectors are hindered by limited coverage and high costs, cloud computing and data mining of vehicular network data, such as driving speeds and GPS coordinates, present a promising and cost-effective alternative. Furthermore, minimizing data collection can significantly reduce overhead. However, limited data can lead to inaccuracies and instability in TFE. To address this, we introduce the spatial-temporal Mamba (ST-Mamba), a deep learning model combining a convolutional neural network (CNN) with a Mamba framework. ST-Mamba is designed to enhance TFE accuracy and stability by effectively capturing the spatial-temporal patterns within traffic flow. Our model aims to achieve results comparable to those from extensive data sets while only utilizing minimal data. Simulations using real-world datasets have validated our model's ability to deliver precise and stable TFE across an urban landscape based on limited data, establishing a cost-efficient solution for TFE.

Updated: 2024-07-11 14:43:03

标题: ST-Mamba: 使用有限数据恢复交通流估计的时空Mamba

摘要: 交通流量估计（TFE）对于城市智能交通系统至关重要。传统的路侦测器受限于覆盖范围有限和高成本，而云计算和对车辆网络数据进行数据挖掘，如行驶速度和GPS坐标，提供了一种有前景且具有成本效益的替代方案。此外，最小化数据收集可以显著减少开销。然而，有限的数据可能导致TFE的不准确性和不稳定性。为了解决这个问题，我们引入了时空Mamba（ST-Mamba），这是一个深度学习模型，结合了卷积神经网络（CNN）和Mamba框架。ST-Mamba旨在通过有效捕捉交通流量中的时空模式来提高TFE的准确性和稳定性。我们的模型旨在利用最少的数据实现与大量数据集相媲美的结果。使用真实世界数据集进行的模拟验证了我们的模型能够基于有限数据在城市景观中提供精确和稳定的TFE，为TFE提供了一种成本效益的解决方案。

更新时间: 2024-07-11 14:43:03

领域: cs.AI

下载: http://arxiv.org/abs/2407.08558v1

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.

Updated: 2024-07-11 14:37:08

标题: 建立严谨且成本效益高的人工智能模型临床试验

摘要: 医学领域人工智能（AI）与临床实践之间存在着深刻的鸿沟，主要是由于缺乏严格和经济有效的评估方法。目前最先进和实践中的AI模型评估仅限于医学数据集上的实验室研究或直接临床试验，其中没有或仅有以患者为中心的对照。此外，临床医生在与AI合作中的关键作用，对于确定其对临床实践的影响是经常被忽视的。我们首次强调了AI模型在临床实践中需要严格和经济有效的评估方法的关键必要性，采用以患者/临床医生为中心（双中心）的AI随机对照试验（DC-AI RCTs）和基于虚拟临床医生的计算机辅助试验（VC-MedAI）作为DC-AI RCTs的有效代理。利用来自14家医疗中心、125名临床医生的两阶段首次DC-AI RCTs的7500个诊断记录，我们的结果表明DC-AI RCTs的必要性和VC-MedAI的有效性。值得注意的是，VC-MedAI的表现与人类临床医生相当，复制了前瞻性DC-AI RCTs的见解和结论。我们设想DC-AI RCTs和VC-MedAI是重要的进展，提供了创新和变革性的评估方法，为AI模型在临床实践中提供了类似临床前的环境，以经济有效和快速迭代的方式重塑了发展范式。中国临床试验注册号：ChiCTR2400086816。

更新时间: 2024-07-11 14:37:08

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.08554v1

Spectral State Space Models

This paper studies sequence modeling for prediction tasks with long range dependencies. We propose a new formulation for state space models (SSMs) based on learning linear dynamical systems with the spectral filtering algorithm (Hazan et al. (2017)). This gives rise to a novel sequence prediction architecture we call a spectral state space model. Spectral state space models have two primary advantages. First, they have provable robustness properties as their performance depends on neither the spectrum of the underlying dynamics nor the dimensionality of the problem. Second, these models are constructed with fixed convolutional filters that do not require learning while still outperforming SSMs in both theory and practice. The resulting models are evaluated on synthetic dynamical systems and long-range prediction tasks of various modalities. These evaluations support the theoretical benefits of spectral filtering for tasks requiring very long range memory.

Updated: 2024-07-11 14:35:23

标题: 频谱状态空间模型

摘要: 这篇论文研究了用于具有长程依赖性的预测任务的序列建模。我们提出了一种基于学习线性动态系统的谱滤波算法（Hazan等人（2017））的状态空间模型（SSMs）的新形式。这产生了一种我们称之为谱状态空间模型的新颖序列预测架构。谱状态空间模型具有两个主要优势。首先，它们具有可证明的鲁棒性特性，因为它们的性能既不依赖于基础动态的频谱，也不依赖于问题的维度。其次，这些模型是用固定的卷积滤波器构建的，不需要学习，但仍然在理论和实践中优于SSMs。生成的模型在合成动态系统和各种模态的长程预测任务上进行评估。这些评估支持谱滤波对需要非常长程记忆的任务的理论优势。

更新时间: 2024-07-11 14:35:23

领域: cs.LG

下载: http://arxiv.org/abs/2312.06837v4

Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility

This paper introduces a novel approach to integrating large language model (LLM) agents into automated production systems, aimed at enhancing task automation and flexibility. We organize production operations within a hierarchical framework based on the automation pyramid. Atomic operation functionalities are modeled as microservices, which are executed through interface invocation within a dedicated digital twin system. This allows for a scalable and flexible foundation for orchestrating production processes. In this digital twin system, low-level, hardware-specific data is semantically enriched and made interpretable for LLMs for production planning and control tasks. Large language model agents are systematically prompted to interpret these production-specific data and knowledge. Upon receiving a user request or identifying a triggering event, the LLM agents generate a process plan. This plan is then decomposed into a series of atomic operations, executed as microservices within the real-world automation system. We implement this overall approach on an automated modular production facility at our laboratory, demonstrating how the LLMs can handle production planning and control tasks through a concrete case study. This results in an intuitive production facility with higher levels of task automation and flexibility. Finally, we reveal the several limitations in realizing the full potential of the large language models in autonomous systems and point out promising benefits. Demos of this series of ongoing research series can be accessed at: https://github.com/YuchenXia/GPT4IndustrialAutomation

Updated: 2024-07-11 14:34:43

标题: 将大型语言模型整合到生产系统中，以增强任务自动化和灵活性

摘要: 本文介绍了一种将大型语言模型（LLM）代理集成到自动化生产系统中的新方法，旨在增强任务自动化和灵活性。我们基于自动化金字塔将生产操作组织在一个分层框架内。原子操作功能被建模为微服务，并通过专用数字双系统内的接口调用来执行。这为编排生产过程提供了可扩展和灵活的基础。在这个数字双系统中，低级别、硬件特定的数据被语义丰富化，并且对于生产规划和控制任务而言是可解释的。大型语言模型代理被系统地促使解释这些生产特定数据和知识。在接收到用户请求或识别到触发事件后，LLM代理生成一个流程计划。然后该计划被分解成一系列原子操作，作为微服务在现实世界的自动化系统中执行。我们在实验室的自动化模块化生产设施上对这一整体方法进行了实施，演示了LLM如何通过一个具体案例研究处理生产规划和控制任务。这导致了一个直观的生产设施，具有更高水平的任务自动化和灵活性。最后，我们揭示了在实现大型语言模型在自主系统中的全部潜力方面的几个限制，并指出了有前途的好处。这一系列正在进行的研究系列的演示可在以下网址访问：https://github.com/YuchenXia/GPT4IndustrialAutomation.

更新时间: 2024-07-11 14:34:43

领域: cs.AI,cs.ET,cs.MA,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.08550v1

GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. However, existing methods in this area either rely on historical data replay, which is increasingly impractical due to stringent data privacy regulations; or are inability to effectively address the over-stability issue; or depend on model-isolation and expansion strategies. To tackle these difficulties, we present GPT4Rec, a Graph Prompt Tuning method for streaming Recommendation. Given the evolving user-item interaction graph, GPT4Rec first disentangles the graph patterns into multiple views. After isolating specific interaction patterns and relationships in different views, GPT4Rec utilizes lightweight graph prompts to efficiently guide the model across varying interaction patterns within the user-item graph. Firstly, node-level prompts are employed to instruct the model to adapt to changes in the attributes or properties of individual nodes within the graph. Secondly, structure-level prompts guide the model in adapting to broader patterns of connectivity and relationships within the graph. Finally, view-level prompts are innovatively designed to facilitate the aggregation of information from multiple disentangled views. These prompt designs allow GPT4Rec to synthesize a comprehensive understanding of the graph, ensuring that all vital aspects of the user-item interactions are considered and effectively integrated. Experiments on four diverse real-world datasets demonstrate the effectiveness and efficiency of our proposal.

Updated: 2024-07-11 14:33:23

标题: GPT4Rec：用于流媒体推荐的图提示调整

摘要: 在个性化推荐系统领域，适应不断变化的用户偏好和持续涌入的新用户和物品的挑战至关重要。传统模型通常依赖静态训练-测试方法，难以跟上这些动态需求。流推荐，特别是通过持续图学习，已经成为一种新颖的解决方案。然而，这一领域现有的方法要么依赖于历史数据重放，由于严格的数据隐私法规而越来越不切实际；要么无法有效解决过于稳定的问题；要么依赖于模型隔离和扩展策略。为了解决这些困难，我们提出了GPT4Rec，一种用于流推荐的图提示调整方法。给定不断变化的用户-物品交互图，GPT4Rec首先将图模式解开成多个视图。在不同视图中孤立特定的交互模式和关系后，GPT4Rec利用轻量级图提示有效地指导模型跨越用户-物品图中不同的交互模式。首先，节点级提示被用来指导模型适应图中个体节点的属性或属性变化。其次，结构级提示指导模型适应图中更广泛的连接模式和关系。最后，视图级提示创新地设计用于促进从多个解开视图中聚合信息。这些提示设计使得GPT4Rec能够综合理解图，确保考虑并有效整合用户-物品交互的所有重要方面。对四个不同的真实世界数据集的实验证明了我们提议的有效性和效率。

更新时间: 2024-07-11 14:33:23

领域: cs.IR,cs.LG,H.3.3

下载: http://arxiv.org/abs/2406.08229v2

Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

Saliency maps have been widely used to interpret deep learning classifiers for Alzheimer's disease (AD). However, since AD is heterogeneous and has multiple subtypes, the pathological mechanism of AD remains not fully understood and may vary from patient to patient. Due to the lack of such understanding, it is difficult to comprehensively and effectively assess the saliency map of AD classifier. In this paper, we utilize the anatomical segmentation to allocate saliency values into different brain regions. By plotting the distributions of saliency maps corresponding to AD and NC (Normal Control), we can gain a comprehensive view of the model's decisions process. In order to leverage the fact that the brain volume shrinkage happens in AD patients during disease progression, we define a new evaluation metric, brain volume change score (VCS), by computing the average Pearson correlation of the brain volume changes and the saliency values of a model in different brain regions for each patient. Thus, the VCS metric can help us gain some knowledge of how saliency maps resulting from different models relate to the changes of the volumes across different regions in the whole brain. We trained candidate models on the ADNI dataset and tested on three different datasets. Our results indicate: (i) models with higher VCSs tend to demonstrate saliency maps with more details relevant to the AD pathology, (ii) using gradient-based adversarial training strategies such as FGSM and stochastic masking can improve the VCSs of the models.

Updated: 2024-07-11 14:30:49

标题: 阿尔茨海默病分类器的解剖分割显著性图的定量评估

摘要: 显著性地图被广泛应用于解释深度学习分类器对阿尔茨海默病（AD）的病例。然而，由于AD是异质性的并且有多个亚型，AD的病理机制仍未完全理解，可能因患者而异。由于缺乏这种理解，难以全面有效地评估AD分类器的显著性地图。本文利用解剖分割将显著性值分配到不同的脑区域中。通过绘制与AD和NC（正常对照组）相对应的显著性地图的分布，我们可以全面了解模型的决策过程。为了利用AD患者在疾病进展过程中脑容积缩小的事实，我们通过计算脑容积变化和模型在不同脑区域的显著性值之间的平均皮尔逊相关性，定义了一个新的评估指标，即脑容积变化得分（VCS）。因此，VCS指标可以帮助我们了解不同模型产生的显著性地图与整个脑部不同区域容积变化的关系。我们在ADNI数据集上训练候选模型，并在三个不同数据集上进行测试。我们的结果表明：（i）具有较高VCS的模型倾向于展示与AD病理相关的更多细节的显著性地图，（ii）使用基于梯度的对抗训练策略，如FGSM和随机遮蔽，可以改善模型的VCS。

更新时间: 2024-07-11 14:30:49

领域: cs.CV,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2407.08546v1

BriDe Arbitrager: Enhancing Arbitrage in Ethereum 2.0 via Bribery-enabled Delayed Block Production

The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow malicious proposers to delay block production by bribing validators/proposers, thereby gaining more time to identify arbitrage opportunities. Through analysing the bribery process, we design an adaptive bribery strategy. Additionally, we propose a Delayed Transaction Ordering Algorithm to leverage the delayed time to amplify arbitrage profits for malicious proposers. To ensure fairness and automate the bribery process, we design and implement a bribery smart contract and a bribery client. As a result, BriDe Arbitrager enables adversaries controlling a limited (< 1/4) fraction of the voting powers to delay block production via bribery and arbitrage more profit. Extensive experimental results based on Ethereum historical transactions demonstrate that BriDe Arbitrager yields an average of 8.66 ETH (16,442.23 USD) daily profits. Furthermore, our approach does not trigger any slashing mechanisms and remains effective even under Proposer Builder Separation and other potential mechanisms will be adopted by Ethereum.

Updated: 2024-07-11 14:26:31

标题: BriDe仲裁者：通过贿赂启用的延迟区块生产增强Ethereum 2.0中的套利

摘要: Ethereum 2.0的出现引入了重大变化，特别是转向权益证明共识机制。这一变化为套利提供了新的机遇和挑战。在这些变化中，我们介绍了BriDe Arbitrager，这是一个为Ethereum 2.0设计的新型工具，利用贿赂驱动攻击来延迟区块生产并增加套利收益。其主要思想是允许恶意提议者通过贿赂验证者/提议者来延迟区块生产，从而有更多时间识别套利机会。通过分析贿赂过程，我们设计了一种自适应贿赂策略。此外，我们提出了一种延迟交易排序算法，利用延迟时间来增加恶意提议者的套利利润。为了确保公平性并自动化贿赂过程，我们设计并实现了一个贿赂智能合约和一个贿赂客户端。因此，BriDe Arbitrager使得控制少于1/4的投票权的对手能够通过贿赂延迟区块生产并获得更多利润。基于以太坊的历史交易的大量实验结果表明，BriDe Arbitrager每日平均获利8.66 ETH（16,442.23美元）。此外，我们的方法不会触发任何减持机制，即使在提议者构建分离和以太坊采用其他潜在机制的情况下仍然有效。

更新时间: 2024-07-11 14:26:31

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2407.08537v1

Point Intervention: Improving ACVP Test Vector Generation Through Human Assisted Fuzzing

Automated Cryptographic Validation Protocol (ACVP) is an existing protocol that is used to validate a software or hardware cryptographic module automatically. In this work, we present a system providing the method and tools to produce well-covering tests in ACVP format for cryptographic libraries. The system achieves better coverage than existing fuzzing methods by using a hybrid approach to fuzzing cryptographic primitives. In addition, the system offers a framework that allows to creates easily and securely create testing modules for cryptographic libraries. The work demonstrates how this system has been used to improve automated testing of NSS (Network Security Services), a popular cryptographic library, detect its vulnerabilities and suggest ways to improve and further develop the ACVP test format.

Updated: 2024-07-11 14:21:48

标题: 关键干预：通过人类辅助模糊测试改进ACVP测试向量生成

摘要: Automated Cryptographic Validation Protocol (ACVP) 是一种现有的协议，用于自动验证软件或硬件加密模块。在这项工作中，我们提出了一个系统，提供了一种方法和工具，以 ACVP 格式生成加密库的全面测试。该系统通过使用混合方法对加密原语进行模糊测试，实现了比现有模糊测试方法更好的覆盖率。此外，该系统提供了一个框架，允许轻松且安全地创建用于加密库的测试模块。该工作展示了如何利用该系统改进 NSS（网络安全服务）的自动化测试，检测其漏洞并建议改进和进一步发展 ACVP 测试格式。

更新时间: 2024-07-11 14:21:48

领域: cs.CR,D.2.5; D.2.4

下载: http://arxiv.org/abs/2407.08535v1

Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models

Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK into the interpreted malware analysis to characterize different phases of an attack lifecycle. Specifically, we propose GENTTP, a zero-shot approach to extracting a TTP of an interpreted malware package. GENTTP leverages large language models (LLMs) to automatically generate a TTP, where the input is a malicious package, and the output is a deceptive tactic and an execution tactic of attack vectors. To validate the effectiveness of GENTTP, we collect two datasets for evaluation: a dataset with ground truth labels and a large dataset in the wild. Experimental results show that GENTTP can generate TTPs with high accuracy and efficiency. To demonstrate GENTTP's benefits, we build an LLM-based Chatbot from 3,700+ PyPI malware's TTPs. We further conduct a quantitative analysis of malware's TTPs at a large scale. Our main findings include: (1) many OSS malicious packages share a relatively stable TTP, even with the increasing emergence of malware and attack campaigns, (2) a TTP reflects characteristics of a malware-based attack, and (3) an attacker's intent behind the malware is linked to a TTP.

Updated: 2024-07-11 14:18:41

标题: 翻译后的文献标题为："解释恶意软件中的战术、技术和程序（TTPs）：使用大型语言模型进行零射击生成"

摘要: 如今，开源软件（OSS）生态系统面临软件供应链（SSC）攻击的安全威胁。解释型OSS恶意软件在SSC攻击中发挥着重要作用，因为犯罪分子拥有一系列攻击向量来欺骗用户安装恶意软件并执行恶意活动。本文介绍了MITRE ATT\&CK提出的战术、技术和程序（TTPs）引入到解释型恶意软件分析中，以描述攻击生命周期的不同阶段。具体来说，我们提出了GENTTP，一种零射击方法，用于提取解释型恶意软件包的TTP。GENTTP利用大型语言模型（LLMs）自动生成TTP，输入为恶意软件包，输出为攻击向量的欺骗战术和执行战术。为验证GENTTP的有效性，我们收集了两个数据集进行评估：一个带有真实标签的数据集和一个野外大规模数据集。实验结果显示，GENTTP能够高精度高效地生成TTPs。为展示GENTTP的好处，我们从3700多个PyPI恶意软件的TTP构建了基于LLM的聊天机器人。我们进一步对恶意软件的TTP进行了大规模的定量分析。我们的主要发现包括：（1）许多OSS恶意软件包共享相对稳定的TTP，即使恶意软件和攻击活动不断增加，（2）TTP反映了基于恶意软件的攻击的特征，（3）攻击者的恶意软件背后的意图与TTP相关联。

更新时间: 2024-07-11 14:18:41

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.08532v1

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the specialized agent to multi-agent systems to improve task-solving capability still remains a significant challenge. In this paper, we introduce EvoAgent, a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm, thereby improving the effectiveness of LLM-based agents in solving tasks. Specifically, we consider the existing agent frameworks as the initial individual and then apply a series of evolutionary operators (e.g., mutation, crossover, selection, etc.) to generate multiple agents with diverse agent settings. EvoAgent can be generalized to any LLM-based agent framework, and can automatically extend the existing agent framework to multi-agent systems without any extra human designs. Experimental results across various tasks have shown that EvoAgent can automatically generate multiple expert agents and significantly enhance the task-solving capabilities of LLM-based agents.

Updated: 2024-07-11 14:18:35

标题: EvoAgent: 通过进化算法实现自动多智能体生成

摘要: 强大的大型语言模型(LLMs)的崛起促使了建立基于LLM的自主代理来解决复杂任务，特别是多代理系统的新趋势。尽管取得了显著进展，但我们注意到现有作品严重依赖人为设计的框架，这严重限制了代理系统的功能范围和可扩展性。如何自动将专业代理扩展到多代理系统以提高任务解决能力仍然是一个重要挑战。在本文中，我们介绍了EvoAgent，这是一种通过进化算法自动将专家代理扩展到多代理系统的通用方法，从而提高LLM-based代理在解决任务中的有效性。具体来说，我们将现有的代理框架视为初始个体，然后应用一系列进化算子(如变异、交叉、选择等)生成多个具有不同代理设置的代理。EvoAgent可以推广到任何基于LLM的代理框架，并且可以在不需要额外人为设计的情况下自动将现有代理框架扩展到多代理系统。各种任务的实验结果表明，EvoAgent能够自动生成多个专家代理，并显著增强基于LLM的代理的任务解决能力。

更新时间: 2024-07-11 14:18:35

领域: cs.AI

下载: http://arxiv.org/abs/2406.14228v2

Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection.

Updated: 2024-07-11 14:17:02

标题: 增强面向梯度反转攻击的时空联合学习隐私保护

摘要: 最近，由于其能够仅通过共享梯度在各种基于位置的服务中训练有价值的模型，时空联合学习引起了密集的研究。另一方面，最近的研究表明，共享梯度可能会受到图像或文本的梯度反转攻击（GIA）的影响。然而，到目前为止，在时空联合学习中还没有任何系统性研究梯度反转攻击。本文从攻击和防御的角度探讨了时空联合学习中的梯度攻击问题。为了了解时空联合学习中的隐私风险，我们首先提出了时空梯度反转攻击（ST-GIA），这是一种专为时空数据量身定制的梯度攻击算法，成功地从梯度中重建了原始位置。此外，我们设计了一种自适应的防御策略，以减轻时空联合学习中的梯度反转攻击。通过动态调整扰动水平，我们可以为不同轮次的训练数据提供量身定制的保护，从而实现比当前最先进方法更好的隐私与效用之间的权衡。通过对三个真实世界数据集的深入实验分析，我们揭示了所提出的防御策略能够有效保护时空联合学习的效用，并提供有效的安全保护。

更新时间: 2024-07-11 14:17:02

领域: cs.CR

下载: http://arxiv.org/abs/2407.08529v1

Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection

Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this work, we introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset, with an improvement in AP of approximately 18% when positive samples are available and 28% when they are missing. The source code is available at https://github.com/intelligolabs/DIAG.

Updated: 2024-07-11 14:14:22

标题: 利用潜在扩散模型进行无需训练的表面缺陷检测中的分布数据增强

摘要: 缺陷检测是识别生产样本中缺陷的任务。通常，缺陷检测分类器是在由正常样本（负数据）和带有缺陷的样本（正数据）组成的地面真实数据上进行训练的，后者通常比正常样本少。最先进的数据增强程序通过在正常样本上叠加人工制品来添加合成缺陷数据，以缓解与不平衡训练数据相关的问题。这些技术通常会产生超出分布范围的图像，导致系统学习了什么不是正常样本，但无法准确识别缺陷是什么样子。在这项工作中，我们介绍了DIAG，一种基于扩散的无需训练的内部分布异常生成管道，用于数据增强。与传统的图像生成技术不同，我们实现了一个人为参与的管道，在这个管道中，领域专家通过文本描述和可能异常的区域定位向模型提供多模式指导。这种战略转变增强了结果的可解释性，并促进了更强大的人类反馈循环，促进了生成输出的迭代改进。值得注意的是，我们的方法以零样本方式运行，避免了耗时的微调过程，同时实现了优越性能。我们在具有挑战性的KSDD2数据集上展示了DIAG相对于最先进的数据增强方法的功效和多样性，在正样本可用时改进了约18%的AP，当正样本缺失时提高了28%。源代码可在https://github.com/intelligolabs/DIAG获取。

更新时间: 2024-07-11 14:14:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.03961v2

Differentiated Federated Reinforcement Learning Based Traffic Offloading on Space-Air-Ground Integrated Networks

The Space-Air-Ground Integrated Network (SAGIN) plays a pivotal role as a comprehensive foundational network communication infrastructure, presenting opportunities for highly efficient global data transmission. Nonetheless, given SAGIN's unique characteristics as a dynamically heterogeneous network, conventional network optimization methodologies encounter challenges in satisfying the stringent requirements for network latency and stability inherent to data transmission within this network environment. Therefore, this paper proposes the use of differentiated federated reinforcement learning (DFRL) to solve the traffic offloading problem in SAGIN, i.e., using multiple agents to generate differentiated traffic offloading policies. Considering the differentiated characteristics of each region of SAGIN, DFRL models the traffic offloading policy optimization process as the process of solving the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) problem. The paper proposes a novel Differentiated Federated Soft Actor-Critic (DFSAC) algorithm to solve the problem. The DFSAC algorithm takes the network packet delay as the joint reward value and introduces the global trend model as the joint target action-value function of each agent to guide the update of each agent's policy. The simulation results demonstrate that the traffic offloading policy based on the DFSAC algorithm achieves better performance in terms of network throughput, packet loss rate, and packet delay compared to the traditional federated reinforcement learning approach and other baseline approaches.

Updated: 2024-07-11 14:11:23

标题: 基于差异化联合强化学习的空中地一体化网络交通卸载

摘要: 空-空-地一体化网络（SAGIN）在作为全面基础网络通信基础设施方面发挥关键作用，为高效全球数据传输提供机遇。然而，鉴于SAGIN作为动态异构网络的独特特征，传统网络优化方法在满足网络延迟和稳定性方面的严格要求方面遇到挑战，这些要求是数据传输在这种网络环境中本质上具有的。因此，本文提出了利用差异化联邦强化学习（DFRL）来解决SAGIN中的流量卸载问题，即使用多个代理生成差异化的流量卸载策略。考虑到SAGIN每个区域的差异化特征，DFRL将流量卸载策略优化过程建模为解决分散式部分可观察马尔可夫决策过程（DEC-POMDP）问题的过程。本文提出了一种新颖的差异化联邦软演员-评论家（DFSAC）算法来解决这个问题。DFSAC算法将网络数据包延迟作为联合奖励值，并引入全局趋势模型作为每个代理的联合目标动作值函数，以指导每个代理策略的更新。模拟结果表明，基于DFSAC算法的流量卸载策略在网络吞吐量、数据包丢失率和数据包延迟方面相比传统的联邦强化学习方法和其他基准方法取得了更好的性能。

更新时间: 2024-07-11 14:11:23

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2212.02075v3

Predictive representations: building blocks of intelligence

Adaptive behavior often requires predicting future events. The theory of reinforcement learning prescribes what kinds of predictive representations are useful and how to compute them. This paper integrates these theoretical ideas with work on cognition and neuroscience. We pay special attention to the successor representation (SR) and its generalizations, which have been widely applied both as engineering tools and models of brain function. This convergence suggests that particular kinds of predictive representations may function as versatile building blocks of intelligence.

Updated: 2024-07-11 14:02:37

标题: 预测性表征：智能的基本组成部分

摘要: 适应性行为通常需要预测未来事件。强化学习理论规定了哪种预测表征是有用的，以及如何计算它们。本文将这些理论观点与认知和神经科学的研究相结合。我们特别关注继承者表征（SR）及其广义化，这些已被广泛应用作为工程工具和大脑功能模型。这种融合表明特定类型的预测表征可能作为智能的多功能构建模块。

更新时间: 2024-07-11 14:02:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.06590v3

Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents

This article explores the convergence of connectionist and symbolic artificial intelligence (AI), from historical debates to contemporary advancements. Traditionally considered distinct paradigms, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs), exemplified by ChatGPT and GPT-4, highlight the potential of connectionist architectures in handling human language as a form of symbols. The study argues that LLM-empowered Autonomous Agents (LAAs) embody this paradigm convergence. By utilizing LLMs for text-based knowledge modeling and representation, LAAs integrate neuro-symbolic AI principles, showcasing enhanced reasoning and decision-making capabilities. Comparing LAAs with Knowledge Graphs within the neuro-symbolic AI theme highlights the unique strengths of LAAs in mimicking human-like reasoning processes, scaling effectively with large datasets, and leveraging in-context samples without explicit re-training. The research underscores promising avenues in neuro-vector-symbolic integration, instructional encoding, and implicit reasoning, aimed at further enhancing LAA capabilities. By exploring the progression of neuro-symbolic AI and proposing future research trajectories, this work advances the understanding and development of AI technologies.

Updated: 2024-07-11 14:00:53

标题: 《趋同范式：在LLM增强自主代理中符号和联结主义人工智能的协同作用》

摘要: 本文探讨了连接主义和符号人工智能（AI）的融合，从历史上的辩论到当代的进展。传统上被认为是不同范式，连接主义AI侧重于神经网络，而符号AI强调符号表示和逻辑。最近在大型语言模型（LLMs）方面取得的进展，如ChatGPT和GPT-4，突显了连接主义架构在处理人类语言作为符号形式的潜力。研究认为，由LLM赋能的自主代理人（LAAs）体现了这种范式的融合。通过利用LLMs进行基于文本的知识建模和表示，LAAs整合了神经符号AI原则，展示了增强的推理和决策能力。将LAAs与神经符号AI主题中的知识图进行比较，突显了LAAs在模拟类人推理过程、有效扩展到大型数据集以及利用上下文样本而无需明确重新训练的独特优势。研究强调了神经-向量-符号集成、指示性编码和隐式推理等领域中有前景的发展方向，旨在进一步增强LAA的能力。通过探索神经符号AI的发展，并提出未来的研究轨迹，这项工作推动了对AI技术的理解和发展。

更新时间: 2024-07-11 14:00:53

领域: cs.AI

下载: http://arxiv.org/abs/2407.08516v1

15M Multimodal Facial Image-Text Dataset

Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image-caption dataset to date. We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M. To validate the effectiveness of FaceCaption-15M, we first trained a facial language-image pre-training model (FLIP, similar to CLIP) to align facial image with its corresponding captions in feature space. Subsequently, using both image and text encoders and fine-tuning only the linear layer, our FLIP-based models achieved state-of-the-art results on two challenging face-centered tasks. The purpose is to promote research in the field of face-related tasks through the availability of the proposed FaceCaption-15M dataset. All data, codes, and models are publicly available. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

Updated: 2024-07-11 14:00:14

标题: 15M多模态面部图像-文本数据集

摘要: 目前，基于图像文本的多模态深度学习模型已经在许多领域展现出了出色的潜力。在实践中，以面部图像为中心的任务具有广泛的应用前景。本文介绍了一个名为\textbf{FaceCaption-15M}的大规模、多样化和高质量的面部图像数据集，其配有自然语言描述（面部图像到文本）。该数据集旨在促进面部任务的研究。FaceCaption-15M包括超过1500万对面部图像及其对应的面部特征自然语言描述，使其成为迄今为止最大的面部图像标题数据集。我们对图像质量、文本自然性、文本复杂性和文本图像相关性进行了全面分析，以展示FaceCaption-15M的优越性。为验证FaceCaption-15M的有效性，我们首先训练了一个面部语言-图像预训练模型（FLIP，类似于CLIP），以在特征空间中将面部图像与其对应的标题对齐。随后，使用图像和文本编码器，并仅微调线性层，我们基于FLIP的模型在两个具有挑战性的面部任务上取得了最新的成果。我们的目的是通过提供所提出的FaceCaption-15M数据集来推动面部相关任务领域的研究。所有数据、代码和模型都是公开可用的。https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

更新时间: 2024-07-11 14:00:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08515v1

Predict. Optimize. Revise. On Forecast and Policy Stability in Energy Management Systems

This research addresses the challenge of integrating forecasting and optimization in energy management systems, focusing on the impacts of switching costs, forecast accuracy, and stability. It proposes a novel framework for analyzing online optimization problems with switching costs and enabled by deterministic and probabilistic forecasts. Through empirical evaluation and theoretical analysis, the research reveals the balance between forecast accuracy, stability, and switching costs in shaping policy performance. Conducted in the context of battery scheduling within energy management applications, it introduces a metric for evaluating probabilistic forecast stability and examines the effects of forecast accuracy and stability on optimization outcomes using the real-world case of the Citylearn 2022 competition. Findings indicate that switching costs significantly influence the trade-off between forecast accuracy and stability, highlighting the importance of integrated systems that enable collaboration between forecasting and operational units for improved decision-making. The study shows that committing to a policy for longer periods can be advantageous over frequent updates. Results also show a correlation between forecast stability and policy performance, suggesting that stable forecasts can mitigate switching costs. The proposed framework provides valuable insights for energy sector decision-makers and forecast practitioners when designing the operation of an energy management system.

Updated: 2024-07-11 13:58:47

标题: 预测、优化、修订：关于能源管理系统中预测和政策稳定性的研究

摘要: 本研究解决了在能源管理系统中整合预测和优化的挑战，重点关注切换成本、预测准确性和稳定性的影响。它提出了一个新颖的框架，用于分析具有切换成本并且由确定性和概率预测支持的在线优化问题。通过实证评估和理论分析，研究揭示了预测准确性、稳定性和切换成本在塑造政策绩效方面的平衡。在能源管理应用中进行电池调度的背景下，它引入了一个评估概率预测稳定性的指标，并利用Citylearn 2022比赛的真实案例，探讨了预测准确性和稳定性对优化结果的影响。研究结果表明，切换成本显著影响了预测准确性和稳定性之间的权衡，突出了促进预测和运营单元协作以改善决策的集成系统的重要性。研究表明，长期致力于一项政策可能比频繁更新更有利。结果还显示了预测稳定性与政策绩效之间的相关性，表明稳定的预测可以缓解切换成本。所提出的框架为能源领域决策者和预测从业者在设计能源管理系统运营时提供了宝贵的见解。

更新时间: 2024-07-11 13:58:47

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2407.03368v2

Resilience of Entropy Model in Distributed Neural Networks

Distributed deep neural networks (DNNs) have emerged as a key technique to reduce communication overhead without sacrificing performance in edge computing systems. Recently, entropy coding has been introduced to further reduce the communication overhead. The key idea is to train the distributed DNN jointly with an entropy model, which is used as side information during inference time to adaptively encode latent representations into bit streams with variable length. To the best of our knowledge, the resilience of entropy models is yet to be investigated. As such, in this paper we formulate and investigate the resilience of entropy models to intentional interference (e.g., adversarial attacks) and unintentional interference (e.g., weather changes and motion blur). Through an extensive experimental campaign with 3 different DNN architectures, 2 entropy models and 4 rate-distortion trade-off factors, we demonstrate that the entropy attacks can increase the communication overhead by up to 95%. By separating compression features in frequency and spatial domain, we propose a new defense mechanism that can reduce the transmission overhead of the attacked input by about 9% compared to unperturbed data, with only about 2% accuracy loss. Importantly, the proposed defense mechanism is a standalone approach which can be applied in conjunction with approaches such as adversarial training to further improve robustness. Code will be shared for reproducibility.

Updated: 2024-07-11 13:51:56

标题: 分布式神经网络中熵模型的韧性

摘要: 分布式深度神经网络（DNNs）已经成为边缘计算系统中减少通信开销而不损失性能的关键技术。最近，熵编码已被引入以进一步减少通信开销。关键思想是联合训练分布式DNN和熵模型，该模型在推断时用作辅助信息，以自适应地将潜在表示编码为具有可变长度的比特流。据我们所知，熵模型的韧性尚未得到研究。因此，在本文中，我们制定并研究了熵模型对有意干扰（例如，对抗性攻击）和无意干扰（例如，天气变化和运动模糊）的韧性。通过对3种不同的DNN架构、2种熵模型和4种速率失真权衡因子进行广泛的实验，我们表明熵攻击可以将通信开销增加高达95%。通过在频率和空间域中分离压缩特征，我们提出了一种新的防御机制，可以将受攻击输入的传输开销与未受干扰数据相比减少约9%，仅损失约2%的准确性。重要的是，所提出的防御机制是一种独立的方法，可以与对抗性训练等方法结合使用，以进一步提高鲁棒性。为了可再现性，我们将分享代码。

更新时间: 2024-07-11 13:51:56

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.00942v2

Adjustment Identification Distance: A gadjid for Causal Structure Learning

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop new reachability algorithms to compute the distances efficiently and to prove their low polynomial time complexity. In our package gadjid (open source at https://github.com/CausalDisco/gadjid), we provide implementations of our distances; they are orders of magnitude faster with proven lower time complexity than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.

Updated: 2024-07-11 13:45:33

标题: 调整识别距离：因果结构学习的一种工具

摘要: 用因果发现算法学到的图形进行评估是困难的：两个图形之间不同的边的数量并不反映它们在提出因果效应的识别公式方面的差异。我们引入了一个框架来开发图形之间的因果距离，其中包括有向无环图的结构干预距离作为一个特殊情况。我们使用这个框架来开发改进的基于调整的距离，以及对完成部分有向无环图和因果顺序的扩展。我们开发了新的可达性算法来有效地计算距离，并证明了它们的低多项式时间复杂度。在我们的软件包gadjid（在https://github.com/CausalDisco/gadjid上开源），我们提供了我们距离的实现；它们比结构干预距离快几个数量级，并且提供了一个适应以前被禁止的图形尺寸的因果发现成功指标。

更新时间: 2024-07-11 13:45:33

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.08616v2

Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode

Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and struggle to effectively address the dynamics inherent in CTDGs. Moreover, these methods often demand substantial domain expertise for parameter tuning and lack theoretical guarantees for augmentation efficacy. To address these issues, we propose Conda, a novel latent diffusion-based GDA method tailored for CTDGs. Conda features a sandwich-like architecture, incorporating a Variational Auto-Encoder (VAE) and a conditional diffusion model, aimed at generating enhanced historical neighbor embeddings for target nodes. Unlike conventional diffusion models trained on entire graphs via pre-training, Conda requires historical neighbor sequence embeddings of target nodes for training, thus facilitating more targeted augmentation. We integrate Conda into the CTDG model and adopt an alternating training strategy to optimize performance. Extensive experimentation across six widely used real-world datasets showcases the consistent performance improvement of our approach, particularly in scenarios with limited historical data.

Updated: 2024-07-11 13:35:22

标题: 潜在条件扩散的数据增强方法在连续时间动态图模型中的应用

摘要: Continuous-Time Dynamic Graph (CTDG)精确地模拟了不断发展的现实世界关系，引起了学术界和工业界对动态图学习的极大兴趣。然而，现有的CTDG模型面临来自噪音和有限历史数据的挑战。图数据增强（GDA）出现作为一个关键解决方案，但当前方法主要集中在静态图上，并且难以有效地处理CTDG固有的动态性。此外，这些方法通常需要大量领域专业知识进行参数调整，并且缺乏对增强效果的理论保证。为了解决这些问题，我们提出了一种专门针对CTDG的新型基于潜在扩散的GDA方法Conda。Conda具有类似三明治的架构，包括一个变分自动编码器（VAE）和一个条件扩散模型，旨在为目标节点生成增强的历史邻居嵌入。与传统的通过预训练在整个图上训练的扩散模型不同，Conda需要目标节点的历史邻居序列嵌入进行训练，从而促进更有针对性的增强。我们将Conda集成到CTDG模型中，并采用交替训练策略来优化性能。在六个广泛使用的真实世界数据集上进行的大量实验展示了我们方法的一致性性能提升，特别是在有限历史数据的情况下。

更新时间: 2024-07-11 13:35:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08500v1

CE-QArg: Counterfactual Explanations for Quantitative Bipolar Argumentation Frameworks (Technical Report)

There is a growing interest in understanding arguments' strength in Quantitative Bipolar Argumentation Frameworks (QBAFs). Most existing studies focus on attribution-based methods that explain an argument's strength by assigning importance scores to other arguments but fail to explain how to change the current strength to a desired one. To solve this issue, we introduce counterfactual explanations for QBAFs. We discuss problem variants and propose an iterative algorithm named Counterfactual Explanations for Quantitative bipolar Argumentation frameworks (CE-QArg). CE-QArg can identify valid and cost-effective counterfactual explanations based on two core modules, polarity and priority, which help determine the updating direction and magnitude for each argument, respectively. We discuss some formal properties of our counterfactual explanations and empirically evaluate CE-QArg on randomly generated QBAFs.

Updated: 2024-07-11 13:34:11

标题: CE-QArg：量化双极论证框架的反事实解释（技术报告）

摘要: 在量化双极论证框架（QBAFs）中，人们越来越关注理解论点强度。大多数现有研究集中在基于归因的方法上，通过给其他论点分配重要性分数来解释论点的强度，但未能解释如何将当前强度改变为期望的强度。为了解决这个问题，我们引入了对QBAFs的反事实解释。我们讨论了问题的变体，并提出了一个名为量化双极论证框架的反事实解释（CE-QArg）的迭代算法。CE-QArg可以根据两个核心模块，即极性和优先级，识别有效且具有成本效益的反事实解释，这有助于确定每个论点的更新方向和幅度。我们讨论了我们的反事实解释的一些形式属性，并在随机生成的QBAFs上对CE-QArg进行了实证评估。

更新时间: 2024-07-11 13:34:11

领域: cs.AI

下载: http://arxiv.org/abs/2407.08497v1

QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest

Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications where data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.

Updated: 2024-07-11 13:32:59

标题: QC-Forest：一种经典-量子算法，可明显加快随机森林的重新训练速度

摘要: 随机森林（RF）是一种用于监督学习的流行的树集成方法，因其易于使用和灵活性而受到赞赏。在线RF模型需要考虑新的训练数据以保持模型的准确性。这在数据以数据流的形式周期性和顺序地生成的应用中尤为重要，如自动驾驶系统和信用卡支付。在这种情况下，定期使用旧数据和积累的新数据进行模型重新训练是有益的，因为它完全捕捉了随时间可能发生的数据分布漂移。然而，由于现有的经典RF算法与累积样本数量成线性关系，这在实践中是不切实际的。我们提出QC-Forest，这是一种经典量子算法，旨在在流式设置中对多类分类和回归重新训练RF模型，实现总累积样本数的多对数运行时间。QC-Forest利用了Des-q，这是由Kumar等人提出的用于单棵树构建和重新训练的量子算法，通过扩展到多类分类，而原始提案仅限于二元类，并引入了一种确切的经典方法来替代底层的量子子程序，从而产生有限误差，同时保持相同的多对数依赖关系。最后，我们展示了QC-Forest在常用基准数据集上与最先进的RF方法相比具有竞争性的准确性，样本数量最多可达80,000个，并显著加快了模型重新训练的速度。

更新时间: 2024-07-11 13:32:59

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.12008v3

A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors

Research into the detection of human activities from wearable sensors is a highly active field, benefiting numerous applications, from ambulatory monitoring of healthcare patients via fitness coaching to streamlining manual work processes. We present an empirical study that evaluates and contrasts four commonly employed annotation methods in user studies focused on in-the-wild data collection. For both the user-driven, in situ annotations, where participants annotate their activities during the actual recording process, and the recall methods, where participants retrospectively annotate their data at the end of each day, the participants had the flexibility to select their own set of activity classes and corresponding labels. Our study illustrates that different labeling methodologies directly impact the annotations' quality, as well as the capabilities of a deep learning classifier trained with the data. We noticed that in situ methods produce less but more precise labels than recall methods. Furthermore, we combined an activity diary with a visualization tool that enables the participant to inspect and label their activity data. Due to the introduction of such a tool were able to decrease missing annotations and increase the annotation consistency, and therefore the F1-Score of the deep learning model by up to 8% (ranging between 82.1 and 90.4% F1-Score). Furthermore, we discuss the advantages and disadvantages of the methods compared in our study, the biases they could introduce, and the consequences of their usage on human activity recognition studies as well as possible solutions.

Updated: 2024-07-11 13:23:32

标题: 一个注释问题：来自可穿戴传感器的就地和自我回忆活动注释的实证研究

摘要: 穿戴式传感器检测人类活动的研究是一个非常活跃的领域，有利于许多应用，从通过健康护理病人的步行监测到通过健身指导简化手工工作流程。我们提出了一项实证研究，评估并对比了四种常用的注释方法在针对野外数据收集的用户研究中的应用。对于用户驱动的原位注释，在这种情况下，参与者在实际记录过程中注释他们的活动，以及回忆方法，在这种情况下，参与者在每天结束时回顾性地注释他们的数据，参与者有灵活性选择自己的活动类别集和相应的标签。我们的研究表明，不同的标记方法直接影响注释的质量，以及使用数据训练的深度学习分类器的能力。我们注意到，在原位方法中产生的标签比回忆方法更少但更精确。此外，我们结合了一种活动日记和一个可视化工具，使参与者能够检查和标记他们的活动数据。由于引入这样的工具，我们能够减少遗漏的注释，并增加注释的一致性，从而深度学习模型的F1分数提高了8%（F1分数在82.1%至90.4%之间）。此外，我们讨论了我们研究中比较的方法的优缺点，它们可能引入的偏见，以及它们在人类活动识别研究中的使用对结果的影响以及可能的解决方案。

更新时间: 2024-07-11 13:23:32

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2305.08752v3

Lynx: An Open Source Hallucination Evaluation Model

Retrieval Augmented Generation (RAG) techniques aim to mitigate hallucinations in Large Language Models (LLMs). However, LLMs can still produce information that is unsupported or contradictory to the retrieved contexts. We introduce LYNX, a SOTA hallucination detection LLM that is capable of advanced reasoning on challenging real-world hallucination scenarios. To evaluate LYNX, we present HaluBench, a comprehensive hallucination evaluation benchmark, consisting of 15k samples sourced from various real-world domains. Our experiment results show that LYNX outperforms GPT-4o, Claude-3-Sonnet, and closed and open-source LLM-as-a-judge models on HaluBench. We release LYNX, HaluBench and our evaluation code for public access.

Updated: 2024-07-11 13:22:17

标题: 猞猁：一个开源的幻觉评估模型

摘要: 检索增强生成（RAG）技术旨在减轻大型语言模型（LLMs）中的幻觉。然而，LLMs仍然可能产生不受支持或与检索上下文相矛盾的信息。我们引入LYNX，一个领先的幻觉检测LLM，能够对具有挑战性的现实世界幻觉场景进行高级推理。为了评估LYNX，我们提出了HaluBench，一个包含来自各种现实世界领域的15k个样本的全面幻觉评估基准。我们的实验结果显示LYNX在HaluBench上优于GPT-4o、Claude-3-Sonnet以及闭源和开源的LLM作为评判模型。我们公开发布LYNX、HaluBench和我们的评估代码供公众使用。

更新时间: 2024-07-11 13:22:17

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.08488v1

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

Embedding based Knowledge Graph (KG) Completion has gained much attention over the past few years. Most of the current algorithms consider a KG as a multidirectional labeled graph and lack the ability to capture the semantics underlying the schematic information. In a separate development, a vast amount of information has been captured within the Large Language Models (LLMs) which has revolutionized the field of Artificial Intelligence. KGs could benefit from these LLMs and vice versa. This vision paper discusses the existing algorithms for KG completion based on the variations for generating KG embeddings. It starts with discussing various KG completion algorithms such as transductive and inductive link prediction and entity type prediction algorithms. It then moves on to the algorithms utilizing type information within the KGs, LLMs, and finally to algorithms capturing the semantics represented in different description logic axioms. We conclude the paper with a critical reflection on the current state of work in the community and give recommendations for future directions.

Updated: 2024-07-11 13:18:29

标题: 朝着语义丰富的嵌入向知识图谱补全努力前行

摘要: 基于嵌入的知识图谱（KG）补全在过去几年中引起了广泛关注。大部分当前的算法将KG视为一个多方向标记的图，并缺乏捕捉底层语义信息的能力。在另一方面，大量信息已经被捕获在大型语言模型（LLMs）中，这彻底改变了人工智能领域。KG可以从这些LLMs中受益，反之亦然。本文探讨了基于不同生成KG嵌入的变体的KG补全现有算法。它从讨论各种KG补全算法开始，如传导和归纳链接预测以及实体类型预测算法。然后，它转向利用KG中的类型信息、LLMs的算法，最后转向捕捉不同描述逻辑公理中表示的语义的算法。我们以对社区中当前工作状态的批判性反思结束本文，并对未来方向提出建议。

更新时间: 2024-07-11 13:18:29

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2308.00081v4

Robust Generalization of Graph Neural Networks for Carrier Scheduling

Battery-free sensor tags are devices that leverage backscatter techniques to communicate with standard IoT devices, thereby augmenting a network's sensing capabilities in a scalable way. For communicating, a sensor tag relies on an unmodulated carrier provided by a neighboring IoT device, with a schedule coordinating this provisioning across the network. Carrier scheduling--computing schedules to interrogate all sensor tags while minimizing energy, spectrum utilization, and latency--is an NP-Hard optimization problem. Recent work introduces learning-based schedulers that achieve resource savings over a carefully-crafted heuristic, generalizing to networks of up to 60 nodes. However, we find that their advantage diminishes in networks with hundreds of nodes, and degrades further in larger setups. This paper introduces RobustGANTT, a GNN-based scheduler that improves generalization (without re-training) to networks up to 1000 nodes (100x training topology sizes). RobustGANTT not only achieves better and more consistent generalization, but also computes schedules requiring up to 2x less resources than existing systems. Our scheduler exhibits average runtimes of hundreds of milliseconds, allowing it to react fast to changing network conditions. Our work not only improves resource utilization in large-scale backscatter networks, but also offers valuable insights in learning-based scheduling.

Updated: 2024-07-11 13:13:24

标题: 图神经网络在载波调度中的稳健泛化

摘要: 无电池传感器标签是利用回波技术与标准物联网设备进行通信的设备，从而以可扩展的方式增强网络的感知能力。为了通信，传感器标签依赖于邻近物联网设备提供的未调制载波，并通过调度协调网络中的供应。载波调度-计算调度以在最小化能量、频谱利用和延迟的同时对所有传感器标签进行询问-是一个NP-Hard优化问题。最近的工作引入了基于学习的调度器，实现了资源节省，超越了精心设计的启发式方法，在最多60个节点的网络中泛化。然而，我们发现它们在拥有数百个节点的网络中的优势减少，并在更大规模的设置中进一步退化。本文介绍了RobustGANTT，一种基于GNN的调度器，可以改进泛化能力（无需重新训练）至多达1000个节点（训练拓扑大小的100倍）。RobustGANTT不仅实现了更好和更一致的泛化，而且计算出需要的资源比现有系统少最多2倍的调度表。我们的调度器平均运行时间为数百毫秒，使其能够快速应对网络条件的变化。我们的工作不仅改进了大规模回波网络中的资源利用，还为基于学习的调度提供了宝贵的见解。

更新时间: 2024-07-11 13:13:24

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2407.08479v1

Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing spatial complexity, potentially surpassing the efficacy of natural-language-only inputs. Expanding upon this premise, our paper introduces an open-source benchmark for multi-modal generative models tailored for Verilog synthesis from visual-linguistic inputs, addressing both singular and complex modules. Additionally, we introduce an open-source visual and natural language Verilog query language framework to facilitate efficient and user-friendly multi-modal queries. To evaluate the performance of the proposed multi-modal hardware generative AI in Verilog generation tasks, we compare it with a popular method that relies solely on natural language. Our results demonstrate a significant accuracy improvement in the multi-modal generated Verilog compared to queries based solely on natural language. We hope to reveal a new approach to hardware design in the large-hardware-design-model era, thereby fostering a more diversified and productive approach to hardware design.

Updated: 2024-07-11 13:10:09

标题: 自然语言不足以：为Verilog生成进行多模态生成AI基准测试

摘要: 自然语言接口在通过利用大型语言模型从高级规范生成Verilog方面展现出相当大的潜力，引起了相当大的关注。然而，本文阐明了视觉表示对于具有空间复杂性的硬件架构的设计意图至关重要的背景信息，潜在地超越了仅依赖自然语言输入的效果。基于这一前提，本文介绍了一个针对从视觉-语言输入中合成Verilog的多模态生成模型的开源基准，同时解决了单一和复杂模块的问题。此外，我们还引入了一个开源的视觉和自然语言Verilog查询语言框架，以促进高效且用户友好的多模态查询。为了评估所提出的多模态硬件生成AI在Verilog生成任务中的性能，我们将其与仅依赖自然语言的流行方法进行比较。我们的结果表明，与仅基于自然语言的查询相比，多模态生成的Verilog在准确性上取得了显著的改进。我们希望揭示在大型硬件设计模型时代的硬件设计中一种新的方法，从而促进更多元化和有效的硬件设计方法。

更新时间: 2024-07-11 13:10:09

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2407.08473v1

Deep Learning Safety Concerns in Automated Driving Perception

Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.

Updated: 2024-07-11 13:07:47

标题: 自动驾驶感知中的深度学习安全问题

摘要: 最近在深度学习领域取得的进展以及深度神经网络（DNNs）在感知方面的出色表现，导致人们对将其用于自动驾驶（AD）系统的需求增加。这些系统的安全性至关重要，因此需要考虑DNNs的独特特性。为了以系统化和全面的方式实现基于DNN感知组件的AD系统的安全性，所谓的安全关注点被引入作为一个合适的结构元素。一方面，安全关注点的概念从设计上与现有的与AD系统安全性相关的标准（如ISO 21448（SOTIF））保持良好的协调。另一方面，它已经激发了几篇学术论文和即将发布的关于AI安全的标准，如ISO PAS 8800。虽然安全关注点的概念之前已经被引入，但本文扩展并完善了它，利用来自各个领域和安全专家的反馈。特别是，本文引入了一个额外的分类，以便更好地理解，并使跨职能团队能够共同解决这些问题。

更新时间: 2024-07-11 13:07:47

领域: cs.LG,cs.CV,cs.SY,eess.SY

下载: http://arxiv.org/abs/2309.03774v2

Natural Language Interaction with a Household Electricity Knowledge-based Digital Twin

Domain specific digital twins, representing a digital replica of various segments of the smart grid, are foreseen as able to model, simulate, and control the respective segments. At the same time, knowledge-based digital twins, coupled with AI, may also empower humans to understand aspects of the system through natural language interaction in view of planning and policy making. This paper is the first to assess and report on the potential of Retrieval Augmented Generation (RAG) question answers related to household electrical energy measurement aspects leveraging a knowledge-based energy digital twin. Relying on the recently published electricity consumption knowledge graph that actually represents a knowledge-based digital twin, we study the capabilities of ChatGPT, Gemini and Llama in answering electricity related questions. Furthermore, we compare the answers with the ones generated through a RAG techniques that leverages an existing electricity knowledge-based digital twin. Our findings illustrate that the RAG approach not only reduces the incidence of incorrect information typically generated by LLMs but also significantly improves the quality of the output by grounding responses in verifiable data. This paper details our methodology, presents a comparative analysis of responses with and without RAG, and discusses the implications of our findings for future applications of AI in specialized sectors like energy data analysis.

Updated: 2024-07-11 13:05:37

标题: 与家庭电力知识库数字孪生体自然语言交互

摘要: 特定领域数字孪生体被视为能够建模、模拟和控制智能电网各个部分的数字副本。同时，基于知识的数字孪生体，结合人工智能，也可以使人类通过自然语言交互来理解系统的各个方面，以便规划和政策制定。本文首次评估并报告了利用基于知识的能源数字孪生体来提升家庭电气能源测量方面问题答案的检索增强生成（RAG）潜力。依靠最近发布的代表基于知识的数字孪生体的电力消耗知识图，我们研究了ChatGPT、Gemini和Llama在回答与电力相关的问题方面的能力。此外，我们将答案与通过利用现有电力基于知识的数字孪生体的RAG技术生成的答案进行比较。我们的研究结果表明，RAG方法不仅减少了通常由LLM生成的错误信息的发生率，而且通过基于可验证数据来支持响应，显著提高了输出质量。本文详细介绍了我们的方法论，提出了对比分析响应与不使用RAG的情况，并讨论了我们的发现对未来应用于专业领域如能源数据分析的人工智能的影响。

更新时间: 2024-07-11 13:05:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06566v2

Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer

This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scans, emphasizing how elements rely on each other across an extended spatial range. The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures, including location, size, and boundaries. Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches, achieving Dice score of 82.0%, 81.5%, 89.0% for Enhancing Tumor, Tumor Core and Whole Tumor, respectively, on BraTS2019.

Updated: 2024-07-11 13:04:20

标题: MRI图像中的脑瘤分割：使用3D U-Net和上下文转换器

摘要: 这项研究提出了一种增强的方法，用于在磁共振成像（MRI）中使用先进的3D-UNet模型结合上下文变换器（CoT）精确分割脑肿瘤块。通过架构扩展CoT，所提出的模型将其架构扩展为3D格式，与基础模型平滑地集成在一起，以利用MRI扫描中发现的复杂上下文信息，强调元素如何依赖于扩展的空间范围内的彼此。所提出的模型从CoT中同步肿瘤块的特征，相互加强特征提取，有助于精确捕获详细的肿瘤块结构，包括位置、大小和边界。几项实验结果展示了所提出方法在与当前最先进方法的比较中的出色分割性能，分别达到了BraTS2019上强化肿瘤、肿瘤核心和整个肿瘤的Dice分数82.0％、81.5％、89.0％。

更新时间: 2024-07-11 13:04:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08470v1

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards, based on temporal distance. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our experimental results in six simulated robotic locomotion environments demonstrate that our method significantly outperforms previous unsupervised GCRL methods in achieving a wide variety of states.

Updated: 2024-07-11 13:01:18

标题: TLDR：基于时间距离感知表征的无监督目标条件强化学习

摘要: 无监督目标条件强化学习（GCRL）是一种有前途的范式，可以在没有外部监督的情况下开发多样化的机器人技能。然而，现有的无监督GCRL方法经常很难覆盖复杂环境中的广泛状态，原因是它们的有限探索和GCRL的稀疏或嘈杂奖励。为了克服这些挑战，我们提出了一种利用时间距离感知表示（TLDR）的新型无监督GCRL方法。TLDR选择远离的目标来启动探索，并基于时间距离计算内在探索奖励和到达目标的奖励。具体来说，我们的探索策略寻找具有较大时间距离的状态（即覆盖大的状态空间），而目标条件策略学习最小化到目标的时间距离（即达到目标）。我们在六个模拟机器人运动环境中的实验结果表明，我们的方法在实现各种状态方面明显优于先前的无监督GCRL方法。

更新时间: 2024-07-11 13:01:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08464v1

Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing

Federated Learning (FL) can protect the privacy of the vehicles in vehicle edge computing (VEC) to a certain extent through sharing the gradients of vehicles' local models instead of local data. The gradients of vehicles' local models are usually large for the vehicular artificial intelligence (AI) applications, thus transmitting such large gradients would cause large per-round latency. Gradient quantization has been proposed as one effective approach to reduce the per-round latency in FL enabled VEC through compressing gradients and reducing the number of bits, i.e., the quantization level, to transmit gradients. The selection of quantization level and thresholds determines the quantization error, which further affects the model accuracy and training time. To do so, the total training time and quantization error (QE) become two key metrics for the FL enabled VEC. It is critical to jointly optimize the total training time and QE for the FL enabled VEC. However, the time-varying channel condition causes more challenges to solve this problem. In this paper, we propose a distributed deep reinforcement learning (DRL)-based quantization level allocation scheme to optimize the long-term reward in terms of the total training time and QE. Extensive simulations identify the optimal weighted factors between the total training time and QE, and demonstrate the feasibility and effectiveness of the proposed scheme.

Updated: 2024-07-11 12:58:47

标题: 基于分布式深度强化学习的梯度量化，用于支持联邦学习的车辆边缘计算

摘要: 联邦学习（FL）可以通过共享车辆本地模型的梯度而不是本地数据，在一定程度上保护车辆边缘计算（VEC）中车辆的隐私。车辆本地模型的梯度通常较大，适用于车辆人工智能（AI）应用，因此传输这些大梯度将导致每轮延迟较大。梯度量化被提出作为一种有效的方法，在FL启用的VEC中通过压缩梯度和减少比特数，即量化级别，来减少每轮延迟。量化级别和阈值的选择决定了量化误差，进而影响模型准确性和训练时间。因此，总训练时间和量化误差（QE）成为FL启用的VEC的两个关键指标。同时优化总训练时间和QE对于FL启用的VEC至关重要。然而，时变信道条件给解决这个问题带来更多挑战。本文提出了一种基于分布式深度强化学习（DRL）的量化级别分配方案，以优化长期奖励，即总训练时间和QE。大量模拟确定了总训练时间和QE之间的最佳加权因子，并展示了所提出方案的可行性和有效性。

更新时间: 2024-07-11 12:58:47

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2407.08462v1

Graph Expansions of Deep Neural Networks and their Universal Scaling Limits

We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Fa\`a di Bruno's formula to an arbitrary number of compositions. In this expansion, the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices, which we call operator graphs. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick's principle to compute the expectation of each of its terms. We then determine the leading contribution to each term by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. Furthermore, by developing a correspondence between analytic and graphical operations, we obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease. Notably, we find explicit formulae for the moments of the limiting singular value distribution of the Jacobian. We then show that all of these results hold for networks with more general weights, such as general matrices with i.i.d. entries satisfying moment assumptions, complex matrices and sparse matrices.

Updated: 2024-07-11 12:58:07

标题: 深度神经网络的图扩展及其通用缩放极限

摘要: 我们提出了一种统一的方法，利用随机矩阵理论中的流形展开技术来获得神经网络的尺度极限。这种方法从一个新颖的神经网络展开开始，这种展开类似于常微分方程的Butcher级数，并通过对Fa\`a di Bruno的公式进行推广得到。在这种展开中，单项式的作用由由有向图索引的随机多线性映射扮演，其中边对应于随机矩阵，我们称之为操作图。这种展开使激活函数的效应线性化，从而允许直接应用Wick原理来计算每个项的期望。然后，通过将相应的图嵌入到表面上，并计算它们的欧拉特征，我们确定了每个项的主要贡献。此外，通过建立解析和图形操作之间的对应关系，我们为神经切线核以及原始神经网络的输入-输出雅可比矩阵得到了类似的图展开，并相对容易地推导出它们的无限宽度极限。值得注意的是，我们找到了原始神经网络雅可比矩阵的极限奇异值分布的矩的显式公式。然后，我们展示了所有这些结果对于具有更一般权重的网络的适用性，例如满足矩假设的独立同分布条目的一般矩阵，复杂矩阵和稀疏矩阵。

更新时间: 2024-07-11 12:58:07

领域: math.PR,cs.LG

下载: http://arxiv.org/abs/2407.08459v1

Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning

Autonomous driving may be the most important application scenario of next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles. This supplements SL communication in LTE-V2X and represents the latest advancement in cellular V2X (C-V2X) with improved performance of NR-V2X. However, in NR-V2X Mode 2, resource collisions still occur, and thus degrade the age of information (AOI). Therefore, a interference cancellation method is employed to mitigate this impact by combining NR-V2X with Non-Orthogonal multiple access (NOMA) technology. In NR-V2X, when vehicles select smaller resource reservation interval (RRI), higher-frequency transmissions take ore energy to reduce AoI. Hence, it is important to jointly consider AoI and communication energy consumption based on NR-V2X communication. Then, we formulate such an optimization problem and employ the Deep Reinforcement Learning (DRL) algorithm to compute the optimal transmission RRI and transmission power for each transmitting vehicle to reduce the energy consumption of each transmitting vehicle and the AoI of each receiving vehicle. Extensive simulations have demonstrated the performance of our proposed algorithm.

Updated: 2024-07-11 12:54:38

标题: 基于深度强化学习的NR-V2X系统中信息时延和能耗的联合优化

摘要: 自动驾驶可能是下一代最重要的应用场景，使可靠且低延迟的车辆通信成为可能的无线接入技术的发展变得至关重要。为应对这一挑战，3GPP基于5G New Radio（NR）技术制定了基于车辆对一切（V2X）的规范，其中Mode 2 Side-Link（SL）通信类似于LTE-V2X中的Mode 4，允许车辆之间直接通信。这一补充了LTE-V2X中的SL通信，并代表了蜂窝V2X（C-V2X）的最新进展，具有改进的NR-V2X性能。然而，在NR-V2X Mode 2中，资源冲突仍然会发生，从而降低信息时代（AOI）。因此，采用干扰消除方法来结合NR-V2X和非正交多址接入（NOMA）技术以减轻这一影响。在NR-V2X中，当车辆选择较小的资源保留间隔（RRI）时，更高频率的传输需要更多的能量来减少AOI。因此，基于NR-V2X通信，重要的是联合考虑AOI和通信能耗。然后，我们制定了这样一个优化问题，并采用深度强化学习（DRL）算法来计算每个传输车辆的最佳传输RRI和传输功率，以减少每个传输车辆的能耗和每个接收车辆的AOI。大量模拟已经证明了我们提出的算法的性能。

更新时间: 2024-07-11 12:54:38

领域: cs.LG,cs.NI,eess.SP

下载: http://arxiv.org/abs/2407.08458v1

Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series

Although recently several foundation models for satellite remote sensing imagery have been proposed, they fail to address major challenges of real/operational applications. Indeed, embeddings that don't take into account the spectral, spatial and temporal dimensions of the data as well as the irregular or unaligned temporal sampling are of little use for most real world uses.As a consequence, we propose an ALIgned Sits Encoder (ALISE), a novel approach that leverages the spatial, spectral, and temporal dimensions of irregular and unaligned SITS while producing aligned latent representations. Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space. Additionally, thanks to a multi-view framework, we explore integration of instance discrimination along a masked autoencoding task to SITS. The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE), and a novel crop change detection dataset. Furthermore, the change detection task is performed without supervision. The results suggest that the use of aligned representations is more effective than previous SSL methods for linear probing segmentation tasks.

Updated: 2024-07-11 12:42:10

标题: 为不规则和不对齐的卫星图像时间序列打下基础模型的道路

摘要: 尽管最近提出了几种卫星遥感图像的基础模型，但它们未能解决实际/操作应用中的主要挑战。事实上，不考虑数据的光谱、空间和时间维度以及不规则或不对齐的时间采样的嵌入对大多数真实世界用途几乎没有用处。因此，我们提出了一种ALIgned Sits Encoder（ALISE）的新方法，它利用不规则和不对齐的SITS的空间、光谱和时间维度，同时生成对齐的潜在表示。与目前可用于SITS的SSL模型不同，ALISE结合了灵活的查询机制，将SITS投影到一个共同的、经过学习的时间投影空间中。此外，通过多视图框架，我们探索了将实例区分融入到SITS的掩蔽自动编码任务中。通过三个下游任务评估生成的表示质量：作物分割（PASTIS）、土地覆盖分割（MultiSenGE）以及一种新颖的作物变化检测数据集。此外，变化检测任务是无监督完成的。结果表明，与以前的SSL方法相比，使用对齐表示对线性探测分割任务更有效。

更新时间: 2024-07-11 12:42:10

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.08448v1

Protein intrinsic disorder prediction using Attention U-Net and ProtTrans protein language model

The prediction of intrinsic disorder regions has significant implications for understanding protein function, structure, and dynamics. It can help to discover novel functions or protein-protein interactions essential to designing new drugs, therapies, or enzymes. Recently, a new generation of predictors based on protein language models is emerging. These algorithms reach state-of-the-art accuracy without calculating time-consuming multiple sequence alignments (MSAs). The article pre-sents a new protein intrinsic disorder predictor DisorderUnetLM based on the Attention U-Net convolutional neural network using features from the protein language model ProtTrans. DisorderUnetLM shows top results in the direct comparison with flDPnn and IDP-CRF predictors using MSAs and with the SETH predictor using features from the same ProtTrans model. Moreover, among 41 predictors from the latest Critical Assessment of Protein Intrinsic Disorder Prediction (CAID-2) benchmark, it ranks 9th for the Disorder-PDB subset (with ROC-AUC of 0.924) and 1st for the Disorder-NOX subset (with ROC-AUC of 0.844) which confirms its potential to perform well in the upcoming CAID-3 challenge for which Disor-derUnetLM was submitted.

Updated: 2024-07-11 12:41:51

标题: 使用Attention U-Net和ProtTrans蛋白质语言模型进行蛋白质内在无序预测

摘要: 预测内在无序区域对于理解蛋白质的功能、结构和动态具有重要意义。它可以帮助发现设计新药物、疗法或酶所必需的新功能或蛋白质-蛋白质相互作用。最近，基于蛋白质语言模型的新一代预测器正在出现。这些算法在不计算耗时的多序列比对（MSAs）的情况下达到了最先进的准确性。本文介绍了一种基于Attention U-Net卷积神经网络的新蛋白质内在无序预测器DisorderUnetLM，使用来自蛋白质语言模型ProtTrans的特征。DisorderUnetLM在直接与flDPnn和IDP-CRF预测器使用MSAs进行比较时取得了最佳结果，并与使用相同ProtTrans模型特征的SETH预测器进行了比较。此外，在最新的蛋白质内在无序预测关键评估（CAID-2）基准测试中，它在Disorder-PDB子集中排名第9位（ROC-AUC为0.924），在Disorder-NOX子集中排名第1位（ROC-AUC为0.844），这证实了其在即将到来的CAID-3挑战中表现良好的潜力，DisorderUnetLM已被提交。

更新时间: 2024-07-11 12:41:51

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2404.08108v2

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.

Updated: 2024-07-11 12:33:53

标题: Helios：一种极低功耗的事件驱动手势识别技术，用于始终处于开启状态的智能眼镜

摘要: 这篇论文介绍了Helios，这是第一个专为全天候智能眼镜设计的极低功耗、实时、事件驱动的手势识别系统。随着增强现实（AR）的发展，像Meta Ray-Bans这样的当前智能眼镜更注重视觉和佩戴舒适，但功能性却受损。这些设备中现有的人机界面（HMIs），如电容触摸和语音控制，在人体工程学、隐私和功耗方面存在限制。Helios通过利用自然手部互动来解决这些挑战，为用户提供更直观、更舒适的体验。我们的系统利用一款极低功耗且紧凑的3mmx4mm/20mW事件相机，在全天候智能眼镜上进行基于自然手势的识别。相机的输出由在NXP Nano UltraLite计算平台上运行的卷积神经网络（CNN）进行处理，功耗不到350mW。Helios能够识别七类手势，包括轻微的微手势，如滑动和捏合，准确率达到91%。我们还展示了在20名用户中实时性能，在惊人的低延迟60ms。我们的用户测试结果与我们最近在AWE-USA-2024成功演示时收到的积极反馈一致。

更新时间: 2024-07-11 12:33:53

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.05206v2

How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

We introduce a novel classification framework for time-series imputation using deep learning, with a particular focus on clinical data. By identifying conceptual gaps in the literature and existing reviews, we devise a taxonomy grounded on the inductive bias of neural imputation frameworks, resulting in a classification of existing deep imputation strategies based on their suitability for specific imputation scenarios and data-specific properties. Our review further examines the existing methodologies employed to benchmark deep imputation models, evaluating their effectiveness in capturing the missingness scenarios found in clinical data and emphasising the importance of reconciling mathematical abstraction with clinical insights. Our classification aims to serve as a guide for researchers to facilitate the selection of appropriate deep learning imputation techniques tailored to their specific clinical data. Our novel perspective also highlights the significance of bridging the gap between computational methodologies and medical insights to achieve clinically sound imputation models.

Updated: 2024-07-11 12:33:28

标题: 你的猜测有多深？对医疗时间序列填充的深度学习新视角

摘要: 我们介绍了一个新颖的基于深度学习的时间序列插补分类框架，特别关注临床数据。通过识别文献和现有综述中的概念空白，我们设计了一个基于神经插补框架归纳偏差的分类法，从而对现有的深度插补策略进行分类，根据它们适用于特定插补场景和数据特性的情况。我们的综述进一步审查了用于基准测试深度插补模型的现有方法论，评估它们在捕捉临床数据中的缺失情况方面的有效性，并强调了在数学抽象和临床见解之间进行调和的重要性。我们的分类旨在为研究人员提供指南，以便选择适合其特定临床数据的深度学习插补技术。我们的新颖观点还强调了弥合计算方法和医学见解之间的差距以实现临床合理的插补模型的重要性。

更新时间: 2024-07-11 12:33:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08442v1

Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation

Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.

Updated: 2024-07-11 12:30:19

标题: 大型语言模型真的没有偏见吗？用来评估对偏见引导的敌对鲁棒性的越狱提示

摘要: 大型语言模型(LLMs)已经彻底改变了人工智能，展示了令人瞩目的计算能力和语言能力。然而，这些模型在本质上容易受到训练数据中各种偏见的影响。这些偏见包括选择、语言和确认偏见，以及与性别、种族、性取向、宗教、社会经济地位、残疾和年龄相关的常见陈规。本研究探讨了最新LLMs提供的回应中这些偏见的存在，分析了对它们的公正性和可靠性的影响。我们还调查了如何利用已知的提示工程技术来有效地揭示LLMs的隐藏偏见，测试它们对专门设计用于引发偏见的越狱提示的对抗鲁棒性。通过使用不同规模的最广泛使用的LLMs进行广泛实验，证实了LLMs仍然可以被操纵以产生有偏见或不当的回应，尽管它们具有先进的能力和复杂的对齐过程。我们的研究结果强调了增强缓解技术以解决这些安全问题的重要性，朝着更可持续和包容的人工智能发展。

更新时间: 2024-07-11 12:30:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08441v1

Beyond Instruction Following: Evaluating Rule Following of Large Language Models

Although Large Language Models (LLMs) have demonstrated strong instruction-following ability to be helpful, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, and accurate in responses. This demands the possession of rule-following capability of LLMs. However, few works have made a clear evaluation of the rule-following capability of LLMs. Previous studies that try to evaluate the rule-following capability of LLMs fail to distinguish the rule-following scenarios from the instruction-following scenarios. Therefore, this paper first makes a clarification of the concept of rule-following, and curates a comprehensive benchmark, RuleBench, to evaluate a diversified range of rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our further analysis provides insights into the improvements for LLMs toward a better rule-following intelligent agent. The data and code can be found at: https://anonymous.4open.science/r/llm-rule-following-B3E3/

Updated: 2024-07-11 12:26:55

标题: 超越指令遵循：评估大型语言模型的规则遵循

摘要: 尽管大型语言模型（LLMs）已经展示出强大的遵循指令能力，但在现实场景中，它们进一步被认为需要受规则控制和引导以确保安全，并准确地响应。这要求LLMs具有遵循规则的能力。然而，很少有研究对LLMs的遵循规则能力进行明确评估。先前的研究试图评估LLMs的遵循规则能力却未能区分遵循规则场景和遵循指令场景。因此，本文首先澄清了遵循规则的概念，并策划了一个全面的基准测试，RuleBench，以评估各种规则遵循能力。我们在多种LLMs上的实验结果表明它们仍然在遵循规则方面存在局限性。我们进一步的分析提供了对LLMs改进的见解，以朝着一个更好的遵循规则的智能代理。数据和代码可以在以下链接找到：https://anonymous.4open.science/r/llm-rule-following-B3E3/

更新时间: 2024-07-11 12:26:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08440v1

Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model

Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by decoding to optimize a utility function backed by a metric or quality-estimation signal, as is done by Minimum Bayes Risk (MBR) or quality-aware decoding. The main disadvantage of these approaches is that they require an additional model to calculate the utility function during decoding, significantly increasing the computational cost. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. Using this approach for MBR decoding we can drastically reduce the size of the candidate list, resulting in a speed-up of two-orders of magnitude. When applying our method to MAP decoding we obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.

Updated: 2024-07-11 12:25:06

标题: 质量感知翻译模型：在单一模型中实现高效生成和质量评估

摘要: 最大后验（MAP）解码是神经机器翻译（NMT）模型中最广泛使用的解码策略。其基本假设是模型概率与人类判断相关性强，翻译质量更高的会被模型赋予更高的分数。然而，研究表明这一假设并不总是成立，通过优化一个由指标或质量估计信号支持的效用函数进行解码，如最小贝叶斯风险（MBR）或质量感知解码，可以改善生成质量。这些方法的主要缺点是需要额外的模型在解码过程中计算效用函数，显著增加了计算成本。本文提出通过训练NMT模型自身来评估其输出质量，使模型本身具备质量感知能力。使用这种方法进行MBR解码可以大幅减少候选列表的大小，实现两个数量级的加速。当将我们的方法应用于MAP解码时，我们获得了与质量重新排序方法相似甚至更优的质量提升，但同时保持了单遍解码的效率。

更新时间: 2024-07-11 12:25:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.06707v4

Improve Load Forecasting in Energy Communities through Transfer Learning using Open-Access Synthetic Profiles

According to a conservative estimate, a 1% reduction in forecast error for a 10 GW energy utility can save up to $ 1.6 million annually. In our context, achieving precise forecasts of future power consumption is crucial for operating flexible energy assets using model predictive control approaches. Specifically, this work focuses on the load profile forecast of a first-year energy community with the common practical challenge of limited historical data availability. We propose to pre-train the load prediction models with open-access synthetic load profiles using transfer learning techniques to tackle this challenge. Results show that this approach improves both, the training stability and prediction error. In a test case with 74 households, the prediction mean squared error (MSE) decreased from 0.34 to 0.13, showing transfer learning based on synthetic load profiles to be a viable approach to compensate for a lack of historic data.

Updated: 2024-07-11 12:17:31

标题: 通过使用开放式合成配置文件的迁移学习改进能源社区的负荷预测

摘要: 根据保守估计，对于一个10GW的能源公用事业单位，预测误差的1%降低可以每年节省高达160万美元。在我们的背景下，精确预测未来能源消耗对于利用模型预测控制方法操作灵活能源资产至关重要。具体来说，本研究关注具有有限历史数据可用性的首年能源社区的负荷曲线预测这一共同的实际挑战。我们提出使用迁移学习技术预训练负荷预测模型，利用开放获取的合成负荷曲线来解决这一挑战。结果显示，这种方法提高了训练稳定性和预测误差。在一个包含74个家庭的测试案例中，预测均方误差（MSE）从0.34降低到0.13，显示基于合成负荷曲线的迁移学习是一种可行的方法来弥补历史数据不足。

更新时间: 2024-07-11 12:17:31

领域: cs.LG

下载: http://arxiv.org/abs/2407.08434v1

Chunking: Continual Learning is not just about Distribution Shift

Work on continual learning (CL) has thus far largely focused on the problems arising from shifts in the data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem, the chunking of data. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. Therefore, we show that chunking is both an important and currently unaddressed sub-problem and until it is addressed CL methods will be capped in performance. Additionally, we analyse why performance drops when learning occurs on identically distributed chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. We also show that performance on the chunking sub-problem can be increased and that this performance transfers to the full CL setting, where there is distribution shift. Hence, we argue that work on chunking can help advance CL in general.

Updated: 2024-07-11 12:13:38

标题: 分块：持续学习不仅仅是关于分布转移

摘要: 迄今为止，关于持续学习（CL）的研究主要集中在由数据分布变化引起的问题上。然而，CL可以分解为两个子问题：（a）数据分布的变化，和（b）处理数据被分成块并且在任何时间点只有部分数据可用于训练的事实。在这项工作中，我们看到后者子问题，即数据分块。我们展示了数据分块是CL的一个重要部分，在我们的实验中占离线学习性能下降的大约一半。此外，我们的结果显示，当前的CL算法并未解决数据分块子问题，只有在数据分布没有变化时才能表现得像普通的SGD训练一样好。因此，我们指出数据分块既是一个重要的且目前未解决的子问题，直到解决了这个问题，CL方法的性能将受到限制。此外，我们分析了当学习发生在相同分布的数据块上时性能下降的原因，并发现遗忘，通常被认为是由于数据分布变化而出现的问题，仍然存在并且是一个重要问题。我们还展示了在数据分块子问题上的性能可以提高，并且这种性能可以转移到完整的CL设置中，其中存在数据分布变化。因此，我们认为对数据分块的研究可以帮助推动CL的进展。

更新时间: 2024-07-11 12:13:38

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.02206v2

Mind the Gap! Bridging Explainable Artificial Intelligence and Human Understanding with Luhmann's Functional Theory of Communication

Over the past decade explainable artificial intelligence has evolved from a predominantly technical discipline into a field that is deeply intertwined with social sciences. Insights such as human preference for contrastive -- more precisely, counterfactual -- explanations have played a major role in this transition, inspiring and guiding the research in computer science. Other observations, while equally important, have nevertheless received much less consideration. The desire of human explainees to communicate with artificial intelligence explainers through a dialogue-like interaction has been mostly neglected by the community. This poses many challenges for the effectiveness and widespread adoption of such technologies as delivering a single explanation optimised according to some predefined objectives may fail to engender understanding in its recipients and satisfy their unique needs given the diversity of human knowledge and intention. Using insights elaborated by Niklas Luhmann and, more recently, Elena Esposito we apply social systems theory to highlight challenges in explainable artificial intelligence and offer a path forward, striving to reinvigorate the technical research in the direction of interactive and iterative explainers. Specifically, this paper demonstrates the potential of systems theoretical approaches to communication in elucidating and addressing the problems and limitations of human-centred explainable artificial intelligence.

Updated: 2024-07-11 12:13:04

标题: 注意差距！用卢曼的功能性沟通理论弥合可解释人工智能和人类理解之间的鸿沟

摘要: 在过去的十年里，可解释的人工智能已经从一个主要是技术性的学科发展成为与社会科学深度交织的领域。人类对对比性解释（更准确地说是反事实的解释）的偏好等见解在这一转变中发挥了重要作用，激发和引导了计算机科学领域的研究。其他同样重要的观察虽然也存在，但却受到了较少的关注。人类被解释者希望通过对话式互动与人工智能解释者进行交流的愿望在学术界大多被忽视。这给这些技术的有效性和广泛应用带来了许多挑战，因为根据一些预先设定的目标优化的单一解释可能无法使接收者理解并满足他们独特的需求，考虑到人类知识和意图的多样性。利用尼克拉斯·卢曼和最近埃琳娜·埃斯波西托的见解，我们运用社会系统理论来强调可解释的人工智能面临的挑战，并提出一条前进之路，努力将技术研究重新引向交互式和迭代式解释者的方向。具体来说，本文展示了系统理论方法在阐明和解决以人为中心的可解释人工智能的问题和局限性方面的潜力。

更新时间: 2024-07-11 12:13:04

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2302.03460v3

Subgroup-Specific Risk-Controlled Dose Estimation in Radiotherapy

Cancer remains a leading cause of death, highlighting the importance of effective radiotherapy (RT). Magnetic resonance-guided linear accelerators (MR-Linacs) enable imaging during RT, allowing for inter-fraction, and perhaps even intra-fraction, adjustments of treatment plans. However, achieving this requires fast and accurate dose calculations. While Monte Carlo simulations offer accuracy, they are computationally intensive. Deep learning frameworks show promise, yet lack uncertainty quantification crucial for high-risk applications like RT. Risk-controlling prediction sets (RCPS) offer model-agnostic uncertainty quantification with mathematical guarantees. However, we show that naive application of RCPS may lead to only certain subgroups such as the image background being risk-controlled. In this work, we extend RCPS to provide prediction intervals with coverage guarantees for multiple subgroups with unknown subgroup membership at test time. We evaluate our algorithm on real clinical planing volumes from five different anatomical regions and show that our novel subgroup RCPS (SG-RCPS) algorithm leads to prediction intervals that jointly control the risk for multiple subgroups. In particular, our method controls the risk of the crucial voxels along the radiation beam significantly better than conventional RCPS.

Updated: 2024-07-11 12:12:55

标题: 放射治疗中亚组特异风险控制的剂量估计

摘要: 癌症仍然是导致死亡的主要原因，突显了有效放疗（RT）的重要性。磁共振引导直线加速器（MR-Linacs）使放疗过程中进行成像成为可能，允许在分次之间，甚至可能在分次内进行治疗计划的调整。然而，实现这一点需要快速准确的剂量计算。虽然蒙特卡罗模拟提供了准确性，但计算量大。深度学习框架显示出潜力，但缺乏对于高风险应用如RT至关重要的不确定性量化。风险控制预测集（RCPS）提供了数学保证的模型无关不确定性量化。然而，我们表明，对RCPS的天真应用可能只导致某些子组如背景图像被风险控制。在这项工作中，我们扩展了RCPS以为多个在测试时未知的子组提供覆盖保证的预测区间。我们在来自五个不同解剖区域的真实临床计划体积上评估了我们的算法，并展示了我们的新颖子组RCPS（SG-RCPS）算法导致联合控制多个子组的风险的预测区间。特别地，我们的方法在辐射束沿线的关键体素的风险控制方面明显优于传统RCPS。

更新时间: 2024-07-11 12:12:55

领域: cs.LG

下载: http://arxiv.org/abs/2407.08432v1

WineGraph: A Graph Representation For Food-Wine Pairing

We present WineGraph, an extended version of FlavorGraph, a heterogeneous graph incorporating wine data into its structure. This integration enables food-wine pairing based on taste and sommelier-defined rules. Leveraging a food dataset comprising 500,000 reviews and a wine reviews dataset with over 130,000 entries, we computed taste descriptors for both food and wine. This information was then utilised to pair food items with wine and augment FlavorGraph with additional data. The results demonstrate the potential of heterogeneous graphs to acquire supplementary information, proving beneficial for wine pairing.

Updated: 2024-07-11 12:12:48

标题: WineGraph: 一种用于食物与葡萄酒搭配的图表示方法

摘要: 我们提出WineGraph，这是FlavorGraph的扩展版本，是一个包含葡萄酒数据的异质图。这种整合使得基于口味和侍酒师定义的规则进行食物和葡萄酒搭配成为可能。利用包含50万条评论的食物数据集和包含超过13万条条目的葡萄酒评论数据集，我们为食物和葡萄酒计算了口味描述符。然后利用这些信息将食物与葡萄酒进行搭配，并向FlavorGraph添加额外数据。结果展示了异质图获取补充信息的潜力，证明对于葡萄酒搭配是有益的。

更新时间: 2024-07-11 12:12:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.00107v2

A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models have laid a solid foundation for the growing interest in this area. Despite the significant progress, the task of human video generation remains challenging due to the consistency of characters, the complexity of human motion, and difficulties in their relationship with the environment. This survey provides a comprehensive review of the current state of human video generation, marking, to the best of our knowledge, the first extensive literature review in this domain. We start with an introduction to the fundamentals of human video generation and the evolution of generative models that have facilitated the field's growth. We then examine the main methods employed for three key sub-tasks within human video generation: text-driven, audio-driven, and pose-driven motion generation. These areas are explored concerning the conditions that guide the generation process. Furthermore, we offer a collection of the most commonly utilized datasets and the evaluation metrics that are crucial in assessing the quality and realism of generated videos. The survey concludes with a discussion of the current challenges in the field and suggests possible directions for future research. The goal of this survey is to offer the research community a clear and holistic view of the advancements in human video generation, highlighting the milestones achieved and the challenges that lie ahead.

Updated: 2024-07-11 12:09:05

标题: 一个关于人类视频生成的全面调查：挑战、方法和见解

摘要: 人类视频生成是一项动态且快速发展的任务，旨在利用生成模型合成2D人体视频序列，给定控制条件如文本、音频和姿势。在电影、游戏和虚拟通信等广泛应用领域，生成自然而逼真的人类视频的能力至关重要。生成模型的最新进展为这一领域的日益增长的兴趣奠定了坚实基础。尽管取得了显著进展，人类视频生成的任务仍然具有挑战性，原因在于角色的一致性、人类运动的复杂性以及与环境之间的关系困难。本调查对人类视频生成的当前状态进行了全面回顾，据我们所知，这是该领域首次进行的广泛文献综述。我们首先介绍了人类视频生成的基础知识和促进该领域发展的生成模型的演变。然后，我们研究了在人类视频生成的三个关键子任务中所采用的主要方法：基于文本驱动、基于音频驱动和基于姿势驱动的运动生成。这些领域是根据引导生成过程的条件来探讨的。此外，我们提供了最常用的数据集和评估指标的集合，这些评估指标对评估生成视频的质量和逼真度至关重要。调查最后讨论了该领域的当前挑战，并提出了未来研究的可能方向。本调查的目标是为研究社区提供对人类视频生成进展的清晰和整体性视图，突出取得的里程碑和未来面临的挑战。

更新时间: 2024-07-11 12:09:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08428v1

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

In time series anomaly detection (TSAD), the scarcity of labeled data poses a challenge to the development of accurate models. Unsupervised domain adaptation (UDA) offers a solution by leveraging labeled data from a related domain to detect anomalies in an unlabeled target domain. However, existing UDA methods assume consistent anomalous classes across domains. To address this limitation, we propose a novel Domain Adaptation Contrastive learning model for Anomaly Detection in multivariate time series (DACAD), combining UDA with contrastive learning. DACAD utilizes an anomaly injection mechanism that enhances generalization across unseen anomalous classes, improving adaptability and robustness. Additionally, our model employs supervised contrastive loss for the source domain and self-supervised contrastive triplet loss for the target domain, ensuring comprehensive feature representation learning and domain-invariant feature extraction. Finally, an effective Centre-based Entropy Classifier (CEC) accurately learns normal boundaries in the source domain. Extensive evaluations on multiple real-world datasets and a synthetic dataset highlight DACAD's superior performance in transferring knowledge across domains and mitigating the challenge of limited labeled data in TSAD.

Updated: 2024-07-11 12:04:13

标题: DACAD：多元时间序列异常检测的领域适应对比学习

摘要: 在时间序列异常检测（TSAD）中，标记数据的稀缺性给准确建模带来了挑战。无监督领域自适应（UDA）通过利用相关领域的标记数据来检测未标记目标领域中的异常，提供了一种解决方案。然而，现有的UDA方法假设跨领域存在一致的异常类别。为了解决这一局限性，我们提出了一种新颖的多变量时间序列异常检测领域自适应对比学习模型（DACAD），将UDA与对比学习相结合。DACAD利用异常注入机制增强了对未见异常类别的泛化能力，提高了适应性和鲁棒性。此外，我们的模型在源领域采用了监督对比损失，并在目标领域采用了自监督对比三元损失，确保了全面的特征表示学习和领域不变特征提取。最后，一种有效的基于中心熵分类器（CEC）准确地学习了源领域的正常边界。在多个真实世界数据集和一个合成数据集上进行的广泛评估突显了DACAD在跨领域知识转移和缓解TSAD中有限标记数据挑战方面的优越性能。

更新时间: 2024-07-11 12:04:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.11269v2

On the (In)Security of LLM App Stores

LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, the development of a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries, and automated monitoring tools to identify and mitigate threats. We uncovered that 15,146 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful content such as hate speech, self-harm, extremism, etc. Additionally, we evaluated the potential for LLM apps to facilitate malicious activities, finding that 616 apps could be used for malware generation, phishing, etc. Our findings highlight the urgent need for robust regulatory frameworks and enhanced enforcement mechanisms.

Updated: 2024-07-11 12:03:32

标题: 关于LLM应用商店的（不）安全性

摘要: LLM应用商店已经迅速增长，导致了大量定制的LLM应用的泛滥。然而，这种扩展引发了安全方面的担忧。在本研究中，我们提出了一个三层关注框架，用于识别LLM应用的潜在安全风险，即具有滥用潜力的LLM应用、恶意意图的LLM应用和具有可利用漏洞的LLM应用。在五个月的时间里，我们从六个主要应用商店（GPT Store、FlowGPT、Poe、Coze、Cici和Character.AI）收集了786,036个LLM应用。我们的研究整合了静态和动态分析，开发了一个包含31,783个条目以上的大规模有毒词典（即ToxicDict），并使用自动化监控工具来识别和缓解威胁。我们发现有15,146个应用有误导性描述，1,366个应用违反了其隐私政策收集了敏感个人信息，15,996个应用生成了有害内容，如仇恨言论、自残、极端主义等。此外，我们评估了LLM应用促进恶意活动的潜力，发现有616个应用可以用于生成恶意软件、网络钓鱼等。我们的发现强调了制定健全的监管框架和加强执法机制的迫切需要。

更新时间: 2024-07-11 12:03:32

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.08422v1

Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

We propose masked particle modeling (MPM) as a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs for use in high energy physics (HEP) scientific data. This work provides a novel scheme to perform masked modeling based pre-training to learn permutation invariant functions on sets. More generally, this work provides a step towards building large foundation models for HEP that can be generically pre-trained with self-supervised learning and later fine-tuned for a variety of down-stream tasks. In MPM, particles in a set are masked and the training objective is to recover their identity, as defined by a discretized token representation of a pre-trained vector quantized variational autoencoder. We study the efficacy of the method in samples of high energy jets at collider physics experiments, including studies on the impact of discretization, permutation invariance, and ordering. We also study the fine-tuning capability of the model, showing that it can be adapted to tasks such as supervised and weakly supervised jet classification, and that the model can transfer efficiently with small fine-tuning data sets to new classes and new data domains.

Updated: 2024-07-11 11:55:25

标题: 集合上的掩蔽粒子建模：走向自监督高能物理基础模型

摘要: 我们提出了掩模粒子建模（MPM）作为一种自监督方法，用于学习高能物理（HEP）科学数据中无序输入的通用、可转移和可重复使用的表示。这项工作提供了一种新颖的方案，通过基于掩模建模的预训练来学习集合上的排列不变函数。更一般地，这项工作是朝着构建能够通过自监督学习进行通用预训练，并随后对各种下游任务进行微调的HEP大型基础模型迈出的一步。在MPM中，集合中的粒子被掩模，训练目标是恢复它们的身份，即通过预训练的向量量化变分自动编码器的离散标记表示定义。我们研究了这种方法在对撞机物理实验中的高能喷流样本中的有效性，包括对离散化、排列不变性和排序的影响的研究。我们还研究了模型的微调能力，表明它可以适应监督和弱监督喷流分类等任务，并且该模型可以通过少量微调数据集高效地转移到新的类别和新的数据领域。

更新时间: 2024-07-11 11:55:25

领域: hep-ph,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2401.13537v3

SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference

Neural Architecture Search (NAS) for Federated Learning (FL) is an emerging field. It automates the design and training of Deep Neural Networks (DNNs) when data cannot be centralized due to privacy, communication costs, or regulatory restrictions. Recent federated NAS methods not only reduce manual effort but also help achieve higher accuracy than traditional FL methods like FedAvg. Despite the success, existing federated NAS methods still fall short in satisfying diverse deployment targets common in on-device inference like hardware, latency budgets, or variable battery levels. Most federated NAS methods search for only a limited range of neuro-architectural patterns, repeat them in a DNN, thereby restricting achievable performance. Moreover, these methods incur prohibitive training costs to satisfy deployment targets. They perform the training and search of DNN architectures repeatedly for each case. SuperFedNAS addresses these challenges by decoupling the training and search in federated NAS. SuperFedNAS co-trains a large number of diverse DNN architectures contained inside one supernet in the FL setting. Post-training, clients perform NAS locally to find specialized DNNs by extracting different parts of the trained supernet with no additional training. SuperFedNAS takes O(1) (instead of O(N)) cost to find specialized DNN architectures in FL for any N deployment targets. As part of SuperFedNAS, we introduce MaxNet - a novel FL training algorithm that performs multi-objective federated optimization of a large number of DNN architectures ($\approx 5*10^8$) under different client data distributions. Overall, SuperFedNAS achieves upto 37.7% higher accuracy for the same MACs or upto 8.13x reduction in MACs for the same accuracy than existing federated NAS methods.

Updated: 2024-07-11 11:53:21

标题: SuperFedNAS：面向设备推断的成本高效联合神经架构搜索

摘要: 神经架构搜索（NAS）用于联邦学习（FL）是一个新兴领域。它自动化了当数据由于隐私、通信成本或监管限制而无法集中时，深度神经网络（DNNs）的设计和训练。最近的联邦NAS方法不仅减少了手动工作量，还帮助实现比传统FL方法如FedAvg更高的准确性。尽管取得了成功，现有的联邦NAS方法在满足在设备推断中常见的各种部署目标方面仍有不足，如硬件、延迟预算或可变电池电量。大多数联邦NAS方法仅搜索有限范围的神经结构模式，并在DNN中重复它们，从而限制了可实现的性能。此外，这些方法为满足部署目标而产生了不可接受的训练成本。它们为每种情况重复执行DNN架构的训练和搜索。SuperFedNAS通过在联邦NAS中解耦训练和搜索来解决这些挑战。SuperFedNAS在FL设置中同时训练大量包含在一个超网络中的多样化DNN架构。训练后，客户端通过在训练好的超网络中提取不同部分，本地执行NAS以找到专门的DNN，而无需额外的训练。SuperFedNAS在FL中寻找专门化的DNN架构的成本为O（1）（而不是O（N））来满足任何N个部署目标。作为SuperFedNAS的一部分，我们引入了MaxNet - 一种新颖的FL训练算法，它在不同客户端数据分布下对大量DNN架构（约5*10^8）进行多目标联邦优化。总体而言，SuperFedNAS实现了比现有联邦NAS方法更高达37.7％的准确性，或者在相同准确性下最多减少8.13倍的MACs。

更新时间: 2024-07-11 11:53:21

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2301.10879v3

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/WZDTHU/PredBench.

Updated: 2024-07-11 11:51:36

标题: PredBench：在不同学科中对时空预测进行基准测试

摘要: 在本文中，我们介绍了PredBench，这是一个专为对时空预测网络进行整体评估而量身定制的基准测试。尽管在这个领域取得了显著进展，但仍然缺乏一个详细和比较分析各种预测网络架构的标准化框架。PredBench通过进行大规模实验、坚持标准化和适当的实验设置，并实施多维评估来填补这一空白。这个基准测试集成了12种广泛采用的方法和15个不同领域的多样化数据集，提供了对当代时空预测网络的广泛评估。通过在各种应用中精心校准预测设置，PredBench确保对其预期用途相关的评估，并实现公平比较。此外，它的多维评估框架通过全面的指标集拓宽了分析范围，提供了对模型能力的深入洞察。我们研究的发现为未来发展方向提供了战略指导。我们的代码库可在https://github.com/WZDTHU/PredBench 上找到。

更新时间: 2024-07-11 11:51:36

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.08418v1

Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges

Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense frameworks have been proposed, demonstrating effectiveness in specific settings and scenarios. To provide a clear understanding of the current research landscape, this paper reviews the most representative and state-of-the-art threats and defense frameworks throughout the FL service life cycle. We start by identifying FL threats that harm utility and privacy, including those with potential or direct impacts. Then, we dive into the defense frameworks, analyze the relationship between threats and defenses, and compare the trade-offs among different defense strategies. Finally, we summarize current research bottlenecks and offer insights into future research directions to conclude this survey. We hope this survey sheds light on trustworthy FL research and contributes to the FL community.

Updated: 2024-07-11 11:50:03

标题: 《联邦学习生命周期中的威胁与防御：全面调查与挑战》

摘要: 联邦学习（FL）为隐私保护的协作式机器学习（ML）提供了创新解决方案。尽管具有巨大潜力，但由于其分布式性质，FL容易受到各种攻击，影响FL服务的整个生命周期。这些威胁可能损害模型的效用或者牵涉到参与者的隐私，无论是直接还是间接的。为应对这些威胁，已经提出了许多防御框架，在特定环境和场景中显示出了有效性。为了清晰了解当前研究现状，本文回顾了整个FL服务生命周期中最具代表性和最先进的威胁和防御框架。我们首先识别了损害效用和隐私的FL威胁，包括那些具有潜在或直接影响的威胁。然后，我们深入研究防御框架，分析威胁与防御之间的关系，并比较不同防御策略之间的权衡。最后，我们总结了当前研究的瓶颈，并提供了未来研究方向的见解以结束本次调查。我们希望这次调查能够为值得信赖的FL研究提供启示，并为FL社区做出贡献。

更新时间: 2024-07-11 11:50:03

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.06754v2

Unveiling the Potential of BERTopic for Multilingual Fake News Analysis -- Use Case: Covid-19

Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tuning and selection of hyperparameters, e.g., for dimension reduction and clustering. This paper aims to analyse the technical application of BERTopic in practice. For this purpose, it compares and selects different methods and hyperparameters for each stage of BERTopic through density based clustering validation and six different topic coherence measures. Moreover, it also aims to analyse the results of topic modeling on real world data as a use case. For this purpose, the German fake news dataset (GermanFakeNCovid) on Covid-19 was created by us and in order to experiment with topic modeling in a multilingual (English and German) setting combined with the FakeCovid dataset. With the final results, we were able to determine thematic similarities between the United States and Germany. Whereas, distinguishing the topics of fake news from India proved to be more challenging.

Updated: 2024-07-11 11:47:43

标题: 揭示BERTopic在多语言假新闻分析中的潜力 -- 案例研究：Covid-19

摘要: 主题建模经常被用于分析大型文本语料库，如新闻文章或社交媒体数据。BERTopic由句子嵌入、降维、聚类和主题提取组成，是最新的并且目前是SOTA主题建模方法。然而，当前的主题建模方法仍有改进的空间，因为作为无监督方法，它们需要仔细调整和选择超参数，例如降维和聚类。本文旨在分析BERTopic在实践中的技术应用。为此，通过基于密度的聚类验证和六种不同的主题连贯性度量，比较和选择了BERTopic每个阶段的不同方法和超参数。此外，本文还旨在分析主题建模在真实世界数据上的结果作为一个用例。为此，我们创建了德国虚假新闻数据集（GermanFakeNCovid）关于Covid-19，并为了在多语言（英语和德语）设置中与FakeCovid数据集结合进行主题建模实验。通过最终结果，我们能够确定美国和德国之间的主题相似性。然而，区分来自印度的虚假新闻主题被证明更具挑战性。

更新时间: 2024-07-11 11:47:43

领域: cs.LG

下载: http://arxiv.org/abs/2407.08417v1

Parallelizing Autoregressive Generation with Variational State Space Models

Attention-based models such as Transformers and recurrent models like state space models (SSMs) have emerged as successful methods for autoregressive sequence modeling. Although both enable parallel training, none enable parallel generation due to their autoregressiveness. We propose the variational SSM (VSSM), a variational autoencoder (VAE) where both the encoder and decoder are SSMs. Since sampling the latent variables and decoding them with the SSM can be parallelized, both training and generation can be conducted in parallel. Moreover, the decoder recurrence allows generation to be resumed without reprocessing the whole sequence. Finally, we propose the autoregressive VSSM that can be conditioned on a partial realization of the sequence, as is common in language generation tasks. Interestingly, the autoregressive VSSM still enables parallel generation. We highlight on toy problems (MNIST, CIFAR) the empirical gains in speed-up and show that it competes with traditional models in terms of generation quality (Transformer, Mamba SSM).

Updated: 2024-07-11 11:41:29

标题: 使用变分状态空间模型并行化自回归生成

摘要: 基于注意力的模型，如Transformer和基于循环的模型，如状态空间模型（SSMs），已经成为自回归序列建模的成功方法。尽管两者都支持并行训练，但由于它们的自回归特性，都不能支持并行生成。我们提出了变分SSM（VSSM），这是一个变分自动编码器（VAE），其中编码器和解码器都是SSMs。由于可以并行对潜变量进行采样和用SSM解码，因此训练和生成都可以并行进行。此外，解码器的循环性允许生成在不重新处理整个序列的情况下恢复。最后，我们提出了可以根据序列的部分实现进行条件化的自回归VSSM，这在语言生成任务中很常见。有趣的是，自回归VSSM仍然可以支持并行生成。我们在玩具问题（MNIST，CIFAR）上突显了加速的实证收益，并展示了它在生成质量方面与传统模型（Transformer，Mamba SSM）竞争的能力。

更新时间: 2024-07-11 11:41:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.08415v1

Specialist vision-language models for clinical ophthalmology

Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While foundational models have stirred considerable interest in the medical community, it is unclear whether their general capabilities translate to real-world clinical utility. In this work, we show that foundation VLMs markedly underperform compared to practicing ophthalmologists on specialist tasks crucial to the care of patients with age-related macular degeneration (AMD). To address this, we initially identified the essential capabilities required for image-based clinical decision-making, and then developed a curriculum to selectively train VLMs in these skills. The resulting model, RetinaVLM, can be instructed to write reports that significantly outperform those written by leading foundation medical VLMs in disease staging (F1 score of 0.63 vs. 0.11) and patient referral (0.67 vs. 0.39), and approaches the diagnostic performance of junior ophthalmologists (who achieve 0.77 and 0.78 on the respective tasks). Furthermore, in a reader study involving two senior ophthalmologists with up to 32 years of experience, RetinaVLM's reports were found to be similarly correct (78.6% vs. 82.1%) and complete (both 78.6%) as reports written by junior ophthalmologists with up to 10 years of experience. These results demonstrate that our curriculum-based approach provides a blueprint for specializing generalist foundation medical VLMs to handle real-world clinical tasks.

Updated: 2024-07-11 11:31:48

标题: 临床眼科专科视觉语言模型

摘要: 临床医生花费大量时间审查医学影像，并将他们关于患者诊断、转诊和治疗的发现转录为文本形式。视觉语言模型（VLMs）可以自动解释图像并总结其发现为文本，具有巨大潜力减轻临床工作负担，并增加患者获取高质量医疗护理的机会。尽管基础模型引起了医学界的极大兴趣，但不清楚它们的通用能力是否能够转化为真实世界的临床效用。在这项工作中，我们展示了基础VLM与从事眼科专业任务的医生相比表现明显不佳，这些任务对于关注年龄相关黄斑变性（AMD）患者护理至关重要。为了解决这个问题，我们首先确定了基于影像的临床决策所需的基本能力，然后开发了一个课程，有选择地培训VLMs这些技能。结果模型RetinaVLM可以被指导编写报告，其在疾病分期（F1分数为0.63对0.11）和患者转诊（0.67对0.39）方面明显优于主要基础医学VLMs编写的报告，并且接近于初级眼科医生的诊断性能（在相应任务上分别达到0.77和0.78）。此外，在涉及两名具有长达32年经验的资深眼科医生的读者研究中，发现RetinaVLM的报告与具有长达10年经验的初级眼科医生编写的报告在正确性（78.6%对82.1%）和完整性（均为78.6%）方面相似。这些结果表明，我们基于课程的方法为让通用基础医学VLMs专门处理真实世界临床任务提供了一个蓝图。

更新时间: 2024-07-11 11:31:48

领域: cs.AI

下载: http://arxiv.org/abs/2407.08410v1

A Comparison of Vulnerability Feature Extraction Methods from Textual Attack Patterns

Nowadays, threat reports from cybersecurity vendors incorporate detailed descriptions of attacks within unstructured text. Knowing vulnerabilities that are related to these reports helps cybersecurity researchers and practitioners understand and adjust to evolving attacks and develop mitigation plans. This paper aims to aid cybersecurity researchers and practitioners in choosing attack extraction methods to enhance the monitoring and sharing of threat intelligence. In this work, we examine five feature extraction methods (TF-IDF, LSI, BERT, MiniLM, RoBERTa) and find that Term Frequency-Inverse Document Frequency (TF-IDF) outperforms the other four methods with a precision of 75\% and an F1 score of 64\%. The findings offer valuable insights to the cybersecurity community, and our research can aid cybersecurity researchers in evaluating and comparing the effectiveness of upcoming extraction methods.

Updated: 2024-07-11 11:31:15

标题: 文献标题翻译：从文本攻击模式中提取脆弱性特征的方法比较

摘要: 现今，来自网络安全厂商的威胁报告中包含了对攻击的详细描述，这些描述都是非结构化文本。了解与这些报告相关的漏洞有助于网络安全研究人员和从业者理解和适应不断演变的攻击，并制定应对计划。本文旨在帮助网络安全研究人员和从业者选择攻击提取方法，以增强威胁情报的监控和分享。在这项工作中，我们研究了五种特征提取方法（TF-IDF、LSI、BERT、MiniLM、RoBERTa），发现逆文档频率-词项频率（TF-IDF）方法表现优异，精度达到75\%，F1分数为64\%。这些发现为网络安全社区提供了宝贵的见解，我们的研究可以帮助网络安全研究人员评估和比较未来提取方法的有效性。

更新时间: 2024-07-11 11:31:15

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.06753v2

Cybersecurity Defenses: Exploration of CVE Types through Attack Descriptions

Vulnerabilities in software security can remain undiscovered even after being exploited. Linking attacks to vulnerabilities helps experts identify and respond promptly to the incident. This paper introduces VULDAT, a classification tool using a sentence transformer MPNET to identify system vulnerabilities from attack descriptions. Our model was applied to 100 attack techniques from the ATT&CK repository and 685 issues from the CVE repository. Then, we compare the performance of VULDAT against the other eight state-of-the-art classifiers based on sentence transformers. Our findings indicate that our model achieves the best performance with F1 score of 0.85, Precision of 0.86, and Recall of 0.83. Furthermore, we found 56% of CVE reports vulnerabilities associated with an attack were identified by VULDAT, and 61% of identified vulnerabilities were in the CVE repository.

Updated: 2024-07-11 11:28:51

标题: 网络安全防御：通过攻击描述探索CVE类型

摘要: 软件安全中的漏洞即使在被利用后仍可能未被发现。将攻击与漏洞联系起来有助于专家及时识别并应对事件。本文介绍了一种名为VULDAT的分类工具，该工具使用句子转换器MPNET来从攻击描述中识别系统漏洞。我们的模型应用于ATT&CK存储库中的100种攻击技术和CVE存储库中的685个问题。然后，我们将VULDAT与其他八种基于句子转换器的最先进分类器的性能进行了比较。我们的研究结果表明，我们的模型在F1分数为0.85，精度为0.86，召回率为0.83时取得了最佳性能。此外，我们发现，VULDAT识别出与攻击相关的CVE报告漏洞中的56％，并且识别出的漏洞中有61％在CVE存储库中。

更新时间: 2024-07-11 11:28:51

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.06759v2

A Two-Stage Machine Learning-Aided Approach for Quench Identification at the European XFEL

This paper introduces a machine learning-aided fault detection and isolation method applied to the case study of quench identification at the European X-Ray Free-Electron Laser. The plant utilizes 800 superconducting radio-frequency cavities in order to accelerate electron bunches to high energies of up to 17.5 GeV. Various faulty events can disrupt the nominal functioning of the accelerator, including quenches that can lead to a loss of the superconductivity of the cavities and the interruption of their operation. In this context, our solution consists in analyzing signals reflecting the dynamics of the cavities in a two-stage approach. (I) Fault detection that uses analytical redundancy to process the data and generate a residual. The evaluation of the residual through the generalized likelihood ratio allows detecting the faulty behaviors. (II) Fault isolation which involves the distinction of the quenches from the other faults. To this end, we proceed with a data-driven model of the k-medoids algorithm that explores different similarity measures, namely, the Euclidean and the dynamic time warping. Finally, we evaluate the new method and compare it to the currently deployed quench detection system, the results show the improved performance achieved by our method.

Updated: 2024-07-11 11:21:41

标题: 一个两阶段机器学习辅助方法用于在欧洲XFEL中识别淬火

摘要: 这篇论文介绍了一种应用于欧洲X射线自由电子激光器的淬火识别案例研究的机器学习辅助故障检测和隔离方法。该装置利用800个超导射频腔加速电子束至高达17.5 GeV的能量。各种故障事件可能会干扰加速器的正常运行，包括可能导致腔体失超导性并中断其运行的淬火。在这种情况下，我们的解决方案是通过两阶段方法分析反映腔体动态的信号。(一)使用分析冗余来处理数据并生成残差的故障检测。通过广义似然比对残差进行评估，可以检测到故障行为。(二)故障隔离涉及将淬火与其他故障区分开来。为此，我们采用基于数据的k-medoids算法的模型，探索不同的相似度度量，即欧氏距离和动态时间扭曲。最后，我们评估了新方法并将其与当前部署的淬火检测系统进行了比较，结果显示我们的方法取得了改进的性能。

更新时间: 2024-07-11 11:21:41

领域: physics.ins-det,cs.AI

下载: http://arxiv.org/abs/2407.08408v1

Cyber Attacks on Maritime Assets and their Impacts on Health and Safety Aboard: A Holistic View

There has been an unprecedented digitization drive in the industrial sector, especially in the maritime industry. The profusion of intelligent electronic devices and IOT-enabled cyber-physical systems (CPS) has helped in the efficient use of resources and increased convenience. CPS has enabled real-time remote command and control of industrial assets. Unlike the relatively isolated legacy systems, the intertwined nature of Information Technology(IT) and Operations Technology(OT) brought by Industry 4.0 has increased the complexity of the systems, thereby increasing the attack surface. This work explores the possible consequences of these attacks from a more holistic view, focusing on high-risk assets such as offshore oil rigs, offshore wind farms, and autonomous vessels. The attacks have become more aggressive with the proliferation of such technologies, disrupting the physical process, causing fire and explosion hazards, and endangering human life and environmental health. The possible attack scenarios, the attack vectors, and their physical consequences have been discussed from the perspective of personnel safety and health, along with known security breaches of such nature. To the best of the authors' knowledge, seldom has any work been done that accentuates the possible human and environmental impacts of such attacks.

Updated: 2024-07-11 11:20:36

标题: 网络攻击对海上资产的影响及其对船员健康与安全的影响：整体视角

摘要: 在工业部门，特别是在海运行业，数字化推动前所未有。智能电子设备和物联网（IOT）使得物理系统（CPS）的普及有助于资源的有效利用和提高便利性。CPS实现了对工业资产的实时远程指挥和控制。与相对孤立的传统系统不同，由工业4.0带来的信息技术（IT）和运营技术（OT）的交织增加了系统的复杂性，从而增加了攻击面。本文从更全面的视角探讨了这些攻击的可能后果，重点关注高风险资产，如海上油井、海上风电场和自主船只。随着这些技术的蔓延，攻击变得更加激烈，破坏了物理过程，引发火灾和爆炸危险，危及人类生命和环境健康。从人员安全和健康的角度讨论了可能的攻击场景、攻击向量和其物理后果，以及已知的此类安全漏洞。据作者所知，很少有研究着重于这些攻击可能对人类和环境造成的影响。

更新时间: 2024-07-11 11:20:36

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2407.08406v1

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarization, and information aggregation. Given their varied difficulty, in this position paper we argue that conflating different tasks by their context length is unproductive. As a community, we require a more precise vocabulary to understand what makes long-context tasks similar or different. We propose to unpack the taxonomy of long-context based on the properties that make them more difficult with longer contexts. We propose two orthogonal axes of difficulty: (I) Diffusion: How hard is it to find the necessary information in the context? (II) Scope: How much necessary information is there to find? We survey the literature on long-context, provide justification for this taxonomy as an informative descriptor, and situate the literature with respect to it. We conclude that the most difficult and interesting settings, whose necessary information is very long and highly diffused within the input, is severely under-explored. By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area. We call for a careful design of tasks and benchmarks with distinctly long context, taking into account the characteristics that make it qualitatively different from shorter context.

Updated: 2024-07-11 11:17:09

标题: 如果您只需要检索，是否真的需要长篇背景？走向真正困难的长篇背景自然语言处理(NLP)

摘要: 对语言模型能力的改进推动了它们应用于更长上下文的发展，使得长上下文的评估和开发成为一个活跃的研究领域。然而，许多不同的用例都被归为“长上下文”这一术语的范畴下，仅仅通过模型输入的总长度来定义，包括比如寻找大海捞针任务、书籍摘要和信息聚合等。鉴于它们的不同难度，我们在这篇观点论文中认为通过上下文长度将不同任务混为一谈是没有效果的。作为一个社区，我们需要一个更加精确的词汇来理解长上下文任务的相似性和差异性。我们提议根据使长上下文任务随着长度增加更加困难的特性来拆解长上下文的分类法。我们提出两个正交的难度轴：（一）扩散：在上下文中找到必要信息有多难？（二）范围：有多少必要信息需要找到？我们调查了长上下文的文献，为这个分类法提供了理由，并将文献与之联系起来。我们得出结论，最困难和有趣的设置是那些必要信息非常长且在输入中高度扩散的情况，这种情况目前被严重忽视。通过使用描述性的词汇并讨论长上下文中困难的相关特性，我们可以在这一领域开展更加明智的研究。我们呼吁谨慎设计具有明显长上下文的任务和基准，考虑到使其在质量上与较短上下文有所不同的特征。

更新时间: 2024-07-11 11:17:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00402v2

Shedding More Light on Robust Classifiers under the lens of Energy-based Models

By reinterpreting a robust discriminative classifier as Energy-based Model (EBM), we offer a new take on the dynamics of adversarial training (AT). Our analysis of the energy landscape during AT reveals that untargeted attacks generate adversarial images much more in-distribution (lower energy) than the original data from the point of view of the model. Conversely, we observe the opposite for targeted attacks. On the ground of our thorough analysis, we present new theoretical and practical results that show how interpreting AT energy dynamics unlocks a better understanding: (1) AT dynamic is governed by three phases and robust overfitting occurs in the third phase with a drastic divergence between natural and adversarial energies (2) by rewriting the loss of TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) in terms of energies, we show that TRADES implicitly alleviates overfitting by means of aligning the natural energy with the adversarial one (3) we empirically show that all recent state-of-the-art robust classifiers are smoothing the energy landscape and we reconcile a variety of studies about understanding AT and weighting the loss function under the umbrella of EBMs. Motivated by rigorous evidence, we propose Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet. We further show that robust classifiers vary in the intensity and quality of their generative capabilities, and offer a simple method to push this capability, reaching a remarkable Inception Score (IS) and FID using a robust classifier without training for generative modeling. The code to reproduce our results is available at http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ .

Updated: 2024-07-11 11:11:03

标题: 在基于能量模型的视角下更深入理解强健分类器

摘要: 通过将一个强大的判别分类器重新解释为基于能量的模型（EBM），我们提出了对对抗训练（AT）动态的新看法。我们对AT期间的能量景观进行分析，发现无目标攻击生成的对抗图像在模型的视角下比原始数据更加分布在内部（能量更低）。相反，我们观察到有针对性攻击则相反。通过我们的彻底分析，我们提出了新的理论和实践结果，展示了如何解释AT能量动态可以带来更深入的理解：（1）AT动态由三个阶段控制，鲁棒过拟合发生在第三阶段，自然能量和对抗能量之间出现了明显的分歧（2）通过将基于TRadeoff启发的对抗性防御通过替代损失最小化（TRADES）的损失重新定义为能量，我们展示TRADES通过将自然能量与对抗能量对齐的方式隐式地缓解了过拟合（3）我们在实验中展示所有最近的最先进的鲁棒分类器都在平滑能量景观，并将多种关于理解AT和在EBM的框架下加权损失函数的研究和解释调和在一起。受到严格证据的启发，我们提出了加权能量对抗训练（WEAT），这是一种新颖的样本加权方案，可以使得在多个基准测试中达到与最先进技术相匹配的鲁棒准确性，例如CIFAR-10和SVHN，并且在CIFAR-100和Tiny-ImageNet上超越。我们进一步展示了鲁棒分类器在其生成能力的强度和质量上有所变化，并提供了一种简单的方法来提升这种能力，使用鲁棒分类器而不是训练生成模型，达到了显著的感知分数（IS）和FID。我们的结果可在http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/找到。

更新时间: 2024-07-11 11:11:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06315v2

Self-training Language Models for Arithmetic Reasoning

Language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving the capabilities of language models without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). We find that models can substantially improve in both single-round (offline) and online self-training. In the offline setting, supervised methods are able to deliver gains comparable to preference optimization, but in online self-training, preference optimization shows to largely outperform supervised training thanks to superior stability and robustness on unseen types of problems.

Updated: 2024-07-11 11:06:05

标题: 自我训练的语言模型用于算术推理

摘要: 语言模型在涉及复杂多步推理的任务中取得了令人印象深刻的成果，但将这些能力进一步扩展传统上需要更多注释数据的收集。在这项工作中，我们探讨了提高语言模型能力的潜力，而不需要新数据，仅使用自动反馈来验证其在算术推理中的预测的有效性（自我训练）。我们发现模型可以在单轮（离线）和在线自我训练中显著改善。在离线环境中，监督方法能够实现与偏好优化相当的增益，但在在线自我训练中，由于在未知问题类型上具有更高的稳定性和鲁棒性，偏好优化表现出了明显优于监督训练。

更新时间: 2024-07-11 11:06:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08400v1

Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool

While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and Artificial Intelligence (AI)-generated language. This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing. Using human-authored essays as a benchmark, we prompted ChatGPT to generate essays of equivalent length. These texts were analyzed using Open Brain AI, an online computational tool, to extract measures of phonological, morphological, syntactic, and lexical constituents. Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features such as consonants, word stress, nouns, verbs, pronouns, direct objects, prepositional modifiers, and use of difficult words among others. These findings underscore the importance of integrating automated tools for efficient language assessment, reducing time and effort in data analysis. Moreover, they emphasize the necessity for enhanced training methodologies to improve the capacity of AI for producing more human-like text.

Updated: 2024-07-11 10:56:01

标题: 使用从在线计算工具自动提取的语言特征区分人类写作和人工智能生成的文本

摘要: 虽然近年来广泛的研究集中在ChatGPT上，但很少有研究系统地量化和比较人类撰写和人工智能（AI）生成的语言之间的语言特征。本研究旨在调查各种语言组成部分在这两种文本中的表现方式，评估AI模拟人类写作的能力。我们以人类创作的论文作为基准，促使ChatGPT生成等长的论文。利用在线计算工具Open Brain AI对这些文本进行分析，提取音韵、形态、句法和词汇成分的度量。尽管AI生成的文本似乎模仿人类语言，但结果显示在多个语言特征上存在显著差异，如辅音、重音、名词、动词、代词、直接宾语、介词修饰语和使用困难词等。这些发现强调了整合自动化工具以提高语言评估效率的重要性，减少数据分析的时间和精力。此外，它们强调了增强培训方法的必要性，以提高AI生成更具人类风格文本的能力。

更新时间: 2024-07-11 10:56:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.03646v2

On the attribution of confidence to large language models

Credences are mental states corresponding to degrees of confidence in propositions. Attribution of credences to Large Language Models (LLMs) is commonplace in the empirical literature on LLM evaluation. Yet the theoretical basis for LLM credence attribution is unclear. We defend three claims. First, our semantic claim is that LLM credence attributions are (at least in general) correctly interpreted literally, as expressing truth-apt beliefs on the part of scientists that purport to describe facts about LLM credences. Second, our metaphysical claim is that the existence of LLM credences is at least plausible, although current evidence is inconclusive. Third, our epistemic claim is that LLM credence attributions made in the empirical literature on LLM evaluation are subject to non-trivial sceptical concerns. It is a distinct possibility that even if LLMs have credences, LLM credence attributions are generally false because the experimental techniques used to assess LLM credences are not truth-tracking.

Updated: 2024-07-11 10:51:06

标题: 关于将自信归因于大型语言模型

摘要: 信念是对命题的信心程度所对应的心理状态。在大型语言模型（LLMs）评估的实证文献中，将信念归因于LLMs是司空见惯的。然而，LLM信念归因的理论基础并不清晰。我们提出三点主张。首先，我们的语义主张是，LLM信念归因（至少在一般情况下）应当被直接解释为科学家表达的真实信念，旨在描述关于LLM信念的事实。其次，我们的形而上主张是，LLM信念的存在至少是合理的，尽管目前的证据并不一致。第三，我们的认识主张是，LLM评估的实证文献中进行的LLM信念归因存在着非平凡的怀疑。有一种明显的可能性，即使LLMs有信念，由于用于评估LLM信念的实验技术并非追踪真相，LLM信念归因通常是错误的。

更新时间: 2024-07-11 10:51:06

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.08388v1

Differentially Private Multiway and $k$-Cut

In this paper, we address the challenge of differential privacy in the context of graph cuts, specifically focusing on the minimum $k$-cut and multiway cut problems. We introduce edge-differentially private algorithms that achieve nearly optimal performance for these problems. For the multiway cut problem, we first provide a private algorithm with a multiplicative approximation ratio that matches the state-of-the-art non-private algorithm. We then present a tight information-theoretic lower bound on the additive error, demonstrating that our algorithm on weighted graphs is near-optimal for constant $k$. For the minimum $k$-cut problem, our algorithms leverage a known bound on the number of approximate $k$-cuts, resulting in a private algorithm with optimal additive error $O(k\log n)$ for fixed privacy parameter. We also establish a information-theoretic lower bound that matches this additive error. Additionally, we give an efficient private algorithm for $k$-cut even for non-constant $k$, including a polynomial-time 2-approximation with an additive error of $\widetilde{O}(k^{1.5})$.

Updated: 2024-07-11 10:44:40

标题: 差分隐私多路和$k$-割

摘要: 在这篇论文中，我们讨论了图割中差分隐私的挑战，特别关注最小$k$-割和多路割问题。我们引入了边差分隐私算法，为这些问题实现了几乎最佳的性能。对于多路割问题，我们首先提供了一个具有与最先进非私密算法匹配的乘法逼近比的私密算法。然后，我们提出了一个紧密的信息论下界，证明我们在加权图上的算法对于恒定$k$几乎是最优的。对于最小$k$-割问题，我们的算法利用了对近似$k$-割数量的已知上界，从而实现了一个具有固定隐私参数的最佳加法误差$O(k\log n)$的私密算法。我们还建立了一个与这个加法误差匹配的信息论下界。此外，我们为非恒定$k$甚至提供了一个高效的私密算法，包括一个具有$\widetilde{O}(k^{1.5})$的加法误差的多项式时间2近似算法。

更新时间: 2024-07-11 10:44:40

领域: cs.CR,cs.DS

下载: http://arxiv.org/abs/2407.06911v2

Digital twins to alleviate the need for real field data in vision-based vehicle speed detection systems

Accurate vision-based speed estimation is much more cost-effective than traditional methods based on radar or LiDAR. However, it is also challenging due to the limitations of perspective projection on a discrete sensor, as well as the high sensitivity to calibration, lighting and weather conditions. Interestingly, deep learning approaches (which dominate the field of computer vision) are very limited in this context due to the lack of available data. Indeed, obtaining video sequences of real road traffic with accurate speed values associated with each vehicle is very complex and costly, and the number of available datasets is very limited. Recently, some approaches are focusing on the use of synthetic data. However, it is still unclear how models trained on synthetic data can be effectively applied to real world conditions. In this work, we propose the use of digital-twins using CARLA simulator to generate a large dataset representative of a specific real-world camera. The synthetic dataset contains a large variability of vehicle types, colours, speeds, lighting and weather conditions. A 3D CNN model is trained on the digital twin and tested on the real sequences. Unlike previous approaches that generate multi-camera sequences, we found that the gap between the the real and the virtual conditions is a key factor in obtaining low speed estimation errors. Even with a preliminary approach, the mean absolute error obtained remains below 3km/h.

Updated: 2024-07-11 10:41:20

标题: 数字孪生技术减轻了视觉车辆速度检测系统中对真实场地数据的需求

摘要: 准确的基于视觉的速度估计比基于雷达或LiDAR的传统方法更具成本效益。然而，由于透视投影在离散传感器上的局限性，以及对校准、光照和天气条件的高度敏感性，这也是具有挑战性的。有趣的是，深度学习方法（在计算机视觉领域占主导地位）在这种情况下非常有限，因为缺乏可用数据。事实上，获取具有准确速度值的每辆车的实际道路交通视频序列非常复杂且昂贵，可用数据集的数量非常有限。最近，一些方法开始关注使用合成数据。然而，目前还不清楚如何有效地将在合成数据上训练的模型应用于真实世界条件。在这项工作中，我们提出使用CARLA模拟器生成一个代表特定真实世界摄像头的大型数据集的数字孪生。合成数据集包含大量不同类型、颜色、速度、光照和天气条件的车辆变化。在数字孪生上训练了一个3D CNN模型，并在真实序列上进行了测试。与以前生成多摄像机序列的方法不同，我们发现真实条件和虚拟条件之间的差距是获得低速度估计误差的关键因素。即使采用初步方法，所获得的平均绝对误差仍然低于3km/h。

更新时间: 2024-07-11 10:41:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08380v1

Scalar Function Topology Divergence: Comparing Topology of 3D Objects

We propose a new topological tool for computer vision - Scalar Function Topology Divergence (SFTD), which measures the dissimilarity of multi-scale topology between sublevel sets of two functions having a common domain. Functions can be defined on an undirected graph or Euclidean space of any dimensionality. Most of the existing methods for comparing topology are based on Wasserstein distance between persistence barcodes and they don't take into account the localization of topological features. On the other hand, the minimization of SFTD ensures that the corresponding topological features of scalar functions are located in the same places. The proposed tool provides useful visualizations depicting areas where functions have topological dissimilarities. We provide applications of the proposed method to 3D computer vision. In particular, experiments demonstrate that SFTD improves the reconstruction of cellular 3D shapes from 2D fluorescence microscopy images, and helps to identify topological errors in 3D segmentation.

Updated: 2024-07-11 10:18:54

标题: 标量函数拓扑散度：比较三维物体的拓扑结构

摘要: 我们提出了一种新的计算机视觉拓扑工具-标量函数拓扑散度（SFTD），用于衡量具有共同定义域的两个函数的子水平集之间多尺度拓扑的不相似性。函数可以在任意维度的无向图或欧几里德空间上定义。大多数现有的比较拓扑的方法都是基于持久性条形码之间的Wasserstein距离，它们没有考虑拓扑特征的定位。另一方面，SFTD的最小化确保了标量函数的相应拓扑特征位于相同位置。该工具提供了有用的可视化，展示了函数具有拓扑不相似性的区域。我们将所提出的方法应用于3D计算机视觉。特别是，实验表明SFTD改善了从2D荧光显微镜图像重建细胞3D形状，并有助于识别3D分割中的拓扑错误。

更新时间: 2024-07-11 10:18:54

领域: cs.CV,cs.LG,math.AT

下载: http://arxiv.org/abs/2407.08364v1

SubspaceNet: Deep Learning-Aided Subspace Methods for DoA Estimation

Direction of arrival (DoA) estimation is a fundamental task in array processing. A popular family of DoA estimation algorithms are subspace methods, which operate by dividing the measurements into distinct signal and noise subspaces. Subspace methods, such as Multiple Signal Classification (MUSIC) and Root-MUSIC, rely on several restrictive assumptions, including narrowband non-coherent sources and fully calibrated arrays, and their performance is considerably degraded when these do not hold. In this work we propose SubspaceNet; a data-driven DoA estimator which learns how to divide the observations into distinguishable subspaces. This is achieved by utilizing a dedicated deep neural network to learn the empirical autocorrelation of the input, by training it as part of the Root-MUSIC method, leveraging the inherent differentiability of this specific DoA estimator, while removing the need to provide a ground-truth decomposable autocorrelation matrix. Once trained, the resulting SubspaceNet serves as a universal surrogate covariance estimator that can be applied in combination with any subspace-based DoA estimation method, allowing its successful application in challenging setups. SubspaceNet is shown to enable various DoA estimation algorithms to cope with coherent sources, wideband signals, low SNR, array mismatches, and limited snapshots, while preserving the interpretability and the suitability of classic subspace methods.

Updated: 2024-07-11 10:16:41

标题: SubspaceNet：用于DoA估计的深度学习辅助子空间方法

摘要: 方向到达（DoA）估计是阵列处理中的基本任务。一种流行的DoA估计算法家族是子空间方法，通过将测量分为不同的信号和噪声子空间来操作。子空间方法，如多信号分类（MUSIC）和根-MUSIC，依赖于几个限制性假设，包括窄带非相干信源和完全校准的阵列，当这些假设不成立时，它们的性能会明显下降。在这项工作中，我们提出了SubspaceNet；这是一个数据驱动的DoA估计器，它学习如何将观测分成可区分的子空间。通过利用专用的深度神经网络来学习输入的经验自相关，通过将其作为Root-MUSIC方法的一部分进行训练，利用这种特定DoA估计器的固有可微性，同时消除了提供可分解自相关矩阵的需求。训练完成后，得到的SubspaceNet可作为通用代理协方差估计器，可与任何基于子空间的DoA估计方法结合使用，在具有挑战性的设置中成功应用。SubspaceNet已被证明可以使各种DoA估计算法能够应对相干源、宽带信号、低信噪比、阵列不匹配和有限快照，同时保留了经典子空间方法的可解释性和适用性。

更新时间: 2024-07-11 10:16:41

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2306.02271v2

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

Updated: 2024-07-11 10:16:12

标题: SpikeGPT：具有脉冲神经网络的生成式预训练语言模型

摘要: 随着大型语言模型的规模不断扩大，运行所需的计算资源也在增加。尖峰神经网络（SNNs）已经成为一种能够利用稀疏和事件驱动激活来减少与模型推断相关的计算开销的高效深度学习方法。虽然它们在许多计算机视觉任务上已经变得与非尖峰模型竞争力，但SNNs在训练过程中也被证明更具挑战性。因此，它们的性能落后于现代深度学习，并且我们尚未看到SNNs在语言生成中的有效性。在本文中，受Receptance Weighted Key Value（RWKV）语言模型的启发，我们成功地实现了'SpikeGPT'，这是一个具有二进制、事件驱动尖峰激活单元的生成语言模型。我们在两个模型变体上训练了提出的模型：45M和216M参数。据我们所知，SpikeGPT是迄今为止最大的反向传播训练SNN模型，使其适用于自然语言的生成和理解。我们通过修改变压器块以替换多头自我注意力，将二次计算复杂度O（N^2）降低为线性复杂度O（N）随着序列长度的增加。输入标记被顺序流式传输到我们的注意机制中（与典型的SNNs一样）。我们的初步实验表明，SpikeGPT在测试基准上与非尖峰模型保持竞争力，同时在运行在可以利用稀疏、事件驱动激活的神经形态硬件上时，操作次数减少了20倍。我们的代码实现可在https://github.com/ridgerchu/SpikeGPT找到。

更新时间: 2024-07-11 10:16:12

领域: cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2302.13939v5

STAL: Spike Threshold Adaptive Learning Encoder for Classification of Pain-Related Biosignal Data

This paper presents the first application of spiking neural networks (SNNs) for the classification of chronic lower back pain (CLBP) using the EmoPain dataset. Our work has two main contributions. We introduce Spike Threshold Adaptive Learning (STAL), a trainable encoder that effectively converts continuous biosignals into spike trains. Additionally, we propose an ensemble of Spiking Recurrent Neural Network (SRNN) classifiers for the multi-stream processing of sEMG and IMU data. To tackle the challenges of small sample size and class imbalance, we implement minority over-sampling with weighted sample replacement during batch creation. Our method achieves outstanding performance with an accuracy of 80.43%, AUC of 67.90%, F1 score of 52.60%, and Matthews Correlation Coefficient (MCC) of 0.437, surpassing traditional rate-based and latency-based encoding methods. The STAL encoder shows superior performance in preserving temporal dynamics and adapting to signal characteristics. Importantly, our approach (STAL-SRNN) outperforms the best deep learning method in terms of MCC, indicating better balanced class prediction. This research contributes to the development of neuromorphic computing for biosignal analysis. It holds promise for energy-efficient, wearable solutions in chronic pain management.

Updated: 2024-07-11 10:15:52

标题: STAL：用于疼痛相关生物信号数据分类的尖峰阈值自适应学习编码器

摘要: 本文介绍了首次应用尖峰神经网络（SNNs）对EmoPain数据集进行慢性下背疼痛（CLBP）分类的应用。我们的工作有两个主要贡献。我们引入了尖峰阈值自适应学习（STAL），这是一个可训练的编码器，有效地将连续生物信号转换为尖峰列。此外，我们提出了一个Spiking Recurrent Neural Network（SRNN）分类器的集成，用于多流程处理sEMG和IMU数据。为了应对样本量少和类别不平衡的挑战，我们在批处理创建过程中实施少数类过采样和加权样本替换。我们的方法表现出色，准确率达到80.43％，AUC为67.90％，F1得分为52.60％，马修斯相关系数（MCC）为0.437，超过了传统的基于速率和延迟的编码方法。STAL编码器显示出在保留时间动态和适应信号特性方面的出色表现。重要的是，我们的方法（STAL-SRNN）在MCC方面表现优于最佳深度学习方法，表明更好地平衡了类别预测。这项研究为生物信号分析的神经形态计算的发展做出了贡献。它有望在慢性疼痛管理中提供节能、可穿戴的解决方案。

更新时间: 2024-07-11 10:15:52

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.08362v1

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), and (iii) difficulty (i.e., the benchmark should be difficult for existing models, leaving headroom for future improvement). We operationalize these three desiderata and cast benchmark creation as a search problem, that of finding benchmarks that that satisfy all three desiderata. To tackle this search problem, we present AutoBencher, which uses a language model to automatically search for datasets that meet the three desiderata. AutoBencher uses privileged information (e.g. relevant documents) to construct reliable datasets, and adaptivity with reranking to optimize for the search objective. We use AutoBencher to create datasets for math, multilingual, and knowledge-intensive question answering. The scalability of AutoBencher allows it to test fine-grained categories and tail knowledge, creating datasets that are on average 27% more novel and 22% more difficult than existing benchmarks. A closer investigation of our constructed datasets shows that we can identify specific gaps in LM knowledge in language models that are not captured by existing benchmarks, such as Gemini Pro performing much worse on question answering about the Permian Extinction and Fordism, while OpenAGI-7B performing surprisingly well on QA about COVID-19.

Updated: 2024-07-11 10:03:47

标题: AutoBencher：为语言模型创建显著、新颖、困难的数据集

摘要: 评估对于评估能力、跟踪科学进展和指导模型选择至关重要。在本文中，我们提出了语言模型良好基准的三个要求：（一）显著性（例如，对于第二次世界大战的知识比历史上的一个随机日更显著），（二）新颖性（即，该基准显示了模型排名中未被先前基准展示的新趋势），以及（三）难度（即，该基准对于现有模型应具有挑战性，为未来改进留出空间）。我们将这三个要求操作化，并将基准创建视为一种搜索问题，即寻找满足所有三个要求的基准。为解决这个搜索问题，我们提出了AutoBencher，利用语言模型自动搜索符合三个要求的数据集。AutoBencher使用特权信息（例如相关文档）构建可靠的数据集，并通过重新排序的适应性来优化搜索目标。我们使用AutoBencher为数学、多语言和知识密集型问题回答创建数据集。AutoBencher的可扩展性使其能够测试细粒度类别和尾部知识，创建的数据集平均更具新颖性和难度，比现有基准高出27%和22%。对我们构建的数据集进行更详细的调查显示，我们可以识别出语言模型中存在的LM知识差距，这些差距无法被现有基准捕捉，例如Gemini Pro在关于二叠纪灭绝事件和福特主义的问题回答方面表现较差，而OpenAGI-7B在关于COVID-19的问题回答上表现出乎意料的好。

更新时间: 2024-07-11 10:03:47

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08351v1

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B LLMs using our proposed 2.5M-instance Skywork-MathQA dataset. Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark and 83.9% on the GSM8K benchmark using only SFT data, outperforming an early version of GPT-4 on MATH. The superior performance of Skywork-Math models contributes to our novel two-stage data synthesis and model SFT pipelines, which include three different augmentation methods and a diverse seed problem set, ensuring both the quantity and quality of Skywork-MathQA dataset across varying difficulty levels. Most importantly, we provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.

Updated: 2024-07-11 09:56:51

标题: Skywork-Math：大型语言模型中数学推理的数据缩放定律 -- 故事继续进行

摘要: 在这篇论文中，我们研究了潜在增强大型语言模型（LLMs）数学推理能力的基本因素。我们认为，现代LLMs中的数学推理能力的数据缩放定律远未饱和，突出了模型质量随数据量增加而提高的情况。为了支持这一观点，我们介绍了Skywork-Math模型系列，通过使用我们提出的250万实例的Skywork-MathQA数据集在常见的7B LLMs上进行监督微调（SFT）。Skywork-Math 7B仅使用SFT数据在竞赛水平的MATH基准测试上取得了51.2％的令人印象深刻的准确率，并在GSM8K基准测试上取得了83.9％的准确率，超过了MATH中GPT-4的早期版本。Skywork-Math模型的出色性能归功于我们的新颖的两阶段数据合成和模型SFT管道，其中包括三种不同的增强方法和一个多样化的种子问题集，确保Skywork-MathQA数据集在不同难度级别上既具有数量又具有质量。最重要的是，我们提供了几个实用的经验教训，以增强LLMs中数学推理能力，适用于研究和工业应用。

更新时间: 2024-07-11 09:56:51

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08348v1

SLRL: Structured Latent Representation Learning for Multi-view Clustering

In recent years, Multi-View Clustering (MVC) has attracted increasing attention for its potential to reduce the annotation burden associated with large datasets. The aim of MVC is to exploit the inherent consistency and complementarity among different views, thereby integrating information from multiple perspectives to improve clustering outcomes. Despite extensive research in MVC, most existing methods focus predominantly on harnessing complementary information across views to enhance clustering effectiveness, often neglecting the structural information among samples, which is crucial for exploring sample correlations. To address this gap, we introduce a novel framework, termed Structured Latent Representation Learning based Multi-View Clustering method (SLRL). SLRL leverages both the complementary and structural information. Initially, it learns a common latent representation for all views. Subsequently, to exploit the structural information among samples, a k-nearest neighbor graph is constructed from this common latent representation. This graph facilitates enhanced sample interaction through graph learning techniques, leading to a structured latent representation optimized for clustering. Extensive experiments demonstrate that SLRL not only competes well with existing methods but also sets new benchmarks in various multi-view datasets.

Updated: 2024-07-11 09:43:57

标题: SLRL：多视图聚类的结构化潜在表示学习

摘要: 近年来，多视图聚类（MVC）因其潜在能够减少大型数据集相关的注释负担而引起了越来越多的关注。MVC的目标是利用不同视图之间固有的一致性和互补性，从而整合多个视角的信息以改善聚类结果。尽管在MVC领域进行了大量研究，但大多数现有方法主要集中在利用跨视图的互补信息以增强聚类效果，往往忽视了样本之间的结构信息，这对于探索样本相关性至关重要。为了填补这一空白，我们引入了一种新的框架，称为基于结构化潜在表示学习的多视图聚类方法（SLRL）。SLRL利用了互补和结构化信息。首先，它学习所有视图的公共潜在表示。随后，为了利用样本之间的结构信息，从这个公共潜在表示中构建了一个k近邻图。这个图通过图学习技术促进了增强的样本交互，从而得到了一个优化用于聚类的结构化潜在表示。大量实验证明，SLRL不仅与现有方法竞争激烈，还在各种多视图数据集中设立了新的基准。

更新时间: 2024-07-11 09:43:57

领域: cs.LG

下载: http://arxiv.org/abs/2407.08340v1

FedLog: Personalized Federated Classification with Less Communication and More Flexibility

In federated learning (FL), the common paradigm that FedAvg proposes and most algorithms follow is that clients train local models with their private data, and the model parameters are shared for central aggregation, mostly averaging. In this paradigm, the communication cost is often a challenge, as modern massive neural networks can contain millions to billions parameters. We suggest that clients do not share model parameters but local data summaries, to decrease the cost of sharing. We develop a new algorithm FedLog with Bayesian inference, which shares only sufficient statistics of local data. FedLog transmits messages as small as the last layer of the original model. We conducted comprehensive experiments to show we outperform other FL algorithms that aim at decreasing the communication cost. To provide formal privacy guarantees, we further extend FedLog with differential privacy and show the trade-off between privacy budget and accuracy.

Updated: 2024-07-11 09:40:29

标题: FedLog: 个性化的联邦分类，通信更少，灵活性更大

摘要: 在联邦学习（FL）中，FedAvg提出的常见范式以及大多数算法遵循的方式是客户端使用其私有数据训练本地模型，并共享模型参数进行中央聚合，主要是平均值。在这种范式中，通信成本经常是一个挑战，因为现代大规模神经网络可能包含数百万到数十亿的参数。我们建议客户端不共享模型参数，而是本地数据摘要，以降低共享成本。我们开发了一种新算法FedLog，采用贝叶斯推断，只共享本地数据的充分统计量。FedLog传输的消息大小仅为原始模型的最后一层。我们进行了全面的实验，展示了我们优于其他旨在降低通信成本的FL算法。为了提供正式的隐私保障，我们进一步将FedLog扩展为具有差分隐私，并展示了隐私预算和准确性之间的权衡。

更新时间: 2024-07-11 09:40:29

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2407.08337v1

Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size

The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that are iteratively applied to each item of a Knapsack instance and thereby compute optimal or provably good solution values. We show that an RNN of depth four and width depending quadratically on the profit of an optimum Knapsack solution is sufficient to find optimum Knapsack solutions. We also prove the following tradeoff between the size of an RNN and the quality of the computed Knapsack solution: for Knapsack instances consisting of $n$ items, an RNN of depth five and width $w$ computes a solution of value at least $1-\mathcal{O}(n^2/\sqrt{w})$ times the optimum solution value. Our results build upon a classical dynamic programming formulation of the Knapsack Problem as well as a careful rounding of profit values that are also at the core of the well-known fully polynomial-time approximation scheme for the Knapsack Problem. A carefully conducted computational study qualitatively supports our theoretical size bounds. Finally, we point out that our results can be generalized to many other combinatorial optimization problems that admit dynamic programming solution methods, such as various Shortest Path Problems, the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.

Updated: 2024-07-11 09:39:30

标题: 通过有限大小的神经网络实现背包问题的可证明良好解答

摘要: 神经网络性能的令人满意和严格的数学理解是人工智能领域的一个主要挑战。在这个背景下，我们通过经典的NP-hard背包问题的例子研究神经网络的表达能力。我们的主要贡献是一类具有修正线性单元的递归神经网络（RNN），它们被迭代地应用于背包实例的每个物品，从而计算出最优或可证明良好的解决方案值。我们表明，对于一个深度为四且宽度取决于最佳背包解决方案的利润的平方的RNN足以找到最佳背包解决方案。我们还证明了RNN的规模与计算出的背包解决方案质量之间的权衡：对于包含$n$个物品的背包实例，深度为五且宽度为$w$的RNN计算出的解决方案值至少是最优解决方案值的$1-\mathcal{O}(n^2/\sqrt{w})$倍。我们的结果建立在背包问题的经典动态规划公式以及对利润值的仔细舍入之上，这也是背包问题的著名全多项式时间近似方案的核心。一项仔细进行的计算研究在定性上支持我们的理论规模界限。最后，我们指出我们的结果可以推广到许多其他具有动态规划解决方法的组合优化问题，例如各种最短路径问题、最长公共子序列问题和旅行推销员问题。

更新时间: 2024-07-11 09:39:30

领域: cs.LG,cs.CC,cs.DM,cs.NE,stat.ML

下载: http://arxiv.org/abs/2005.14105v3

Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation

Large Language Models (LLMs) drive current AI breakthroughs despite very little being known about their internal representations. In this work, we propose to shed the light on LLMs inner mechanisms through the lens of geometry. In particular, we develop in closed form $(i)$ the intrinsic dimension in which the Multi-Head Attention embeddings are constrained to exist and $(ii)$ the partition and per-region affine mappings of the feedforward (MLP) network of LLMs' layers. Our theoretical findings further enable the design of novel principled solutions applicable to state-of-the-art LLMs. First, we show that, through our geometric understanding, we can bypass LLMs' RLHF protection by controlling the embedding's intrinsic dimension through informed prompt manipulation. Second, we derive interpretable geometrical features that can be extracted from any (pre-trained) LLM, providing a rich abstract representation of their inputs. We observe that these features are sufficient to help solve toxicity detection, and even allow the identification of various types of toxicity. Our results demonstrate how, even in large-scale regimes, exact theoretical results can answer practical questions in LLMs. Code: https://github.com/RandallBalestriero/SplineLLM

Updated: 2024-07-11 09:32:19

标题: 描绘大型语言模型几何特征有助于解决毒性检测和生成问题

摘要: 大型语言模型（LLMs）推动当前人工智能的突破，尽管对它们的内部表示知之甚少。在这项工作中，我们提出通过几何的视角来揭示LLMs内部机制。具体来说，我们以闭合形式开发了以下内容：$(i)$ 多头注意力嵌入被约束存在的固有维度，以及$(ii)$ LLMs层的前馈（MLP）网络的分区和每个区域的仿射映射。我们的理论发现进一步促使设计适用于最先进LLMs的新颖原则性解决方案。首先，我们展示通过我们的几何理解，可以通过控制嵌入的固有维度来绕过LLMs的RLHF保护，通过有信息的提示操作。其次，我们提取自任何（预训练的）LLM的可解释几何特征，提供了丰富的输入抽象表示。我们观察到这些特征足以帮助解决毒性检测，甚至能够识别各种类型的毒性。我们的结果展示了即使在大规模情况下，精确的理论结果也可以回答LLMs中的实际问题。代码：https://github.com/RandallBalestriero/SplineLLM

更新时间: 2024-07-11 09:32:19

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.01648v3

AbstractBeam: Enhancing Bottom-Up Program Synthesis using Library Learning

LambdaBeam is a state-of-the-art execution-guided algorithm for program synthesis that incorporates higher-order functions, lambda functions, and iterative loops into the Domain-Specific Language (DSL). LambdaBeam generates every program from the start. Yet, many program blocks or subprograms occur frequently in a given domain, e.g., loops to traverse a list. Thus, repeating programs can be used to enhance the synthesis algorithm. However, LambdaBeam fails to leverage this potential. For this purpose, we introduce AbstractBeam: A novel program synthesis framework that employs Library Learning to identify such program repetitions, integrates them into the DSL, and thus utilizes their potential to boost LambdaBeam's synthesis algorithm. Our experimental evaluations demonstrate that AbstractBeam significantly improves LambdaBeam's performance in the LambdaBeam integer list manipulation domain. Additionally, AbstractBeam's program generation is more efficient compared to LambdaBeam's synthesis. Finally, our findings indicate that Library Learning is effective in domains not specifically crafted to highlight its benefits.

Updated: 2024-07-11 09:29:29

标题: AbstractBeam：利用库学习增强自下而上的程序合成

摘要: LambdaBeam是一种先进的执行引导算法，用于程序合成，其中包括高阶函数、Lambda函数和迭代循环，将它们整合到特定领域语言（DSL）中。LambdaBeam从头开始生成每个程序。然而，在特定领域中，许多程序块或子程序经常出现，例如用于遍历列表的循环。因此，重复程序可以用来增强合成算法。然而，LambdaBeam未能利用这一潜力。为此，我们引入了AbstractBeam：一种新颖的程序合成框架，利用库学习来识别这种程序重复性，将其整合到DSL中，从而利用它们的潜力来提升LambdaBeam的合成算法。我们的实验评估表明，AbstractBeam在LambdaBeam整数列表操作领域显著改善了LambdaBeam的性能。此外，与LambdaBeam的合成相比，AbstractBeam的程序生成更有效率。最后，我们的发现表明，库学习在并非专门设计以突显其好处的领域中也是有效的。

更新时间: 2024-07-11 09:29:29

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2405.17514v2

Towards Explainable Evolution Strategies with Large Language Models

This paper introduces an approach that integrates self-adaptive Evolution Strategies (ES) with Large Language Models (LLMs) to enhance the explainability of complex optimization processes. By employing a self-adaptive ES equipped with a restart mechanism, we effectively navigate the challenging landscapes of benchmark functions, capturing detailed logs of the optimization journey, including fitness evolution, step-size adjustments, and restart events due to stagnation. An LLM is then utilized to process these logs, generating concise, user-friendly summaries that highlight key aspects such as convergence behavior, optimal fitness achievements, and encounters with local optima. Our case study on the Rastrigin function demonstrates how our approach makes the complexities of ES optimization transparent and accessible. Our findings highlight the potential of using LLMs to bridge the gap between advanced optimization algorithms and their interpretability.

Updated: 2024-07-11 09:28:27

标题: 朝向使用大型语言模型实现可解释的进化策略

摘要: 本文介绍了一种将自适应进化策略（ES）与大型语言模型（LLMs）相结合的方法，以增强复杂优化过程的可解释性。通过采用装备重启机制的自适应ES，我们有效地遍历基准函数的挑战性景观，捕获优化过程的详细日志，包括适应度演化、步长调整和由于停滞而引起的重启事件。然后利用LLM处理这些日志，生成简洁、用户友好的摘要，突出关键方面，如收敛行为、最优适应度成就以及与局部最优解的相遇。我们对Rastrigin函数的案例研究展示了我们的方法如何使ES优化的复杂性变得透明和可访问。我们的研究结果突显了利用LLMs弥合先进优化算法与可解释性之间差距的潜力。

更新时间: 2024-07-11 09:28:27

领域: cs.NE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.08331v1

HDT: Hierarchical Document Transformer

In this paper, we propose the Hierarchical Document Transformer (HDT), a novel sparse Transformer architecture tailored for structured hierarchical documents. Such documents are extremely important in numerous domains, including science, law or medicine. However, most existing solutions are inefficient and fail to make use of the structure inherent to documents. HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. This approach facilitates information exchange between tokens at different levels while maintaining sparsity, thereby enhancing computational and memory efficiency while exploiting the document structure as an inductive bias. We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that considers the hierarchical structure of documents. As demonstrated by our experiments, utilizing structural information present in documents leads to faster convergence, higher sample efficiency and better performance on downstream tasks.

Updated: 2024-07-11 09:28:04

标题: HDT：分层文档转换器

摘要: 在本文中，我们提出了分层文档变换器（HDT），这是一种专为结构化分层文档而设计的新颖稀疏变换器架构。这种文档在许多领域中都非常重要，包括科学、法律或医学。然而，大多数现有解决方案效率低下，未能利用文档固有的结构。HDT通过引入辅助锚点令牌和将注意机制重新设计为稀疏多级层次结构来利用文档结构。这种方法促进了不同层级令牌之间的信息交换，同时保持稀疏性，从而提高计算和内存效率，同时利用文档结构作为归纳偏差。我们通过开发一种考虑文档的层次结构的新颖稀疏注意力核来解决实现HDT的样本相关的分层注意力模式的技术挑战。正如我们的实验所证明的那样，利用文档中存在的结构信息可以实现更快的收敛速度、更高的样本效率和更好的下游任务性能。

更新时间: 2024-07-11 09:28:04

领域: cs.LG

下载: http://arxiv.org/abs/2407.08330v1

Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation Reports

This study applies Natural Language Processing techniques, including Latent Dirichlet Allocation, to analyse anonymised maternity incident investigation reports from the Healthcare Safety Investigation Branch. The reports underwent preprocessing, annotation using the Safety Intelligence Research taxonomy, and topic modelling to uncover prevalent topics and detect differences in maternity care across ethnic groups. A combination of offline and online methods was utilised to ensure data protection whilst enabling advanced analysis, with offline processing for sensitive data and online processing for non-sensitive data using the `Claude 3 Opus' language model. Interactive topic analysis and semantic network visualisation were employed to extract and display thematic topics and visualise semantic relationships among keywords. The analysis revealed disparities in care among different ethnic groups, with distinct focus areas for the Black, Asian, and White British ethnic groups. The study demonstrates the effectiveness of topic modelling and NLP techniques in analysing maternity incident investigation reports and highlighting disparities in care. The findings emphasise the crucial role of advanced data analysis in improving maternity care quality and equity.

Updated: 2024-07-11 09:26:05

标题: 揭示产科护理的差异：一种主题建模方法分析产科事故调查报告

摘要: 这项研究应用自然语言处理技术，包括潜在狄利克雷分配，分析了来自医疗安全调查部门的匿名产科事故调查报告。这些报告经过预处理，使用安全情报研究分类法进行注释，并进行主题建模，以揭示流行主题并检测不同种族群体之间的产科护理差异。采用离线和在线方法的组合确保数据保护的同时实现先进分析，对敏感数据使用离线处理，对非敏感数据使用“Claude 3 Opus”语言模型进行在线处理。采用交互式主题分析和语义网络可视化提取和显示主题主题，并可视化关键词之间的语义关系。分析揭示了不同种族群体之间护理的差异，对黑人、亚裔和白人英国种族群体有明显的关注重点。该研究展示了主题建模和自然语言处理技术在分析产科事故调查报告和突出护理差异方面的有效性。研究结果强调了先进数据分析在改善产科护理质量和公平性方面的关键作用。

更新时间: 2024-07-11 09:26:05

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08328v1

Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection

Jailbreak attacks on large language models (LLMs) involve inducing these models to generate harmful content that violates ethics or laws, posing a significant threat to LLM security. Current jailbreak attacks face two main challenges: low success rates due to defensive measures and high resource requirements for crafting specific prompts. This paper introduces Virtual Context, which leverages special tokens, previously overlooked in LLM security, to improve jailbreak attacks. Virtual Context addresses these challenges by significantly increasing the success rates of existing jailbreak methods and requiring minimal background knowledge about the target model, thus enhancing effectiveness in black-box settings without additional overhead. Comprehensive evaluations show that Virtual Context-assisted jailbreak attacks can improve the success rates of four widely used jailbreak methods by approximately 40% across various LLMs. Additionally, applying Virtual Context to original malicious behaviors still achieves a notable jailbreak effect. In summary, our research highlights the potential of special tokens in jailbreak attacks and recommends including this threat in red-teaming testing to comprehensively enhance LLM security.

Updated: 2024-07-11 09:21:31

标题: 虚拟环境：通过特殊令牌注入增强越狱攻击

摘要: 大型语言模型（LLMs）的越狱攻击涉及诱使这些模型生成违反道德或法律的有害内容，对LLM安全构成重大威胁。当前越狱攻击面临两个主要挑战：由于防御措施，成功率低，以及制作特定提示所需的高资源要求。本文介绍了虚拟环境，利用先前在LLM安全中被忽视的特殊标记来改进越狱攻击。虚拟环境通过显著提高现有越狱方法的成功率，并且需要关于目标模型的最少背景知识，因此在不增加额外开销的黑盒设置中提高效果。全面评估显示，虚拟环境辅助的越狱攻击可以在各种LLMs上将四种广泛使用的越狱方法的成功率提高约40%。此外，将虚拟环境应用于原始恶意行为仍然可以实现显著的越狱效果。总而言之，我们的研究突显了在越狱攻击中特殊标记的潜力，并建议在红队测试中包括这种威胁，以全面增强LLM安全性。

更新时间: 2024-07-11 09:21:31

领域: cs.CR

下载: http://arxiv.org/abs/2406.19845v2

Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies between minority ethnic groups, highlighting the need for targeted interventions in real-world AI applications. Additionally, we demonstrate how the proposed approach can be used to provide insights into the fairness of machine learning.

Updated: 2024-07-11 09:19:11

标题: 使用潜在类分析量化多个群体间跨行业交叉差异，朝向公平性

摘要: 对公平人工智能开发日益增长的兴趣是显而易见的。"不让任何人掉队"倡议敦促我们解决在获取服务、资源和机会方面多种交叉形式的不平等，强调AI中公平的重要性。随着越来越多的AI工具被应用于决策过程，如资源分配和服务方案开发，跨越健康、能源和住房等各个领域，这一点尤为重要。因此，在这些领域探索联合不平等对于深入了解整体不平等和不公平是重要和有价值的。本研究引入了一种创新方法，使用潜在类别分析来量化用户定义组之间的跨领域交叉差异。这些差异可用于近似不平等，并为公平问题提供有价值的见解。我们使用专有和公共数据集进行验证，包括EVENS和2021年人口普查（英格兰和威尔士）数据集，以检查不同种族群体之间的跨领域交叉差异。我们还通过与政府公共指标进行相关性分析来验证量化差异的可靠性。我们的研究结果揭示了少数族裔群体之间存在显著差异，强调了在现实世界的AI应用中有针对性干预的必要性。此外，我们演示了提出的方法如何用于提供对机器学习公平性的见解。

更新时间: 2024-07-11 09:19:11

领域: cs.CY,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.03133v2

A Cantor-Kantorovich Metric Between Markov Decision Processes with Application to Transfer Learning

We extend the notion of Cantor-Kantorovich distance between Markov chains introduced by (Banse et al., 2023) in the context of Markov Decision Processes (MDPs). The proposed metric is well-defined and can be efficiently approximated given a finite horizon. Then, we provide numerical evidences that the latter metric can lead to interesting applications in the field of reinforcement learning. In particular, we show that it could be used for forecasting the performance of transfer learning algorithms.

Updated: 2024-07-11 09:13:11

标题: 一个Cantor-Kantorovich度量在具有应用于迁移学习的马尔可夫决策过程之间

摘要: 我们在马尔可夫决策过程（MDPs）的背景下，扩展了（Banse等人，2023年）引入的马尔可夫链之间的康托-坎托罗维奇距离的概念。所提出的度量是明确定义的，并且在有限的时间段内可以有效地近似。然后，我们提供了数值证据表明后一度量在强化学习领域中可以导致有趣的应用。特别地，我们展示它可以用于预测迁移学习算法的性能。

更新时间: 2024-07-11 09:13:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08324v1

Intelligent Multi-Document Summarisation for Extracting Insights on Racial Inequalities from Maternity Incident Investigation Reports

In healthcare, thousands of safety incidents occur every year, but learning from these incidents is not effectively aggregated. Analysing incident reports using AI could uncover critical insights to prevent harm by identifying recurring patterns and contributing factors. To aggregate and extract valuable information, natural language processing (NLP) and machine learning techniques can be employed to summarise and mine unstructured data, potentially surfacing systemic issues and priority areas for improvement. This paper presents I-SIRch:CS, a framework designed to facilitate the aggregation and analysis of safety incident reports while ensuring traceability throughout the process. The framework integrates concept annotation using the Safety Intelligence Research (SIRch) taxonomy with clustering, summarisation, and analysis capabilities. Utilising a dataset of 188 anonymised maternity investigation reports annotated with 27 SIRch human factors concepts, I-SIRch:CS groups the annotated sentences into clusters using sentence embeddings and k-means clustering, maintaining traceability via file and sentence IDs. Summaries are generated for each cluster using offline state-of-the-art abstractive summarisation models (BART, DistilBART, T5), which are evaluated and compared using metrics assessing summary quality attributes. The generated summaries are linked back to the original file and sentence IDs, ensuring traceability and allowing for verification of the summarised information. Results demonstrate BART's strengths in creating informative and concise summaries.

Updated: 2024-07-11 09:11:20

标题: 智能多文档摘要提取孕产事故调查报告中有关种族不平等的见解

摘要: 在医疗保健领域，每年都会发生数千起安全事故，但从这些事故中学习的效果并不理想。利用人工智能对事故报告进行分析可以发现关键的见解，通过识别重复出现的模式和影响因素来预防伤害。为了聚合和提取有价值的信息，可以利用自然语言处理（NLP）和机器学习技术来总结和挖掘非结构化数据，潜在地揭示系统性问题和改进的重点领域。本文提出了I-SIRch:CS框架，旨在促进安全事故报告的聚合和分析，同时确保整个过程的可追溯性。该框架集成了使用安全情报研究（SIRch）分类法进行概念标注的能力，同时具备聚类、摘要和分析功能。利用一个包含27个SIRch人为因素概念的188份匿名产科调查报告数据集，I-SIRch:CS使用句子嵌入和k均值聚类将标注句子分组成簇，通过文件和句子ID保持可追溯性。针对每个簇生成摘要，使用离线最先进的摘要模型（BART、DistilBART、T5），并使用评估摘要质量属性的指标进行评估和比较。生成的摘要与原始文件和句子ID相联系，确保可追溯性，并允许验证摘要信息。结果表明BART在创建信息丰富且简洁的摘要方面具有优势。

更新时间: 2024-07-11 09:11:20

领域: cs.AI

下载: http://arxiv.org/abs/2407.08322v1

Latent Dataset Distillation with Diffusion Models

Machine learning traditionally relies on increasingly larger datasets. Yet, such datasets pose major storage challenges and usually contain non-influential samples, which could be ignored during training without negatively impacting the training quality. In response, the idea of distilling a dataset into a condensed set of synthetic samples, i.e., a distilled dataset, emerged. One key aspect is the selected architecture, usually ConvNet, for linking the original and synthetic datasets. However, the final accuracy is lower if the employed model architecture differs from that used during distillation. Another challenge is the generation of high-resolution images (128x128 and higher). To address both challenges, this paper proposes Latent Dataset Distillation with Diffusion Models (LD3M) that combine diffusion in latent space with dataset distillation. Our novel diffusion process is tailored for this task and significantly improves the gradient flow for distillation. By adjusting the number of diffusion steps, LD3M also offers a convenient way of controlling the trade-off between distillation speed and dataset quality. Overall, LD3M consistently outperforms state-of-the-art methods by up to 4.8 p.p. and 4.2 p.p. for 1 and 10 images per class, respectively, and on several ImageNet subsets and high resolutions (128x128 and 256x256).

Updated: 2024-07-11 09:10:10

标题: 使用扩散模型进行潜在数据集精炼

摘要: 传统上，机器学习依赖于越来越大的数据集。然而，这些数据集带来了重大的存储挑战，并且通常包含非影响力样本，在训练过程中可以忽略而不会对训练质量产生负面影响。为了解决这个问题，提出了将数据集精炼为一组合成样本的想法，即精炼数据集。其中一个关键方面是选择的架构，通常是ConvNet，用于将原始数据集和合成数据集联系起来。然而，如果所采用的模型架构与精炼过程中使用的不同，则最终的准确率会较低。另一个挑战是生成高分辨率图像（128x128及更高）。为了解决这两个挑战，本文提出了将潜在空间中的扩散模型与数据集精炼相结合的潜在数据集精炼模型（LD3M）。我们的新颖扩散过程专为这一任务量身定制，并显著改善了数据集精炼的梯度流。通过调整扩散步数，LD3M还提供了一种方便的方法来控制精炼速度和数据集质量之间的权衡。总的来说，LD3M在1和10个图像每类的情况下分别比现有方法表现优异高达4.8个百分点和4.2个百分点，并在几个ImageNet子集和高分辨率（128x128和256x256）上表现出色。

更新时间: 2024-07-11 09:10:10

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.03881v3

A Machine Learning Approach to Detecting Albedo Anomalies on the Lunar Surface

This study introduces a data-driven approach using machine learning (ML) techniques to explore and predict albedo anomalies on the Moon's surface. The research leverages diverse planetary datasets, including high-spatial-resolution albedo maps and element maps (LPFe, LPK, LPTh, LPTi) derived from laser and gamma-ray measurements. The primary objective is to identify relationships between chemical elements and albedo, thereby expanding our understanding of planetary surfaces and offering predictive capabilities for areas with incomplete datasets. To bridge the gap in resolution between the albedo and element maps, we employ Gaussian blurring techniques, including an innovative adaptive Gaussian blur. Our methodology culminates in the deployment of an Extreme Gradient Boosting Regression Model, optimized to predict full albedo based on elemental composition. Furthermore, we present an interactive analytical tool to visualize prediction errors, delineating their spatial and chemical characteristics. The findings not only pave the way for a more comprehensive understanding of the Moon's surface but also provide a framework for similar studies on other celestial bodies.

Updated: 2024-07-11 09:10:09

标题: 一个机器学习方法用于检测月表上的反照率异常

摘要: 这项研究引入了一种利用机器学习（ML）技术的数据驱动方法，探索和预测月球表面的反照率异常。该研究利用了多样的行星数据集，包括从激光和γ射线测量中得出的高空间分辨率反照率图和元素图（LPFe、LPK、LPTh、LPTi）。主要目标是确定化学元素和反照率之间的关系，从而扩展我们对行星表面的理解，并为数据不完整的区域提供预测能力。为了弥合反照率和元素图之间的分辨率差距，我们采用了高斯模糊技术，包括一种创新的自适应高斯模糊。我们的方法最终部署了一个经过优化的极端梯度增强回归模型，用于基于元素组成预测完整的反照率。此外，我们还提供了一个交互式分析工具，可视化预测误差，勾画它们的空间和化学特征。这些发现不仅为更全面地了解月球表面铺平了道路，还为其他天体的类似研究提供了一个框架。

更新时间: 2024-07-11 09:10:09

领域: astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2407.05832v2

Enhancing ADHD Diagnosis with EEG: The Critical Role of Preprocessing and Key Features

Background: Attention-Deficit/Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder that significantly impacts various key aspects of life, requiring accurate diagnostic methods. Electroencephalogram (EEG) signals are used in diagnosing ADHD, but proper preprocessing is crucial to avoid noise and artifacts that could lead to unreliable results. Method: This study utilized a public EEG dataset from children diagnosed with ADHD and typically developing (TD) children. Four preprocessing techniques were applied: no preprocessing (Raw), Finite Impulse Response (FIR) filtering, Artifact Subspace Reconstruction (ASR), and Independent Component Analysis (ICA). EEG recordings were segmented, and features were extracted and selected based on statistical significance. Classification was performed using Machine Learning models, as XGBoost, Support Vector Machine, and K-Nearest Neighbors. Results: The absence of preprocessing leads to artificially high classification accuracy due to noise. In contrast, ASR and ICA preprocessing techniques significantly improved the reliability of results. Segmenting EEG recordings revealed that later segments provided better classification accuracy, likely due to the manifestation of ADHD symptoms over time. The most relevant EEG channels were P3, P4, and C3. The top features for classification included Kurtosis, Katz fractal dimension, and power spectral density of Delta, Theta, and Alpha bands. Conclusions: Effective preprocessing is essential in EEG-based ADHD diagnosis to prevent noise-induced biases. This study identifies crucial EEG channels and features, providing a foundation for further research and improving ADHD diagnostic accuracy. Future work should focus on expanding datasets, refining preprocessing methods, and enhancing feature interpretability to improve diagnostic accuracy and model robustness for clinical use.

Updated: 2024-07-11 09:07:22

标题: 利用EEG增强ADHD诊断：预处理和关键特征的关键作用

摘要: 背景：注意缺陷/多动障碍（ADHD）是一种流行的神经发育障碍，显著影响生活的各个关键方面，需要准确的诊断方法。脑电图（EEG）信号用于诊断ADHD，但适当的预处理对于避免噪音和可能导致不可靠结果的伪影很关键。方法：本研究利用了一组来自被诊断为ADHD和通常发育（TD）的儿童的公共EEG数据集。应用了四种预处理技术：无预处理（原始）、有限脉冲响应（FIR）滤波、伪影子空间重建（ASR）和独立成分分析（ICA）。EEG记录被分割，特征根据统计显著性被提取和选择。分类使用机器学习模型进行，如XGBoost、支持向量机和K-最近邻。结果：缺乏预处理导致由于噪音而出现人为高的分类准确性。相反，ASR和ICA预处理技术显著提高了结果的可靠性。分割EEG记录显示后期片段提供更好的分类准确性，可能是由于ADHD症状随时间的流逝而表现出来。最相关的EEG通道是P3、P4和C3。用于分类的顶级特征包括峰度、Katz分形维度以及Delta、Theta和Alpha波段的功率谱密度。结论：有效的预处理对于基于EEG的ADHD诊断至关重要，以防止由噪音引起的偏差。本研究确定了关键的EEG通道和特征，为进一步研究和提高ADHD诊断准确性提供了基础。未来的工作应重点关注扩大数据集、完善预处理方法，并提高特征的可解释性，以提高临床使用的诊断准确性和模型的稳健性。

更新时间: 2024-07-11 09:07:22

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2407.08316v1

Improving Molecular Modeling with Geometric GNNs: an Empirical Study

Rapid advancements in machine learning (ML) are transforming materials science by significantly speeding up material property calculations. However, the proliferation of ML approaches has made it challenging for scientists to keep up with the most promising techniques. This paper presents an empirical study on Geometric Graph Neural Networks for 3D atomic systems, focusing on the impact of different (1) canonicalization methods, (2) graph creation strategies, and (3) auxiliary tasks, on performance, scalability and symmetry enforcement. Our findings and insights aim to guide researchers in selecting optimal modeling components for molecular modeling tasks.

Updated: 2024-07-11 09:04:12

标题: 用几何图神经网络改进分子建模：一项实证研究

摘要: 机器学习（ML）的快速发展正在通过显著加快材料性质计算来改变材料科学。然而，ML方法的不断增加使科学家难以跟上最有前途的技术。本文针对三维原子系统的几何图神经网络进行了实证研究，着重研究了不同（1）规范化方法，（2）图创建策略和（3）辅助任务对性能、可扩展性和对称性强制执行的影响。我们的发现和见解旨在指导研究人员选择最佳的建模组件进行分子建模任务。

更新时间: 2024-07-11 09:04:12

领域: cs.LG

下载: http://arxiv.org/abs/2407.08313v1

Preventing Radio Fingerprinting through Friendly Jamming

Radio Frequency fingerprinting enables a passive receiver to recognize and authenticate a transmitter without the need for cryptographic tools. Authentication is achieved by isolating specific features of the transmitted signal that are unique to the transmitter's hardware. Much research has focused on improving the effectiveness and efficiency of radio frequency fingerprinting to maximize its performance in various scenarios and conditions, while little research examined how to protect devices from being subject to radio fingerprinting in the wild. In this paper, we explore a novel point of view. We examine the hostile usage of radio frequency fingerprinting, which facilitates the unauthorized tracking of wireless devices in the field by malicious entities. We also suggest a method to sanitize the transmitted signal of its fingerprint using a jammer, deployed on purpose to improve devices' anonymity on the channel while still guaranteeing the link's quality of service. Our experimental results and subsequent analysis demonstrate that a friendly jammer can effectively block a malicious eavesdropper from recognizing and tracking a device without affecting the quality of the wireless link, thereby restoring the privacy of the user when accessing the radio spectrum.

Updated: 2024-07-11 09:01:46

标题: 通过友好干扰防止无线电指纹识别

摘要: 射频指纹识别技术使被动接收器能够识别和验证发射器，无需使用加密工具。认证是通过分离特定于发射器硬件的传输信号的特定特征来实现的。许多研究致力于改进射频指纹识别的效果和效率，以最大限度地提高其在各种情况和条件下的性能，但很少有研究考虑如何保护设备免受野外射频指纹识别的影响。在本文中，我们探讨了一个新颖的观点。我们研究了射频指纹识别的恶意用途，这有助于恶意实体在野外对无线设备进行未经授权的跟踪。我们还提出了一种方法，通过有意部署的干扰器来消除传输信号的指纹，从而提高设备在信道上的匿名性，同时保证链路的服务质量。我们的实验结果和随后的分析表明，友好的干扰器可以有效地阻止恶意窃听者识别和跟踪设备，同时不影响无线链路的质量，从而恢复用户在访问射频频谱时的隐私。

更新时间: 2024-07-11 09:01:46

领域: cs.CR

下载: http://arxiv.org/abs/2407.08311v1

Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning

As an important part of Music Information Retrieval (MIR), Symbolic Music Understanding (SMU) has gained substantial attention, as it can assist musicians and amateurs in learning and creating music. Recently, pre-trained language models have been widely adopted in SMU because the symbolic music shares a huge similarity with natural language, and the pre-trained manner also helps make full use of limited music data. However, the issue of bias, such as sexism, ageism, and racism, has been observed in pre-trained language models, which is attributed to the imbalanced distribution of training data. It also has a significant influence on the performance of downstream tasks, which also happens in SMU. To address this challenge, we propose Adversarial-MidiBERT, a symbolic music understanding model based on Bidirectional Encoder Representations from Transformers (BERT). We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better. We evaluate our method on four music understanding tasks, and our approach demonstrates excellent performance in all of them. The code for our model is publicly available at https://github.com/RS2002/Adversarial-MidiBERT.

Updated: 2024-07-11 08:54:38

标题: 对抗性-MidiBERT：基于无偏预训练和掩码微调的符号音乐理解模型

摘要: 作为音乐信息检索（MIR）的重要组成部分，符号音乐理解（SMU）受到了广泛关注，因为它可以帮助音乐家和业余爱好者学习和创作音乐。最近，预训练的语言模型在SMU中被广泛采用，因为符号音乐与自然语言具有很大的相似性，而预训练的方式也有助于充分利用有限的音乐数据。然而，在预训练的语言模型中观察到了偏见问题，如性别歧视、年龄歧视和种族歧视，这归因于训练数据的分布不平衡。这也对下游任务的性能产生了显著影响，这种情况也发生在SMU中。为了解决这一挑战，我们提出了一种基于双向编码器变换器（BERT）的符号音乐理解模型Adversarial-MidiBERT。我们引入了一种基于对抗学习的无偏预训练方法，以最小化在训练过程中导致偏见的令牌的参与。此外，我们提出了一种掩码微调方法，缩小了预训练和微调之间的数据差距，这可以帮助模型更快地收敛并表现更好。我们在四个音乐理解任务上评估了我们的方法，我们的方法在所有任务中都表现出色。我们的模型代码可以在https://github.com/RS2002/Adversarial-MidiBERT上公开获取。

更新时间: 2024-07-11 08:54:38

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.08306v1

TAKT: Target-Aware Knowledge Transfer for Whole Slide Image Classification

Transferring knowledge from a source domain to a target domain can be crucial for whole slide image classification, since the number of samples in a dataset is often limited due to high annotation costs. However, domain shift and task discrepancy between datasets can hinder effective knowledge transfer. In this paper, we propose a Target-Aware Knowledge Transfer framework, employing a teacher-student paradigm. Our framework enables the teacher model to learn common knowledge from the source and target domains by actively incorporating unlabelled target images into the training of the teacher model. The teacher bag features are subsequently adapted to supervise the training of the student model on the target domain. Despite incorporating the target features during training, the teacher model tends to overlook them under the inherent domain shift and task discrepancy. To alleviate this, we introduce a target-aware feature alignment module to establish a transferable latent relationship between the source and target features by solving the optimal transport problem. Experimental results show that models employing knowledge transfer outperform those trained from scratch, and our method achieves state-of-the-art performance among other knowledge transfer methods on various datasets, including TCGA-RCC, TCGA-NSCLC, and Camelyon16.

Updated: 2024-07-11 08:52:14

标题: TAKT：面向目标的整张幻灯片图像分类知识传递

摘要: 将知识从源领域转移到目标领域对于整个切片图像分类可能至关重要，因为数据集中的样本数量通常受限于高昂的注释成本。然而，数据集之间的领域转移和任务差异可能会妨碍有效的知识转移。在本文中，我们提出了一个目标感知知识转移框架，采用师生范式。我们的框架使得师模型能够通过积极将未标记的目标图像纳入师模型的训练中来学习源领域和目标领域的共同知识。随后，师袋特征被调整用于监督学生模型在目标领域上的训练。尽管在训练过程中纳入了目标特征，但师模型往往会在固有的领域转移和任务差异下忽略它们。为了缓解这一问题，我们引入了一个目标感知特征对齐模块，通过解决最优传输问题来建立源特征和目标特征之间的可转移的潜在关系。实验结果表明，采用知识转移的模型优于从头开始训练的模型，并且我们的方法在包括TCGA-RCC、TCGA-NSCLC和Camelyon16在内的各种数据集上实现了其他知识转移方法的最新性能。

更新时间: 2024-07-11 08:52:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2303.05780v2

What should be observed for optimal reward in POMDPs?

Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.

Updated: 2024-07-11 08:48:48

标题: 在POMDPs中实现最佳奖励应该观察什么？

摘要: 部分可观察马尔可夫决策过程（POMDPs）是一个标准模型，用于在不确定环境中做出决策的代理。大多数关于POMDPs的工作集中在基于可用能力合成策略上。然而，系统设计者通常可以控制代理的观察能力，例如通过放置或选择传感器。这引发了一个问题，即如何以成本效益的方式选择代理的传感器，以便实现期望的目标。在本文中，我们研究了新颖的最优可观察性问题OOP：给定一个POMDP M，如何在固定预算内改变M的观察能力，以使其（最小）期望奖励保持在给定阈值以下？我们展示了该问题在一般情况下是不可判定的，在考虑位置策略时是可判定的。我们提出了两种算法来处理OOP的可判定片段：一种基于M的基础马尔可夫决策过程的最优策略，另一种基于SMT的参数合成。我们报告了来自POMDP文献的典型示例变体的有希望的结果。

更新时间: 2024-07-11 08:48:48

领域: cs.AI

下载: http://arxiv.org/abs/2405.10768v2

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity of such hyper-detailed datasets currently hinders progress within the MLLM community. The bottleneck stems from the limited perceptual capabilities of current caption engines, which fall short in providing complete and accurate annotations. To facilitate the cutting-edge research of MLLMs on comprehensive vision perception, we thereby propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions. Specifically, Perceptual Fusion integrates diverse perception experts as image priors to provide explicit information on visual elements and adopts an efficient MLLM as a centric pivot to mimic advanced MLLMs' perception abilities. We carefully select 1M highly representative images from uncurated LAION dataset and generate dense descriptions using our engine, dubbed DenseFusion-1M. Extensive experiments validate that our engine outperforms its counterparts, where the resulting dataset significantly improves the perception and cognition abilities of existing MLLMs across diverse vision-language benchmarks, especially with high-resolution images as inputs. The dataset and code are publicly available at https://github.com/baaivision/DenseFusion.

Updated: 2024-07-11 08:48:06

标题: 密集融合-1M：合并视觉专家以实现综合多模态感知

摘要: 现有的多模式大型语言模型（MLLMs）越来越强调对各种视觉元素的复杂理解，包括多个对象、文本信息和空间关系。它们对于全面视觉感知的发展取决于提供多样化视觉元素和全面图像描述的高质量图像文本数据集的可用性。然而，目前这种超详细数据集的稀缺性阻碍了MLLM社区的进展。瓶颈源自当前字幕引擎有限的感知能力，无法提供完整准确的注释。为了促进MLLMs在全面视觉感知方面的尖端研究，我们提出了感知融合，使用低成本但高效的字幕引擎进行完整准确的图像描述。具体而言，感知融合将各种感知专家作为图像先验集成，提供有关视觉元素的明确信息，并采用高效的MLLM作为中心枢纽来模拟先进MLLMs的感知能力。我们从未经整理的LAION数据集中精选了100万张高度代表性的图像，并使用我们的引擎生成密集描述，命名为DenseFusion-1M。广泛的实验证实，我们的引擎优于其对手，由此产生的数据集显着改善了现有MLLMs在各种视觉语言基准上的感知和认知能力，特别是对于高分辨率图像作为输入的情况。数据集和代码可在https://github.com/baaivision/DenseFusion 上公开获取。

更新时间: 2024-07-11 08:48:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08303v1

Impact Measures for Gradual Argumentation Semantics

Argumentation is a formalism allowing to reason with contradictory information by modeling arguments and their interactions. There are now an increasing number of gradual semantics and impact measures that have emerged to facilitate the interpretation of their outcomes. An impact measure assesses, for each argument, the impact of other arguments on its score. In this paper, we refine an existing impact measure from Delobelle and Villata and introduce a new impact measure rooted in Shapley values. We introduce several principles to evaluate those two impact measures w.r.t. some well-known gradual semantics. This comprehensive analysis provides deeper insights into their functionality and desirability.

Updated: 2024-07-11 08:47:44

标题: 逐步论证语义的影响度量

摘要: 论证是一种形式主义，允许通过建模论点及其相互作用来推理具有矛盾信息。现在出现了越来越多的逐渐语义和影响度量，以便促进对它们结果的解释。影响度量评估每个论点对其得分的其他论点的影响。在本文中，我们对Delobelle和Villata提出的现有影响度量进行了细化，并引入了根植于Shapley值的新的影响度量。我们介绍了几个原则，以评估这两种影响度量与一些众所周知的逐渐语义的关系。这种全面的分析提供了对它们功能和可取性的更深入的见解。

更新时间: 2024-07-11 08:47:44

领域: cs.AI

下载: http://arxiv.org/abs/2407.08302v1

Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining

Large language models exhibit exceptional generalization capabilities, primarily attributed to the utilization of diversely sourced data. However, conventional practices in integrating this diverse data heavily rely on heuristic schemes, lacking theoretical guidance. This research tackles these limitations by investigating strategies based on low-cost proxies for data mixtures, with the aim of streamlining data curation to enhance training efficiency. Specifically, we propose a unified scaling law, termed $\textbf{BiMix}$, which accurately models the bivariate scaling behaviors of both data quantity and mixing proportions. We conduct systematic experiments and provide empirical evidence for the predictive power and fundamental principles of $\textbf{BiMix}$. Notably, our findings reveal that entropy-driven training-free data mixtures can achieve comparable or even better performance than more resource-intensive methods. We hope that our quantitative insights can shed light on further judicious research and development in cost-effective language modeling.

Updated: 2024-07-11 08:44:45

标题: 数据混合变得更有效：语言模型预训练的双变量缩放定律

摘要: 大型语言模型展现出卓越的泛化能力，主要归因于利用多样化的数据。然而，传统的整合多样化数据的做法主要依赖于启发式方案，缺乏理论指导。本研究通过研究基于低成本代理的数据混合策略，旨在简化数据整理以增强训练效率，以应对这些限制。具体来说，我们提出了一个统一的缩放定律，称为$\textbf{BiMix}$，准确地建模了数据数量和混合比例的双变量缩放行为。我们进行系统实验，并提供了对$\textbf{BiMix}$的预测能力和基本原理的经验证据。值得注意的是，我们的研究结果表明，基于熵驱动的无需训练的数据混合方法可以实现与更加资源密集的方法相当甚至更好的性能。我们希望我们的定量见解可以为成本效益的语言建模研究和开发提供启示。

更新时间: 2024-07-11 08:44:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.14908v2

XAI-Guided Enhancement of Vegetation Indices for Crop Mapping

Vegetation indices allow to efficiently monitor vegetation growth and agricultural activities. Previous generations of satellites were capturing a limited number of spectral bands, and a few expert-designed vegetation indices were sufficient to harness their potential. New generations of multi- and hyperspectral satellites can however capture additional bands, but are not yet efficiently exploited. In this work, we propose an explainable-AI-based method to select and design suitable vegetation indices. We first train a deep neural network using multispectral satellite data, then extract feature importance to identify the most influential bands. We subsequently select suitable existing vegetation indices or modify them to incorporate the identified bands and retrain our model. We validate our approach on a crop classification task. Our results indicate that models trained on individual indices achieve comparable results to the baseline model trained on all bands, while the combination of two indices surpasses the baseline in certain cases.

Updated: 2024-07-11 08:44:43

标题: XAI引导下的植被指数增强用于作物分类

摘要: 植被指数可以有效监测植被生长和农业活动。先前一代卫星只能捕捉有限数量的光谱波段，一些专家设计的植被指数已足以发挥它们的潜力。新一代的多光谱和高光谱卫星可以捕捉额外的波段，但尚未被有效利用。在这项工作中，我们提出了一种基于可解释人工智能的方法来选择和设计适当的植被指数。我们首先使用多光谱卫星数据训练深度神经网络，然后提取特征重要性以识别最有影响力的波段。我们随后选择适当的现有植被指数或修改它们以纳入识别的波段，并重新训练我们的模型。我们在作物分类任务上验证了我们的方法。我们的结果表明，使用单个指数训练的模型能够达到与在所有波段上训练的基线模型相当的结果，而在某些情况下，两个指数的组合超过了基线。

更新时间: 2024-07-11 08:44:43

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08298v1

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Training Large Language Models (LLMs) is memory-intensive due to the large number of parameters and associated optimization states. GaLore, a recent method, reduces memory usage by projecting weight gradients into a low-rank subspace without compromising performance. However, GaLore relies on time-consuming Singular Value Decomposition (SVD) operations to identify the subspace, and the frequent subspace updates lead to significant training time overhead. Moreover, GaLore offers minimal improvements in accuracy and efficiency compared to LoRA in more accessible fine-tuning scenarios. To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore. Our method is based on two key observations: (i) the gradient subspace exhibits diverse properties, with some layers converging early in training while others are subject to frequent changes; (ii) the projection matrices are highly resilient to low-bit quantization. Leveraging these insights, Q-GaLore adaptively updates the gradient subspace based on its convergence statistics, achieving comparable performance while significantly reducing the number of SVD operations. We maintain the projection matrices in INT4 format and weights in INT8 format, incorporating stochastic rounding to capture accumulated gradient information. This approach enables a high-precision training trajectory using only low-precision weights. We demonstrate that Q-GaLore achieves highly competitive performance with exceptional memory efficiency. At pre-training, Q-GaLore facilitates training a LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16 GB memory. At fine-tuning, it reduces memory consumption by up to 50% compared to LoRA and GaLore, while consistently outperforming QLoRA at the same memory cost.

Updated: 2024-07-11 08:42:58

标题: Q-GaLore：采用INT4投影和层自适应低秩梯度的量化GaLore

摘要: 培训大型语言模型（LLMs）由于参数数量众多和相关的优化状态而需要大量内存。GaLore是一种最近的方法，通过将权重梯度投影到低秩子空间中，减少了内存使用，同时不影响性能。然而，GaLore依赖于耗时的奇异值分解（SVD）操作来识别子空间，并且频繁的子空间更新导致了显著的训练时间开销。此外，与LoRA在更易访问的微调场景中相比，GaLore在准确性和效率方面的改进很少。为了解决这些限制，我们引入了Q-Galore，一种通过结合量化和低秩投影显著减少内存使用的新方法，优于GaLore的好处。我们的方法基于两个关键观察：（i）梯度子空间表现出多样的特性，有些层在训练早期就收敛，而其他层则经常变化；（ii）投影矩阵对低比特量化具有高度弹性。利用这些见解，Q-GaLore根据其收敛统计信息自适应地更新梯度子空间，实现了可比较的性能，同时显著减少了SVD操作的数量。我们将投影矩阵保持在INT4格式中，权重保持在INT8格式中，结合随机四舍五入来获取累积梯度信息。这种方法使得仅使用低精度权重就可以实现高精度的训练轨迹。我们展示了Q-GaLore在内存效率方面实现了非常有竞争力的性能。在预训练阶段，Q-GaLore能够在单个NVIDIA RTX 4060 Ti上从零开始训练一个LLaMA-7B模型，仅使用16 GB内存。在微调阶段，与LoRA和GaLore相比，它将内存消耗减少了高达50%，同时始终优于在相同内存成本下的QLoRA。

更新时间: 2024-07-11 08:42:58

领域: cs.LG

下载: http://arxiv.org/abs/2407.08296v1

Predicting Heart Failure with Attention Learning Techniques Utilizing Cardiovascular Data

Cardiovascular diseases (CVDs) encompass a group of disorders affecting the heart and blood vessels, including conditions such as coronary artery disease, heart failure, stroke, and hypertension. In cardiovascular diseases, heart failure is one of the main causes of death and also long-term suffering in patients worldwide. Prediction is one of the risk factors that is highly valuable for treatment and intervention to minimize heart failure. In this work, an attention learning-based heart failure prediction approach is proposed on EHR(electronic health record) cardiovascular data such as ejection fraction and serum creatinine. Moreover, different optimizers with various learning rate approaches are applied to fine-tune the proposed approach. Serum creatinine and ejection fraction are the two most important features to predict the patient's heart failure. The computational result shows that the RMSProp optimizer with 0.001 learning rate has a better prediction based on serum creatinine. On the other hand, the combination of SGD optimizer with 0.01 learning rate exhibits optimum performance based on ejection fraction features. Overall, the proposed attention learning-based approach performs very efficiently in predicting heart failure compared to the existing state-of-the-art such as LSTM approach.

Updated: 2024-07-11 08:33:42

标题: 利用心血管数据和注意力学习技术预测心力衰竭

摘要: 心血管疾病（CVDs）包括一组影响心脏和血管的疾病，包括冠状动脉疾病、心力衰竭、中风和高血压等疾病。在心血管疾病中，心力衰竭是全球患者长期痛苦和死亡的主要原因之一。预测是治疗和干预心力衰竭的一个高度有价值的风险因素。在这项工作中，提出了一种基于注意力学习的心力衰竭预测方法，该方法基于电子健康记录（EHR）心血管数据，如射血分数和血清肌酐。此外，采用不同的优化器和不同学习率方法对提出的方法进行微调。血清肌酐和射血分数是预测患者心力衰竭的两个最重要特征。计算结果显示，使用0.001学习率的RMSProp优化器基于血清肌酐具有更好的预测能力。另一方面，使用0.01学习率的SGD优化器结合射血分数特征表现出最佳性能。总体而言，与现有的LSTM方法相比，提出的基于注意力学习的方法在预测心力衰竭方面表现非常高效。

更新时间: 2024-07-11 08:33:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08289v1

Performance Evaluation of Hashing Algorithms on Commodity Hardware

Hashing functions, which are created to provide brief and erratic digests for the message entered, are the primary cryptographic primitives used in blockchain networks. Hashing is employed in blockchain networks to create linked block lists, which offer safe and secure distributed repository storage for critical information. Due to the unique nature of the hash search problem in blockchain networks, the most parallelization of calculations is possible. This technical report presents a performance evaluation of three popular hashing algorithms Blake3, SHA-256, and SHA-512. These hashing algorithms are widely used in various applications, such as digital signatures, message authentication, and password storage. It then discusses the performance metrics used to evaluate the algorithms, such as hash rate/throughput and memory usage. The evaluation is conducted on a range of hardware platforms, including desktop and VMs. The evaluation includes synthetic benchmarks. The results of the evaluation show that Blake3 generally outperforms both SHA-256 and SHA-512 in terms of throughput and latency. However, the performance advantage of Blake3 varies depending on the specific hardware platform and the size of the input data. The report concludes with recommendations for selecting the most suitable hashing algorithm for a given application, based on its performance requirements and security needs. The evaluation results can also inform future research and development efforts to improve the performance and security of hashing algorithms.

Updated: 2024-07-11 08:31:02

标题: 在商品硬件上对哈希算法的性能评估

摘要: 哈希函数被设计用来为输入的消息提供简短和随机的摘要，在区块链网络中是主要的加密原语。在区块链网络中使用哈希函数来创建链接的区块列表，为关键信息提供安全和可靠的分布式存储库。由于区块链网络中哈希搜索问题的独特性质，可以实现最大程度的并行计算。本技术报告对三种流行的哈希算法Blake3、SHA-256和SHA-512进行了性能评估。这些哈希算法广泛用于数字签名、消息认证和密码存储等各种应用中。报告讨论了用于评估算法的性能指标，如哈希率/吞吐量和内存使用量。评估涵盖了多种硬件平台，包括台式机和虚拟机。评估包括合成基准测试。评估结果显示，就吞吐量和延迟而言，Blake3通常优于SHA-256和SHA-512。然而，Blake3的性能优势取决于具体的硬件平台和输入数据的大小。报告总结了针对特定应用程序选择最适合的哈希算法的建议，基于其性能要求和安全需求。评估结果还可以为未来的研究和开发工作提供信息，以改进哈希算法的性能和安全性。

更新时间: 2024-07-11 08:31:02

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.08284v1

AoA-Based Physical Layer Authentication in Analog Arrays under Impersonation Attacks

We discuss the use of angle of arrival (AoA) as an authentication measure in analog array multiple-input multiple-output (MIMO) systems. A base station equipped with an analog array authenticates users based on the AoA estimated from certified pilot transmissions, while active attackers manipulate their transmitted signals to mount impersonation attacks. We study several attacks of increasing intensity (captured through the availability of side information at the attackers) and assess the performance of AoA-based authentication using one-class classifiers. Our results show that some attack techniques with knowledge of the combiners at the verifier are effective in falsifying the AoA and compromising the security of the considered type of physical layer authentication.

Updated: 2024-07-11 08:30:04

标题: 基于AoA的模拟数组中身份冒充攻击下的物理层认证

摘要: 我们讨论了在模拟阵列多输入多输出（MIMO）系统中将到达角度（AoA）用作身份验证措施的应用。一个配备有模拟阵列的基站基于从经认证的导频传输中估计的AoA对用户进行身份验证，而主动攻击者则操纵其传输信号以发动冒充攻击。我们研究了几种攻击，其强度逐渐增加（通过攻击者可获得的旁路信息来捕捉），并评估了使用单类分类器的基于AoA的身份验证的性能。我们的结果显示，一些攻击技术在验证器了解合并器的情况下具有有效性，可以伪造AoA并破坏所考虑的物理层身份验证的安全性。

更新时间: 2024-07-11 08:30:04

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.08282v1

Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Learning continually from a stream of non-i.i.d. data is an open challenge in deep learning, even more so when working in resource-constrained environments such as embedded devices. Visual models that are continually updated through supervised learning are often prone to overfitting, catastrophic forgetting, and biased representations. On the other hand, large language models contain knowledge about multiple concepts and their relations, which can foster a more robust, informed and coherent learning process. This work proposes Continual Visual Mapping (CVM), an approach that continually ground vision representations to a knowledge space extracted from a fixed Language model. Specifically, CVM continually trains a small and efficient visual model to map its representations into a conceptual space established by a fixed Large Language Model. Due to their smaller nature, CVM can be used when directly adapting large visual pre-trained models is unfeasible due to computational or data constraints. CVM overcome state-of-the-art continual learning methods on five benchmarks and offers a promising avenue for addressing generalization capabilities in continual learning, even in computationally constrained devices.

Updated: 2024-07-11 08:28:40

标题: 不断学习在资源受限环境中将视觉概念映射到大型语言模型

摘要: 在深度学习中，从非独立同分布数据流中持续学习是一个开放的挑战，尤其是在资源受限的环境下，如嵌入式设备。通过监督学习持续更新的视觉模型往往容易出现过拟合、灾难性遗忘和偏倚表现。另一方面，大型语言模型包含多个概念及其关系的知识，可以促进更稳健、知情和连贯的学习过程。本文提出了Continual Visual Mapping（CVM）方法，该方法将视觉表示持续地映射到从固定语言模型中提取的知识空间中。具体来说，CVM持续训练一个小型高效的视觉模型，将其表示映射到由固定大型语言模型建立的概念空间中。由于其较小的规模，当直接调整大型视觉预训练模型由于计算或数据限制不可行时，CVM可以使用。CVM在五个基准测试中超越了最先进的持续学习方法，并为解决持续学习中的泛化能力提供了一个有前途的途径，即使在计算受限的设备上也能做到。

更新时间: 2024-07-11 08:28:40

领域: cs.AI

下载: http://arxiv.org/abs/2407.08279v1

Explainability of Sub-Field Level Crop Yield Prediction using Remote Sensing

Crop yield forecasting plays a significant role in addressing growing concerns about food security and guiding decision-making for policymakers and farmers. When deep learning is employed, understanding the learning and decision-making processes of the models, as well as their interaction with the input data, is crucial for establishing trust in the models and gaining insight into their reliability. In this study, we focus on the task of crop yield prediction, specifically for soybean, wheat, and rapeseed crops in Argentina, Uruguay, and Germany. Our goal is to develop and explain predictive models for these crops, using a large dataset of satellite images, additional data modalities, and crop yield maps. We employ a long short-term memory network and investigate the impact of using different temporal samplings of the satellite data and the benefit of adding more relevant modalities. For model explainability, we utilize feature attribution methods to quantify input feature contributions, identify critical growth stages, analyze yield variability at the field level, and explain less accurate predictions. The modeling results show an improvement when adding more modalities or using all available instances of satellite data. The explainability results reveal distinct feature importance patterns for each crop and region. We further found that the most influential growth stages on the prediction are dependent on the temporal sampling of the input data. We demonstrated how these critical growth stages, which hold significant agronomic value, closely align with the existing literature in agronomy and crop development biology.

Updated: 2024-07-11 08:23:46

标题: 遥感技术在子领域级作物产量预测中的可解释性

摘要: 作物产量预测在解决粮食安全日益增长的关注以及指导政策制定者和农民的决策中发挥着重要作用。当使用深度学习时，了解模型的学习和决策过程，以及它们与输入数据的交互作用对建立对模型的信任和获得对其可靠性的洞察至关重要。本研究侧重于作物产量预测任务，特别是针对阿根廷、乌拉圭和德国的大豆、小麦和油菜作物。我们的目标是开发和解释这些作物的预测模型，利用大量的卫星图像数据集、额外的数据模态以及作物产量地图。我们采用长短期记忆网络，研究使用不同时间采样的卫星数据以及添加更多相关模态的影响。为了模型的可解释性，我们利用特征归因方法量化输入特征的贡献，确定关键生长阶段，分析田间产量的变化，并解释不太准确的预测结果。建模结果显示，在添加更多模态或使用所有可用的卫星数据实例时有所改善。可解释性结果显示每种作物和地区的特征重要性模式是不同的。我们进一步发现，对预测最具影响力的生长阶段取决于输入数据的时间采样。我们展示了这些关键生长阶段，它们具有重要的农学价值，与现有的农学和作物发育生物学文献密切一致。

更新时间: 2024-07-11 08:23:46

领域: cs.LG

下载: http://arxiv.org/abs/2407.08274v1

Gaussian process interpolation with conformal prediction: methods and comparative analysis

This article advocates the use of conformal prediction (CP) methods for Gaussian process (GP) interpolation to enhance the calibration of prediction intervals. We begin by illustrating that using a GP model with parameters selected by maximum likelihood often results in predictions that are not optimally calibrated. CP methods can adjust the prediction intervals, leading to better uncertainty quantification while maintaining the accuracy of the underlying GP model. We compare different CP variants and introduce a novel variant based on an asymmetric score. Our numerical experiments demonstrate the effectiveness of CP methods in improving calibration without compromising accuracy. This work aims to facilitate the adoption of CP methods in the GP community.

Updated: 2024-07-11 08:15:57

标题: 高斯过程插值与符合预测：方法和比较分析

摘要: 这篇文章主张在高斯过程（GP）插值中使用符合预测（CP）方法，以增强预测区间的校准。我们首先说明，使用最大似然选择的GP模型参数通常导致预测不是最佳校准。CP方法可以调整预测区间，从而更好地量化不确定性，同时保持底层GP模型的准确性。我们比较不同的CP变体，并介绍一种基于不对称分数的新变体。我们的数值实验证明了CP方法在改善校准的有效性，而不影响准确性。这项工作旨在促进GP社区采用CP方法。

更新时间: 2024-07-11 08:15:57

领域: cs.LG,stat.CO,stat.ME,stat.ML

下载: http://arxiv.org/abs/2407.08271v1

SciQu: Accelerating Materials Properties Prediction with Automated Literature Mining for Self-Driving Laboratories

Assessing different material properties to predict specific attributes, such as band gap, resistivity, young modulus, work function, and refractive index, is a fundamental requirement for materials science-based applications. However, the process is time-consuming and often requires extensive literature reviews and numerous experiments. Our study addresses these challenges by leveraging machine learning to analyze material properties with greater precision and efficiency. By automating the data extraction process and using the extracted information to train machine learning models, our developed model, SciQu, optimizes material properties. As a proof of concept, we predicted the refractive index of materials using data extracted from numerous research articles with SciQu, considering input descriptors such as space group, volume, and bandgap with Root Mean Square Error (RMSE) 0.068 and R2 0.94. Thus, SciQu not only predicts the properties of materials but also plays a key role in self-driving laboratories by optimizing the synthesis parameters to achieve precise shape, size, and phase of the materials subjected to the input parameters.

Updated: 2024-07-11 08:12:46

标题: SciQu：利用自动文献挖掘加速材料性质预测，实现自动实验室

摘要: 评估不同材料属性以预测特定属性，如带隙、电阻率、杨氏模量、功函数和折射率，是基于材料科学的应用的基本要求。然而，这个过程耗时且通常需要广泛的文献回顾和大量的实验。我们的研究通过利用机器学习来分析材料属性，从而更精确和高效地解决这些挑战。通过自动化数据提取过程并使用提取的信息来训练机器学习模型，我们开发的模型SciQu 优化了材料属性。作为概念验证，我们使用从许多研究文章中提取的数据来预测材料的折射率，考虑输入描述符如空间群、体积和带隙，其均方根误差（RMSE）为0.068，R2为0.94。因此，SciQu不仅预测材料的属性，而且通过优化合成参数在自动化实验室中起到关键作用，以实现受输入参数影响的材料的精确形状、尺寸和相位。

更新时间: 2024-07-11 08:12:46

领域: cond-mat.mtrl-sci,cs.AI,cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2407.08270v1

Early Explorations of Lightweight Models for Wound Segmentation on Mobile Devices

The aging population poses numerous challenges to healthcare, including the increase in chronic wounds in the elderly. The current approach to wound assessment by therapists based on photographic documentation is subjective, highlighting the need for computer-aided wound recognition from smartphone photos. This offers objective and convenient therapy monitoring, while being accessible to patients from their home at any time. However, despite research in mobile image segmentation, there is a lack of focus on mobile wound segmentation. To address this gap, we conduct initial research on three lightweight architectures to investigate their suitability for smartphone-based wound segmentation. Using public datasets and UNet as a baseline, our results are promising, with both ENet and TopFormer, as well as the larger UNeXt variant, showing comparable performance to UNet. Furthermore, we deploy the models into a smartphone app for visual assessment of live segmentation, where results demonstrate the effectiveness of TopFormer in distinguishing wounds from wound-coloured objects. While our study highlights the potential of transformer models for mobile wound segmentation, future work should aim to further improve the mask contours.

Updated: 2024-07-11 08:01:58

标题: 移动设备上轻量级模型在伤口分割上的早期探索

摘要: 人口老龄化给医疗保健带来了许多挑战，包括老年人慢性伤口的增加。目前治疗师基于摄影文档的伤口评估方法是主观的，突出了从智能手机照片中进行计算机辅助伤口识别的需求。这种方法提供了客观和方便的治疗监测，同时让患者可以随时从家中获得。然而，尽管在移动图像分割方面进行了研究，但对移动伤口分割的关注不足。为了弥补这一差距，我们进行了三种轻量级架构的初步研究，以研究它们在基于智能手机的伤口分割中的适用性。使用公共数据集和UNet作为基准，我们的结果令人鼓舞，ENet和TopFormer以及更大的UNeXt变种表现出与UNet相当的性能。此外，我们将模型部署到智能手机应用程序中，用于实时分割的视觉评估，在结果中展示了TopFormer在区分伤口和伤口着色物体方面的有效性。尽管我们的研究强调了变压器模型在移动伤口分割中的潜力，未来的工作应该旨在进一步改进蒙版轮廓。

更新时间: 2024-07-11 08:01:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07605v2

Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear

Models based on convolutional neural networks (CNN) and transformers have steadily been improved. They also have been applied in various computer vision downstream tasks. However, in object detection tasks, accurately localizing and classifying almost infinite categories of foods in images remains challenging. To address these problems, we first segmented the food as the region-of-interest (ROI) by using the segment-anything model (SAM) and masked the rest of the region except ROI as black pixels. This process simplified the problems into a single classification for which annotation and training were much simpler than object detection. The images in which only the ROI was preserved were fed as inputs to fine-tune various off-the-shelf models that encoded their own inductive biases. Among them, Data-efficient image Transformers (DeiTs) had the best classification performance. Nonetheless, when foods' shapes and textures were similar, the contextual features of the ROI-only images were not enough for accurate classification. Therefore, we introduced a novel type of combined architecture, RveRNet, which consisted of ROI, extra-ROI, and integration modules that allowed it to account for both the ROI's and global contexts. The RveRNet's F1 score was 10% better than other individual models when classifying ambiguous food images. If the RveRNet's modules were DeiT with the knowledge distillation from the CNN, performed the best. We investigated how architectures can be made robust against input noise caused by permutation and translocation. The results indicated that there was a trade-off between how much the CNN teacher's knowledge could be distilled to DeiT and DeiT's innate strength. Code is publicly available at: https://github.com/Seonwhee-Genome/RveRNet.

Updated: 2024-07-11 07:57:33

标题: 将知识蒸馏应用于从一幅图像中有效获取感兴趣区域和全局语义的方法，其中出现多个对象

摘要: 基于卷积神经网络（CNN）和变压器的模型已经稳步改进，并在各种计算机视觉下游任务中得到应用。然而，在目标检测任务中，准确定位和分类图像中几乎无限类别的食物仍然具有挑战性。为了解决这些问题，我们首先使用分割任何东西模型（SAM）将食物分割为感兴趣区域（ROI），并将除ROI之外的区域掩盖为黑色像素。这个过程将问题简化为一个单一的分类，注释和训练比目标检测要简单得多。只保留ROI的图像作为输入馈送到各种现成模型进行微调，这些模型编码了它们自己的归纳偏差。其中，数据高效图像变换器（DeiTs）具有最佳的分类性能。然而，当食物的形状和纹理相似时，仅具有ROI的图像的上下文特征不足以进行准确分类。因此，我们引入了一种新型的组合架构RveRNet，它由ROI、额外ROI和整合模块组成，允许考虑ROI和全局上下文。当对模糊的食物图像进行分类时，RveRNet的F1得分比其他单个模型高出10%。如果RveRNet的模块是具有从CNN进行知识蒸馏的DeiT，则表现最佳。我们研究了如何使架构能够抵御由排列和平移引起的输入噪声。结果表明，在CNN教师的知识能够蒸馏到DeiT和DeiT固有强度之间存在权衡。代码公开可在以下链接找到：https://github.com/Seonwhee-Genome/RveRNet。

更新时间: 2024-07-11 07:57:33

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08257v1

Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling

Compressed Sensing (CS) facilitates rapid image acquisition by selecting a small subset of measurements sufficient for high-fidelity reconstruction. Adaptive CS seeks to further enhance this process by dynamically choosing future measurements based on information gleaned from data that is already acquired. However, many existing frameworks are often tailored to specific tasks and require intricate training procedures. We propose AdaSense, a novel Adaptive CS approach that leverages zero-shot posterior sampling with pre-trained diffusion models. By sequentially sampling from the posterior distribution, we can quantify the uncertainty of each possible future linear measurement throughout the acquisition process. AdaSense eliminates the need for additional training and boasts seamless adaptation to diverse domains with minimal tuning requirements. Our experiments demonstrate the effectiveness of AdaSense in reconstructing facial images from a small number of measurements. Furthermore, we apply AdaSense for active acquisition of medical images in the domains of magnetic resonance imaging (MRI) and computed tomography (CT), highlighting its potential for tangible real-world acceleration.

Updated: 2024-07-11 07:56:17

标题: 自适应压缩感知与基于扩散的后验抽样

摘要: 压缩感知（CS）通过选择足够进行高保真重建的小子集测量，促进了快速图像采集。自适应CS通过根据已经获取的数据中获得的信息动态选择未来的测量，旨在进一步增强这一过程。然而，许多现有框架通常专门针对特定任务，并需要复杂的训练程序。我们提出了一种新颖的自适应CS方法AdaSense，利用预训练扩散模型的零样本后验采样。通过从后验分布顺序采样，我们可以在采集过程中量化每个可能的未来线性测量的不确定性。AdaSense消除了额外训练的需要，并具有对各种领域的无缝适应性，最小化调整要求。我们的实验证明了AdaSense在从少量测量中重建面部图像方面的有效性。此外，我们将AdaSense应用于磁共振成像（MRI）和计算机断层扫描（CT）领域的医学图像的主动获取，突显了它在切实可行的现实加速方面的潜力。

更新时间: 2024-07-11 07:56:17

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08256v1

GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integrated spatial information as much as possible. However, the spectral feature-capturing architectures exhibit low computational efficiency, and CNNs lack the flexibility to perceive spatial contextual information. To address these issues, this paper proposes GraphMamba--an efficient graph structure learning vision Mamba classification framework that fully considers HSI characteristics to achieve deep spatial-spectral information mining. Specifically, we propose a novel hyperspectral visual GraphMamba processing paradigm (HVGM) that preserves spatial-spectral features by constructing spatial-spectral cubes and utilizes linear spectral encoding to enhance the operability of subsequent tasks. The core components of GraphMamba include the HyperMamba module for improving computational efficiency and the SpectralGCN module for adaptive spatial context awareness. The HyperMamba mitigates clutter interference by employing the global mask (GM) and introduces a parallel training inference architecture to alleviate computational bottlenecks. The SpatialGCN incorporates weighted multi-hop aggregation (WMA) spatial encoding to focus on highly correlated spatial structural features, thus flexibly aggregating contextual information while mitigating spatial noise interference. Extensive experiments were conducted on three different scales of real HSI datasets, and compared with the state-of-the-art classification frameworks, GraphMamba achieved optimal performance.

Updated: 2024-07-11 07:56:08

标题: GraphMamba：一种高效的用于高光谱图像分类的图结构学习视觉Mamba

摘要: 高效提取光谱序列和地理空间信息一直是高光谱图像分类中的热门话题。在光谱序列特征捕获方面，由于其长距离特征捕获能力，RNN和Transformer已成为主流分类框架。在空间信息聚合方面，CNN增强了接受野，尽可能保留集成的空间信息。然而，光谱特征捕获体系结构表现出较低的计算效率，而CNN缺乏感知空间上下文信息的灵活性。为了解决这些问题，本文提出了GraphMamba - 一种高效的图结构学习视觉Mamba分类框架，充分考虑HSI特征以实现深度空间-光谱信息挖掘。具体来说，我们提出了一种新颖的高光谱视觉GraphMamba处理范式（HVGM），通过构建空间-光谱立方体保留空间-光谱特征，并利用线性光谱编码增强后续任务的可操作性。GraphMamba的核心组件包括用于提高计算效率的HyperMamba模块和用于自适应空间上下文意识的SpectralGCN模块。HyperMamba通过使用全局掩码（GM）减轻干扰，并引入并行训练推理架构以缓解计算瓶颈。SpatialGCN结合加权多跳聚合（WMA）空间编码，重点关注高度相关的空间结构特征，从而灵活地聚合上下文信息，同时减轻空间噪声干扰。在三个不同规模的真实HSI数据集上进行了大量实验，并与最先进的分类框架进行了比较，结果表明GraphMamba实现了最佳性能。

更新时间: 2024-07-11 07:56:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08255v1

United We Stand: Decentralized Multi-Agent Planning With Attrition

Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks. However, despite the high frequency of agent failures in realistic large deployment scenarios, current approaches perform poorly in the presence of failures, by not converging at all, and/or by making very inefficient use of resources (e.g. energy). In this work, we propose Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents. It is based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination. We evaluate its effectiveness in realistic data-harvesting problems under different scenarios. We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates. Results suggest that, in the presence of frequent failures, our solution improves substantially over the best existing approaches in terms of global utility and scalability.

Updated: 2024-07-11 07:55:50

标题: 我们团结一致：具有减员的分散式多智能体规划

摘要: 分散规划是合作多智能体系统中信息收集任务的关键要素。然而，在现实大规模部署场景中智能体频繁失败的情况下，当前方法在存在故障时表现不佳，要么根本无法收敛，要么利用资源（如能源）非常低效。在本研究中，我们提出了一种名为可磨损MCTS（A-MCTS）的分散MCTS算法，能够及时高效地适应活跃智能体集合的变化。它基于使用全局奖励函数估计每个智能体的本地贡献，并使用遗憾匹配进行协调。我们在不同场景下的实际数据收集问题中评估其有效性。我们理论上和实验上展示了A-MCTS在高故障率下能够实现高效的适应。结果表明，在频繁故障的情况下，我们的解决方案在全局效用和可扩展性方面明显优于现有最佳方法。

更新时间: 2024-07-11 07:55:50

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2407.08254v1

Length independent generalization bounds for deep SSM architectures

Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with stable SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.

Updated: 2024-07-11 07:55:14

标题: 深度SSM结构的长度无关泛化界限

摘要: 许多最先进的模型，如S4、S5或LRU等，是基于长序列训练的，由结合状态空间模型（SSMs）和神经网络的顺序块组成。在本文中，我们提供了一个适用于这种类型架构的PAC界限，其中SSM块稳定，并且不依赖于输入序列的长度。在文献中，要求SSM块的稳定性是一种标准做法，已知有助于性能。我们的结果为使用稳定SSM块提供了理论上的理由，因为所提出的PAC界限会随着SSM块的稳定程度增加而减少。

更新时间: 2024-07-11 07:55:14

领域: cs.LG,cs.AI,stat.ML,68,I.2.6

下载: http://arxiv.org/abs/2405.20278v2

Gradient Boosting Reinforcement Learning

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

Updated: 2024-07-11 07:52:33

标题: 梯度提升强化学习

摘要: 神经网络（NN）在各种任务中取得了显著的成果，但缺乏关键特征：可解释性、对分类特征的支持以及适用于边缘设备的轻量级实现。虽然持续的努力旨在解决这些挑战，但梯度提升树（GBT）固有地满足了这些要求。因此，GBT已经成为许多实际应用和竞赛中监督学习任务的首选方法。然而，在在线学习场景中，特别是在强化学习（RL）中，它们的应用受到了限制。在这项工作中，我们通过引入梯度提升RL（GBRL）框架来弥合这一差距，这一框架将GBT的优势扩展到RL领域。使用GBRL框架，我们实现了各种演员-评论家算法，并将它们的性能与它们的NN对应物进行了比较。受NN中共享骨干的启发，我们引入了一种用于具有不同学习速率的策略和值函数的树共享方法，提高了在数百万次交互中的学习效率。GBRL在各种任务中取得了竞争性的表现，在具有结构化或分类特征的领域表现出色。此外，我们提供了一个高性能的、GPU加速的实现，与广泛使用的RL库无缝集成（可在https://github.com/NVlabs/gbrl获取）。GBRL为RL从业者扩展了工具包，展示了GBT在RL范式中的可行性和潜力，特别是在具有结构化或分类特征的领域中。

更新时间: 2024-07-11 07:52:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08250v1

GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration

Communication network engineering in enterprise environments is traditionally a complex, time-consuming, and error-prone manual process. Most research on network engineering automation has concentrated on configuration synthesis, often overlooking changes in the physical network topology. This paper introduces GeNet, a multimodal co-pilot for enterprise network engineers. GeNet is a novel framework that leverages a large language model (LLM) to streamline network design workflows. It uses visual and textual modalities to interpret and update network topologies and device configurations based on user intents. GeNet was evaluated on enterprise network scenarios adapted from Cisco certification exercises. Our results demonstrate GeNet's ability to interpret network topology images accurately, potentially reducing network engineers' efforts and accelerating network design processes in enterprise environments. Furthermore, we show the importance of precise topology understanding when handling intents that require modifications to the network's topology.

Updated: 2024-07-11 07:51:57

标题: GeNet：基于多模态LLM的网络拓扑和配置的辅助驾驶员

摘要: 企业环境中的通信网络工程传统上是一个复杂、耗时且容易出错的手动过程。大多数关于网络工程自动化的研究都集中在配置合成上，往往忽视了物理网络拓扑的变化。本文介绍了GeNet，这是一个面向企业网络工程师的多模态副驾驶。GeNet是一个新颖的框架，利用大型语言模型（LLM）来简化网络设计工作流程。它使用视觉和文本模态来解释和更新基于用户意图的网络拓扑和设备配置。GeNet在从思科认证练习中改编的企业网络场景上进行了评估。我们的结果表明，GeNet能够准确解释网络拓扑图像，潜在地减少网络工程师的工作量，并加快企业环境中的网络设计过程。此外，我们还展示了在处理需要对网络拓扑进行修改的意图时，准确理解拓扑的重要性。

更新时间: 2024-07-11 07:51:57

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2407.08249v1

Toward accessible comics for blind and low vision readers

This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects.

Updated: 2024-07-11 07:50:25

标题: 朝着盲人和低视力读者可访问的漫画前进

摘要: 这项工作探讨了如何利用提示工程技术和上下文信息来微调大型语言模型，以生成准确的完整故事文本描述，以便转发到现成的语音合成工具。我们建议利用现有的计算机视觉和光学字符识别技术，从漫画图片内容中构建一个基于现实的上下文，例如面板、人物、文本、阅读顺序以及气泡和人物的关联。然后我们推断人物身份并生成具有上下文感知面板描述的漫画书脚本，包括人物的外貌、姿势、情绪、对话等。我们相信这样丰富的内容描述可以轻松用于制作有各种声音的有声书和电子书，包括人物、标题和播放声音效果。

更新时间: 2024-07-11 07:50:25

领域: cs.AI

下载: http://arxiv.org/abs/2407.08248v1

VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications

The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.

Updated: 2024-07-11 07:46:14

标题: VR-GPT：智能虚拟现实应用的视觉语言模型

摘要: 虚拟现实应用的出现已经改变了各个领域，然而它们与视觉语言模型等先进人工智能技术的整合仍未被充分探索。本研究引入了一种创新方法，利用虚拟现实环境中的VLM来增强用户互动和任务效率。通过利用Unity引擎和自定义开发的VLM，我们的系统通过自然语言处理实现了实时直观的用户交互，而无需依赖视觉文本指令。语音转文本和文本转语音技术的整合使用户与VLM之间实现了无缝沟通，使系统能够有效地引导用户完成复杂任务。初步实验结果表明，利用VLM不仅可以缩短任务完成时间，还可以提高用户舒适度和任务参与度，与传统的虚拟现实交互方法相比。

更新时间: 2024-07-11 07:46:14

领域: cs.RO,cs.AI,cs.ET

下载: http://arxiv.org/abs/2405.11537v2

Feature Diversification and Adaptation for Federated Domain Generalization

Federated learning, a distributed learning paradigm, utilizes multiple clients to build a robust global model. In real-world applications, local clients often operate within their limited domains, leading to a `domain shift' across clients. Privacy concerns limit each client's learning to its own domain data, which increase the risk of overfitting. Moreover, the process of aggregating models trained on own limited domain can be potentially lead to a significant degradation in the global model performance. To deal with these challenges, we introduce the concept of federated feature diversification. Each client diversifies the own limited domain data by leveraging global feature statistics, i.e., the aggregated average statistics over all participating clients, shared through the global model's parameters. This data diversification helps local models to learn client-invariant representations while preserving privacy. Our resultant global model shows robust performance on unseen test domain data. To enhance performance further, we develop an instance-adaptive inference approach tailored for test domain data. Our proposed instance feature adapter dynamically adjusts feature statistics to align with the test input, thereby reducing the domain gap between the test and training domains. We show that our method achieves state-of-the-art performance on several domain generalization benchmarks within a federated learning setting.

Updated: 2024-07-11 07:45:10

标题: 特征多样化和适应性用于联邦领域泛化

摘要: 联邦学习是一种分布式学习范式，利用多个客户端构建强大的全局模型。在现实世界的应用中，本地客户端通常在其有限的领域内运作，导致客户端之间存在“领域转移”。隐私问题限制了每个客户端只能学习自己领域的数据，这增加了过拟合的风险。此外，对在自己有限领域上训练的模型进行聚合处理可能会导致全局模型性能显著下降。为了应对这些挑战，我们引入了联邦特征多样化的概念。每个客户端通过利用全局特征统计数据，即所有参与客户端的聚合平均统计数据，通过全局模型参数共享，来使自己的有限领域数据多样化。这种数据多样化有助于本地模型学习客户端不变表示，同时保护隐私。我们的全局模型在未知测试领域数据上表现出强大的性能。为了进一步提高性能，我们开发了一种针对测试领域数据量身定制的实例自适应推理方法。我们提出的实例特征适配器动态调整特征统计数据以与测试输入对齐，从而减少测试领域和训练领域之间的领域差距。我们展示了我们的方法在联邦学习环境中的几个领域泛化基准上实现了最先进的性能。

更新时间: 2024-07-11 07:45:10

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.08245v1

Evaluating Copyright Takedown Methods for Language Models

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity to (but legal distinction from) the DMCA takedown This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods, the impact on the model's ability to retain uncopyrightable factual knowledge from the training data whose recitation is embargoed, and how well the model maintains its general utility and efficiency. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches. Our findings indicate that no tested method excels across all metrics, showing significant room for research in this unique problem setting and indicating potential unresolved challenges for live policy proposals.

Updated: 2024-07-11 07:45:04

标题: 评估语言模型的版权删除方法

摘要: 语言模型（LMs）的能力来自于对各种数据进行广泛训练，包括潜在的受版权保护的材料。这些模型可以记忆并生成与它们的训练数据类似的内容，带来潜在的担忧。因此，模型创建者被激励开发减轻方法，以阻止生成受保护内容。我们将这一过程称为LM的版权删除，指出与DMCA删除（概念上类似但法律上有区别）的概念相似。本文介绍了对LM版权删除的可行性和副作用的首次评估。我们提出了CoTaEval，一个评估框架，用于评估版权删除方法的有效性，对模型保留来自训练数据但被禁止引用的不受版权保护的事实知识的影响，以及模型如何保持其一般效用和效率。我们考察了几种策略，包括添加系统提示、解码时过滤干预和取消学习方法。我们的发现表明，没有经过测试的方法在所有指标上表现优异，显示了在这个独特问题设置中研究的重要空间，并指出了潜在的未解决的实际政策提案挑战。

更新时间: 2024-07-11 07:45:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18664v3

Generalizable Sleep Staging via Multi-Level Domain Alignment

Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance.

Updated: 2024-07-11 07:38:32

标题: 通过多级领域对齐实现可推广的睡眠分期

摘要: 自动睡眠分期对于睡眠评估和疾病诊断至关重要。大多数现有方法依赖于一个特定的数据集，并且受限于泛化到其他未见数据集，其中训练数据和测试数据来自同一数据集。在本文中，我们引入领域泛化到自动睡眠分期，并提出了通用睡眠分期的任务，旨在提高模型对未见数据集的泛化能力。受现有领域泛化方法的启发，我们采用特征对齐的思想，并提出了一个名为SleepDG的框架来解决这个问题。考虑到本地显著特征和序列特征对于睡眠分期都很重要，我们提出了一个多级特征对齐，结合了时段级和序列级特征对齐，以学习领域不变特征表示。具体而言，我们设计了一个时段级特征对齐，来对齐不同领域中每个单独睡眠时段的特征分布，以及一个序列级特征对齐，来减小不同领域中序列特征的差异。SleepDG在五个公共数据集上进行了验证，取得了最先进的性能。

更新时间: 2024-07-11 07:38:32

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2401.05363v4

Leveraging LLMs to Predict Affective States via Smartphone Sensor Features

As mental health issues for young adults present a pressing public health concern, daily digital mood monitoring for early detection has become an important prospect. An active research area, digital phenotyping, involves collecting and analysing data from personal digital devices such as smartphones (usage and sensors) and wearables to infer behaviours and mental health. Whilst this data is standardly analysed using statistical and machine learning approaches, the emergence of large language models (LLMs) offers a new approach to make sense of smartphone sensing data. Despite their effectiveness across various domains, LLMs remain relatively unexplored in digital mental health, particularly in integrating mobile sensor data. Our study aims to bridge this gap by employing LLMs to predict affect outcomes based on smartphone sensing data from university students. We demonstrate the efficacy of zero-shot and few-shot embedding LLMs in inferring general wellbeing. Our findings reveal that LLMs can make promising predictions of affect measures using solely smartphone sensing data. This research sheds light on the potential of LLMs for affective state prediction, emphasizing the intricate link between smartphone behavioral patterns and affective states. To our knowledge, this is the first work to leverage LLMs for affective state prediction and digital phenotyping tasks.

Updated: 2024-07-11 07:37:52

标题: 利用LLMs通过智能手机传感器特征预测情感状态

摘要: 随着年轻成年人的心理健康问题成为一个紧迫的公共卫生问题，每日数字情绪监测以便早期检测已成为一个重要的前景。数字表型学作为一个活跃的研究领域，涉及从个人数字设备（如智能手机（使用和传感器）和可穿戴设备）收集和分析数据，以推断行为和心理健康。虽然这些数据通常使用统计和机器学习方法进行分析，但大型语言模型（LLMs）的出现提供了一种新的方法来理解智能手机传感数据。尽管LLMs在各个领域的有效性得到证明，但在数字心理健康领域，尤其是在整合移动传感器数据方面，LLMs仍然相对未被探索。我们的研究旨在利用LLMs根据大学生的智能手机传感数据预测情感结果。我们展示了零射击和少射击嵌入LLMs在推断一般福祉方面的有效性。我们的研究结果显示，LLMs可以仅使用智能手机传感数据进行情感测量的有希望的预测。这项研究揭示了LLMs在情感状态预测方面的潜力，强调了智能手机行为模式与情感状态之间的复杂联系。据我们所知，这是第一项利用LLMs进行情感状态预测和数字表型学任务的工作。

更新时间: 2024-07-11 07:37:52

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.08240v1

An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio

When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in source domain. Inspired by the mixture-of-experts model, we propose an unsupervised method named Samples mining with Diversity and Entropy (SDE). Our method first learns from a collection of diverse experts that achieve great performance from different perspectives in the source domain, but with ambiguity on target samples. We leverage these diverse experts to select the most informative samples by calculating their entropy. Furthermore, we introduced a label generation method tailored for these selected samples that are incorporated in the training process in source domain integrating the target domain information. We applied our method to a cross-domain partially fake audio detection dataset, ADD2023Track2. By introducing 10% of unknown samples from the target domain, we achieved an F1 score of 43.84%, which represents a relative increase of 77.2% compared to the second-best method.

Updated: 2024-07-11 07:32:16

标题: 一种无监督领域自适应方法用于定位部分伪造音频中的篡改区域

摘要: 当在部分伪造音频（PFA）中定位操作区域的任务涉及跨领域数据集时，由于源领域和目标领域之间的转移，深度学习模型的性能显著下降。为了解决这个问题，现有方法通常在训练之前使用数据增强。然而，它们忽视了目标领域中与源领域不同的特征。受专家混合模型启发，我们提出了一种无监督方法，名为样本挖掘与多样性和熵（SDE）。我们的方法首先从源域中一组不同专家学习，这些专家从不同角度取得了出色的性能，但在目标样本上存在歧义。我们利用这些多样化的专家通过计算它们的熵来选择最具信息量的样本。此外，我们引入了一种为这些选定样本量身定制的标签生成方法，这些样本被整合到源域中的训练过程中，整合了目标领域的信息。我们将我们的方法应用于跨域部分伪造音频检测数据集ADD2023Track2。通过引入来自目标领域的未知样本的10％，我们实现了43.84％的F1得分，相对于第二好的方法增加了77.2％。

更新时间: 2024-07-11 07:32:16

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.08239v1

Step-Back Profiling: Distilling User History for Personalized Scientific Writing

Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. To conduct the experiments, we construct a Personalized Scientific Writing (PSW) dataset to study multi-user personalization. PSW requires the models to write scientific papers given specialized author groups with diverse academic backgrounds. As for the results, we demonstrate the effectiveness of capturing user characteristics via STEP-BACK PROFILING for collaborative writing. Moreover, our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark (LaMP), including 7 personalization LLM tasks. Our ablation studies validate the contributions of different components in our method and provide insights into our task definition. Our dataset and code are available at \url{https://github.com/gersteinlab/step-back-profiling}.

Updated: 2024-07-11 07:29:12

标题: 倒退剖析：提炼用户历史以个性化科学写作

摘要: 大型语言模型（LLM）在各种自然语言处理任务中表现出色，但在生成个性化内容方面，特别是在科学写作等现实场景中，它们往往表现不佳。为了解决这个挑战，我们引入了STEP-BACK PROFILING来通过提炼用户历史信息为个性化LLMs，包括用户的基本特征和偏好。为了进行实验，我们构建了一个个性化科学写作（PSW）数据集，以研究多用户个性化。PSW要求模型根据具有不同学术背景的专业作者组撰写科学论文。至于结果，我们展示了通过STEP-BACK PROFILING捕捉用户特征对协作写作的有效性。此外，我们的方法在通用个性化基准（LaMP）上表现优于基准线，包括7个个性化LLM任务，最多提高3.6点。我们的消融研究验证了我们方法中不同组件的贡献，并为我们的任务定义提供了见解。我们的数据集和代码可在\url{https://github.com/gersteinlab/step-back-profiling}上找到。

更新时间: 2024-07-11 07:29:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14275v2

Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System

Multi-agent debate system (MAD) imitating the process of human discussion in pursuit of truth, aims to align the correct cognition of different agents for the optimal solution. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds (i.e., cognitive islands), which hinders the search for the optimal solution. To address the challenge, we propose a novel \underline{M}ulti-\underline{A}gent \underline{D}ebate with \underline{K}nowledge-\underline{E}nhanced framework (\textbf{MADKE}) to promote the system to find the solution. First, we involve a shared retrieval knowledge pool in the debate process to solve the problem of limited and different knowledge backgrounds. Then, we propose an adaptive knowledge selection method to guarantee the accuracy and personalization of knowledge. This method allows agents to choose whether to use external knowledge in each conversation round according to their own needs. Our experimental results on six datasets show that our method achieves state-of-the-art results compared to existing single-agent and multi-agent methods. Further analysis reveals that the introduction of retrieval knowledge can help the agent to break cognitive islands in the debate process and effectively improve the consistency and correctness of the model. Moreover, MADKE using Qwen1.5-72B-Chat surpasses GPT-4 by +1.26\% on average in six datasets, which validates that our method can help open-source LLMs achieve or even surpass the performance of GPT-4. Our code is available at \url{https://github.com/FutureForMe/MADKE}.

Updated: 2024-07-11 07:28:56

标题: 学习破局：知识增强的多智能体辩论系统推理

摘要: 多智能体辩论系统（MAD）模拟人类讨论过程以追求真相，旨在使不同智能体的正确认知达到最佳解决方案。由于各智能体的有限和不同的知识背景（即认知孤岛），使其表现出正确和高度一致的认知具有挑战性，这阻碍了寻找最佳解决方案。为了应对这一挑战，我们提出了一个新颖的知识增强框架下的多智能体辩论（MADKE）系统，以促进系统找到解决方案。首先，我们在辩论过程中引入共享检索知识池来解决有限和不同知识背景的问题。然后，我们提出了一种自适应知识选择方法来保证知识的准确性和个性化。该方法允许智能体根据自身需求在每轮对话中选择是否使用外部知识。我们在六个数据集上的实验结果显示，与现有的单一智能体和多智能体方法相比，我们的方法取得了最先进的结果。进一步分析表明，在辩论过程中引入检索知识可以帮助智能体突破认知孤岛，并有效提高模型的一致性和正确性。此外，MADKE使用Qwen1.5-72B-Chat平均超过GPT-4 1.26\%，验证了我们的方法可以帮助开源LLMs实现甚至超越GPT-4的性能。我们的代码可在\url{https://github.com/FutureForMe/MADKE}上找到。

更新时间: 2024-07-11 07:28:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.04854v2

Differentially Private Neural Network Training under Hidden State Assumption

We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific layer. By doing so, we extend the analysis of differential privacy under the hidden state assumption to encompass non-convex problems and algorithms employing proximal gradient descent. Furthermore, in contrast to existing methods, we adopt a novel approach by utilizing calibrated noise sampled from adaptive distributions, yielding improved empirical trade-offs between utility and privacy.

Updated: 2024-07-11 07:14:40

标题: 基于隐藏状态假设的差分隐私神经网络训练

摘要: 我们提出了一种新颖的方法，称为差分私有随机块坐标下降（DP-SBCD），用于训练具有差分隐私可证保证的神经网络。我们的方法结合了Lipschitz神经网络，并将神经网络的训练过程分解为子问题，每个子问题对应于特定层的训练。通过这样做，我们将差分隐私在隐含状态假设下的分析扩展到包括非凸问题和采用近端梯度下降的算法。此外，与现有方法相比，我们采用了一种新颖的方法，通过利用从自适应分布中采样的校准噪声，产生了在效用和隐私之间的改进的经验权衡。

更新时间: 2024-07-11 07:14:40

领域: cs.LG

下载: http://arxiv.org/abs/2407.08233v1

SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance

ReLU, a commonly used activation function in deep neural networks, is prone to the issue of "Dying ReLU". Several enhanced versions, such as ELU, SeLU, and Swish, have been introduced and are considered to be less commonly utilized. However, replacing ReLU can be somewhat challenging due to its inconsistent advantages. While Swish offers a smoother transition similar to ReLU, its utilization generally incurs a greater computational burden compared to ReLU. This paper proposes SwishReLU, a novel activation function combining elements of ReLU and Swish. Our findings reveal that SwishReLU outperforms ReLU in performance with a lower computational cost than Swish. This paper undertakes an examination and comparison of different types of ReLU variants with SwishReLU. Specifically, we compare ELU and SeLU along with Tanh on three datasets: CIFAR-10, CIFAR-100 and MNIST. Notably, applying SwishReLU in the VGG16 model described in Algorithm 2 yields a 6% accuracy improvement on the CIFAR-10 dataset.

Updated: 2024-07-11 07:14:34

标题: SwishReLU：一种用于增强深度神经网络性能的激活函数统一方法

摘要: ReLU是深度神经网络中常用的激活函数，容易出现“Dying ReLU”问题。为此，引入了一些改进版本，如ELU、SeLU和Swish，但它们被认为使用较少。然而，由于ReLU的优点不一致，替换它可能有一定挑战性。虽然Swish提供了类似ReLU的更平滑的过渡，但与ReLU相比，其使用通常会增加更大的计算负担。本文提出了SwishReLU，这是一种将ReLU和Swish元素结合的新型激活函数。我们的研究结果显示，SwishReLU在性能上优于ReLU，且计算成本低于Swish。本文对不同类型的ReLU变种与SwishReLU进行了检验和比较。具体而言，我们在三个数据集（CIFAR-10、CIFAR-100和MNIST）上比较了ELU、SeLU和Tanh。值得注意的是，在描述为Algorithm 2的VGG16模型中应用SwishReLU在CIFAR-10数据集上实现了6%的准确度提高。

更新时间: 2024-07-11 07:14:34

领域: cs.LG

下载: http://arxiv.org/abs/2407.08232v1

Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation

This paper addresses the unrealistic aspect of the commonly adopted Continuous Incremental Semantic Segmentation (CISS) scenario, termed overlapped. We point out that overlapped allows the same image to reappear in future tasks with different pixel labels, which is far from practical incremental learning scenarios. Moreover, we identified that this flawed scenario may lead to biased results for two commonly used techniques in CISS, pseudo-labeling and exemplar memory, resulting in unintended advantages or disadvantages for certain techniques. To mitigate this, a practical scenario called partitioned is proposed, in which the dataset is first divided into distinct subsets representing each class, and then the subsets are assigned to each corresponding task. This efficiently addresses the issue above while meeting the requirement of CISS scenario, such as capturing the background shifts. Furthermore, we identify and address the code implementation issues related to retrieving data from the exemplar memory, which was ignored in previous works. Lastly, we introduce a simple yet competitive memory-based baseline, MiB-AugM, that handles background shifts of current tasks in the exemplar memory. This baseline achieves state-of-the-art results across multiple tasks involving learning numerous new classes.

Updated: 2024-07-11 07:09:00

标题: 朝向实际的增量式场景在类增量式语义分割中

摘要: 这篇论文讨论了常用的连续增量语义分割（CISS）场景中存在的不切实际的方面，称为重叠。我们指出，重叠允许同一图像在未来任务中以不同的像素标签重新出现，这与实际的增量学习场景相去甚远。此外，我们发现，这种有缺陷的场景可能会导致CISS中两种常用技术——伪标记和示例记忆——产生偏见的结果，从而为某些技术带来意外的优势或劣势。为了减轻这种情况，提出了一个实用的场景，称为分区，其中数据集首先被划分为代表每个类别的不同子集，然后这些子集被分配给相应的任务。这有效地解决了上述问题，同时满足了CISS场景的要求，如捕捉背景的变化。此外，我们还确定并解决了与从示例记忆中检索数据相关的代码实现问题，这在以前的工作中被忽略了。最后，我们介绍了一个简单但竞争力强的基于记忆的基线模型MiB-AugM，可以处理示例记忆中当前任务的背景变化。这个基线模型在涉及学习大量新类别的多个任务中取得了最先进的结果。

更新时间: 2024-07-11 07:09:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.09858v2

DALL-M: Context-Aware Clinical Data Augmentation with LLMs

X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating comprehensive clinical features and data integration. We present a novel technique to enhance the clinical context through augmentation techniques with clinical tabular data, thereby improving its applicability and reliability in AI medical diagnostics. To address this, we introduce a pioneering approach to clinical data augmentation that employs large language models (LLMs) to generate patient contextual synthetic data. This methodology is crucial for training more robust deep learning models in healthcare. It preserves the integrity of real patient data while enriching the dataset with contextually relevant synthetic features, significantly enhancing model performance. DALL-M uses a three-phase feature generation process: (i) clinical context storage, (ii) expert query generation, and (iii) context-aware feature augmentation. DALL-M generates new, clinically relevant features by synthesizing chest X-ray images and reports. Applied to 799 cases using nine features from the MIMIC-IV dataset, it created an augmented set of 91 features. This is the first work to generate contextual values for existing and new features based on patients' X-ray reports, gender, and age and to produce new contextual knowledge during data augmentation. Empirical validation with machine learning models, including Decision Trees, Random Forests, XGBoost, and TabNET, showed significant performance improvements. Incorporating augmented features increased the F1 score by 16.5% and Precision and Recall by approximately 25%. DALL-M addresses a critical gap in clinical data augmentation, offering a robust framework for generating contextually enriched datasets.

Updated: 2024-07-11 07:01:50

标题: DALL-M：使用LLMs进行上下文感知的临床数据增强

摘要: X射线图像在医学诊断中至关重要，但在没有临床背景的情况下，它们的有效性受到限制。放射科医生通常发现胸部X射线不足以诊断潜在疾病，需要全面的临床特征和数据整合。我们提出了一种新技术，通过临床表格数据的增强技术来增强临床背景，从而提高其在AI医学诊断中的适用性和可靠性。为了解决这个问题，我们引入了一种创新的临床数据增强方法，利用大型语言模型（LLMs）生成患者上下文合成数据。这种方法对于在医疗保健领域训练更强健的深度学习模型至关重要。它在保持真实患者数据完整性的同时，通过丰富数据集中具有相关上下文的合成特征，显著提高了模型性能。DALL-M使用三阶段特征生成过程：（i）临床上下文存储，（ii）专家查询生成，（iii）上下文感知特征增强。DALL-M通过合成胸部X射线图像和报告来生成新的临床相关特征。应用于MIMIC-IV数据集中的799例病例，使用九个特征创建了一个包含91个特征的增强数据集。这是首个根据患者X射线报告、性别和年龄生成现有和新特征的上下文值，并在数据增强过程中产生新上下文知识的工作。使用决策树、随机森林、XGBoost和TabNET等机器学习模型进行实证验证，显示出显著的性能改进。整合增强特征将F1分数提高了16.5％，将精度和召回率提高了约25％。DALL-M填补了临床数据增强中的重要空白，提供了一个为生成上下文丰富数据集的坚实框架。

更新时间: 2024-07-11 07:01:50

领域: cs.AI,cs.IR,cs.LG,I.5.1; J.3; H.3.3; I.2.7

下载: http://arxiv.org/abs/2407.08227v1

stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTrans, a deep learning method based on Transformer architecture that provides comprehensive predictions for gene expression in unmeasured areas or unexpectedly lost areas and enhances gene expression in original and inputed spots. Utilizing a self-supervised learning approach, stEnTrans establishes proxy tasks on gene expression profile without requiring additional data, mining intrinsic features of the tissues as supervisory information. We evaluate stEnTrans on six datasets and the results indicate superior performance in enhancing spots resolution and predicting gene expression in unmeasured areas compared to other deep learning and traditional interpolation methods. Additionally, Our method also can help the discovery of spatial patterns in Spatial Transcriptomics and enrich to more biologically significant pathways. Our source code is available at https://github.com/shuailinxue/stEnTrans.

Updated: 2024-07-11 06:50:34

标题: stEnTrans：基于Transformer的深度学习用于空间转录组增强

摘要: 细胞在组织和器官中的空间定位对于它们特定功能的表现至关重要。空间转录组学技术使得可以全面地测量组织中的基因表达模式，同时保留空间信息。然而，目前流行的空间转录组学技术要么具有较浅的测序深度，要么分辨率较低。我们提出了基于Transformer架构的深度学习方法stEnTrans，用于在未测量区域或意外丢失的区域提供基因表达的全面预测，并增强原始和输入点的基因表达。利用自监督学习方法，stEnTrans在基因表达谱上建立了代理任务，无需额外数据，挖掘组织的内在特征作为监督信息。我们在六个数据集上评估了stEnTrans的表现，结果表明在增强点分辨率和预测未测量区域的基因表达方面，与其他深度学习和传统插值方法相比表现更优。此外，我们的方法还可以帮助发现空间转录组学中的空间模式，并丰富更多具有生物学意义的通路。我们的源代码可在https://github.com/shuailinxue/stEnTrans获得。

更新时间: 2024-07-11 06:50:34

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2407.08224v1

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.

Updated: 2024-07-11 06:50:19

标题: 投机性RAG：通过起草增强检索增强生成

摘要: 检索增强生成（RAG）结合了大型语言模型（LLMs）的生成能力和外部知识来源，以提供更准确和及时的响应。最近的RAG进展集中在通过迭代LLM改进或通过对LLM进行额外指导调整来获得自我批判能力，从而改善检索结果。在这项工作中，我们介绍了推测性RAG-一种利用更大的通用LM来有效验证由较小的精炼专家LM并行生成的多个RAG草稿的框架。每个草稿都是从不同的检索文档子集生成的，提供了多样化的证据视角，同时减少了每个草稿的输入标记数。这种方法增强了对每个子集的理解，并减轻了长篇背景下的潜在位置偏见。我们的方法通过将起草工作委托给较小的专家LM，使大型通用LM在草稿上执行单次验证通过，从而加快了RAG的速度。大量实验证明，推测性RAG在TriviaQA、MuSiQue、PubHealth和ARC-Challenge基准测试中实现了最先进的性能，同时在PubHealth上与传统的RAG系统相比，精度提高了高达12.97%，同时将延迟降低了51%。

更新时间: 2024-07-11 06:50:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08223v1

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. To make the evaluation reliable, we augment LLM-based evaluators with rules to effectively verify whether generated texts can satisfy each constraint and composition. Furthermore, we obtain the final evaluation score based on the dependency structure determined by different composition types. ComplexBench identifies significant deficiencies in existing LLMs when dealing with complex instructions with multiple constraints composition.

Updated: 2024-07-11 06:44:47

标题: 用多约束组合进行复杂指令跟踪的基准测试

摘要: 遵循指令是大型语言模型（LLMs）的基本能力之一。随着LLMs的能力不断提升，它们越来越多地被应用于处理现实世界场景中的复杂人类指令。因此，如何评估LLMs遵循复杂指令的能力已成为一个关键的研究问题。现有的基准主要集中于对人类指令中不同类型的约束进行建模，而忽视了不同约束的组合，这是复杂指令中不可或缺的组成部分。为此，我们提出了ComplexBench，一个用于全面评估LLMs遵循由多个约束组成的复杂指令能力的基准。我们提出了复杂指令的分层分类法，包括4种约束类型，19个约束维度和4种组合类型，并相应地手工收集了高质量数据集。为了使评估可靠，我们通过规则增强了基于LLMs的评估器，以有效验证生成的文本是否能够满足每个约束和组合。此外，我们根据不同组合类型确定的依赖结构获得最终评估分数。ComplexBench识别了现有LLMs在处理具有多个约束组合的复杂指令时存在的重大不足。

更新时间: 2024-07-11 06:44:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.03978v2

Multimodal contrastive learning for spatial gene expression prediction using histology images

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H\&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose \textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of \textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

Updated: 2024-07-11 06:33:38

标题: 多模式对比学习在利用组织学图像进行空间基因表达预测中的应用

摘要: 近年来，空间转录组学（ST）技术的出现为深入研究复杂生物系统内基因表达模式的机会提供了前所未有的机会。尽管其具有颠覆性潜力，但ST技术的昂贵成本仍然是其在大规模研究中广泛应用的重要障碍。一种替代、更具成本效益的策略是利用人工智能来预测使用易获得的带有Hematoxylin和Eosin（H\&E）染色的全切片图像（WSIs）的基因表达水平。然而，现有方法尚未充分利用H&E图像和具有空间位置的ST数据提供的多模态信息。在本文中，我们提出了一种名为mclSTExp的多模态对比学习方法，该方法结合了Transformer和Densenet-121编码器，用于空间转录组表达预测。我们将每个点概念化为一个“单词”，通过Transformer编码器的自注意机制将其内在特征与空间上下文结合起来。通过对比学习进一步丰富了这种集成，从而增强了我们模型的预测能力。我们在两个乳腺癌数据集和一个皮肤鳞状细胞癌数据集上对mclSTExp进行了广泛评估，证明了其在预测空间基因表达方面的优越性能。此外，mclSTExp在解释癌症特异性过表达基因、阐明免疫相关基因以及识别病理学家注释的专门空间区域方面显示出潜力。我们的源代码可在https://github.com/shizhiceng/mclSTExp获取。

更新时间: 2024-07-11 06:33:38

领域: eess.IV,cs.AI,cs.CV,q-bio.QM

下载: http://arxiv.org/abs/2407.08216v1

Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user's immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods.

Updated: 2024-07-11 06:33:11

标题: 提升日常压力监测的性能和用户参与度：一种上下文感知的主动强化学习方法

摘要: 在当今快节奏的世界中，准确监测压力水平至关重要。基于传感器的压力监测系统通常需要大量数据集来训练有效的模型。然而，在个性化和互动场景中，个体特定的模型是必要的。传统方法如生态瞬时评估（EMAs）评估压力，但在不给用户增加负担的情况下效率高效收集数据。挑战在于及时发送EMAs，特别是在压力情况下，平衡监测效率和用户便利性。本文介绍了一种新颖的基于上下文感知的主动强化学习（RL）算法，利用智能手表的光电容积描记（PPG）数据和智能手机的上下文数据，以提高压力检测效果。我们的方法动态选择最佳时间部署EMAs，利用用户的即时上下文最大化标签准确性和最小化干扰性。最初，研究在离线环境中执行，以完善标签收集过程，旨在提高准确性同时减轻用户负担。后来，我们整合了实时标签收集机制，过渡到在线方法。这一转变导致了压力检测效率提高了11%。整合上下文数据提高了模型准确性4%。个性化研究表明AUC-ROC分数提高了10%，展示了更好的压力水平区分能力。这项研究标志着朝着个性化、上下文驱动的实时压力监测方法迈出了重要的一步。

更新时间: 2024-07-11 06:33:11

领域: cs.LG

下载: http://arxiv.org/abs/2407.08215v1

Towards stable training of parallel continual learning

Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultaneously, leading to severe training instability in PCL. This instability manifests during both forward and backward propagation, where features are entangled and gradients are conflict. This paper introduces Stable Parallel Continual Learning (SPCL), a novel approach that enhances the training stability of PCL for both forward and backward propagation. For the forward propagation, we apply Doubly-block Toeplit (DBT) Matrix based orthogonality constraints to network parameters to ensure stable and consistent propagation. For the backward propagation, we employ orthogonal decomposition for gradient management stabilizes backpropagation and mitigates gradient conflicts across tasks. By optimizing gradients by ensuring orthogonality and minimizing the condition number, SPCL effectively stabilizing the gradient descent in complex optimization tasks. Experimental results demonstrate that SPCL outperforms state-of-the-art methjods and achieve better training stability.

Updated: 2024-07-11 06:31:04

标题: 朝着稳定的并行持续学习训练方向

摘要: 并行持续学习（PCL）任务探讨了多源输入下的持续学习训练方法，其中来自不同任务的数据在到达时进行学习。PCL具有高训练效率，非常适用于复杂的多源数据系统，例如配备多个传感器的自动驾驶车辆。然而，任何时候，多个任务需要同时进行训练，导致PCL中的训练不稳定性严重。这种不稳定性在前向和后向传播过程中表现出来，特征纠缠在一起，梯度发生冲突。本文介绍了稳定的并行持续学习（SPCL），这是一种增强PCL在前向和后向传播中训练稳定性的新方法。对于前向传播，我们应用基于Doubly-block Toeplit（DBT）矩阵的正交约束来网络参数，以确保稳定和一致的传播。对于后向传播，我们采用正交分解来进行梯度管理，稳定反向传播并减轻跨任务的梯度冲突。通过通过确保正交性和最小化条件数来优化梯度，SPCL有效地稳定了复杂优化任务中的梯度下降。实验结果表明，SPCL优于最先进的方法，并实现了更好的训练稳定性。

更新时间: 2024-07-11 06:31:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08214v1

Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review

Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings:Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value:To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education.

Updated: 2024-07-11 06:25:36

标题: ChatGPT在高等教育中潜在的社会偏见：一个范围审查

摘要: 目的：生成式人工智能（GAI）模型，如ChatGPT，可能会因为在广泛数据集上的训练而继承或放大社会偏见。随着在高等教育机构（HEIs）中学生、教职员工对GAI的使用日益增加，迫切需要研究这些技术所涉及的伦理问题和潜在偏见。设计/方法/方法：这项范围性综述旨在阐明近期学术出版物中如何研究和讨论与GAI在HEIs中相关的偏见。我们对GAI可能在高等教育领域引起的潜在社会偏见进行了分类。我们的综述包括英文、中文和日文撰写的文章，跨越四个主要数据库，重点关注高等教育和偏见中的GAI使用。结果：我们的研究结果显示，虽然在AI领域关于LLMs偏见和歧视的学术讨论有意义，但大多数涉及高等教育问题的文章都是表面化处理。少数文章在不同情况下识别特定类型的偏见，并且缺乏实证研究。我们综述中的大多数论文主要关注与医学和工程相关的教育和研究领域，有些涉及英语教育。然而，几乎没有关于人文和社会科学的讨论。此外，目前讨论的一个重要部分是以英语为主，主要涉及英语环境。独创性/价值：据我们所知，我们的研究是首次总结高等教育中潜在的社会偏见。这个综述突出了需要进行更深入的研究和实证工作，以了解GAI在教育环境中可能引入或放大的具体偏见，引导更具伦理的AI应用在高等教育中的发展。

更新时间: 2024-07-11 06:25:36

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2311.14381v3

Gated Ensemble of Spatio-temporal Mixture of Experts for Multi-task Learning in Ride-hailing System

Ride-hailing system requires efficient management of dynamic demand and supply to ensure optimal service delivery, pricing strategies, and operational efficiency. Designing spatio-temporal forecasting models separately in a task-wise and city-wise manner to forecast demand and supply-demand gap in a ride-hailing system poses a burden for the expanding transportation network companies. Therefore, a multi-task learning architecture is proposed in this study by developing gated ensemble of spatio-temporal mixture of experts network (GESME-Net) with convolutional recurrent neural network (CRNN), convolutional neural network (CNN), and recurrent neural network (RNN) for simultaneously forecasting these spatio-temporal tasks in a city as well as across different cities. Furthermore, a task adaptation layer is integrated with the architecture for learning joint representation in multi-task learning and revealing the contribution of the input features utilized in prediction. The proposed architecture is tested with data from Didi Chuxing for: (i) simultaneously forecasting demand and supply-demand gap in Beijing, and (ii) simultaneously forecasting demand across Chengdu and Xian. In both scenarios, models from our proposed architecture outperformed the single-task and multi-task deep learning benchmarks and ensemble-based machine learning algorithms.

Updated: 2024-07-11 06:18:12

标题: 门控时空专家混合集成在顺风车系统中的多任务学习

摘要: 出行拼车系统需要有效管理动态需求和供应，以确保最佳的服务交付、定价策略和运营效率。在任务和城市层面分别设计空间-时间预测模型来预测出行拼车系统中的需求和供需差距，给不断扩大的运输网络公司带来了负担。因此，本研究提出了一种多任务学习架构，通过开发具有卷积循环神经网络（CRNN）、卷积神经网络（CNN）和循环神经网络（RNN）的空间-时间混合专家门控集成网络（GESME-Net），同时预测城市内以及跨不同城市的这些空间-时间任务。此外，该架构集成了任务适应层，用于在多任务学习中学习联合表示，并揭示在预测中利用的输入特征的贡献。该提出的架构在滴滴出行的数据上进行了测试：（一）同时预测北京的需求和供需差距，以及（二）同时预测成都和西安的需求。在这两种情况下，我们提出的架构模型表现优于单任务和多任务深度学习基准模型以及集成机器学习算法。

更新时间: 2024-07-11 06:18:12

领域: cs.LG

下载: http://arxiv.org/abs/2012.15408v5

OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.

Updated: 2024-07-11 06:12:04

标题: OPIMA：用于卷积神经网络加速的光学处理内存

摘要: 机器学习（ML）的最新进展突出了需要建立在内存带宽和处理能力之间的计算架构的迫切性。深度神经网络的出现推动传统的冯·诺伊曼体系结构达到极限，因为数据在处理器和内存之间移动的高延迟和能耗成本。克服这一瓶颈的解决方案之一是通过内存中的处理进行计算（PIM），从而限制数据移动及其相关成本。然而，基于DRAM的PIM由于内部数据移动瓶颈和频繁刷新操作的需要，难以实现高吞吐量和能效。在这项工作中，我们介绍了OPIMA，一个基于PIM的ML加速器，设计在光学主存储器中。OPIMA被设计为利用主存储器中的固有大规模并行性，同时通过高速、低能耗的光学计算加速基于卷积神经网络的ML模型。我们对OPIMA进行了全面分析，以指导设计选择和操作机制。此外，我们评估了OPIMA的性能和能耗，将其与传统电子计算系统和新兴的光子PIM体系结构进行了比较。实验结果显示，OPIMA的吞吐量比已知的最佳先前工作高出2.98倍，能效比提高137倍。

更新时间: 2024-07-11 06:12:04

领域: cs.AR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.08205v1

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{https://github.com/datamllab/LongLM}.

Updated: 2024-07-11 06:11:46

标题: 或许长LM: 不调整即可自我扩展的LLM上下文窗口

摘要: 众所周知，长序列语言模型在训练序列长度大于其长度时很难进行泛化。这在推理过程中处理长输入序列时给使用长序列语言模型带来了挑战。在这项工作中，我们认为长序列语言模型本身具有处理长上下文的固有能力，无需微调。为了实现这一目标，我们提出了SelfExtend，通过构建双层注意力信息来扩展长序列语言模型的上下文窗口：分组注意力和邻居注意力。分组注意力捕获了远距离的标记之间的依赖关系，而邻居注意力则捕获了指定范围内相邻标记之间的依赖关系。这两级注意力是在推理过程中基于原始模型的自注意机制计算的。通过轻微的代码修改，我们的SelfExtend可以轻松地扩展现有长序列语言模型的上下文窗口，而无需进行任何微调。我们在多个基准测试上进行了全面的实验，结果表明我们的SelfExtend可以有效地扩展现有长序列语言模型的上下文窗口长度。代码可在\url{https://github.com/datamllab/LongLM}找到。

更新时间: 2024-07-11 06:11:46

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.01325v3

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https://github.com/NineAbyss/GLBench.

Updated: 2024-07-11 06:06:33

标题: GLBench：用于大型语言模型图的综合基准测试

摘要: 大型语言模型（LLMs）的出现彻底改变了我们与图形互动的方式，引领了一种被称为GraphLLM的新范式。尽管近年来GraphLLM方法迅速发展，但由于缺乏具有一致实验协议的基准，该领域的进展和理解仍不清晰。为了弥补这一差距，我们引入了GLBench，这是第一个全面评估GraphLLM方法在监督和零样本情景下的基准。GLBench为不同类别的GraphLLM方法以及传统基线模型如图神经网络提供了公平和彻底的评估。通过对一系列真实数据集进行广泛实验，并采用一致的数据处理和分割策略，我们发现了一些关键结果。首先，在监督设置中，GraphLLM方法优于传统基线模型，LLM作为增强器表现最为稳健。然而，将LLMs用作预测器效果较差，通常会导致无法控制的输出问题。我们还注意到当前GraphLLM方法没有明确的扩展规律。此外，结构和语义对于有效的零样本转移至关重要，我们提出的简单基线甚至可以胜过为零样本情景量身定制的几个模型。基准的数据和代码可在https://github.com/NineAbyss/GLBench找到。

更新时间: 2024-07-11 06:06:33

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.07457v2

Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chromosomes with structural abnormalities. Most existing data-driven methods concentrate on a single chromosome and consider each chromosome independently, overlooking the crucial aspect of homologous chromosomes. In normal cases, homologous chromosomes share identical structures, with the exception that one of them is abnormal. Therefore, we propose an adaptive method to align homologous chromosomes and diagnose structural abnormalities through homologous similarity. Inspired by the process of human expert diagnosis, we incorporate information from multiple pairs of homologous chromosomes simultaneously, aiming to reduce noise disturbance and improve prediction performance. Extensive experiments on real-world datasets validate the effectiveness of our model compared to baselines.

Updated: 2024-07-11 06:04:21

标题: 同源相似性诊断染色体结构异常

摘要: 致病性染色体异常在普通人群中非常常见。虽然数目染色体异常可以被快速准确地检测出来，结构染色体异常则更为复杂，通常需要人类专家通过大量工作来识别。本文重点研究染色体特征建模和识别具有结构异常的染色体。大多数现有的数据驱动方法集中在单个染色体上，并且将每个染色体独立考虑，忽略了同源染色体这一关键方面。在正常情况下，同源染色体具有相同的结构，唯一例外是其中一个异常。因此，我们提出了一种自适应方法来对齐同源染色体，并通过同源相似性诊断结构异常。受到人类专家诊断过程的启发，我们同时整合了来自多对同源染色体的信息，旨在减少噪音干扰并提高预测性能。对真实数据集进行的大量实验验证了我们的模型相对于基线的有效性。

更新时间: 2024-07-11 06:04:21

领域: cs.AI

下载: http://arxiv.org/abs/2407.08204v1

Quantum Generative Diffusion Model: A Fully Quantum-Mechanical Model for Generating Quantum State Ensemble

Classical diffusion models have shown superior generative results. Exploring them in the quantum domain can advance the field of quantum generative learning. This work introduces Quantum Generative Diffusion Model (QGDM) as their simple and elegant quantum counterpart. Through a non-unitary forward process, any target quantum state can be transformed into a completely mixed state that has the highest entropy and maximum uncertainty about the system. A trainable backward process is used to recover the former from the latter. The design requirements for its backward process includes non-unitarity and small parameter count. We introduce partial trace operations to enforce non-unitary and reduce the number of trainable parameters by using a parameter-sharing strategy and incorporating temporal information as an input in the backward process. We present QGDM's resource-efficient version to reduce auxiliary qubits while preserving generative capabilities. QGDM exhibits faster convergence than Quantum Generative Adversarial Network (QGAN) because its adopted convex-based optimization can result in faster convergence. The results of comparing it with QGAN demonstrate its effectiveness in generating both pure and mixed quantum states. It can achieve 53.02% higher fidelity in mixed-state generation than QGAN. The results highlight its great potential to tackle challenging quantum generation tasks.

Updated: 2024-07-11 05:46:04

标题: 量子生成扩散模型：一种完全量子机械模型用于生成量子态集合

摘要: 经典扩散模型显示出更好的生成结果。将它们应用于量子领域可以推动量子生成学习的发展。本文介绍了量子生成扩散模型（QGDM）作为它们简单而优雅的量子对应物。通过非幺正的前向过程，任何目标量子态都可以被转化为一个具有最高熵和最大系统不确定性的完全混合态。一个可训练的反向过程被用来从混合态中恢复原来的状态。其反向过程的设计要求包括非幺正性和较少的参数数量。我们引入了偏迹操作来强制非幺正性，并通过使用参数共享策略和将时间信息作为反向过程的输入来减少可训练参数的数量。我们提出了QGDM的资源高效版本，以减少辅助量子比特同时保持生成能力。QGDM比量子生成对抗网络（QGAN）表现出更快的收敛速度，因为其采用的基于凸优化的方法可以导致更快的收敛。与QGAN的比较结果展示了其在生成纯量子态和混合量子态方面的有效性。它在混合态生成方面可以比QGAN的保真度高出53.02%。这些结果突显了它在应对具有挑战性的量子生成任务方面的巨大潜力。

更新时间: 2024-07-11 05:46:04

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2401.07039v3

Quantum Curriculum Learning

Quantum machine learning (QML) requires significant quantum resources to achieve quantum advantage. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. We define the curriculum criteria based on the data density ratio between tasks to determine the curriculum order. We also implement a dynamic learning schedule to emphasize the significance of quantum data in optimizing the loss function. Empirical evidence shows that Q-CurL significantly enhances the training convergence and the generalization for unitary learning tasks and improves the robustness of quantum phase recognition tasks. Our framework provides a general learning strategy, bringing QML closer to realizing practical advantages.

Updated: 2024-07-11 05:42:23

标题: 量子课程学习

摘要: 量子机器学习（QML）需要大量的量子资源才能实现量子优势。研究应该优先考虑量子架构的高效设计和优化资源使用的学习策略的发展。我们提出了一个称为量子课程学习（Q-CurL）的框架，用于量子数据，其中课程在将学习模型引入更具挑战性的任务之前引入更简单的任务或数据。我们根据任务之间的数据密度比率定义课程标准，以确定课程顺序。我们还实施了一个动态学习计划，强调量子数据在优化损失函数方面的重要性。经验证据显示，Q-CurL显著增强了对酉学习任务的训练收敛性和泛化能力，并改善了量子相位识别任务的鲁棒性。我们的框架提供了一种通用的学习策略，使QML更接近实现实际优势。

更新时间: 2024-07-11 05:42:23

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.02419v2

SoupLM: Model Integration in Large Language and Multi-Modal Models

Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The training cost and complexity for such LLM variants grow rapidly. In this study, we propose to use a soup strategy to assemble these LLM variants into a single well-generalized multimodal LLM (SoupLM) in a cost-efficient manner. Assembling these LLM variants efficiently brings knowledge and specialities trained from different domains and data modalities into an integrated one (e.g., chatbot speciality from user-shared conversations for Vicuna, and visual capacity from vision-language data for LLaVA), therefore, to avoid computing costs of repetitive training on several different domains. We propose series of soup strategies to systematically benchmark performance gains across various configurations, and probe the soup behavior across base models in the interpolation space.

Updated: 2024-07-11 05:38:15

标题: SoupLM：大型语言和多模型中的模型集成

摘要: 培训大型语言模型（LLMs）和多模态LLMs需要大量计算资源，现有的公开可用的LLMs通常是在各种任务上使用私人策划的数据集进行预训练的。例如，LLaMA、Vicuna和LLaVA是三种使用LLaMA基础模型训练的LLM变体，它们使用非常不同的训练配方、任务和数据模态。这些LLM变体的训练成本和复杂性迅速增长。在这项研究中，我们提出使用一种汤策略以一种成本效益的方式将这些LLM变体组装成一个单一的良好泛化的多模态LLM（SoupLM）。高效地组装这些LLM变体将来自不同领域和数据模态的知识和特长集成到一个整体中（例如，从用户共享对话中的聊天机器人特长到Vicuna，以及从视觉语言数据中的视觉能力到LLaVA），因此，避免在多个不同领域上进行重复训练的计算成本。我们提出了一系列汤策略，系统地评估各种配置的性能增益，并探索基础模型在插值空间中的汤行为。

更新时间: 2024-07-11 05:38:15

领域: cs.AI

下载: http://arxiv.org/abs/2407.08196v1

A Text-to-Game Engine for UGC-Based Role-Playing Games

The shift from professionally generated content (PGC) to user-generated content (UGC) has revolutionized various media formats, from text to video. With the rapid advancements in generative AI, a similar shift is set to transform the game industry, particularly in the realm of role-playing games (RPGs). This paper introduces a new framework for a text-to-game engine that utilizes foundation models to convert simple textual inputs into complex, interactive RPG experiences. The engine dynamically renders the game story in a multi-modal format and adjusts the game character, environment, and mechanics in real-time in response to player actions. Using this framework, we developed the "Zagii" game engine, which has successfully supported hundreds of RPG games across a diverse range of genres and facilitated tens of thousands of online user gameplay instances. This validates the effectiveness of our frame-work. Our work showcases the potential for a more open and democratized gaming paradigm, highlighting the transformative impact of generative AI on the game life cycle.

Updated: 2024-07-11 05:33:19

标题: 一个基于UGC的角色扮演游戏文本到游戏引擎

摘要: 从专业生成内容（PGC）向用户生成内容（UGC）的转变已经彻底改变了各种媒体格式，从文本到视频。随着生成AI的快速发展，类似的转变定将改变游戏产业，特别是在角色扮演游戏（RPG）领域。本文介绍了一种新的文本游戏引擎框架，利用基础模型将简单的文本输入转换为复杂的互动RPG体验。该引擎动态呈现游戏故事以多模态格式，并根据玩家的操作实时调整游戏角色、环境和机制。通过这个框架，我们开发了“Zagii”游戏引擎，成功支持了数百款跨越各种流派的RPG游戏，并促进了成千上万次在线用户游玩。这验证了我们框架的有效性。我们的工作展示了更加开放和民主化的游戏范式的潜力，突显了生成AI对游戏生命周期的转变影响。

更新时间: 2024-07-11 05:33:19

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2407.08195v1

Approximating G(t)/GI/1 queues with deep learning

In this paper, we apply a supervised machine-learning approach to solve a fundamental problem in queueing theory: estimating the transient distribution of the number in the system for a G(t)/GI/1. We develop a neural network mechanism that provides a fast and accurate predictor of these distributions for moderate horizon lengths and practical settings. It is based on using a Recurrent Neural Network (RNN) architecture based on the first several moments of the time-dependant inter-arrival and the stationary service time distributions; we call it the Moment-Based Recurrent Neural Network (RNN) method (MBRNN ). Our empirical study suggests MBRNN requires only the first four inter-arrival and service time moments. We use simulation to generate a substantial training dataset and present a thorough performance evaluation to examine the accuracy of our method using two different test sets. We show that even under the configuration with the worst performance errors, the mean number of customers over the entire timeline has an error of less than 3%. While simulation modeling can achieve high accuracy, the advantage of the MBRNN over simulation is runtime, while the MBRNN analyzes hundreds of systems within a fraction of a second. This paper focuses on a G(t)/GI/1; however, the MBRNN approach demonstrated here can be extended to other queueing systems, as the training data labeling is based on simulations (which can be applied to more complex systems) and the training is based on deep learning, which can capture very complex time sequence tasks. In summary, the MBRNN can potentially revolutionize our ability to perform transient analyses of queueing systems.

Updated: 2024-07-11 05:25:45

标题: 用深度学习逼近G(t)/GI/1队列

摘要: 在这篇论文中，我们应用监督机器学习方法来解决排队理论中的一个基本问题：估计G(t)/GI/1系统中的人数瞬时分布。我们开发了一种神经网络机制，可以针对中等时间范围和实际设置提供快速准确的这些分布预测器。该方法基于使用基于时间相关的到达间隔和稳态服务时间分布的前几个时刻的循环神经网络（RNN）结构；我们称之为基于时刻的循环神经网络（RNN）方法（MBRNN）。我们的实证研究表明，MBRNN仅需要前四个到达间隔和服务时间时刻。我们使用模拟生成大量训练数据集，并进行了彻底的性能评估，以检验我们方法的准确度，使用了两个不同的测试集。我们展示即使在性能错误最严重的配置下，整个时间线上的顾客平均数的误差也小于3%。虽然模拟建模可以实现高准确性，但MBRNN相对于模拟的优势在于运行时间，因为MBRNN可以在一小部分时间内分析数百个系统。本文重点研究了G(t)/GI/1系统；然而，这里展示的MBRNN方法可以扩展到其他排队系统，因为训练数据标记基于模拟（可以应用于更复杂的系统），训练基于深度学习，可以捕捉非常复杂的时间序列任务。总之，MBRNN有可能彻底改变我们进行排队系统瞬时分析的能力。

更新时间: 2024-07-11 05:25:45

领域: cs.LG,math.PR

下载: http://arxiv.org/abs/2407.08765v1

Automatic Outlier Rectification via Optimal Transport

In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize the optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We demonstrate the effectiveness of our approach over conventional approaches in simulations and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.

Updated: 2024-07-11 05:22:42

标题: 通过最优输运实现自动异常值矫正

摘要: 在这篇论文中，我们提出了一个新颖的概念框架，利用具有凹凸成本函数的最优输运来检测异常值。传统的异常值检测方法通常使用两阶段程序：首先检测和移除异常值，然后在清洁数据上进行估计。然而，这种方法并未将异常值的移除与估计任务联系起来，留下改进的空间。为了解决这一局限性，我们提出了一个自动异常值矫正机制，将矫正和估计整合在一个联合优化框架中。我们首先利用具有凹凸成本函数的最优输运距离，构建一个概率分布空间中的矫正集。然后，在矫正集中选择最佳分布来执行估计任务。值得注意的是，我们在本文中引入的凹凸成本函数是使我们的估计器在优化过程中有效识别异常值的关键。我们通过模拟和实证分析展示了我们的方法在均值估计、最小绝对回归以及期权隐含波动率曲面拟合方面相对于传统方法的有效性。

更新时间: 2024-07-11 05:22:42

领域: stat.ML,cs.LG,math.OC,stat.ME

下载: http://arxiv.org/abs/2403.14067v2

ARCO:Adaptive Multi-Agent Reinforcement Learning-Based Hardware/Software Co-Optimization Compiler for Improved Performance in DNN Accelerator Design

This paper presents ARCO, an adaptive Multi-Agent Reinforcement Learning (MARL)-based co-optimizing compilation framework designed to enhance the efficiency of mapping machine learning (ML) models - such as Deep Neural Networks (DNNs) - onto diverse hardware platforms. The framework incorporates three specialized actor-critic agents within MARL, each dedicated to a distinct aspect of compilation/optimization at an abstract level: one agent focuses on hardware, while two agents focus on software optimizations. This integration results in a collaborative hardware/software co-optimization strategy that improves the precision and speed of DNN deployments. Concentrating on high-confidence configurations simplifies the search space and delivers superior performance compared to current optimization methods. The ARCO framework surpasses existing leading frameworks, achieving a throughput increase of up to 37.95% while reducing the optimization time by up to 42.2% across various DNNs.

Updated: 2024-07-11 05:22:04

标题: ARCO：用于改进深度神经网络加速器设计性能的自适应多智能体强化学习硬件/软件协同优化编译器

摘要: 本文介绍了ARCO，一种自适应的基于多智能体强化学习（MARL）的共优化编译框架，旨在增强将机器学习（ML）模型（如深度神经网络（DNNs））映射到不同硬件平台的效率。该框架在MARL中集成了三个专门的演员-评论者智能体，每个智能体专门负责不同抽象级别的编译/优化方面：一个智能体专注于硬件，而另外两个智能体专注于软件优化。这种整合产生了一种协作的硬件/软件共优化策略，提高了DNN部署的精度和速度。专注于高置信度配置简化了搜索空间，并与当前优化方法相比提供了更优越的性能。ARCO框架超越了现有的主要框架，实现了多个DNN的吞吐量增加高达37.95%，同时将优化时间减少高达42.2%。

更新时间: 2024-07-11 05:22:04

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2407.08192v1

fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework for learning fair fine-tuned BERT series models by erasing the protected sensitive information via semantic and fairness-aware perturbations generated by a generative adversarial network. Through extensive qualitative and quantitative experiments on two real-world tasks, we demonstrate the great superiority of fairBERTs in mitigating unfairness while maintaining the model utility. We also verify the feasibility of transferring adversarial components in fairBERTs to other conventionally trained BERT-like models for yielding fairness improvements. Our findings may shed light on further research on building fairer fine-tuned PLMs.

Updated: 2024-07-11 05:13:38

标题: 公平BERTs：通过语义和公平感知扰动消除敏感信息

摘要: 预训练语言模型（PLMs）彻底改变了自然语言处理研究和应用。然而，PLMs中编码的陈旧偏见（例如性别和种族歧视）引发了对PLMs的负面伦理影响，严重限制了它们的广泛应用。为了解决上述不公平问题，我们提出了fairBERTs，一个通过生成对抗网络生成的语义和公平感知扰动来消除受保护的敏感信息，从而学习公平微调的BERT系列模型的通用框架。通过对两个真实任务进行大量定性和定量实验，我们展示了fairBERTs在减轻不公平性方面的巨大优势，同时保持模型的实用性。我们还验证了将fairBERTs中的对抗成分转移给其他传统训练的类似BERT模型以获得公平性改进的可行性。我们的发现可能为进一步研究构建更公平微调的PLMs提供启示。

更新时间: 2024-07-11 05:13:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08189v1

Position: Measure Dataset Diversity, Don't Just Claim It

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these terms lack clear definitions and validation. Our research explores the implications of this issue by analyzing "diversity" across 135 image and text datasets. Drawing from social sciences, we apply principles from measurement theory to identify considerations and offer recommendations for conceptualizing, operationalizing, and evaluating diversity in datasets. Our findings have broader implications for ML research, advocating for a more nuanced and precise approach to handling value-laden properties in dataset construction.

Updated: 2024-07-11 05:13:27

标题: 立场：衡量数据集的多样性，而不仅仅是声称它

摘要: 机器学习（ML）数据集通常被视为中立，但实质上包含了抽象和有争议的社会构建。数据集的策划者经常使用带有价值观的术语，如多样性、偏见和质量来描述数据集。尽管这些术语很普遍，但它们缺乏清晰的定义和验证。我们的研究通过分析135个图像和文本数据集中的“多样性”，探讨了这个问题的影响。借鉴社会科学，我们应用测量理论的原则来识别考虑因素，并提出概念化、操作化和评估数据集中多样性的建议。我们的研究结果对机器学习研究具有更广泛的影响，倡导在数据集构建中处理带有价值观属性的更加细致和精确的方法。

更新时间: 2024-07-11 05:13:27

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2407.08188v1

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

The Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an efficient technology for target retrieval using electroencephalography (EEG) signals. The performance improvement of traditional decoding methods relies on a substantial amount of training data from new test subjects, which increases preparation time for BCI systems. Several studies introduce data from existing subjects to reduce the dependence of performance improvement on data from new subjects, but their optimization strategy based on adversarial learning with extensive data increases training time during the preparation procedure. Moreover, most previous methods only focus on the single-view information of EEG signals, but ignore the information from other views which may further improve performance. To enhance decoding performance while reducing preparation time, we propose a Temporal-Spectral fusion transformer with Subject-specific Adapter (TSformer-SA). Specifically, a cross-view interaction module is proposed to facilitate information transfer and extract common representations across two-view features extracted from EEG temporal signals and spectrogram images. Then, an attention-based fusion module fuses the features of two views to obtain comprehensive discriminative features for classification. Furthermore, a multi-view consistency loss is proposed to maximize the feature similarity between two views of the same EEG signal. Finally, we propose a subject-specific adapter to rapidly transfer the knowledge of the model trained on data from existing subjects to decode data from new subjects. Experimental results show that TSformer-SA significantly outperforms comparison methods and achieves outstanding performance with limited training data from new subjects. This facilitates efficient decoding and rapid deployment of BCI systems in practical use.

Updated: 2024-07-11 05:07:54

标题: 一种具有特定主题适配器的时间-频谱融合变压器，用于增强RSVP-BCI解码

摘要: 基于快速连续视觉呈现（RSVP）的脑机接口（BCI）是一种利用脑电图（EEG）信号进行目标检索的高效技术。传统解码方法的性能改进依赖于来自新测试对象的大量训练数据，这增加了BCI系统的准备时间。一些研究介绍了现有受试者的数据，以减少性能改进对来自新受试者数据的依赖，但基于对抗学习的优化策略增加了准备过程中的训练时间。此外，大多数先前的方法只关注脑电图信号的单一视图信息，而忽略了其他视图的信息，这可能进一步提高性能。为了提高解码性能并减少准备时间，我们提出了一种带有特定于受试者适配器的时间-频谱融合变压器（TSformer-SA）。具体来说，提出了一个跨视图交互模块，以促进信息传递并从脑电图时间信号和频谱图像提取的两个视图特征中提取共同表示。然后，一个基于注意力的融合模块融合两个视图的特征，获得用于分类的全面区分特征。此外，提出了一种多视图一致性损失，用于最大化同一脑电信号的两个视图之间的特征相似性。最后，我们提出了一个特定于受试者的适配器，快速将在现有受试者数据上训练的模型的知识转移给解码新受试者数据。实验结果表明，TSformer-SA明显优于比较方法，并在有限的新受试者训练数据下取得了出色的性能。这有助于在实际使用中高效解码并快速部署BCI系统。

更新时间: 2024-07-11 05:07:54

领域: cs.HC,cs.AI,68T07,I.5.4

下载: http://arxiv.org/abs/2401.06340v2

Automatic Generation of Web Censorship Probe Lists

Domain probe lists--used to determine which URLs to probe for Web censorship--play a critical role in Internet censorship measurement studies. Indeed, the size and accuracy of the domain probe list limits the set of censored pages that can be detected; inaccurate lists can lead to an incomplete view of the censorship landscape or biased results. Previous efforts to generate domain probe lists have been mostly manual or crowdsourced. This approach is time-consuming, prone to errors, and does not scale well to the ever-changing censorship landscape. In this paper, we explore methods for automatically generating probe lists that are both comprehensive and up-to-date for Web censorship measurement. We start from an initial set of 139,957 unique URLs from various existing test lists consisting of pages from a variety of languages to generate new candidate pages. By analyzing content from these URLs (i.e., performing topic and keyword extraction), expanding these topics, and using them as a feed to search engines, our method produces 119,255 new URLs across 35,147 domains. We then test the new candidate pages by attempting to access each URL from servers in eleven different global locations over a span of four months to check for their connectivity and potential signs of censorship. Our measurements reveal that our method discovered over 1,400 domains--not present in the original dataset--we suspect to be blocked. In short, automatically updating probe lists is possible, and can help further automate censorship measurements at scale.

Updated: 2024-07-11 05:04:52

标题: Web审查探针列表的自动生成

摘要: 域名探测列表——用于确定应该探测哪些网址以进行网络审查——在互联网审查测量研究中发挥关键作用。实际上，域名探测列表的大小和准确性限制了可以检测到的被审查页面集合；不准确的列表可能导致对审查景观的不完整视图或有偏见的结果。先前生成域名探测列表的努力大多是手动的或众包的。这种方法耗时，容易出错，并且无法很好地适应不断变化的审查环境。在本文中，我们探索了用于自动生成既全面又及时的Web审查测量探测列表的方法。我们从包含各种语言页面的各种现有测试列表中的139,957个唯一URL的初始集合开始生成新的候选页面。通过分析这些URL的内容（即执行主题和关键字提取），扩展这些主题，并将它们作为搜索引擎的输入，我们的方法在35,147个域中生成了119,255个新的URL。然后，我们通过尝试从全球11个不同位置的服务器访问每个URL，在四个月的时间跨度内检查它们的连通性和潜在的审查迹象。我们的测量结果显示，我们的方法发现了超过1,400个域——原始数据集中没有的——我们怀疑被封锁。简而言之，自动更新探测列表是可能的，并且可以帮助进一步自动化大规模的审查测量。

更新时间: 2024-07-11 05:04:52

领域: cs.CR,cs.CL,cs.CY,cs.NI

下载: http://arxiv.org/abs/2407.08185v1

CoGS: Causality Constrained Counterfactual Explanations using goal-directed ASP

Machine learning models are increasingly used in areas such as loan approvals and hiring, yet they often function as black boxes, obscuring their decision-making processes. Transparency is crucial, and individuals need explanations to understand decisions, especially for the ones not desired by the user. Ethical and legal considerations require informing individuals of changes in input attribute values (features) that could lead to a desired outcome for the user. Our work aims to generate counterfactual explanations by considering causal dependencies between features. We present the CoGS (Counterfactual Generation with s(CASP)) framework that utilizes the goal-directed Answer Set Programming system s(CASP) to generate counterfactuals from rule-based machine learning models, specifically the FOLD-SE algorithm. CoGS computes realistic and causally consistent changes to attribute values taking causal dependencies between them into account. It finds a path from an undesired outcome to a desired one using counterfactuals. We present details of the CoGS framework along with its evaluation.

Updated: 2024-07-11 04:50:51

标题: CoGS：使用面向目标的ASP进行因果受限反事实解释

摘要: 机器学习模型在贷款批准和招聘等领域的应用越来越多，然而它们通常作为黑匣子，模糊了它们的决策过程。透明度至关重要，个人需要解释来理解决策，尤其是对用户不希望的决策。伦理和法律考虑要求告知个人输入属性值（特征）的变化，这些变化可能导致用户期望的结果。我们的工作旨在通过考虑特征之间的因果依赖关系生成反事实解释。我们提出了利用目标导向答案集编程系统s(CASP)来生成反事实的CoGS（Counterfactual Generation with s(CASP)）框架，该框架特别适用于基于规则的机器学习模型，特别是FOLD-SE算法。CoGS考虑了因果依赖关系，计算出对属性值进行现实和因果一致的变化。它通过反事实找到从不希望的结果到期望结果的路径。我们提供了CoGS框架的详细信息以及评估结果。

更新时间: 2024-07-11 04:50:51

领域: cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2407.08179v1

Tuning Vision-Language Models with Candidate Labels by Prompt Alignment

Vision-language models (VLMs) can learn high-quality representations from a large-scale training dataset of image-text pairs. Prompt learning is a popular approach to fine-tuning VLM to adapt them to downstream tasks. Despite the satisfying performance, a major limitation of prompt learning is the demand for labelled data. In real-world scenarios, we may only obtain candidate labels (where the true label is included) instead of the true labels due to data privacy or sensitivity issues. In this paper, we provide the first study on prompt learning with candidate labels for VLMs. We empirically demonstrate that prompt learning is more advantageous than other fine-tuning methods, for handling candidate labels. Nonetheless, its performance drops when the label ambiguity increases. In order to improve its robustness, we propose a simple yet effective framework that better leverages the prior knowledge of VLMs to guide the learning process with candidate labels. Specifically, our framework disambiguates candidate labels by aligning the model output with the mixed class posterior jointly predicted by both the learnable and the handcrafted prompt. Besides, our framework can be equipped with various off-the-shelf training objectives for learning with candidate labels to further improve their performance. Extensive experiments demonstrate the effectiveness of our proposed framework.

Updated: 2024-07-11 04:46:24

标题: 通过提示对齐来调整具有候选标签的视觉-语言模型

摘要: 视觉语言模型（VLMs）可以从一个大规模的图像-文本配对训练数据集中学习高质量的表示。提示学习是一种流行的方法，用于微调VLM以适应下游任务。尽管表现令人满意，但提示学习的一个主要限制是对标记数据的需求。在现实场景中，由于数据隐私或敏感性问题，我们可能只能获得候选标签（包含真实标签）而不是真实标签。在本文中，我们首次对VLM使用候选标签进行提示学习进行了研究。我们通过实验证明，与其他微调方法相比，提示学习在处理候选标签时更具优势。然而，当标签模糊度增加时，其性能会下降。为了提高其鲁棒性，我们提出了一个简单而有效的框架，更好地利用VLM的先验知识来引导使用候选标签进行学习过程。具体来说，我们的框架通过将模型输出与由可学习的提示和手工制作的提示共同预测的混合类后验进行对齐，从而消除候选标签的歧义。此外，我们的框架可以配备各种现成的训练目标，用于学习候选标签以进一步提高其性能。大量实验证明了我们提出的框架的有效性。

更新时间: 2024-07-11 04:46:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07638v2

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

We present a novel time series anomaly detection method that achieves excellent detection accuracy while offering a superior level of explainability. Our proposed method, TimeVQVAE-AD, leverages masked generative modeling adapted from the cutting-edge time series generation method known as TimeVQVAE. The prior model is trained on the discrete latent space of a time-frequency domain. Notably, the dimensional semantics of the time-frequency domain are preserved in the latent space, enabling us to compute anomaly scores across different frequency bands, which provides a better insight into the detected anomalies. Additionally, the generative nature of the prior model allows for sampling likely normal states for detected anomalies, enhancing the explainability of the detected anomalies through counterfactuals. Our experimental evaluation on the UCR Time Series Anomaly archive demonstrates that TimeVQVAE-AD significantly surpasses the existing methods in terms of detection accuracy and explainability. We provide our implementation on GitHub: \url{https://github.com/ML4ITS/TimeVQVAE-AnomalyDetection}.

Updated: 2024-07-11 04:45:41

标题: 可解释的时间序列异常检测：使用遮蔽潜在生成建模进行解释

摘要: 我们提出了一种新颖的时间序列异常检测方法，能够在提供卓越的检测准确性的同时具有更高水平的可解释性。我们提出的方法，TimeVQVAE-AD，利用了来自最前沿的时间序列生成方法TimeVQVAE的掩蔽生成建模。先前的模型是在时间-频率域的离散潜在空间上进行训练的。值得注意的是，时间-频率域的维度语义在潜在空间中得以保留，使我们能够计算跨不同频段的异常分数，从而更好地了解检测到的异常。此外，先前模型的生成性质允许对检测到的异常进行采样，增强了通过反事实推理来解释检测到的异常。我们在UCR时间序列异常存档上的实验评估表明，TimeVQVAE-AD在检测准确性和可解释性方面明显优于现有方法。我们在GitHub上提供了我们的实现：\url{https://github.com/ML4ITS/TimeVQVAE-AnomalyDetection}。

更新时间: 2024-07-11 04:45:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.12550v4

Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software

By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engineering, a strategic response to the anticipated FM crisis with principled engineering methodologies. FM engineering aims to mitigate potential issues in FM development and application through the introduction of declarative, automated, and unified programming interfaces for both data and model management, reducing the complexities involved in working with FMs by providing a more structured and intuitive process for developers. Through the establishment of FM engineering, we aim to provide a robust, automated, and extensible framework that addresses the imminent challenges, and discovering new research opportunities for the software engineering field.

Updated: 2024-07-11 04:40:02

标题: 基础模型工程：工程基础模型与工程软件一样

摘要: 通过将数据和模型视为源代码，基础模型（FMs）成为一种新型软件。反映了软件危机的概念，FMs日益复杂的特性使得FM危机在未来十年成为一个切实关注的问题，需要软件工程领域提供新的理论和方法论。在本文中，我们概述了引入基础模型（FM）工程的愿景，这是对预期的FM危机做出的战略性响应，采用原则性的工程方法。FM工程旨在通过引入声明式、自动化和统一的数据和模型管理编程接口，减少与FMs工作中的复杂性，为开发人员提供更结构化和直观的工作流程，以缓解FM开发和应用中可能出现的问题。通过建立FM工程，我们旨在提供一个强大、自动化和可扩展的框架，解决即将面临的挑战，并发现软件工程领域的新研究机会。

更新时间: 2024-07-11 04:40:02

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08176v1

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.

Updated: 2024-07-11 04:37:47

标题: 同质化工具的自适应选择：在RAG场景中的实例化

摘要: 目前关于工具学习的研究主要集中在从众多选项中选择最有效的工具，往往忽视了成本效益，这是人类问题解决中的一个关键因素。在本文中，我们通过预测同类工具的性能和完成特定任务所需的相关成本，来解决同类工具的选择问题。然后以一种成本效益的方式将查询分配给最佳工具。我们的实验结果表明，与强基准方法相比，我们的方法在更低的成本下实现了更高的性能。

更新时间: 2024-07-11 04:37:47

领域: cs.AI

下载: http://arxiv.org/abs/2406.12429v2

Faster Machine Unlearning via Natural Gradient Descent

We address the challenge of efficiently and reliably deleting data from machine learning models trained using Empirical Risk Minimization (ERM), a process known as machine unlearning. To avoid retraining models from scratch, we propose a novel algorithm leveraging Natural Gradient Descent (NGD). Our theoretical framework ensures strong privacy guarantees for convex models, while a practical Min/Max optimization algorithm is developed for non-convex models. Comprehensive evaluations show significant improvements in privacy, computational efficiency, and generalization compared to state-of-the-art methods, advancing both the theoretical and practical aspects of machine unlearning.

Updated: 2024-07-11 04:19:28

标题: 自然梯度下降加速机器遗忘

摘要: 我们面临的挑战是如何高效可靠地从使用经验风险最小化（ERM）训练的机器学习模型中删除数据，这个过程被称为机器遗忘。为了避免从头开始重新训练模型，我们提出了一种利用自然梯度下降（NGD）的新算法。我们的理论框架确保了凸模型的强隐私保证，同时为非凸模型开发了一个实用的最小/最大优化算法。综合评估显示，与最先进的方法相比，我们的方法在隐私、计算效率和泛化方面都取得了显著的改进，推动了机器遗忘的理论和实践方面的发展。

更新时间: 2024-07-11 04:19:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08169v1

Missile detection and destruction robot using detection algorithm

This research is based on the present missile detection technologies in the world and the analysis of these technologies to find a cost effective solution to implement the system in Bangladesh. The paper will give an idea of the missile detection technologies using the electro-optical sensor and the pulse doppler radar. The system is made to detect the target missile. Automatic detection and destruction with the help of ultrasonic sonar, a metal detector sensor, and a smoke detector sensor. The system is mainly based on an ultrasonic sonar sensor. It has a transducer, a transmitter, and a receiver. Transducer is connected with the connected with controller. When it detects an object by following the algorithm, it finds its distance and angle. It can also assure whether the system can destroy the object or not by using another algorithm's simulation.

Updated: 2024-07-11 04:18:17

标题: 使用检测算法的导弹检测和摧毁机器人

摘要: 这项研究基于世界上当前的导弹探测技术，分析这些技术，寻找一种成本有效的解决方案来在孟加拉国实施该系统。本文将介绍利用电光传感器和脉冲多普勒雷达的导弹探测技术。该系统旨在探测目标导弹。利用超声声纳、金属探测器传感器和烟雾探测器传感器进行自动探测和摧毁。该系统主要基于超声声纳传感器。它具有换能器、发射器和接收器。换能器与控制器连接。当它通过遵循算法检测到一个物体时，它会找到物体的距离和角度。它还可以通过使用另一个算法的模拟来确保系统是否可以摧毁该物体。

更新时间: 2024-07-11 04:18:17

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.07452v2

Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

The electroretinogram (ERG) is a clinical test that records the retina's electrical response to light. The ERG is a promising way to study different neurodevelopmental and neurodegenerative disorders, including autism spectrum disorder (ASD) - a neurodevelopmental condition that impacts language, communication, and reciprocal social interactions. However, in heterogeneous populations, such as ASD, where the ability to collect large datasets is limited, the application of artificial intelligence (AI) is complicated. Synthetic ERG signals generated from real ERG recordings carry similar information as natural ERGs and, therefore, could be used as an extension for natural data to increase datasets so that AI applications can be fully utilized. As proof of principle, this study presents a Generative Adversarial Network capable of generating synthetic ERG signals of children with ASD and typically developing control individuals. We applied a Time Series Transformer and Visual Transformer with Continuous Wavelet Transform to enhance classification results on the extended synthetic signals dataset. This approach may support classification models in related psychiatric conditions where the ERG may help classify disorders.

Updated: 2024-07-11 04:11:52

标题: 使用条件生成对抗网络生成合成电睛电图信号以增强自闭症谱系障碍的分类

摘要: 电子视网膜图（ERG）是一种临床检测方法，记录视网膜对光的电响应。 ERG是研究不同神经发育和神经退行性疾病的一种有前途的方法，包括自闭症谱系障碍（ASD）-一种影响语言、沟通和互惠社交互动的神经发育状况。然而，在异质人群中，如ASD，由于收集大型数据集的能力有限，人工智能（AI）的应用变得复杂。从真实ERG记录生成的合成ERG信号携带与自然ERG类似的信息，因此可以作为自然数据的扩展，以增加数据集，使得AI应用能够充分利用。作为原则证明，本研究提出了一种能够生成ASD患儿和通常发展正常个体合成ERG信号的生成对抗网络。我们应用了时间序列转换器和可视化转换器与连续小波变换来提高对扩展合成信号数据集的分类结果。这种方法可能支持相关精神疾病条件中的分类模型，其中ERG可能有助于分类疾病。

更新时间: 2024-07-11 04:11:52

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.08166v1

From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models

Tabular data is foundational to predictive modeling in various crucial industries, including healthcare, finance, retail, sustainability, etc. Despite the progress made in specialized models, there is an increasing demand for universal models that can transfer knowledge, generalize from limited data, and follow human instructions. These are challenges that current tabular deep learning approaches have not fully tackled. Here we introduce Generative Tabular Learning (GTL), a novel framework that integrates the advanced functionalities of large language models (LLMs)-such as prompt-based zero-shot generalization and in-context learning-into tabular deep learning. GTL capitalizes on the pre-training of LLMs on diverse tabular data, enhancing their understanding of domain-specific knowledge, numerical sequences, and statistical dependencies critical for accurate predictions. Our empirical study spans 384 public datasets, rigorously analyzing GTL's convergence and scaling behaviors and assessing the impact of varied data templates. The GTL-enhanced LLaMA-2 model demonstrates superior zero-shot and in-context learning capabilities across numerous classification and regression tasks. Notably, it achieves this without fine-tuning, outperforming traditional methods and rivaling state-of-the-art models like GPT-4 in certain cases. Through GTL, we not only foster a deeper integration of LLMs' sophisticated abilities into tabular data comprehension and application but also offer a new training resource and a test bed for LLMs to enhance their ability to comprehend tabular data. To facilitate reproducible research, we release our code, data, and model checkpoints at https://github.com/microsoft/Industrial-Foundation-Models.

Updated: 2024-07-11 04:09:19

标题: 从监督到生成：一种新的范式用于具有大型语言模型的表格深度学习

摘要: 表格数据是各个重要行业中预测建模的基础，包括医疗保健、金融、零售、可持续性等。尽管专门模型取得了进展，但对于可以转移知识、从有限数据中概括并遵循人类指令的通用模型的需求正在增加。这些是当前表格深度学习方法尚未完全解决的挑战。在这里，我们介绍了生成式表格学习（GTL），这是一个将大型语言模型（LLMs）的先进功能（如基于提示的零样本泛化和上下文学习）整合到表格深度学习中的新框架。GTL利用LLMs在各种表格数据上的预训练，增强了它们对于领域特定知识、数字序列和统计依赖的理解，这对准确预测至关重要。我们的实证研究涵盖了384个公共数据集，严格分析了GTL的收敛性和扩展行为，并评估了不同数据模板的影响。GTL增强的LLaMA-2模型在众多分类和回归任务中展现了优越的零样本和上下文学习能力。值得注意的是，在某些情况下，它可以在没有微调的情况下胜过传统方法，并与GPT-4等最先进模型媲美。通过GTL，我们不仅促进了LLMs复杂能力与表格数据理解和应用的深度整合，还为LLMs提供了一个新的训练资源和测试基础，以提升它们理解表格数据的能力。为了促进可重现性研究，我们在https://github.com/microsoft/Industrial-Foundation-Models发布了我们的代码、数据和模型检查点。

更新时间: 2024-07-11 04:09:19

领域: cs.LG

下载: http://arxiv.org/abs/2310.07338v4

SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we investigate the brittleness of foundation models in the dimensions of semantics and multilinguality. Our analyses span both open-sourced and closed models, leading to empirical results across classic NLP tasks, reasoning, and cultural comprehension. Key findings indicate (1) Most models exhibit varied behavior when given paraphrased instructions. (2) Many models still suffer from exposure bias (e.g., positional bias, majority label bias). (3) For questions rooted in factual, scientific, and commonsense knowledge, consistent responses are expected across multilingual queries that are semantically equivalent. Yet, most models surprisingly demonstrate inconsistent performance on these queries. (4) Multilingually-trained models have not attained "balanced multilingual" capabilities. Our endeavors underscore the need for more generalizable semantic representations and enhanced multilingual contextualization. SeaEval can serve as a launchpad for more thorough investigations and evaluations for multilingual and multicultural scenarios.

Updated: 2024-07-11 04:01:41

标题: SeaEval用于多语言基础模型：从跨语言对齐到文化推理

摘要: 我们提出SeaEval，这是一个用于多语言基础模型的基准测试。除了表征这些模型如何理解和推理自然语言之外，我们还调查它们对文化实践、细微差别和价值观的理解能力。除了标准准确度指标外，我们还研究了基础模型在语义和多语言性维度上的脆弱性。我们的分析涵盖了开源和封闭模型，得出了跨经典NLP任务、推理和文化理解的实证结果。主要发现包括：（1）大多数模型在给出释义指令时表现出不同行为。 (2) 许多模型仍然受到暴露偏见的影响（例如，位置偏见、大多数标签偏见）。（3）对于根植于事实、科学和常识知识的问题，我们期望在语义上等效的多语言查询中获得一致的回应。然而，大多数模型在这些查询上令人惊讶地表现出不一致的性能。 (4) 在多语言训练的模型尚未达到“平衡的多语言”能力。我们的努力强调了对更具普遍性的语义表示和增强的多语言上下文化的需求。SeaEval可以作为更彻底调查和评估多语言和多文化情境的跳板。

更新时间: 2024-07-11 04:01:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.04766v5

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.

Updated: 2024-07-11 03:55:55

标题: 分层一致性基础的多智能体强化学习用于多机器人协作任务

摘要: 在多智能体强化学习（MARL）中，集中式训练与分散式执行（CTDE）框架是至关重要的，但由于训练中的全局状态引导与执行中依赖局部观测之间存在差距而遇到困难，缺乏全局信号。受人类社会共识机制的启发，我们引入了基于层次共识的多智能体强化学习（HC-MARL）框架来解决这一限制。HC-MARL采用对比学习来促进智能体之间的全局共识，实现合作行为而无需直接通信。这种方法使智能体能够从局部观察中形成全局共识，并将其作为指导执行期间协作行动的额外信息。为了满足各种任务的动态需求，共识被划分为多个层次，包括短期和长期考虑。短期观察促使形成即时、低层次的共识，而长期观察则有助于形成战略性、高层次的共识。通过自适应注意机制进一步完善了这一过程，动态调整每个共识层的影响力。这种机制优化了即时反应和战略规划之间的平衡，使其适应当前任务的特定需求。在多机器人系统中进行的大量实验和真实应用展示了我们框架卓越的性能，标志着与基准线相比的重大进展。

更新时间: 2024-07-11 03:55:55

领域: cs.AI,cs.MA,cs.RO

下载: http://arxiv.org/abs/2407.08164v1

Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard

We introduce a novel and extensible benchmark for large language models (LLMs) through grid-based games such as Tic-Tac-Toe, Connect Four, and Gomoku. The open-source game simulation code, available on GitHub, allows LLMs to compete and generates detailed data files in JSON, CSV, TXT, and PNG formats for leaderboard rankings and further analysis. We present the results of games among leading LLMs, including Claude 3.5 Sonnet and Claude 3 Sonnet by Anthropic, Gemini 1.5 Pro and Gemini 1.5 Flash by Google, GPT-4 Turbo and GPT-4o by OpenAI, and Llama3-70B by Meta. We also encourage submissions of results from other LLMs. In total, we simulated 2,310 matches (5 sessions for each pair among 7 LLMs and a random player) across three types of games, using three distinct prompt types: list, illustration, and image. The results revealed significant variations in LLM performance across different games and prompt types, with analysis covering win and disqualification rates, missed opportunity analysis, and invalid move analysis. The details of the leaderboard and result matrix data are available as open-access data on GitHub. This study enhances our understanding of LLMs' capabilities in playing games they were not specifically trained for, helping to assess their rule comprehension and strategic thinking. On the path to Artificial General Intelligence (AGI), this study lays the groundwork for future exploration into their utility in complex decision-making scenarios, illuminating their strategic thinking abilities and offering directions for further inquiry into the limits of LLMs within game-based frameworks.

Updated: 2024-07-11 03:46:35

标题: 使用基于网格的游戏竞赛评估大型语言模型：一种可扩展的LLM基准和排行榜

摘要: 我们通过基于网格的游戏（如井字棋、四子棋和五子棋）引入了一个新颖且可扩展的大型语言模型（LLMs）基准。这些开源游戏模拟代码可在GitHub上获得，允许LLMs竞争并生成详细的JSON、CSV、TXT和PNG格式的数据文件，用于排行榜排名和进一步分析。我们展示了领先的LLMs之间的游戏结果，包括Anthropic的Claude 3.5 Sonnet和Claude 3 Sonnet，Google的Gemini 1.5 Pro和Gemini 1.5 Flash，OpenAI的GPT-4 Turbo和GPT-4o，以及Meta的Llama3-70B。我们还鼓励其他LLMs提交结果。总共，我们在三种类型的游戏中模拟了2,310场比赛（每对7个LLMs和一个随机玩家进行5次会话），使用了三种不同的提示类型：列表、插图和图片。结果显示LLMs在不同游戏和提示类型中表现出显著的性能变化，分析涵盖了胜率和取消资格率、错失机会分析以及无效移动分析。排行榜和结果矩阵数据的详细信息可在GitHub上作为开放获取数据获取。该研究增进了我们对LLMs在未经专门训练的游戏中的能力的理解，有助于评估它们对规则理解和战略思维的能力。在通往人工通用智能（AGI）的道路上，该研究为未来探索它们在复杂决策情景中的实用性奠定了基础，揭示了它们的战略思维能力，并为进一步探讨LLMs在基于游戏的框架中的极限提供了方向。

更新时间: 2024-07-11 03:46:35

领域: cs.AI,cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.07796v2

Graph convolutional network for predicting abnormal grain growth in Monte Carlo simulations of microstructural evolution

Recent developments in graph neural networks show promise for predicting the occurrence of abnormal grain growth, which has been a particularly challenging area of research due to its apparent stochastic nature. In this study, we generate a large dataset of Monte Carlo simulations of abnormal grain growth. We train simple graph convolution networks to predict which initial microstructures will exhibit abnormal grain growth, and compare the results to a standard computer vision approach for the same task. The graph neural network outperformed the computer vision method and achieved 73% prediction accuracy and fewer false positives. It also provided some physical insight into feature importance and the relevant length scale required to maximize predictive performance. Analysis of the uncertainty in the Monte Carlo simulations provides additional insights for ongoing work in this area.

Updated: 2024-07-11 03:45:01

标题: 图卷积网络用于预测蒙特卡罗模拟中微观结构演变中异常晶粒生长

摘要: 最近发展的图神经网络显示出在预测异常晶粒长大的发生方面具有潜力，这是一个特别具有挑战性的研究领域，因为它表现出明显的随机性质。在这项研究中，我们生成了一个大型的 Monte Carlo 模拟异常晶粒长大的数据集。我们训练简单的图卷积网络来预测哪些初始微观结构将表现出异常晶粒长大，并将结果与用于相同任务的标准计算机视觉方法进行比较。图神经网络的表现优于计算机视觉方法，实现了73%的预测准确率和更少的误报。它还提供了一些有关特征重要性和最大化预测性能所需的相关长度尺度的物理见解。对 Monte Carlo 模拟中的不确定性进行分析为这一领域的正在进行的工作提供了额外的见解。

更新时间: 2024-07-11 03:45:01

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2110.09326v2

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding. One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt. Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations. In this work, we carry out one of the first explorations to analyze the effectiveness of LLMs in explaining other complex predictive models using ICL. To this end, we propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models. We conduct extensive analysis with these approaches on real-world tabular and text datasets and demonstrate that LLMs are capable of explaining other predictive models similar to state-of-the-art post hoc explainers, opening up promising avenues for future research into LLM-based post hoc explanations of complex predictive models.

Updated: 2024-07-11 03:42:12

标题: 上下文解释器：利用LLMs解释黑盒模型

摘要: 最近大型语言模型（LLMs）的进展展示了在机器翻译、常识推理和语言理解等复杂任务中的出色能力。LLMs在如此多样的任务中具有适应性的主要原因之一是它们的上下文学习（ICL）能力，这使它们能够通过简单地使用提示中的一些任务样本在新任务上表现良好。尽管这些方法在增强LLMs在多样的语言和表格任务上的性能方面非常有效，但尚未充分探索它们在生成事后解释方面的潜力。在这项工作中，我们进行了对LLMs在使用ICL解释其他复杂预测模型方面的有效性的首次探索之一。为此，我们提出了一个新颖的框架，In-Context Explainers，包括三种利用LLMs的ICL能力解释其他预测模型所做预测的新方法。我们在真实的表格和文本数据集上使用这些方法进行了广泛的分析，并展示了LLMs能够解释其他预测模型，类似于最先进的事后解释器，为未来基于LLMs的复杂预测模型的事后解释打开了有前途的研究途径。

更新时间: 2024-07-11 03:42:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.05797v4

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER. Code for BRIDGER is available at https://github.com/clear-nus/bridger.

Updated: 2024-07-11 03:41:42

标题: 不要从零开始：基于插值的策略扩散的行为细化

摘要: 模仿学习使人工智能代理能够通过学习示范来模仿行为。最近，扩散模型展现出在模仿学习任务上的出色表现，这些模型具有建模高维和多模态分布的能力。这些模型通过从标准高斯噪声中扩散操作（或状态）来学习形成策略。然而，要学习的目标策略通常与高斯分布显著不同，这种不匹配可能导致在使用少量扩散步骤（以提高推断速度）和有限数据时性能不佳。这项工作的关键思想是，从比高斯更具信息性的源开始，可以使扩散方法缓解上述限制。我们提出了理论结果、一种新方法和实证发现，显示了使用信息源策略的好处。我们的方法被称为BRIDGER，利用随机插值框架来连接任意策略，从而实现对模仿学习的灵活方法。它推广了之前的工作，因为仍然可以使用标准高斯，但如果可用，还可以使用其他源策略。在具有挑战性的仿真基准和真实机器人上的实验中，BRIDGER 超越了最先进的扩散策略。我们对应用 BRIDGER 时的设计考虑进行了进一步分析。BRIDGER 的代码可以在 https://github.com/clear-nus/bridger 上找到。

更新时间: 2024-07-11 03:41:42

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2402.16075v4

MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

Amid the rising intersection of generative AI and human artistic processes, this study probes the critical yet less-explored terrain of alignment in human-centric automatic song composition. We propose a novel task of Colloquial Description-to-Song Generation, which focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model, with the ultimate goal of creating songs that accurately satisfy human auditory expectations and structurally align with musical norms. Current datasets are limited due to their narrow descriptive scope, semantic gaps and inaccuracies. To overcome data scarcity in this domain, we present the Caichong Music Dataset (CaiMD). CaiMD is manually annotated by both professional musicians and amateurs, offering diverse perspectives and a comprehensive understanding of colloquial descriptions. Unlike existing datasets pre-set with expert annotations or auto-generated ones with inherent biases, CaiMD caters more sufficiently to our purpose of aligning AI-generated music with widespread user-desired results. Moreover, we propose an innovative single-stage framework called MuDiT/MuSiT for enabling effective human-machine alignment in song creation. This framework not only achieves cross-modal comprehension between colloquial language and auditory music perceptions but also ensures generated songs align with user-desired results. MuDiT/MuSiT employs one DiT/SiT model for end-to-end generation of musical components like melody, harmony, rhythm, vocals, and instrumentation. The approach ensures harmonious sonic cohesiveness amongst all generated musical components, facilitating better resonance with human auditory expectations.

Updated: 2024-07-11 03:32:44

标题: MuDiT和MuSiT：在描述到歌曲生成中与口语表达的对齐

摘要: 在生成AI和人类艺术过程交集日益增长的背景下，本研究探讨了人类中心自动创作歌曲领域中关键但较少探索的领域——对齐。我们提出了一个新颖的Colloquial Description-to-Song Generation任务，重点关注将生成内容与口语化人类表达对齐。该任务旨在弥合口语理解和AI模型内听觉表达之间的鸿沟，最终目标是创作出能准确满足人类听觉期望并在音乐规范上结构对齐的歌曲。由于当前数据集在描述范围、语义差距和不准确性方面存在限制，我们提出了Caichong音乐数据集（CaiMD）。CaiMD由专业音乐人和业余爱好者手动标注，提供多样化的观点和对口语描述的全面理解。与现有数据集预设专家注释或具有固有偏见的自动生成注释不同，CaiMD更充分地满足了我们将AI生成的音乐与广泛用户期望结果对齐的目的。此外，我们提出了一个名为MuDiT/MuSiT的创新的单阶段框架，用于实现歌曲创作中的有效人机对齐。该框架不仅实现了口语语言和听觉音乐感知之间的跨模态理解，还确保生成的歌曲与用户期望结果对齐。MuDiT/MuSiT采用一个DiT/SiT模型，用于端到端生成如旋律、和声、节奏、人声和器乐等音乐组件。该方法确保了所有生成的音乐组件之间的和谐音频连贯性，有利于更好地与人类听觉期望产生共鸣。

更新时间: 2024-07-11 03:32:44

领域: cs.SD,cs.AI,cs.MM,eess.AS,68Txx(Primary)14F05, 91Fxx(Secondary),I.2.7; J.5

下载: http://arxiv.org/abs/2407.03188v2

MCSD: An Efficient Language Model with Diverse Fusion

Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to $O(1)$ and time complexity to $O(N)$ respectively. Our experiments show that MCSD attains higher throughput and lower GPU memory consumption compared to Transformers, while maintaining comparable performance to larger-scale language learning models on benchmark tests. These attributes position MCSD as a promising base for edge deployment and embodied intelligence.

Updated: 2024-07-11 03:29:19

标题: MCSD: 一种具有多元融合的高效语言模型

摘要: Transformer在自然语言处理（NLP）中表现出色，因为它们擅长捕捉长期依赖关系，但随着序列长度的增加，资源消耗呈指数增长。为了解决这些挑战，我们提出了MCSD模型，这是一个具有线性扩展和快速推理速度的高效语言模型。MCSD模型利用多通道斜率和衰减（MCSD）块进行多样特征融合，主要用于稳健地表示特征。该块包括斜率和衰减部分，可跨越各种时间感受野提取特征，有助于捕捉局部和全局信息。此外，MCSD块进行元素级特征融合，进一步增强了精细特征提取能力。对于推理，我们将推理过程表述为递归表示，将空间复杂度削减至$O(1)$，时间复杂度削减至$O(N)$。我们的实验表明，与Transformer相比，MCSD在吞吐量和GPU内存消耗方面表现更优，同时在基准测试中保持与更大规模语言学习模型相媲美的性能。这些特性使MCSD成为边缘部署和体现智能的有前途的基础。

更新时间: 2024-07-11 03:29:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12230v2

CAV-AHDV-CAV: Mitigating Traffic Oscillations for CAVs through a Novel Car-Following Structure and Reinforcement Learning

Connected and Automated Vehicles (CAVs) offer a promising solution to the challenges of mixed traffic with both CAVs and Human-Driven Vehicles (HDVs). A significant hurdle in such scenarios is traffic oscillation, or the "stop-and-go" pattern, during car-following situations. While HDVs rely on limited information, CAVs can leverage data from other CAVs for better decision-making. This allows CAVs to anticipate and mitigate the spread of deceleration waves that worsen traffic flow. We propose a novel "CAV-AHDV-CAV" car-following framework that treats the sequence of HDVs between two CAVs as a single entity, eliminating noise from individual driver behaviors. This deep reinforcement learning approach analyzes vehicle equilibrium states and employs a state fusion strategy. Trained and tested on diverse datasets (HighD, NGSIM, SPMD, Waymo, Lyft) encompassing over 70,000 car-following instances, our model outperforms baselines in collision avoidance, maintaining equilibrium with both preceding and leading vehicles and achieving the lowest standard deviation of time headway. These results demonstrate the effectiveness of our approach in developing robust CAV control strategies for mixed traffic. Our model has the potential to mitigate traffic oscillation, improve traffic flow efficiency, and enhance overall safety.

Updated: 2024-07-11 03:27:50

标题: CAV-AHDV-CAV：通过一种新颖的车辆跟随结构和强化学习缓解自动驾驶车辆的交通振荡

摘要: Connected and Automated Vehicles（CAVs）提供了一个有希望的解决方案，来解决混合交通中CAVs和Human-Driven Vehicles（HDVs）的挑战。在这种情况下的一个重要障碍是交通振荡，或者在跟车情况下的“停-走”模式。虽然HDVs依赖有限的信息，但CAVs可以利用来自其他CAVs的数据进行更好的决策。这使得CAVs能够预见并减轻恶化交通流的减速波。我们提出了一个新颖的“CAV-AHDV-CAV”跟车框架，将两辆CAVs之间的HDVs序列视为一个单一实体，消除了个体驾驶行为的噪音。这种深度强化学习方法分析车辆平衡状态，并采用状态融合策略。在多样化数据集（HighD，NGSIM，SPMD，Waymo，Lyft）上进行训练和测试，包括超过70,000个跟车实例，我们的模型在避免碰撞、与前车和后车保持平衡以及实现最低时间间隔的标准差方面优于基线。这些结果展示了我们方法在开发混合交通的稳健CAV控制策略方面的有效性。我们的模型有潜力减轻交通振荡，提高交通流效率，并增强整体安全性。

更新时间: 2024-07-11 03:27:50

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.02517v2

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.

Updated: 2024-07-11 03:25:40

标题: 在网络安全环境中，与模型无关的干净标签后门缓解

摘要: 机器学习模型的训练阶段是一个非常关键的步骤，特别是在网络安全的背景下。最近的研究揭示了一系列阴险的训练时攻击，这些攻击在为安全分类任务设计的模型中注入后门，而不改变训练标签。通过这项工作，我们提出了一些新的技术，利用网络安全威胁模型的见解，有效地减轻这些干净标签污染攻击，同时保留模型的效用。通过在精心选择的特征子空间上执行基于密度的聚类，并通过一种新颖的迭代评分程序逐渐隔离可疑的聚类，我们的防御机制可以减轻攻击，而不需要现有后门防御文献中的许多常见假设。为了展示我们提出的减轻方法的普适性，我们在两种不同的经典网络安全数据模态上评估了它，分别是网络流分类和恶意软件分类，使用梯度提升和神经网络模型。

更新时间: 2024-07-11 03:25:40

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.08159v1

$R^3$: "This is My SQL, Are You With Me?" A Consensus-Based Multi-Agent System for Text-to-SQL Tasks

Large Language Models (LLMs) have demonstrated strong performance on various tasks. To unleash their power on the Text-to-SQL task, we propose $R^3$ (Review-Rebuttal-Revision), a consensus-based multi-agent system for Text-to-SQL tasks. $R^3$ outperforms the existing single LLM Text-to-SQL systems as well as the multi-agent Text-to-SQL systems by $1.3\%$ to $8.1\%$ on Spider and Bird. Surprisingly, we find that for Llama-3-8B, $R^3$ outperforms chain-of-thought prompting by over 20\%, even outperforming GPT-3.5 on the development set of Spider.

Updated: 2024-07-11 03:14:54

标题: $R^3$: “这是我的SQL，你跟上我吗？”一种基于共识的文本到SQL任务的多代理系统

摘要: 大型语言模型(LLMs)在各种任务上表现出色。为了发挥它们在文本到SQL任务上的潜力，我们提出了一个基于共识的多智能体系统$R^3$（Review-Rebuttal-Revision）。$R^3$在Spider和Bird上比现有的单一LLM文本到SQL系统以及多智能体文本到SQL系统表现出$1.3\%$到$8.1\%$的优势。令人惊讶的是，我们发现对于Llama-3-8B，$R^3$比思维链提示高出超过20\%，甚至在Spider的开发集上超过了GPT-3.5。

更新时间: 2024-07-11 03:14:54

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2402.14851v2

Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models

Deduplication is a vital preprocessing step that enhances machine learning model performance and saves training time and energy. However, enhancing federated learning through deduplication poses challenges, especially regarding scalability and potential privacy violations if deduplication involves sharing all clients' data. In this paper, we address the problem of deduplication in a federated setup by introducing a pioneering protocol, Efficient Privacy-Preserving Multi-Party Deduplication (EP-MPD). It efficiently removes duplicates from multiple clients' datasets without compromising data privacy. EP-MPD is constructed in a modular fashion, utilizing two novel variants of the Private Set Intersection protocol. Our extensive experiments demonstrate the significant benefits of deduplication in federated learning of large language models. For instance, we observe up to 19.61% improvement in perplexity and up to 27.95% reduction in running time. EP-MPD effectively balances privacy and performance in federated learning, making it a valuable solution for large-scale applications.

Updated: 2024-07-11 03:10:27

标题: 隐私保护数据去重以增强语言模型的联邦学习

摘要: 重复数据删除是增强机器学习模型性能并节省训练时间和能量的关键预处理步骤。然而，通过重复数据删除增强联邦学习存在挑战，特别是在可扩展性和潜在的隐私侵犯方面，如果重复数据删除涉及共享所有客户端数据。本文介绍了一个创新的协议，即高效的隐私保护多方重复数据删除（EP-MPD），来解决联邦设置中的重复数据删除问题。EP-MPD可以有效地从多个客户端数据集中删除重复数据，而不会损害数据隐私。EP-MPD采用模块化设计，利用两种新颖的私有集合交集协议。我们的广泛实验展示了在大型语言模型的联邦学习中进行重复数据删除的显著好处。例如，我们观察到困惑度提高了高达19.61％，运行时间减少了高达27.95％。EP-MPD在联邦学习中有效地平衡了隐私和性能，使其成为大规模应用的宝贵解决方案。

更新时间: 2024-07-11 03:10:27

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08152v1

Entropy Law: The Story Behind Data Compression and LLM Performance

Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect quality, their combinations may be suboptimal in teaching LLMs due to their intrinsic homogeneity or contradiction. In this paper, we aim to uncover the underlying relationships between LLM performance and data selection. Inspired by the information compression nature of LLMs, we uncover an ``entropy law'' that connects LLM performance with data compression ratio and first-epoch training loss, which reflect the information redundancy of a dataset and the mastery of inherent knowledge encoded in this dataset, respectively. Through both theoretical deduction and empirical evaluation, we find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method named \textbf{ZIP} for training LLMs, which aim to prioritize data subsets exhibiting a low compression ratio. Based on a multi-stage algorithm that selects diverse data in a greedy manner, we can obtain a good data subset with satisfactory diversity. Extensive experiments have been conducted to validate the entropy law and the superiority of ZIP across different LLM backbones and alignment stages. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.

Updated: 2024-07-11 03:06:45

标题: 熵定律：数据压缩和LLM性能背后的故事

摘要: 数据是大型语言模型（LLMs）的基石，但并非所有数据对模型学习都有用。精心选择的数据可以更好地引发LLMs的能力，而且计算开销要小得多。大多数方法集中在评估数据选择中单个样本的质量，而忽略了样本之间的组合效应。即使每个样本质量完美，它们的组合可能由于固有的同质性或矛盾而无法最佳地教授LLMs。在本文中，我们旨在揭示LLM性能和数据选择之间的潜在关系。受LLMs信息压缩特性的启发，我们揭示了一个将LLM性能与数据压缩比率和首轮训练损失相联系的“熵定律”，分别反映了数据集的信息冗余和编码在数据集中的内在知识的掌握程度。通过理论推导和实证评估，我们发现模型性能与训练数据的压缩比率呈负相关，通常导致较低的训练损失。基于熵定律的发现，我们提出了一种名为ZIP的非常高效和通用的数据选择方法，用于训练LLMs，旨在优先考虑展示低压缩比率的数据子集。通过以贪婪方式选择多样化数据的多阶段算法，我们可以获得一个具有令人满意多样性的良好数据子集。已进行了广泛实验以验证熵定律和ZIP在不同LLM骨干和对齐阶段上的优越性。我们还提出了熵定律的一个有趣应用，可以在模型训练的开始阶段检测潜在的性能风险。

更新时间: 2024-07-11 03:06:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.06645v3

SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis

Singing voice conversion (SVC) aims to convert a singer's voice in a given music piece to another singer while keeping the original content. We propose an end-to-end feature disentanglement-based model, which we named SaMoye, to enable zero-shot many-to-many singing voice conversion. SaMoye disentangles the features of the singing voice into content features, timbre features, and pitch features respectively. The content features are enhanced using a GPT-based model to perform cross-prediction with the phoneme of the lyrics. SaMoye can generate the music with converted voice by replacing the timbre features with the target singer. We also establish an unparalleled large-scale dataset to guarantee zero-shot performance. The dataset consists of 1500k pure singing vocal clips containing at least 10,000 singers.

Updated: 2024-07-11 03:06:21

标题: 萨莫耶：基于特征解耦和合成的零样本歌声转换

摘要: 歌声转换（SVC）旨在将一位歌手在给定音乐作品中的声音转换为另一位歌手的声音，同时保留原始内容。我们提出了一种基于端到端特征解缠的模型，命名为SaMoye，以实现零样本多对多歌声转换。SaMoye将歌声的特征分解为内容特征、音色特征和音高特征。内容特征使用基于GPT的模型增强，以与歌词的音素进行交叉预测。SaMoye可以通过用目标歌手的音色特征替换来生成具有转换声音的音乐。我们还建立了一个空前大规模的数据集，以确保零样本性能。该数据集包含1500k个纯歌唱声音片段，至少包含10,000名歌手。

更新时间: 2024-07-11 03:06:21

领域: cs.SD,cs.AI,cs.MM,eess.AS,68Txx(Primary)14F05, 91Fxx(Secondary),I.2.7; J.5

下载: http://arxiv.org/abs/2407.07728v2

Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication

Reduplication and repetition, though similar in form, serve distinct linguistic purposes. Reduplication is a deliberate morphological process used to express grammatical, semantic, or pragmatic nuances, while repetition is often unintentional and indicative of disfluency. This paper presents the first large-scale study of reduplication and repetition in speech using computational linguistics. We introduce IndicRedRep, a new publicly available dataset containing Hindi, Telugu, and Marathi text annotated with reduplication and repetition at the word level. We evaluate transformer-based models for multi-class reduplication and repetition token classification, utilizing the Reparandum-Interregnum-Repair structure to distinguish between the two phenomena. Our models achieve macro F1 scores of up to 85.62% in Hindi, 83.95% in Telugu, and 84.82% in Marathi for reduplication-repetition classification.

Updated: 2024-07-11 03:00:14

标题: 外表可能欺人：区分重复的不流畅与重叠

摘要: 复制和重复虽然在形式上相似，但具有不同的语言目的。复制是一种有意的形态学过程，用于表达语法、语义或语用细微差别，而重复则经常是无意的，表明不流畅。本文利用计算语言学首次对语音中的复制和重复进行了大规模研究。我们介绍了IndicRedRep，一个新的公开可用数据集，其中包含用印地语、泰卢固语和马拉地语标注的具有复制和重复的文本，以单词级别进行注释。我们评估了基于变压器的模型，用于多类复制和重复令牌分类，利用Reparandum-Interregnum-Repair结构来区分这两种现象。我们的模型在印地语中达到了最高85.62％的宏F1分数，在泰卢固语中达到了83.95％，在马拉地语中达到了84.82％，用于复制和重复分类。

更新时间: 2024-07-11 03:00:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08147v1

AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning

In recent years, advancements in representation learning and language models have propelled Automated Captioning (AC) to new heights, enabling the generation of human-level descriptions. Leveraging these advancements, we propose AVCap, an Audio-Visual Captioning framework, a simple yet powerful baseline approach applicable to audio-visual captioning. AVCap utilizes audio-visual features as text tokens, which has many advantages not only in performance but also in the extensibility and scalability of the model. AVCap is designed around three pivotal dimensions: the exploration of optimal audio-visual encoder architectures, the adaptation of pre-trained models according to the characteristics of generated text, and the investigation into the efficacy of modality fusion in captioning. Our method outperforms existing audio-visual captioning methods across all metrics and the code is available on https://github.com/JongSuk1/AVCap

Updated: 2024-07-11 02:38:14

标题: AVCap：利用音频视觉特征作为字幕的文本标记

摘要: 近年来，表示学习和语言模型的进步推动了自动字幕（AC）达到了新的高度，使得生成类人级描述成为可能。利用这些进步，我们提出了AVCap，一个音频-视觉字幕框架，这是一个简单但强大的基线方法，适用于音频-视觉字幕。AVCap利用音频-视觉特征作为文本标记，这不仅在性能上有很多优势，而且在模型的可扩展性和可扩展性上也有很多优势。AVCap设计围绕着三个关键维度：探索最佳的音频-视觉编码器架构，根据生成文本的特征调整预训练模型，以及探讨在字幕中的模态融合的有效性。我们的方法在所有指标上都优于现有的音频-视觉字幕方法，代码可在https://github.com/JongSuk1/AVCap上找到。

更新时间: 2024-07-11 02:38:14

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2407.07801v2

Fish-bone diagram of research issue: Gain a bird's-eye view on a specific research topic

Novice researchers often face difficulties in understanding a multitude of academic papers and grasping the fundamentals of a new research field. To solve such problems, the knowledge graph supporting research survey is gradually being developed. Existing keyword-based knowledge graphs make it difficult for researchers to deeply understand abstract concepts. Meanwhile, novice researchers may find it difficult to use ChatGPT effectively for research surveys due to their limited understanding of the research field. Without the ability to ask proficient questions that align with key concepts, obtaining desired and accurate answers from this large language model (LLM) could be inefficient. This study aims to help novice researchers by providing a fish-bone diagram that includes causal relationships, offering an overview of the research topic. The diagram is constructed using the issue ontology from academic papers, and it offers a broad, highly generalized perspective of the research field, based on relevance and logical factors. Furthermore, we evaluate the strengths and improvable points of the fish-bone diagram derived from this study's development pattern, emphasizing its potential as a viable tool for supporting research survey.

Updated: 2024-07-11 02:18:54

标题: 研究问题的鱼骨图：从鸟瞰视角了解特定研究主题

摘要: 初学者研究人员经常面临理解大量学术论文和把握新研究领域基础知识的困难。为解决这些问题，支持研究调查的知识图逐渐被开发。现有基于关键词的知识图使研究人员难以深入理解抽象概念。同时，由于对研究领域的理解有限，初学者研究人员可能会发现难以有效地利用ChatGPT进行研究调查。如果没有能力提出与关键概念一致的熟练问题，从这个大型语言模型（LLM）获得期望和准确的答案可能会效率低下。本研究旨在帮助初学者研究人员，通过提供包含因果关系的鱼骨图，提供研究主题的概述。该图是使用学术论文的问题本体构建的，它基于相关性和逻辑因素提供了对研究领域的广泛、高度概括的视角。此外，我们评估了从本研究开发模式中得出的鱼骨图的优势和可改进点，强调其作为支持研究调查的可行工具的潜力。

更新时间: 2024-07-11 02:18:54

领域: cs.AI

下载: http://arxiv.org/abs/2407.01553v2

Highway Networks for Improved Surface Reconstruction: The Role of Residuals and Weight Updates

Surface reconstruction from point clouds is a fundamental challenge in computer graphics and medical imaging. In this paper, we explore the application of advanced neural network architectures for the accurate and efficient reconstruction of surfaces from data points. We introduce a novel variant of the Highway network (Hw) called Square-Highway (SqrHw) within the context of multilayer perceptrons and investigate its performance alongside plain neural networks and a simplified Hw in various numerical examples. These examples include the reconstruction of simple and complex surfaces, such as spheres, human hands, and intricate models like the Stanford Bunny. We analyze the impact of factors such as the number of hidden layers, interior and exterior points, and data distribution on surface reconstruction quality. Our results show that the proposed SqrHw architecture outperforms other neural network configurations, achieving faster convergence and higher-quality surface reconstructions. Additionally, we demonstrate the SqrHw's ability to predict surfaces over missing data, a valuable feature for challenging applications like medical imaging. Furthermore, our study delves into further details, demonstrating that the proposed method based on highway networks yields more stable weight norms and backpropagation gradients compared to the Plain Network architecture. This research not only advances the field of computer graphics but also holds utility for other purposes such as function interpolation and physics-informed neural networks, which integrate multilayer perceptrons into their algorithms.

Updated: 2024-07-11 02:15:21

标题: 公路网络用于改善地表重建：残差和权重更新的作用

摘要: 从点云中进行表面重建是计算机图形学和医学成像中的一个基本挑战。本文探讨了先进神经网络架构在准确高效地从数据点重建表面方面的应用。我们在多层感知器的背景下引入了一种新颖的Highway网络(Hw)变体，称为Square-Highway(SqrHw)，并通过在各种数值示例中与普通神经网络和简化Hw的性能进行比较。这些示例包括简单和复杂表面的重建，如球体、人手和斯坦福兔等复杂模型。我们分析了隐藏层数量、内部和外部点以及数据分布对表面重建质量的影响。我们的结果显示，所提出的SqrHw架构优于其他神经网络配置，实现更快的收敛速度和更高质量的表面重建。此外，我们展示了SqrHw预测缺失数据表面的能力，这是医学成像等具有挑战性应用的有价值特性。此外，我们的研究深入探讨了更多细节，表明基于Highway网络的提出方法与Plain Network架构相比产生更稳定的权重规范和反向传播梯度。这项研究不仅推进了计算机图形学领域，还对其他用途如函数插值和物理信息神经网络等具有实用价值，这些网络将多层感知器整合到其算法中。

更新时间: 2024-07-11 02:15:21

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08134v1

Nonverbal Interaction Detection

This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth.

Updated: 2024-07-11 02:14:06

标题: 非言语互动检测

摘要: 这项工作解决了理解社交背景下人类非语言交互的新挑战。非语言信号几乎无处不在地渗透到每一个交际行为中。我们的手势、面部表情、姿势、凝视，甚至身体外貌都在没有说话的情况下传达信息。尽管非语言信号在社交生活中扮演着至关重要的角色，但与语言同行相比，非语言信号受到的关注非常有限，现有解决方案通常仅考虑孤立的非语言暗示。我们的研究标志着首次系统性努力增强多方面非语言信号的解释。首先，我们贡献了一个新颖的大规模数据集，名为NVI，该数据集经过精心注释，包括人类和相应社交群体的边界框，以及五种广泛交互类型下的22种原子级非语言行为。其次，我们建立了一个新任务NVI-DET用于非语言交互检测，该任务被形式化为从图像中识别三元组<个人，群体，交互>。第三，我们提出了一个非语言交互检测超图(NVI-DEHR)，这是一种新方法，明确地使用超图来建模高阶非语言交互。模型的核心是一个双重多尺度超图，巧妙地处理了个体与个体和群体与群体之间在不同尺度上的相关性，促进了交互特征学习，并最终改善了交互预测。NVI上的大量实验表明，NVI-DEHR在NVI-DET中显著改善了各种基线。它还在HOI-DET上展现了领先的性能，验证了其在支持相关任务和强大泛化能力方面的多功能性。我们希望我们的研究将为社区提供更深入探索非语言信号的新途径。

更新时间: 2024-07-11 02:14:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08133v1

Fairness-aware Vision Transformer via Debiased Self-Attention

Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.

Updated: 2024-07-11 02:11:49

标题: 公平感知视觉变换器：通过去偏自注意力算法

摘要: 最近，Vision Transformer（ViT）因其能够通过注意机制提取信息特征和建模长距离依赖关系的能力而在解决计算机视觉（CV）问题方面引起了重大关注。然而，最近的研究已经探讨了ViT的可信度，包括其稳健性和可解释性，但公平性问题尚未得到充分解决。我们发现，为CNN设计的现有公平感知算法在ViT上表现不佳，这突出了通过Debiased Self-Attention（DSA）开发我们的新框架的必要性。DSA是一种通过盲目性实现公平性的方法，它强制ViT消除与敏感标签相关的虚假特征以减轻偏见，并同时保留用于目标预测的真实特征。值得注意的是，DSA利用对抗性示例来定位和掩盖输入图像补丁中的虚假特征，并在训练目标中添加额外的注意权重对齐正则化器，以鼓励学习用于目标预测的真实特征。重要的是，我们的DSA框架在多个预测任务上比以前的作品提供了改进的公平性保证，而不会损害目标预测性能。代码可在\href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}找到。

更新时间: 2024-07-11 02:11:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2301.13803v3

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM.

Updated: 2024-07-11 02:08:35

标题: FunAudioLLM：人类和LLM之间自然互动的语音理解和生成基础模型

摘要: 这份报告介绍了FunAudioLLM，一个旨在增强人类与大型语言模型（LLMs）之间自然语音交互的模型系列。其核心包括两种创新模型：SenseVoice，负责多语言语音识别、情感识别和音频事件检测；以及CosyVoice，通过控制多种语言、音色、说话风格和说话者身份，促进自然语音生成。SenseVoice-Small提供了5种语言的极低延迟ASR，SenseVoice-Large支持50多种语言的高精度ASR，而CosyVoice在多语言语音生成、零样本上下文学习、跨语言语音克隆和遵循指令等方面表现出色。与SenseVoice和CosyVoice相关的模型已在Modelscope和Huggingface上开源，并在GitHub上发布了相应的训练、推断和微调代码。通过将这些模型与LLMs集成，FunAudioLLM实现了诸如语音到语音翻译、情感语音聊天、交互式播客和富有表现力的有声读物叙述等应用，从而推动了语音交互技术的边界。演示可在https://fun-audio-llm.github.io上找到，代码可在https://github.com/FunAudioLLM上访问。

更新时间: 2024-07-11 02:08:35

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.04051v3

Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less attention. We aim to advance the decoding phase and improve its interpretability. Specifically, we introduce a new decoding paradigm, \underline{l}abel s\underline{e}m\underline{a}ntic-based \underline{p}rojection (LEAP), that employs labels texts of event categories, each bearing distinct and explicit semantics, for parsing potentially overlapping events.LEAP works by iteratively projecting encoded latent features of audio/visual segments onto semantically independent label embeddings. This process, enriched by modeling cross-modal (audio/visual-label) interactions, gradually disentangles event semantics within video segments to refine relevant label embeddings, guaranteeing a more discriminative and interpretable decoding process. To facilitate the LEAP paradigm, we propose a semantic-aware optimization strategy, which includes a novel audio-visual semantic similarity loss function. This function leverages the Intersection over Union of audio and visual events (EIoU) as a novel metric to calibrate audio-visual similarities at the feature level, accommodating the varied event densities across modalities. Extensive experiments demonstrate the superiority of our method, achieving new state-of-the-art performance for AVVP and also enhancing the relevant audio-visual event localization task.

Updated: 2024-07-11 01:57:08

标题: 音频-视觉视频解析中基于标签预期的事件解缠

摘要: 音频-视觉视频解析（AVVP）任务旨在检测和在音频和视觉模态中暂时定位事件。在时间轴上可能会有多个事件重叠，这使得识别变得具有挑战性。传统方法通常专注于改进早期音频-视觉编码器，以嵌入更有效的特征，解码阶段——对于最终事件分类至关重要，通常受到较少关注。我们旨在推进解码阶段并提高其可解释性。具体而言，我们引入了一种新的解码范式，即基于标签语义的投影（LEAP），它利用事件类别的标签文本，每个类别具有明确的语义，用于解析潜在重叠的事件。LEAP通过将音频/视频段的编码潜在特征迭代地投影到语义独立的标签嵌入上来工作。这一过程通过建模跨模态（音频/视觉-标签）交互而得以丰富，逐渐解开视频段内的事件语义，以精炼相关的标签嵌入，从而确保更具有辨别性和可解释性的解码过程。为了促进LEAP范式，我们提出了一种语义感知的优化策略，其中包括一种新颖的音频-视觉语义相似性损失函数。该函数利用音频和视觉事件的交并比（EIoU）作为一种新颖的度量来校准特征级别上的音频-视觉相似性，以适应模态间各种事件密度的差异。大量实验表明了我们方法的优越性，达到了AVVP的最新性能水平，并增强了相关的音频-视觉事件定位任务。

更新时间: 2024-07-11 01:57:08

领域: cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2407.08126v1

Real-Time Summarization of Twitter

In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.

Updated: 2024-07-11 01:56:31

标题: Twitter的实时摘要

摘要: 在这篇论文中，我们描述了我们在TREC实时推文摘要中的方法。我们专注于实时推送通知场景，这需要一个系统监控采样推文流，并返回与给定兴趣资料相关和新颖的推文。我们使用Dirichlet分数以及几乎没有平滑（基准）来分类推文是否与给定兴趣资料相关。通过使用包括平均精度（MAP）、累积增益（CG）和折扣累积增益（DCG）在内的指标，实验表明我们的方法表现良好。我们还希望从推送队列中删除冗余的推文。由于精度限制，我们在本文中仅描述算法。

更新时间: 2024-07-11 01:56:31

领域: cs.LG

下载: http://arxiv.org/abs/2407.08125v1

Training toward significance with the decorrelated event classifier transformer neural network

Experimental particle physics uses machine learning for many tasks, where one application is to classify signal and background events. This classification can be used to bin an analysis region to enhance the expected significance for a mass resonance search. In natural language processing, one of the leading neural network architectures is the transformer. In this work, an event classifier transformer is proposed to bin an analysis region, in which the network is trained with special techniques. The techniques developed here can enhance the significance and reduce the correlation between the network's output and the reconstructed mass. It is found that this trained network can perform better than boosted decision trees and feed-forward networks.

Updated: 2024-07-11 01:50:17

标题: 训练朝着显著性的方向，使用去相关的事件分类器转换神经网络

摘要: 实验粒子物理学在许多任务中使用机器学习，其中一个应用是对信号和背景事件进行分类。这种分类可以用来将分析区域进行分组，以增强对质量共振搜索的预期显著性。在自然语言处理中，领先的神经网络架构之一是变压器。在本研究中，提出了一个事件分类器变压器，用于对分析区域进行分组，网络采用特殊技术进行训练。这里开发的技术可以增强显著性，并减少网络输出与重建质量之间的相关性。发现经过训练的网络可以比增强决策树和前馈网络表现更好。

更新时间: 2024-07-11 01:50:17

领域: hep-ex,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.00428v3

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.

Updated: 2024-07-11 01:34:37

标题: 多模态情感分析中的缺失模态：一种知识迁移方法

摘要: 多模态情感分析旨在通过视觉、语言和声音线索识别个体表达的情绪。然而，大多数现有的研究工作都假设所有模态在训练和测试期间都是可用的，使得它们的算法容易受到缺失模态情况的影响。本文提出了一种新颖的知识转移网络，用于在不同模态之间进行翻译，以重建缺失的音频模态。此外，我们开发了跨模态注意机制，以保留重建和观察到的模态的最大信息，用于情感预测。在三个公开可用的数据集上进行的大量实验证实了与基线相比的显著改进，并且在完整多模态监督的情况下取得了与先前方法相当的结果。

更新时间: 2024-07-11 01:34:37

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2401.10747v3

Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models

Denoising Diffusion Probabilistic Model (DDPM) has shown great competence in image and audio generation tasks. However, there exist few attempts to employ DDPM in the text generation, especially review generation under recommendation systems. Fueled by the predicted reviews explainability that justifies recommendations could assist users better understand the recommended items and increase the transparency of recommendation system, we propose a Diffusion Model-based Review Generation towards EXplainable Recommendation named Diffusion-EXR. Diffusion-EXR corrupts the sequence of review embeddings by incrementally introducing varied levels of Gaussian noise to the sequence of word embeddings and learns to reconstruct the original word representations in the reverse process. The nature of DDPM enables our lightweight Transformer backbone to perform excellently in the recommendation review generation task. Extensive experimental results have demonstrated that Diffusion-EXR can achieve state-of-the-art review generation for recommendation on two publicly available benchmark datasets.

Updated: 2024-07-11 01:34:09

标题: Diffusion-EXR：通过扩散模型实现可控的解释性推荐评论生成

摘要: Denoising Diffusion Probabilistic Model（DDPM）在图像和音频生成任务中表现出很高的竞争力。然而，在文本生成领域，尤其是在推荐系统下的评论生成方面，很少有尝试使用DDPM。受到预测评论可解释性的启发，可以帮助用户更好地理解推荐的物品并增加推荐系统的透明度，我们提出了一种基于扩散模型的评论生成方法，命名为Diffusion-EXR。Diffusion-EXR通过逐步引入不同级别的高斯噪声来破坏评论嵌入序列，学习在反向过程中重建原始单词表示。DDPM的特性使得我们的轻量级Transformer骨干网络在推荐评论生成任务中表现出色。广泛的实验结果表明，Diffusion-EXR在两个公开可用的基准数据集上可以实现最先进的推荐评论生成。

更新时间: 2024-07-11 01:34:09

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2312.15490v3

VLind-Bench: Measuring Language Priors in Large Vision-Language Models

Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.

Updated: 2024-07-11 01:32:48

标题: VLind-Bench: 在大型视觉-语言模型中测量语言先验

摘要: 大型视觉-语言模型（LVLMs）已经在各种多模态任务中展现出卓越的性能。然而，它们存在一个被称为语言先验的问题，即基于文本模式生成响应，而忽视图像信息。解决语言先验问题至关重要，因为当处理训练分布之外的图像时，可能会导致不良偏见或幻觉。尽管其重要性，目前对LVLMs中准确测量语言先验的方法研究不足。虽然基于反事实或分布之外图像的现有基准可以部分用于测量语言先验，但它们未能将语言先验与其他混杂因素进行区分。因此，我们提出了一个名为VLind-Bench的新基准，这是第一个专门设计用于衡量LVLMs的语言先验或盲目性的基准。它不仅包括对反事实图像的测试以评估语言先验，还涉及一系列测试以评估更基本的能力，如常识知识、视觉感知和常识偏见。在我们的基准中的每个实例中，我们确保在评估语言先验之前通过所有这些基本测试，从而最大程度地减少其他因素对评估的影响。我们基准中最近LVLMs的评估和分析表明，几乎所有模型都在很大程度上依赖于语言先验，这在该领域提出了一个严峻的挑战。

更新时间: 2024-07-11 01:32:48

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.08702v3

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence lenth. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation capabilities, highlighting the need to further study such paradigms and investigate why long-context models seemingly fail to behave as one might expect.

Updated: 2024-07-11 01:08:39

标题: 长序列模型能够多好地建模长序列？比较长上下文能力的架构归纳偏差

摘要: 长序列在现实世界中丰富多样，因此适当地对其进行建模可以打开许多下游用例。然而，深度神经网络在处理这些序列时往往遇到各种困难。最近的进展，无论是在系统工程还是模型设计方面，都使得可以扩展支持更长上下文长度的模型成为可能。特别是，状态空间和线性循环神经网络系列的模型理论上可以扩展到无限序列长度。然而，这是否太美好了？我们进行了评估，结果表明，虽然这些主张在理论上可能是正确的，但实际上存在着大量的实证观察到的差距。特别是，在相同设置下，循环模型仍然遭受着与具有注意力机制的长上下文LLMs相同的困境。我们进一步展示了不同的归纳偏差具有不一致的外推能力，突出了进一步研究这些范式并探讨为什么长上下文模型似乎未能如人们所期望地行为的必要性。

更新时间: 2024-07-11 01:08:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.08112v1

Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter

Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications. We propose a Large-Small Model co-adapter paradigm (LSM-adapter), which harnesses the substantial generic segmentation potential of large model and the specific task-directed guidance of small model. Specifically, a Triple-S Prompt Adapter module alongside a Dynamic Prompt Combiner are proposed to generate then merge multiple prompts for mask decoder adaptation. Meanwhile, a Histogram Equalization Adap-ter module is designed to infuse the image specific information for image encoder adaptation. Results and analysis show the challenge and superiority of our developed benchmark and algorithm. Project page: \url{https://github.com/zhang-chenxu/LSM-Adapter}

Updated: 2024-07-11 01:03:02

标题: 城市积水检测：一个具有挑战性的基准和大小模型共适配器

摘要: 城市内涝对公共安全和基础设施构成重大风险。传统方法使用水位传感器需要高维护成本，很难实现全面覆盖。最近的进展利用监控摄像头图像和深度学习进行检测，然而在数据稀缺和恶劣环境条件下仍然面临困难。本文在不同恶劣条件下建立了一个具有挑战性的城市内涝基准（UW-Bench），以推进现实世界的应用。我们提出了一个大小模型共适配器范例（LSM-adapter），它利用大型模型的大量通用分割潜力和小型模型的特定任务导向引导。具体来说，提出了一个三重S提示适配器模块以及一个动态提示组合器，用于生成然后合并多个提示以进行蒙版解码器适配。与此同时，设计了一个直方图均衡适配器模块，用于为图像编码器适配注入图像特定信息。结果和分析显示了我们开发的基准和算法的挑战和优越性。项目页面：\url{https://github.com/zhang-chenxu/LSM-Adapter}

更新时间: 2024-07-11 01:03:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08109v1

CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data

Deep learning recommendation models (DLRMs) are at the heart of the current e-commerce industry. However, the amount of training data used to train these large models is growing exponentially, leading to substantial training hurdles. The training dataset contains two primary types of information: content-based information (features of users and items) and collaborative information (interactions between users and items). One approach to reduce the training dataset is to remove user-item interactions. But that significantly diminishes collaborative information, which is crucial for maintaining accuracy due to its inclusion of interaction histories. This loss profoundly impacts DLRM performance. This paper makes an important observation that if one can capture the user-item interaction history to enrich the user and item embeddings, then the interaction history can be compressed without losing model accuracy. Thus, this work, Collaborative Aware Data Compression (CADC), takes a two-step approach to training dataset compression. In the first step, we use matrix factorization of the user-item interaction matrix to create a novel embedding representation for both the users and items. Once the user and item embeddings are enriched by the interaction history information the approach then applies uniform random sampling of the training dataset to drastically reduce the training dataset size while minimizing model accuracy drop. The source code of CADC is available at \href{https://anonymous.4open.science/r/DSS-RM-8C1D/README.md}{https://anonymous.4open.science/r/DSS-RM-8C1D/README.md}.

Updated: 2024-07-11 00:54:56

标题: CADC：编码用户-项目交互以压缩推荐模型训练数据

摘要: 深度学习推荐模型（DLRM）是当前电子商务行业的核心。然而，用于训练这些大型模型的训练数据量呈指数增长，导致训练障碍重重。训练数据集包含两种主要类型的信息：基于内容的信息（用户和项目的特征）和协作信息（用户和项目之间的互动）。一种减少训练数据集的方法是删除用户-项目的互动。但这将显著减少协作信息，而协作信息对于维持准确性至关重要，因为它包含互动历史。这种损失严重影响DLRM的性能。本文做出了一个重要观察，即如果能够捕捉用户-项目互动历史以丰富用户和项目嵌入，那么可以在不损失模型准确性的情况下压缩互动历史。因此，这项工作，协作感知数据压缩（CADC），采取了两步方法来压缩训练数据集。在第一步中，我们使用用户-项目互动矩阵的矩阵分解来为用户和项目创建新颖的嵌入表示。一旦用户和项目嵌入通过互动历史信息丰富，该方法然后应用训练数据集的均匀随机抽样，大幅减少训练数据集的大小，同时最大限度地减少模型准确性下降。CADC的源代码可在以下网址获取：https://anonymous.4open.science/r/DSS-RM-8C1D/README.md。

更新时间: 2024-07-11 00:54:56

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08108v1

Advanced Meta-Ensemble Machine Learning Models for Early and Accurate Sepsis Prediction to Improve Patient Outcomes

Sepsis, a critical condition from the body's response to infection, poses a major global health crisis affecting all age groups. Timely detection and intervention are crucial for reducing healthcare expenses and improving patient outcomes. This paper examines the limitations of traditional sepsis screening tools like Systemic Inflammatory Response Syndrome, Modified Early Warning Score, and Quick Sequential Organ Failure Assessment, highlighting the need for advanced approaches. We propose using machine learning techniques - Random Forest, Extreme Gradient Boosting, and Decision Tree models - to predict sepsis onset. Our study evaluates these models individually and in a combined meta-ensemble approach using key metrics such as Accuracy, Precision, Recall, F1 score, and Area Under the Receiver Operating Characteristic Curve. Results show that the meta-ensemble model outperforms individual models, achieving an AUC-ROC score of 0.96, indicating superior predictive accuracy for early sepsis detection. The Random Forest model also performs well with an AUC-ROC score of 0.95, while Extreme Gradient Boosting and Decision Tree models score 0.94 and 0.90, respectively.

Updated: 2024-07-11 00:51:32

标题: 高级元集成机器学习模型用于早期和准确的败血症预测，以改善患者结果

摘要: Sepsis是一种由机体对感染的反应引起的危重病情，它是影响所有年龄组的全球性重大健康危机。及时检测和干预对于降低医疗费用和改善患者预后至关重要。本文审查了传统脓毒症筛查工具（如全身性炎症反应综合征、改良早期警示评分和快速连续器官功能衰竭评估）的局限性，突出了对先进方法的需求。我们提出使用机器学习技术 - 随机森林、极端梯度提升和决策树模型 - 来预测脓毒症发作。我们的研究评估了这些模型的单独表现以及采用关键指标（如准确率、精确率、召回率、F1得分和接收者操作特征曲线下的面积）的组合元集成方法。结果显示，元集成模型优于单独模型，在早期脓毒症检测的预测准确性方面达到了AUC-ROC分数为0.96，显示出卓越的预测准确性。随机森林模型也表现良好，AUC-ROC得分为0.95，而极端梯度提升和决策树模型分别得分为0.94和0.90。

更新时间: 2024-07-11 00:51:32

领域: cs.LG

下载: http://arxiv.org/abs/2407.08107v1

Federated Learning and AI Regulation in the European Union: Who is liable? An Interdisciplinary Analysis

The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.

Updated: 2024-07-11 00:41:16

标题: 《欧盟中的联邦学习和人工智能监管：谁应承担责任？跨学科分析》

摘要: 欧盟人工智能法案要求在开发和部署机器学习应用程序时明确利益相关者的责任，以避免巨额罚款，优先考虑私密和安全的数据处理，数据仍保留在其原始位置。联合学习（FL）使得可以跨数据孤立域训练生成式人工智能模型，仅分享模型参数，同时提高数据安全性。由于FL是一种合作学习范式，客户端和服务器在FL管道中自然分享法律责任。我们的工作有助于澄清双方的角色，解释将责任转移到服务器运营商的策略，并指出我们必须解决的开放技术挑战，以提高FL在欧盟人工智能法案下的实际适用性。

更新时间: 2024-07-11 00:41:16

领域: cs.AI,K.5; I.2.11; C.2.4; D.2.1

下载: http://arxiv.org/abs/2407.08105v1

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e.g., text and images) to improve the model performance. Recently, CLIP has emerged as an effective approach that employs vision-language contrastive pretraining to learn joint image and text representations and exhibits remarkable performance in zero-shot learning and text-guided natural image generation. Despite the huge practical success of CLIP, its theoretical understanding remains elusive. In this paper, we formally study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned. We also analyze its zero-shot transfer performance on the downstream tasks. Inspired by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.

Updated: 2024-07-11 00:38:08

标题: 理解CLIP中的可转移表示学习和零样本迁移

摘要: 多模态学习因其能够利用不同数据源（如文本和图像）的信息来提高模型性能而变得越来越流行。最近，CLIP已经成为一种有效的方法，采用视觉-语言对比预训练来学习联合图像和文本表示，并在零样本学习和文本引导的自然图像生成中表现出卓越的性能。尽管CLIP取得了巨大的实际成功，但其理论理解仍然难以捉摸。在本文中，我们正式研究了CLIP背后的可转移表示学习，并展示了不同模态特征如何对齐。我们还分析了其在下游任务上的零样本转移性能。受我们分析的启发，我们提出了一种新的类似于CLIP的方法，它在基准数据集上比CLIP和其他最先进的方法表现更好。

更新时间: 2024-07-11 00:38:08

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.00927v2

Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates

Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies. For example, SGD methods are used to train powerful large language models (LLMs) such as versions of ChatGPT and Gemini, SGD methods are employed to create successful generative AI based text-to-image creation models such as Midjourney, DALL-E, and Stable Diffusion, but SGD methods are also used to train DNNs to approximately solve scientific models such as partial differential equation (PDE) models from physics and biology and optimal control and stopping problems from engineering. It is known that the plain vanilla standard SGD method fails to converge even in the situation of several convex optimization problems if the learning rates are bounded away from zero. However, in many practical relevant training scenarios, often not the plain vanilla standard SGD method but instead adaptive SGD methods such as the RMSprop and the Adam optimizers, in which the learning rates are modified adaptively during the training process, are employed. This naturally rises the question whether such adaptive optimizers, in which the learning rates are modified adaptively during the training process, do converge in the situation of non-vanishing learning rates. In this work we answer this question negatively by proving that adaptive SGD methods such as the popular Adam optimizer fail to converge to any possible random limit point if the learning rates are asymptotically bounded away from zero. In our proof of this non-convergence result we establish suitable pathwise a priori bounds for a class of accelerated and adaptive SGD methods, which are also of independent interest.

Updated: 2024-07-11 00:10:35

标题: Adam和其他自适应随机梯度下降优化方法在非消失学习率情况下的不收敛问题

摘要: 深度学习算法-通常由一类通过随机梯度下降（SGD）优化方法训练的深度神经网络组成-如今是许多人工智能（AI）系统中的关键要素，并且已经彻底改变了我们在现代社会中工作和生活的方式。例如，SGD方法被用于训练强大的大型语言模型（LLMs），如ChatGPT和Gemini的版本，SGD方法被用于创建成功的生成式AI文本到图像创建模型，如Midjourney、DALL-E和Stable Diffusion，但SGD方法也被用于训练DNN以近似解决来自物理和生物学的偏微分方程（PDE）模型以及来自工程学的最优控制和停止问题。众所周知，即使在学习速率远离零的情况下，纯粹的标准SGD方法也无法收敛于几个凸优化问题的情况。然而，在许多实际相关的训练场景中，通常不使用纯粹的标准SGD方法，而是使用自适应SGD方法，如RMSprop和Adam优化器，在训练过程中逐步修改学习速率。这自然引出了一个问题，即这种自适应优化器，在训练过程中逐步修改学习速率，是否会在学习速率非零的情况下收敛。在这项工作中，我们通过证明自适应SGD方法，如流行的Adam优化器，如果学习速率在渐近上远离零，则无法收敛到任何可能的随机极限点来否定地回答了这个问题。在我们对这一非收敛结果的证明中，我们为一类加速和自适应SGD方法建立了适当的路径先验界限，这也是独立利益所在。

更新时间: 2024-07-11 00:10:35

领域: cs.LG,math.OC,math.PR,60J22 (Primary), 65K10, 60J20, 65C40 (Secondary),G.1.6; F.2.0; G.3

下载: http://arxiv.org/abs/2407.08100v1