Arxiv Day: Article

A Taxonomy and Comparative Analysis of IPv4 ID Selection Correctness, Security, and Performance

The battle for a more secure Internet is waged on many fronts, including the most basic of networking protocols. Our focus is the IPv4 Identifier (IPID), an IPv4 header field as old as the Internet with an equally long history as an exploited side channel for scanning network properties, inferring off-path connections, and poisoning DNS caches. This article taxonomizes the 25-year history of IPID-based exploits and the corresponding changes to IPID selection methods. By mathematically analyzing these methods' correctness and security and empirically evaluating their performance, we reveal recommendations for best practice as well as shortcomings of current operating system implementations, emphasizing the value of systematic evaluations in network security.

Updated: 2024-07-12 23:44:41

标题: 一个IPv4 ID选择正确性、安全性和性能的分类和比较分析

摘要: 为了更安全的互联网，人们在许多方面进行了激烈的战斗，包括最基本的网络协议。我们的焦点是IPv4标识符（IPID），这是一个与互联网同样古老的IPv4头字段，也同样长期被利用作为扫描网络属性、推断路径外连接和毒化DNS缓存的侧通道。这篇文章对基于IPID的攻击历史进行了25年的分类，并对IPID选择方法的相应变化进行了分析。通过对这些方法的正确性和安全性进行数学分析，并通过实证评估它们的性能，我们揭示了最佳实践建议以及目前操作系统实现的不足之处，强调了在网络安全领域进行系统评估的价值。

更新时间: 2024-07-12 23:44:41

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2406.06483v2

Equivariant vs. Invariant Layers: A Comparison of Backbone and Pooling for Point Cloud Classification

Learning from set-structured data, such as point clouds, has gained significant attention from the machine learning community. Geometric deep learning provides a blueprint for designing effective set neural networks that preserve the permutation symmetry of set-structured data. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression/classification head. While existing literature has focused on improving equivariant backbones, the impact of the pooling layer is often overlooked. In this paper, we examine the interplay between permutation equivariant backbones and permutation invariant global pooling on three benchmark point cloud classification datasets. Our findings reveal that: 1) complex pooling methods, such as transport-based or attention-based poolings, can significantly boost the performance of simple backbones, but the benefits diminish for more complex backbones, 2) even complex backbones can benefit from pooling layers in low data scenarios, 3) surprisingly, the choice of pooling layers can have a more significant impact on the model's performance than adjusting the width and depth of the backbone, and 4) pairwise combination of pooling layers can significantly improve the performance of a fixed backbone. Our comprehensive study provides insights for practitioners to design better permutation invariant set neural networks. Our code is available at https://github.com/mint-vu/backbone_vs_pooling.

Updated: 2024-07-12 23:40:41

标题: 等变 vs. 不变层：骨干和池化在点云分类中的比较

摘要: 学习来自集合结构化数据，如点云，已经引起了机器学习社区的极大关注。几何深度学习提供了设计有效的保持集合结构化数据的置换对称性的神经网络的蓝图。我们感兴趣的是置换不变网络，它由一个置换等变主干、置换不变全局池化和回归/分类头部组成。虽然现有文献已经集中于改进等变主干，但池化层的影响通常被忽视。在本文中，我们研究了置换等变主干和置换不变全局池化在三个基准点云分类数据集上的相互作用。我们的研究发现：1）复杂的池化方法，如基于传输或注意力的池化，可以显著提升简单主干的性能，但对于更复杂的主干，这种好处会减少，2）即使在数据稀缺情况下，即使复杂主干也可以从池化层中受益，3）令人惊讶的是，池化层的选择对模型的性能影响可能比调整主干的宽度和深度更显著，4）池化层的成对组合可以显著提高固定主干的性能。我们的综合研究为从业者设计更好的置换不变集合神经网络提供了见解。我们的代码可在https://github.com/mint-vu/backbone_vs_pooling 上找到。

更新时间: 2024-07-12 23:40:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.05553v2

Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference

Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generates the same number of tokens with fewer runs of the large model, hence accelerating the overall inference by $1$-$2\times$. However, greedy decoding is not the optimal decoding algorithm in terms of output perplexity, which is a direct measurement of the effectiveness of a decoding algorithm. An algorithm that has better output perplexity and even better efficiency than speculative decoding can be more useful in practice. To achieve this seemingly contradictory goal, we first introduce multi-token joint greedy decoding (MJGD), which greedily generates multiple tokens at each step based on their joint perplexity. We show that it leads to better perplexity for the whole output. But the computation cost of MJGD is infeasible in practice. So we further propose multi-token joint speculative decoding (MJSD), which approximates and accelerates the MJGD from two aspects: it approximates the joint distribution of the large model with that of a small model, and uses a verification step to guarantee the accuracy of approximation; then it uses beam decoding to accelerate the sequence generation from the joint distribution. Compared with vanilla speculative decoding, MJSD has two advantages: (1) it is an approximation of MJGD, thus achieving better output perplexity; (2) verification with joint likelihood allows it to accept the longest prefix sub-sequence of the draft tokens with valid perplexity, leading to better efficiency...

Updated: 2024-07-12 23:29:54

标题: 多令牌联合推测解码以加速大型语言模型推理

摘要: 基于Transformer的大型语言模型（LLMs）已经展示了它们在各种任务中的强大能力，但它们的推理会产生显著的时间和能量成本。为了加速LLM推理，猜测解码使用一个较小的模型提出一个token序列，随后由目标大型模型批量验证。与自回归解码相比，猜测解码生成相同数量的token，但使用大模型运行次数更少，因此将整体推理加速了1-2倍。然而，贪婪解码并非在输出困惑度方面最佳的解码算法，困惑度是解码算法效果的直接度量。具有更好输出困惑度甚至更高效率的算法可能在实践中更有用。为了实现这一看似矛盾的目标，我们首先介绍了多token联合贪婪解码（MJGD），它根据它们的联合困惑度每步贪婪地生成多个token。我们展示了它导致整体输出更好的困惑度。但MJGD的计算成本在实践中是不可行的。因此，我们进一步提出了多token联合猜测解码（MJSD），它从两个方面近似和加速了MJGD：它使用小模型的联合分布近似大模型的分布，并使用验证步骤来保证近似的准确性；然后使用波束解码来加速从联合分布生成序列。与普通的猜测解码相比，MJSD有两个优势：（1）它是MJGD的近似，因此实现了更好的输出困惑度；（2）使用联合似然度进行验证，允许接受具有有效困惑度的草稿token最长前缀子序列，从而带来更好的效率。

更新时间: 2024-07-12 23:29:54

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.09722v1

MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models

Material selection plays a pivotal role in many industries, from manufacturing to construction. Material selection is usually carried out after several cycles of conceptual design, during which designers iteratively refine the design solution and the intended manufacturing approach. In design research, material selection is typically treated as an optimization problem with a single correct answer. Moreover, it is also often restricted to specific types of objects or design functions, which can make the selection process computationally expensive and time-consuming. In this paper, we introduce MSEval, a novel dataset which is comprised of expert material evaluations across a variety of design briefs and criteria. This data is designed to serve as a benchmark to facilitate the evaluation and modification of machine learning models in the context of material selection for conceptual design.

Updated: 2024-07-12 23:27:33

标题: MSEval：用于概念设计中材料选择的数据集，用于评估算法模型

摘要: 材料选择在许多行业中起着至关重要的作用，从制造到建筑。通常在几轮概念设计之后进行材料选择，在这期间设计师反复完善设计解决方案和预期的制造方法。在设计研究中，材料选择通常被视为一个具有单一正确答案的优化问题。此外，它通常也受限于特定类型的物体或设计功能，这可能使选择过程在计算上昂贵且耗时。在本文中，我们介绍了MSEval，这是一个由专家材料评估组成的新型数据集，涵盖各种设计要求和标准。这些数据旨在作为一个基准，以促进在概念设计的材料选择背景下评估和修改机器学习模型。

更新时间: 2024-07-12 23:27:33

领域: cs.LG

下载: http://arxiv.org/abs/2407.09719v1

Differentially Private Stream Processing at Scale

We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.

Updated: 2024-07-12 23:18:22

标题: 规模化的差分隐私流处理

摘要: 我们设计了据我们所知规模最大的首个差分隐私（DP）流聚合处理系统。我们的系统——差分隐私SQL管道（DP-SQLP）——使用类似于Spark流处理的流式框架构建，并建立在谷歌的Spanner数据库和F1查询引擎之上。为了设计DP-SQLP，我们在算法和系统方面都取得了进展，即，（i）设计了一种新颖的（用户级别）DP密钥选择算法，可以在可能的密钥无限集上运行，并且可以扩展到用户贡献的十亿个密钥，（ii）设计了一种DP密钥选择的主动执行方案，避免在每次触发时枚举所有密钥，（iii）使用DP连续观察的算法技术发布用户在流长度上对不同密钥的连续DP直方图。我们通过实证验证明了有效性，至少在我们考虑的有意义的基线上获得了16倍的误差减少。我们使用DP-SQLP为谷歌购物实现了流式差分隐私用户印象。流式DP算法进一步应用于谷歌趋势。

更新时间: 2024-07-12 23:18:22

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2303.18086v3

Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations

In this work, we address the problem of eavesdropping on digital video displays by analyzing the electromagnetic waves that unintentionally emanate from the cables and connectors, particularly HDMI. This problem is known as TEMPEST. Compared to the analog case (VGA), the digital case is harder due to a 10-bit encoding that results in a much larger bandwidth and non-linear mapping between the observed signal and the pixel's intensity. As a result, eavesdropping systems designed for the analog case obtain unclear and difficult-to-read images when applied to digital video. The proposed solution is to recast the problem as an inverse problem and train a deep learning module to map the observed electromagnetic signal back to the displayed image. However, this approach still requires a detailed mathematical analysis of the signal, firstly to determine the frequency at which to tune but also to produce training samples without actually needing a real TEMPEST setup. This saves time and avoids the need to obtain these samples, especially if several configurations are being considered. Our focus is on improving the average Character Error Rate in text, and our system improves this rate by over 60 percentage points compared to previous available implementations. The proposed system is based on widely available Software Defined Radio and is fully open-source, seamlessly integrated into the popular GNU Radio framework. We also share the dataset we generated for training, which comprises both simulated and over 1000 real captures. Finally, we discuss some countermeasures to minimize the potential risk of being eavesdropped by systems designed based on similar principles.

Updated: 2024-07-12 23:07:37

标题: 深度TEMPEST：利用深度学习窃听HDMI的非预期电磁辐射

摘要: 在这项工作中，我们解决了通过分析从电缆和连接器（特别是HDMI）无意中发出的电磁波，窃听数字视频显示的问题。这个问题被称为TEMPEST。与模拟情况（VGA）相比，数字情况更难处理，因为10位编码导致更大的带宽和观察信号与像素强度之间的非线性映射。因此，为模拟情况设计的窃听系统在应用于数字视频时会得到不清晰且难以阅读的图像。提出的解决方案是将问题重新构造为一个反问题，并训练一个深度学习模块将观察到的电磁信号映射回显示的图像。然而，这种方法仍然需要对信号进行详细的数学分析，首先确定要调谐的频率，同时产生训练样本，而不实际需要真实的TEMPEST设置。这样可以节省时间，避免获取这些样本的需要，特别是如果正在考虑多个配置。我们的重点是提高文本中的平均字符错误率，与之前可用的实现相比，我们的系统将这一率提高了超过60个百分点。提出的系统基于广泛可用的软件定义无线电，并完全开源，无缝集成到流行的GNU Radio框架中。我们还分享了我们为训练生成的数据集，其中包括模拟和超过1000个真实捕获。最后，我们讨论了一些减少被基于类似原则设计的系统窃听的潜在风险的对策。

更新时间: 2024-07-12 23:07:37

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09717v1

One for All: Towards Training One Graph Model for All Classification Tasks

Designing a single model to address multiple tasks has been a long-standing objective in artificial intelligence. Recently, large language models have demonstrated exceptional capability in solving different tasks within the language domain. However, a unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. First, graph data from different areas carry distinct attributes and follow different distributions. Such discrepancy makes it hard to represent graphs in a single representation space. Second, tasks on graphs diversify into node, link, and graph tasks, requiring distinct embedding strategies. Finally, an appropriate graph prompting paradigm for in-context learning is unclear. We propose \textbf{One for All (OFA)}, the first general framework that can use a single graph model to address the above challenges. Specifically, OFA proposes text-attributed graphs to unify different graph data by describing nodes and edges with natural language and uses language models to encode the diverse and possibly cross-domain text attributes to feature vectors in the same embedding space. Furthermore, OFA introduces the concept of nodes-of-interest to standardize different tasks with a single task representation. For in-context learning on graphs, OFA introduces a novel graph prompting paradigm that appends prompting substructures to the input graph, which enables it to address varied tasks without fine-tuning. We train the OFA model using graph data from multiple domains (including citation networks, molecular graphs, knowledge graphs, etc.) simultaneously and evaluate its ability in supervised, few-shot, and zero-shot learning scenarios. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.

Updated: 2024-07-12 23:01:32

标题: 一对所有：朝着为所有分类任务训练一个图模型

摘要: 在人工智能领域，设计一个可以处理多个任务的单一模型一直是一个长期以来的目标。最近，大型语言模型展示了在语言领域内解决不同任务的卓越能力。然而，针对各种图任务的统一模型仍然未被充分探索，主要是由于图学习领域独特的挑战。首先，来自不同领域的图数据具有不同的属性并遵循不同的分布。这种差异使得将图表示在单一表示空间中变得困难。其次，图上的任务多样化为节点、链接和图任务，需要不同的嵌入策略。最后，在上下文学习中适合的图提示范式尚不清晰。我们提出了“一切皆有可能（OFA）”，这是第一个可以使用单一图模型解决上述挑战的通用框架。具体来说，OFA提出了文本属性图，通过用自然语言描述节点和边，使用语言模型将不同的文本属性编码成相同嵌入空间中的特征向量来统一不同的图数据。此外，OFA引入了感兴趣节点的概念，用一个任务表示来标准化不同任务。为了图上的上下文学习，OFA引入了一种新颖的图提示范式，将提示子结构附加到输入图中，使其能够解决各种任务而无需微调。我们使用来自多个领域的图数据（包括引文网络、分子图、知识图等）同时训练OFA模型，并评估其在监督、少样本和零样本学习场景中的能力。OFA在不同任务中表现良好，使其成为图上第一个通用跨领域分类模型。

更新时间: 2024-07-12 23:01:32

领域: cs.LG

下载: http://arxiv.org/abs/2310.00149v3

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.

Updated: 2024-07-12 22:45:41

标题: BiSHop：具有广义稀疏现代霍普菲尔德模型的表格数据双向细胞学习

摘要: 我们介绍了BiSHop（Bi-Directional Sparse Hopfield Network），这是一个用于深度表格学习的全新端到端框架。BiSHop处理了深度表格学习的两个主要挑战：非旋转不变数据结构和表格数据中的特征稀疏性。我们的主要动机来自于最近建立的联想记忆和注意机制之间的连接。因此，BiSHop使用了双组件方法，通过两个互连的方向学习模块逐列和逐行顺序处理数据。在计算上，这些模块包含了通用稀疏现代Hopfield层，这是现代Hopfield模型的稀疏扩展，具有可调整的稀疏性。在方法上，BiSHop促进了多尺度表示学习，捕捉了每个尺度上的特征内部和特征间的相互作用，具有自适应的稀疏性。在实证方面，通过在多样的真实世界数据集上进行实验，我们证明了BiSHop超越了当前SOTA方法，且所需的HPO运行明显较少，使其成为深度表格学习的强大解决方案。

更新时间: 2024-07-12 22:45:41

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.03830v2

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.

Updated: 2024-07-12 22:30:35

标题: 模型映射：评估对基于LLM的编程助手的敌对攻击的影响

摘要: 基于LLM的编程助手承诺更快地进行编程，但也存在引入更多安全漏洞的风险。先前的研究已经探讨了LLMs如何被恶意微调以更频繁地建议漏洞。随着代理性LLMs的兴起，可能会使用来自不受信任第三方的结果，模型提示面临越来越大的攻击风险。我们介绍了恶意编程提示(MaPP)攻击，攻击者在编程任务的提示中添加少量文本(不超过500字节)。我们展示了我们的提示策略可以导致LLM添加漏洞，同时继续编写正确的代码。我们对七种常见的LLM，从基础型到最先进的商业模型，评估了三个提示。使用HumanEval基准测试，我们发现我们的提示广泛有效，不需要为不同的LLM进行定制。此外，在HumanEval中表现最佳的LLMs也最擅长遵循我们的恶意指令，这表明简单地扩展语言模型并不能防止MaPP攻击。使用包含16个场景的八个CWE的数据集，我们发现MaPP攻击也能够有效地在各种模型中实现特定和有针对性的漏洞。我们的工作强调了确保LLM提示免受操纵以及严格审计通过LLMs辅助生成的代码的必要性。

更新时间: 2024-07-12 22:30:35

领域: cs.CR,cs.AI,I.2.2

下载: http://arxiv.org/abs/2407.11072v1

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph data into a language format for LLM-based prediction or still train a GNN model with LLM as an assistant. The former can handle unlimited tasks, while the latter captures graph structure much better -- yet, no existing work can achieve both simultaneously. In this paper, we identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA to solve the problem. The model interleaves randomly initialized GNN layers into a frozen pre-trained LLM so that the semantic and structural modeling abilities are organically combined. GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks to obtain the above GFM properties. The pre-trained model is further fine-tuned on downstream tasks to obtain task-solving ability. The fine-tuned model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/JiaruiFeng/GOFA.

Updated: 2024-07-12 22:23:51

标题: GOFA：一种用于联合图语言建模的生成式通用模型

摘要: 基础模型，如大型语言模型（LLMs）或大型视觉模型（LVMs），已经成为各自领域中最强大的工具之一。然而，与文本和图像数据不同，图数据没有明确定义的结构，这给开发图基础模型（GFM）带来了巨大挑战。例如，目前设计通用图模型的尝试要么将图数据转换为LLM基于预测的语言格式，要么仍然训练一个带有LLM助手的GNN模型。前者可以处理无限的任务，而后者能更好地捕捉图结构--然而，目前没有现有工作可以同时实现这两者。在本文中，我们确定了GFM的三个关键可取属性：自监督预训练、任务流动性和图意识。为了考虑这些属性，我们将传统语言建模扩展到图领域，并提出了一种新颖的生成图语言模型GOFA来解决这个问题。该模型将随机初始化的GNN层交错插入到冻结的预训练LLM中，以便有机地结合语义和结构建模能力。GOFA在新提出的图级下一个词预测、问答和结构任务上进行预训练，以获得上述GFM属性。预训练模型进一步在下游任务上进行微调，以获得任务解决能力。微调后的模型在各种下游任务上进行评估，展示了在零-shot场景下解决结构和上下文问题的强大能力。代码可在https://github.com/JiaruiFeng/GOFA上找到。

更新时间: 2024-07-12 22:23:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.09709v1

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. FlashAttention elaborated an approach to speed up attention on GPUs through minimizing memory reads/writes. However, it has yet to take advantage of new capabilities present in recent hardware, with FlashAttention-2 achieving only 35% utilization on the H100 GPU. We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. We demonstrate that our method, FlashAttention-3, achieves speedup on H100 GPUs by 1.5-2.0$\times$ with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. We validate that FP8 FlashAttention-3 achieves 2.6$\times$ lower numerical error than a baseline FP8 attention.

Updated: 2024-07-12 22:15:02

标题: FlashAttention-3：具有异步和低精度的快速准确注意力

摘要: 注意力作为无处不在的Transformer架构的核心层，是大型语言模型和长文本应用的瓶颈。FlashAttention通过最小化内存读写来加速GPU上的注意力，但它尚未利用最新硬件中的新功能，FlashAttention-2仅在H100 GPU上实现了35%的利用率。我们开发了三种主要技术来加速Hopper GPU上的注意力：利用张量核心和TMA的异步性来通过warp-specialization重叠整体计算和数据移动，交错块状matmul和softmax操作，以及利用硬件支持FP8低精度的块量化和非连贯处理。我们证明我们的方法FlashAttention-3在H100 GPU上实现了1.5-2.0倍的加速，FP16达到了高达740 TFLOPs/s（利用率为75%），而FP8接近1.2 PFLOPs/s。我们验证FP8 FlashAttention-3比基线FP8注意力实现了2.6倍的数值误差更低。

更新时间: 2024-07-12 22:15:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08608v2

Diagnosing and Re-learning for Balanced Multimodal Learning

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state of each modality is firstly estimated based on the separability of its uni-modal representation space, and then used to softly re-initialize the corresponding uni-modal encoder. In this way, the over-emphasizing of scarcely informative modalities is avoided. In addition, encoders of worse-learnt modalities are enhanced, simultaneously avoiding the over-training of other modalities. Accordingly, multimodal learning is effectively balanced and enhanced. Experiments covering multiple types of modalities and multimodal frameworks demonstrate the superior performance of our simple-yet-effective method for balanced multimodal learning. The source code and dataset are available at \url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024}.

Updated: 2024-07-12 22:12:03

标题: 诊断和重新学习用于平衡多模式学习

摘要: 为了克服多模态学习中存在的不平衡问题，即模型更倾向于训练特定模态的情况，现有方法提出从不同角度控制单模态编码器的训练，以互模态性能差距为基础。然而，模态容量的固有限制被忽略了。信息稀缺的模态可能被识别为“学得更差”的模态，这可能迫使模型记住更多噪音，对多模态模型能力产生反作用。此外，当前的模态调制方法狭窄地集中在选定的学得更差的模态上，甚至抑制了其他模态的训练。因此，在平衡时考虑模态容量的固有限制并同时考虑所有模态是至关重要的。为此，我们提出了诊断与重学习方法。首先基于单模态表示空间的可分性估计每个模态的学习状态，然后用于软重置相应的单模态编码器。通过这种方式，避免了对信息稀缺模态的过度强调。此外，加强了学得更差模态的编码器，同时避免了对其他模态的过度训练。因此，多模态学习得到了有效的平衡和增强。涵盖多种类型模态和多模态框架的实验证明了我们简单而有效的平衡多模态学习方法的卓越性能。源代码和数据集可在\url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024}上找到。

更新时间: 2024-07-12 22:12:03

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.09705v1

Deep Generative Models for Detector Signature Simulation: A Taxonomic Review

In modern collider experiments, the quest to explore fundamental interactions between elementary particles has reached unparalleled levels of precision. Signatures from particle physics detectors are low-level objects (such as energy depositions or tracks) encoding the physics of collisions (the final state particles of hard scattering interactions). The complete simulation of them in a detector is a computational and storage-intensive task. To address this computational bottleneck in particle physics, alternative approaches have been developed, introducing additional assumptions and trade off accuracy for speed.The field has seen a surge in interest in surrogate modeling the detector simulation, fueled by the advancements in deep generative models. These models aim to generate responses that are statistically identical to the observed data. In this paper, we conduct a comprehensive and exhaustive taxonomic review of the existing literature on the simulation of detector signatures from both methodological and application-wise perspectives. Initially, we formulate the problem of detector signature simulation and discuss its different variations that can be unified. Next, we classify the state-of-the-art methods into five distinct categories based on their underlying model architectures, summarizing their respective generation strategies. Finally, we shed light on the challenges and opportunities that lie ahead in detector signature simulation, setting the stage for future research and development.

Updated: 2024-07-12 22:11:43

标题: 深度生成模型用于探测器特征模拟：一个分类审查

摘要: 在现代对撞机实验中，探索基本粒子之间的相互作用已经达到了前所未有的精度水平。粒子物理探测器中的特征是低级对象（如能量沉积或轨迹），编码了碰撞的物理学（硬散射相互作用的最终态粒子）。在探测器中完全模拟它们是一项计算和存储密集的任务。为了解决这种粒子物理学中的计算瓶颈，已经开发了替代方法，引入了额外的假设并以速度为代价来交换精度。该领域在模拟探测器的兴趣激增，受到了深度生成模型的进展推动。这些模型旨在生成与观察数据在统计上相同的响应。在本文中，我们从方法论和应用角度对现有文献中关于探测器特征模拟的全面和详尽的分类进行了回顾。首先，我们阐明了探测器特征模拟的问题，并讨论了可以统一的不同变体。接下来，我们根据其基本模型结构将最先进的方法分类为五类，总结它们各自的生成策略。最后，我们揭示了探测器特征模拟面临的挑战和机遇，为未来的研究和发展奠定了基础。

更新时间: 2024-07-12 22:11:43

领域: physics.ins-det,cs.LG,hep-ex,hep-ph,physics.data-an

下载: http://arxiv.org/abs/2312.09597v2

Investigating the Interplay of Prioritized Replay and Generalization

Experience replay is ubiquitous in reinforcement learning, to reuse past data and improve sample efficiency. Though a variety of smart sampling schemes have been introduced to improve performance, uniform sampling by far remains the most common approach. One exception is Prioritized Experience Replay (PER), where sampling is done proportionally to TD errors, inspired by the success of prioritized sweeping in dynamic programming. The original work on PER showed improvements in Atari, but follow-up results are mixed. In this paper, we investigate several variations on PER, to attempt to understand where and when PER may be useful. Our findings in prediction tasks reveal that while PER can improve value propagation in tabular settings, behavior is significantly different when combined with neural networks. Certain mitigations -- like delaying target network updates to control generalization and using estimates of expected TD errors in PER to avoid chasing stochasticity -- can avoid large spikes in error with PER and neural networks, but nonetheless generally do not outperform uniform replay. In control tasks, none of the prioritized variants consistently outperform uniform replay.

Updated: 2024-07-12 21:56:24

标题: 调查优先回放和泛化之间的相互作用

摘要: 经验重播在强化学习中是普遍存在的，可以重复使用过去的数据并提高样本效率。尽管已经引入了各种智能抽样方案来提高性能，但均匀抽样仍然是最常见的方法。优先经验重播（PER）是一个例外，其中抽样是根据TD误差比例进行的，受到动态规划中优先扫描的成功启发。关于PER的原始工作在Atari中显示了改进，但后续结果却是参差不齐的。在本文中，我们调查了几种PER的变体，以尝试理解在何时何地PER可能是有用的。我们在预测任务中的发现表明，虽然PER可以改善表格设置中的值传播，但在与神经网络结合时，行为显著不同。某些缓解措施，比如延迟目标网络更新以控制泛化，并在PER中使用预期TD误差的估计来避免追逐随机性，可以避免PER和神经网络中的误差大幅上升，但总体上并没有超过均匀重播。在控制任务中，并没有任何优先变体始终胜过均匀重播。

更新时间: 2024-07-12 21:56:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09702v1

Compact Proofs of Model Performance via Mechanistic Interpretability

We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-07-12 21:51:34

标题: 通过机械解释紧凑证明模型性能

摘要: 我们提出使用机械可解释性——将模型权重逆向工程成人类可解释算法的技术——来推导并紧凑地证明模型性能的形式保证。我们通过在151个小型transformer上正式证明在Max-of-$K$任务上的准确性下限来原型化这种方法。我们创建了102种不同的计算机辅助证明策略，并评估它们在我们的每个模型上的长度和紧密性。使用定量指标，我们发现较短的证据似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解会导致更紧密的性能边界。我们通过定性地检查我们证据的子集来确认这些联系。最后，我们确定结构性噪声的复合是利用机械可解释性来产生关于模型性能的紧凑证据的关键挑战。

更新时间: 2024-07-12 21:51:34

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v8

RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannian geometry, where the geodesic distances accurately capture the development of correlations. We propose Rio-CPD, a non-parametric correlation-aware online change point detection framework that combines the Riemannian geometry of the manifold of symmetric positive definite matrices and the cumulative sum statistic (CUSUM) for detecting change points. Rio-CPD enhances CUSUM by computing the geodesic distance from present observations to the Fr\'echet mean of previous observations. With careful choice of metrics equipped to the Riemannian geometry, Rio-CPD is simple and computationally efficient. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods in detection accuracy and efficiency.

Updated: 2024-07-12 21:42:51

标题: RIO-CPD：一种基于黎曼几何的相关性感知在线变点检测方法

摘要: 变点检测的目标是识别数据序列中潜在的多个点的突变变化。在在线环境中，各种类型的变化都可能发生，包括数据的边际和联合分布的变化，这使得这项任务变得尤为具有挑战性。本文通过在黎曼几何上顺序跟踪相关矩阵来解决这些挑战，其中测地距离能够准确捕捉相关性的发展。我们提出了Rio-CPD，这是一个非参数的、关注相关性的在线变点检测框架，它结合了对称正定矩阵流形的黎曼几何和累积和统计量(CUSUM)来检测变点。Rio-CPD通过计算从当前观察到以前观察的Fréchet平均值的测地距离来增强CUSUM。通过精心选择适合黎曼几何的度量标准，Rio-CPD既简单又具有高效的计算性能。对合成和真实数据集的实验结果表明，Rio-CPD在检测准确性和效率方面优于现有方法。

更新时间: 2024-07-12 21:42:51

领域: cs.LG

下载: http://arxiv.org/abs/2407.09698v1

A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering.

Updated: 2024-07-12 21:26:21

标题: 一个数学框架，建模范式分类法和神经符号系统学习技术套件

摘要: 神经符号（NeSy）系统领域正在迅速发展。提出的方法显示出在实现神经和符号方法的共生联盟方面具有巨大潜力。然而，每个NeSy系统在根本上有所不同。迫切需要一个统一的理论来阐明方法的共同点和差异，并促进进一步的进展。在本文中，我们引入了神经符号能量基模型（NeSy-EBMs），这是一个统一的数学框架，用于具有概率和非概率NeSy方法的判别和生成建模。我们利用NeSy-EBMs开发了一个建模范式分类法，重点关注系统的神经符号界面和推理能力。此外，我们介绍了一套针对NeSy-EBMs的学习技术。重要的是，NeSy-EBMs允许推导出突出的学习损失的梯度的一般表达式，并提供了利用多个领域的方法的四种学习方法，包括双层和随机策略优化。最后，我们介绍神经概率软逻辑（NeuPSL），这是一个面向可扩展性和表达能力设计的开源NeSy-EBM库，促进了NeSy系统在实际应用中的应用。通过对多个数据集进行广泛的实证分析，我们展示了NeSy-EBMs在各种任务中的实际优势，包括图像分类、图节点标记、自动驾驶车辆情景感知和问题回答。

更新时间: 2024-07-12 21:26:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09693v1

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.

Updated: 2024-07-12 21:25:42

标题: ProTIP：针对随机扰动的文本到图像扩散模型的概率鲁棒性验证

摘要: 文本到图像（T2I）扩散模型（DMs）已经展示出在基于简单文本描述生成高质量图像方面令人印象深刻的能力。然而，与许多深度学习（DL）模型一样，DMs存在缺乏鲁棒性的问题。尽管有人尝试将评估T2I DMs的鲁棒性作为二元或最坏情况问题，但他们无法回答模型在发现对抗样本（AE）时总体上有多少鲁棒性。在本研究中，我们首先引入了T2I DMs鲁棒性的概率概念；然后建立了一个有效的框架ProTIP来评估它并提供统计保证。主要挑战源于：i）生成过程的高计算成本；以及ii）确定扰动输入是否是对抗样本涉及比较两个输出分布，这与其他DL任务（如分类）相比更加困难，其中对抗样本在误预测标签时被识别。为了解决这些挑战，我们在统计测试中采用了有效性和无效性早停规则的顺序分析，以识别对抗样本，并使用自适应浓度不等式动态确定“刚刚好”的随机扰动数量，以满足验证目标。实证实验验证了ProTIP相对于常见的T2I DMs的有效性和效率。最后，我们展示了ProTIP在排名常用防御方法方面的应用。

更新时间: 2024-07-12 21:25:42

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.15429v2

EVOLVE: Predicting User Evolution and Network Dynamics in Social Media Using Fine-Tuned GPT-like Model

Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demographic attributes and the networks they establish. In this research, we propose a predictive method to understand how a user evolves on social media throughout their life and to forecast the next stage of their evolution. We fine-tune a GPT-like decoder-only model (we named it E-GPT: Evolution-GPT) to predict the future stages of a user's evolution in online social media. We evaluate the performance of these models and demonstrate how user attributes influence changes within their network by predicting future connections and shifts in user activities on social media, which also addresses other social media challenges such as recommendation systems.

Updated: 2024-07-12 21:20:57

标题: EVOLVE：使用微调的类GPT模型预测社交媒体中用户演变和网络动态

摘要: 社交媒体平台被广泛用于分享个人情感、日常活动和各种生活事件，使人们了解最新发生的事情。从用户创建账户的那一刻起，他们不断扩展朋友或关注者的网络，通过发布、评论和分享内容与他人自由互动。随着时间的推移，用户行为基于人口属性和他们建立的网络而演变。在这项研究中，我们提出了一种预测方法，以了解用户在社交媒体上如何演变，并预测他们演变的下一个阶段。我们对类似GPT的仅解码器模型进行了微调（我们将其命名为E-GPT：Evolution-GPT），以预测用户在在线社交媒体上的未来演变阶段。我们评估了这些模型的性能，并展示了用户属性如何通过预测未来连接和社交媒体上用户活动的变化来影响他们网络内的变化，这也解决了其他社交媒体挑战，如推荐系统。

更新时间: 2024-07-12 21:20:57

领域: cs.SI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.09691v1

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2203.06735 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.

Updated: 2024-07-12 21:20:44

标题: 重新审视无信任服务器的私有异构联邦学习：凸损失的误差最优和通信高效算法

摘要: 我们重新审视了具有不信任服务器或其他数据储存库/客户端的个人私密数据的联邦学习（FL）问题。在这种情况下，每个数据储存库（例如医院）都有来自多个人（例如患者）的数据，并且需要保护每个人的数据隐私（例如健康记录），即使服务器和/或其他储存库试图揭示这些数据。储存库间记录级差分隐私（ISRL-DP）通过要求储存库i的通信满足项目级差分隐私来防止每个储存库的数据泄漏。之前的工作arXiv:2203.06735表征了具有同质（独立同分布）储存库数据和凸损失函数的ISRL-DP算法的最佳超额风险界限。然而，两个重要问题尚未解决：（1）是否可以通过异构（非独立同分布）储存库数据实现相同的超额风险界限？（2）是否可以在较少的通信轮数内实现最优风险界限？本文中，我们对这两个问题都给出了积极的答案。我们提供了新颖的ISRL-DP FL算法，在异构储存库数据存在的情况下实现最佳超额风险界限。此外，我们的算法比先前的最新技术更具通信效率。对于平滑损失函数，我们的算法实现了最佳超额风险界限，并且具有与非私密下限匹配的通信复杂度。此外，我们的算法比以前的最新技术更具计算效率。

更新时间: 2024-07-12 21:20:44

领域: cs.LG,cs.CR,math.OC

下载: http://arxiv.org/abs/2407.09690v1

Vision-Language Models as a Source of Rewards

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

Updated: 2024-07-12 21:14:32

标题: 视觉-语言模型作为奖励的来源

摘要: 在丰富的开放式环境中构建能够实现多种目标的通用代理是强化学习的研究前沿之一。构建具有RL的通用代理的一个关键限制因素是需要大量的奖励函数来实现不同的目标。我们调查了使用现成的视觉语言模型（VLM）作为强化学习代理的奖励来源的可行性。我们展示了如何从CLIP系列模型中获取用于训练RL代理的视觉实现多种语言目标的奖励，并用于训练可以实现多种语言目标的RL代理。我们在两个不同的视觉领域展示了这种方法，并提出了一个扩展趋势，显示更大的VLM会导致更准确的视觉目标实现奖励，从而产生更有能力的RL代理。

更新时间: 2024-07-12 21:14:32

领域: cs.LG

下载: http://arxiv.org/abs/2312.09187v3

Accelerating the inference of string generation-based chemical reaction models for industrial applications

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.

Updated: 2024-07-12 20:55:59

标题: 加速基于字符串生成的化学反应模型在工业应用中的推断

摘要: 无模板的SMILES到SMILES翻译模型对于反应预测和单步逆合成在工业应用中具有重要意义，因为它们具有最先进的准确性。然而，它们的推断速度较慢。我们提出了一种通过将查询字符串子序列复制到目标字符串中的正确位置来加速自回归SMILES生成器中的推断的方法。我们将我们的方法应用于在Pytorch Lightning中实现的分子变压器，实现了反应预测和单步逆合成中超过3倍的更快推断速度，而准确性不会损失。

更新时间: 2024-07-12 20:55:59

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2407.09685v1

SAMM: Sharded Automated Market Makers

\emph{Automated Market Makers} (\emph{AMMs}) are a cornerstone of decentralized finance (DeFi) blockchain-based platforms. They are smart contracts, enabling the direct exchange of virtual tokens by maintaining \emph{liquidity pools}. Traders exchange tokens with the contract, paying a fee; liquidity comes from \emph{liquidity providers}, paid by those fees. But despite growing demand, the performance of AMMs is limited. State-of-the-art blockchain platforms allow for parallel execution of transactions. However, we show that AMMs do not enjoy these gains, since their operations are not commutative so transactions using them must be serialized. We present \emph{SAMM}, an AMM comprising multiple independent \emph{shards}. All shards are smart contracts operating in the same chain, but they allow for parallel execution as each is independent. The challenge is that trading in a standard AMM is cheaper if its liquidity pool is larger. Therefore, we show that simply using multiple smaller AMMs results in traders splitting each trade among all AMMs, which worsens performance. SAMM addresses this issue with a novel design of the trading fees. Traders are incentivized to use only a single smallest shard. We show that all Subgame-Perfect Nash Equilibria (SPNE) fit the desired behavior: Liquidity providers balance the liquidity among all pools, so the system converges to the state where trades are evenly distributed. Evaluation in the Sui blockchain shows that SAMM's throughput is over fivefold that of traditional AMMs, approaching the system's limit. SAMM is a directly deployable open-source smart contract, allowing trading at scale for individuals and DeFi applications.

Updated: 2024-07-12 20:38:20

标题: SAMM：分片化的自动化市场制造者

摘要: {自动做市商}（AMMs）是去中心化金融（DeFi）基于区块链平台的基石。它们是智能合约，通过维护\emph{流动性池}实现虚拟代币的直接交换。交易者通过与合约交换代币支付费用；流动性来自\emph{流动性提供者}，他们通过这些费用获取报酬。尽管需求增长，AMMs的性能受限。最先进的区块链平台允许事务并行执行。然而，我们展示了AMMs并未享受这些收益，因为它们的操作不可交换，因此使用它们的交易必须串行化。我们提出了\emph{SAMM}，一个包含多个独立\emph{分片}的AMM。所有分片是在同一链上操作的智能合约，但它们允许并行执行，因为每个分片都是独立的。挑战在于，在标准AMM中交易更便宜，如果其流动性池更大。因此，我们展示了简单地使用多个较小的AMMs会导致交易者将每笔交易分配给所有AMMs，从而恶化性能。SAMM通过一种新颖的交易费设计解决了这个问题。交易者被激励仅使用单个最小的分片。我们展示所有次博弈完美纳什均衡（SPNE）都符合期望的行为：流动性提供者在所有流动性池之间平衡流动性，从而使系统收敛到交易均匀分布的状态。在Sui区块链上的评估显示，SAMM的吞吐量是传统AMMs的五倍以上，接近系统的极限。SAMM是一个可直接部署的开源智能合约，为个人和DeFi应用提供规模化交易。

更新时间: 2024-07-12 20:38:20

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2406.05568v2

MonoSparse-CAM: Harnessing Monotonicity and Sparsity for Enhanced Tree Model Processing on CAMs

Despite significant advancements in AI driven by neural networks, tree-based machine learning (TBML) models excel on tabular data. These models exhibit promising energy efficiency, and high performance, particularly when accelerated on analog content-addressable memory (aCAM) arrays. However, optimizing their hardware deployment, especially in leveraging TBML model structure and aCAM circuitry, remains challenging. In this paper, we introduce MonoSparse-CAM, a novel content-addressable memory (CAM) based computing optimization technique. MonoSparse-CAM efficiently leverages TBML model sparsity and CAM array circuits, enhancing processing performance. Our experiments show that MonoSparse-CAM reduces energy consumption by up to 28.56x compared to raw processing and 18.51x compared to existing deployment optimization techniques. Additionally, it consistently achieves at least 1.68x computational efficiency over current methods. By enabling energy-efficient CAM-based computing while preserving performance regardless of the array sparsity, MonoSparse-CAM addresses the high energy consumption problem of CAM which hinders processing of large arrays. Our contributions are twofold: we propose MonoSparse-CAM as an effective deployment optimization solution for CAM-based computing, and we investigate the impact of TBML model structure on array sparsity. This work provides crucial insights for energy-efficient TBML on hardware, highlighting a significant advancement in sustainable AI technologies.

Updated: 2024-07-12 20:34:59

标题: MonoSparse-CAM：利用单调性和稀疏性增强CAM上的树模型处理

摘要: 尽管由神经网络驱动的人工智能取得了显著进展，但基于树的机器学习（TBML）模型在表格数据上表现出色。这些模型展示出有希望的能源效率和高性能，特别是在模拟内容寻址内存（aCAM）阵列上加速时。然而，优化它们的硬件部署，特别是在利用TBML模型结构和aCAM电路方面，仍然具有挑战性。在本文中，我们介绍了MonoSparse-CAM，一种基于内容寻址内存（CAM）的计算优化技术。MonoSparse-CAM有效地利用了TBML模型的稀疏性和CAM阵列电路，增强了处理性能。我们的实验表明，与原始处理相比，MonoSparse-CAM的能源消耗降低了高达28.56倍，并且与现有的部署优化技术相比降低了18.51倍。此外，它始终实现了至少1.68倍的计算效率，超过了当前方法。通过实现能源高效的基于CAM的计算，同时保持性能，无论阵列的稀疏性如何，MonoSparse-CAM解决了CAM的高能耗问题，这阻碍了大型阵列的处理。我们的贡献是双重的：我们提出MonoSparse-CAM作为CAM基础计算的有效部署优化解决方案，并研究了TBML模型结构对阵列稀疏性的影响。这项工作为硬件上的能源高效TBML提供了关键见解，突显了可持续人工智能技术的显著进步。

更新时间: 2024-07-12 20:34:59

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2407.11071v1

AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities

In this survey we are focusing on utilizing drone-based systems for the detection of individuals, particularly by identifying human screams and other distress signals. This study has significant relevance in post-disaster scenarios, including events such as earthquakes, hurricanes, military conflicts, wildfires, and more. These drones are capable of hovering over disaster-stricken areas that may be challenging for rescue teams to access directly. Unmanned aerial vehicles (UAVs), commonly referred to as drones, are frequently deployed for search-and-rescue missions during disaster situations. Typically, drones capture aerial images to assess structural damage and identify the extent of the disaster. They also employ thermal imaging technology to detect body heat signatures, which can help locate individuals. In some cases, larger drones are used to deliver essential supplies to people stranded in isolated disaster-stricken areas. In our discussions, we delve into the unique challenges associated with locating humans through aerial acoustics. The auditory system must distinguish between human cries and sounds that occur naturally, such as animal calls and wind. Additionally, it should be capable of recognizing distinct patterns related to signals like shouting, clapping, or other ways in which people attempt to signal rescue teams. To tackle this challenge, one solution involves harnessing artificial intelligence (AI) to analyze sound frequencies and identify common audio signatures. Deep learning-based networks, such as convolutional neural networks (CNNs), can be trained using these signatures to filter out noise generated by drone motors and other environmental factors. Furthermore, employing signal processing techniques like the direction of arrival (DOA) based on microphone array signals can enhance the precision of tracking the source of human noises.

Updated: 2024-07-12 20:34:34

标题: 基于人工智能的无人机辅助人类在灾难环境中的救援：挑战与机遇

摘要: 在这项调查中，我们专注于利用基于无人机系统来检测个体，特别是通过识别人类的尖叫和其他紧急信号。这项研究在灾后场景中具有重要意义，包括地震、飓风、军事冲突、森林大火等事件。这些无人机能够盘旋在灾区，这些区域可能对救援队直接进入具有挑战性。无人机（UAVs），通常称为无人机，经常用于灾害情况下的搜索和救援任务。通常，无人机拍摄航空图像以评估结构损坏程度并确定灾害范围。它们还利用热成像技术来检测体热特征，有助于定位个体。在某些情况下，较大的无人机被用来向被困在孤立的灾区的人们提供必要的物资。在我们的讨论中，我们深入探讨通过空中声学定位人类所面临的独特挑战。听觉系统必须区分人类的哭声和自然发生的声音，如动物叫声和风声。此外，它还应该能够识别与喊叫、拍手或其他人们试图向救援队发出信号的方式相关的独特模式。为了应对这一挑战，一个解决方案是利用人工智能（AI）来分析声音频率并识别常见的音频签名。基于深度学习的网络，如卷积神经网络（CNNs），可以使用这些签名进行训练，以滤除由无人机电机和其他环境因素产生的噪音。此外，利用基于麦克风阵列信号的到达方向（DOA）的信号处理技术可以提高跟踪人类噪声源的精度。

更新时间: 2024-07-12 20:34:34

领域: cs.SD,cs.AI,eess.AS,68U10, 68T50(Primary) 68T45 (Secondary),I.2.7; I.2.10; I.4.0

下载: http://arxiv.org/abs/2406.15875v2

Bias and Fairness in Large Language Models: A Survey

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

Updated: 2024-07-12 20:29:57

标题: 大型语言模型中的偏见和公平性：一项调查

摘要: 大型语言模型（LLMs）的快速发展已经实现了对人类文本的处理、理解和生成，逐渐融入到触及我们社会领域的系统中。尽管取得了成功，这些模型仍然可能学习、延续和放大有害的社会偏见。在本文中，我们提出了对LLMs进行偏见评估和缓解技术的综合调查。我们首先整合、正式化和扩展自然语言处理中的社会偏见和公平概念，定义了不同的伤害方面，并引入了几个实现LLMs公平性的愿望。然后，我们通过提出三个直观的分类法来统一文献，其中两个用于偏见评估，即指标和数据集，另一个用于缓解。我们对偏见评估的度量指标分类消除了指标和评估数据集之间的关系，将指标按照模型运行的不同级别进行组织：嵌入、概率和生成文本。我们对偏见评估数据集的第二个分类法根据其结构将数据集分类为反事实输入或提示，并确定了目标伤害和社会群体；我们还发布了一份公开可用数据集的整合，以提高访问性。我们对偏见缓解技术的第三个分类法通过介入预处理、训练中、内部处理和后处理来分类方法，并通过详细的子类别阐明研究趋势。最后，我们确定了未来工作的开放问题和挑战。综合各种最新研究，我们旨在提供一份清晰的指南，使研究人员和从业者能够更好地理解和预防LLMs中偏见的传播。

更新时间: 2024-07-12 20:29:57

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2309.00770v3

Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence

The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.

Updated: 2024-07-12 20:25:55

标题: 混合人工智能和自然智能：从统计力学到人工智能，再回到湍流

摘要: 这篇论文反思了人工智能在科学研究中的未来角色，特别关注涡流研究，并通过根植于非平衡统计力学的扩散模型检视人工智能的发展。它强调人工智能对通过创新地使用深度神经网络推进涡流简化、拉格朗日模型的重要影响。此外，论文还回顾了涡流研究中各种其他人工智能应用，并概述了在人工智能和统计流体力学并行发展中的潜在挑战和机遇。这一讨论为一个未来铺平道路，人工智能和涡流研究将密切相互交织，从而在两个领域中带来更深刻的洞察和进步。

更新时间: 2024-07-12 20:25:55

领域: cs.LG,cond-mat.stat-mech,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2403.17993v3

Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction

We delve into the physics-informed neural reconstruction of smoke and obstacles through sparse-view RGB videos, tackling challenges arising from limited observation of complex dynamics. Existing physics-informed neural networks often emphasize short-term physics constraints, leaving the proper preservation of long-term conservation less explored. We introduce Neural Characteristic Trajectory Fields, a novel representation utilizing Eulerian neural fields to implicitly model Lagrangian fluid trajectories. This topology-free, auto-differentiable representation facilitates efficient flow map calculations between arbitrary frames as well as efficient velocity extraction via auto-differentiation. Consequently, it enables end-to-end supervision covering long-term conservation and short-term physics priors. Building on the representation, we propose physics-informed trajectory learning and integration into NeRF-based scene reconstruction. We enable advanced obstacle handling through self-supervised scene decomposition and seamless integrated boundary constraints. Our results showcase the ability to overcome challenges like occlusion uncertainty, density-color ambiguity, and static-dynamic entanglements. Code and sample tests are at \url{https://github.com/19reborn/PICT_smoke}.

Updated: 2024-07-12 20:19:41

标题: 物理知识引导的特征轨迹学习用于烟雾重建

摘要: 我们深入研究了通过稀疏视角RGB视频对烟雾和障碍物进行物理信息神经重建，从而解决由于对复杂动态的观察受限而产生的挑战。现有的物理信息神经网络通常强调短期物理约束，而长期守恒的正确保存则较少探讨。我们引入了神经特征轨迹场，这是一种利用欧拉神经场隐含地建模拉格朗日流体轨迹的新型表示。这种无拓扑、自动可微分的表示有助于在任意帧之间进行流场计算以及通过自动微分实现高效速度提取。因此，它实现了端到端监督，涵盖了长期守恒和短期物理约束。基于这种表示，我们提出了物理信息轨迹学习，并将其整合到基于NeRF的场景重建中。我们通过自监督场景分解和无缝集成的边界约束实现了先进的障碍物处理。我们的结果展示了克服遮挡不确定性、密度-颜色模糊以及静态-动态纠缠等挑战的能力。代码和样本测试位于\url{https://github.com/19reborn/PICT_smoke}。

更新时间: 2024-07-12 20:19:41

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2407.09679v1

Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regression not only as a tool for translation but also as a means to uncover counterexamples. This procedure terminates when no counterexamples are found in the analytical formulation. Compared with previous results, CoNSAL directly produces an analytical form of the Lyapunov function with improved interpretability in both the learning process and the final results. We apply CoNSAL to 2-D inverted pendulum, path following, Van Der Pol Oscillator, 3-D trig dynamics, 4-D rotating wheel pendulum, 6-D 3-bus power system, and demonstrate that our algorithm successfully finds their valid Lyapunov functions. Code examples are available at https://github.com/HaohanZou/CoNSAL.

Updated: 2024-07-12 20:08:46

标题: 将神经网络和符号回归结合用于分析李雅普诺夫函数的发现

摘要: 我们提出了CoNSAL（将神经网络和符号回归结合用于分析李雅普诺夫函数），用于构建非线性动态系统的分析性李雅普诺夫函数。该框架包含神经网络李雅普诺夫函数和符号回归组件，其中符号回归被应用于将神经网络精确地提炼为分析形式。我们的方法不仅利用符号回归作为翻译工具，还作为揭示反例的手段。当在分析公式中找不到反例时，该过程终止。与以前的结果相比，CoNSAL直接产生具有改进可解释性的李雅普诺夫函数的分析形式，无论是在学习过程中还是在最终结果中。我们将CoNSAL应用于2-D倒立摆，路径跟踪，Van Der Pol振荡器，3-D三角动力学，4-D旋转轮摆，6-D 3总线电力系统，并证明我们的算法成功找到它们的有效李雅普诺夫函数。代码示例可在https://github.com/HaohanZou/CoNSAL找到。

更新时间: 2024-07-12 20:08:46

领域: eess.SY,cs.AI,cs.SC,cs.SY

下载: http://arxiv.org/abs/2406.15675v3

Parametric Matrix Models

We present a general class of machine learning algorithms called parametric matrix models. In contrast with most existing machine learning models that imitate the biology of neurons, parametric matrix models use matrix equations that emulate the physics of quantum systems. Similar to how physics problems are usually solved, parametric matrix models learn the governing equations that lead to the desired outputs. Parametric matrix models can be efficiently trained from empirical data, and the equations may use algebraic, differential, or integral relations. While originally designed for scientific computing, we prove that parametric matrix models are universal function approximators that can be applied to general machine learning problems. After introducing the underlying theory, we apply parametric matrix models to a series of different challenges that show their performance for a wide range of problems. For all the challenges tested here, parametric matrix models produce accurate results within an efficient and interpretable computational framework that allows for input feature extrapolation.

Updated: 2024-07-12 20:08:17

标题: 参数矩阵模型

摘要: 我们提出了一类被称为参数矩阵模型的机器学习算法。与大多数现有的模仿神经生物学的机器学习模型不同，参数矩阵模型使用模拟量子系统物理的矩阵方程。类似于解决物理问题的方法，参数矩阵模型学习导致所需输出的控制方程。参数矩阵模型可以从经验数据中高效训练，并且方程可以使用代数、微分或积分关系。虽然最初设计用于科学计算，我们证明参数矩阵模型是通用函数逼近器，可应用于一般机器学习问题。在介绍基本理论后，我们将参数矩阵模型应用于一系列不同挑战，展示它们在各种问题上的性能。在这里测试的所有挑战中，参数矩阵模型在一个高效和可解释的计算框架内产生准确结果，允许进行输入特征外推。

更新时间: 2024-07-12 20:08:17

领域: cs.LG,cond-mat.dis-nn,nucl-th,physics.comp-ph,quant-ph

下载: http://arxiv.org/abs/2401.11694v4

BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Federated learning, while being a promising approach for collaborative model training, is susceptible to poisoning attacks due to its decentralized nature. Backdoor attacks, in particular, have shown remarkable stealthiness, as they selectively compromise predictions for inputs containing triggers. Previous endeavors to detect and mitigate such attacks are based on the Independent and Identically Distributed (IID) data assumption where benign model updates exhibit high-level similarity in multiple feature spaces due to IID data. Thus, outliers are detected as backdoor attacks. Nevertheless, non-IID data presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making outlier detection-based mechanisms less effective. We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem. In order to differentiate outliers arising from data variety versus backdoor attack, we propose to break down the problem into two steps: clustering clients utilizing their data distribution followed by a voting-based detection. Based on the intuition that clustering and subsequent backdoor detection can drastically benefit from knowing client data distributions, we propose a novel data distribution inference mechanism. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.

Updated: 2024-07-12 19:38:42

标题: BoBa：通过数据分布推理在联合学习中增强后门检测

摘要: 联邦学习作为一种有前途的协作模型训练方法，由于其分散性质而容易受到毒化攻击的影响。尤其是后门攻击表现出了惊人的隐蔽性，因为它们有选择性地破坏包含触发器的输入的预测结果。先前为检测和缓解此类攻击所做的努力基于独立同分布（IID）数据假设，其中良性模型更新在多个特征空间中表现出高度相似性，因为IID数据。因此，异常值被检测为后门攻击。然而，非IID数据在后门攻击检测中提出了重大挑战，因为数据的多样性在良性模型之间引入了变化，使基于异常值检测的机制变得不那么有效。我们提出了一种新颖的分布感知异常检测机制BoBa来解决这个问题。为了区分由数据多样性引起的异常值和后门攻击，我们提出将问题分解为两个步骤：利用其数据分布对客户端进行聚类，然后进行基于投票的检测。基于这样一个直觉，即聚类和随后的后门检测可以从了解客户端数据分布中获得极大好处，我们提出了一种新颖的数据分布推断机制。为了提高检测的鲁棒性，我们引入了一种重叠聚类方法，其中每个客户端与多个聚类相关联，确保模型更新的可信度由多个聚类而不是单个聚类共同评估。通过广泛的评估，我们证明了BoBa可以将攻击成功率降低到低于0.001，同时在各种攻击策略和实验设置中保持高主任务准确性。

更新时间: 2024-07-12 19:38:42

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.09658v1

Watermarking Text Data on Large Language Models for Dataset Copyright

Substantial research works have shown that deep models, e.g., pre-trained models, on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks. However, these powerful models are also vulnerable to various privacy attacks, while much sensitive information exists in the training dataset. The attacker can easily steal sensitive information from public models, e.g., individuals' email addresses and phone numbers. In an attempt to address these issues, particularly the unauthorized use of private data, we introduce a novel watermarking technique via a backdoor-based membership inference approach named TextMarker, which can safeguard diverse forms of private information embedded in the training text data. Specifically, TextMarker only requires data owners to mark a small number of samples for data copyright protection under the black-box access assumption to the target model. Through extensive evaluation, we demonstrate the effectiveness of TextMarker on various real-world datasets, e.g., marking only 0.1% of the training dataset is practically sufficient for effective membership inference with negligible effect on model utility. We also discuss potential countermeasures and show that TextMarker is stealthy enough to bypass them.

Updated: 2024-07-12 19:29:56

标题: 在大型语言模型上为数据集版权添加水印文字数据

摘要: 大量的研究表明，深度模型，例如预训练模型，在大语料库上可以学习到通用的语言表示，这对下游的自然语言处理任务有益。然而，这些强大的模型也容易受到各种隐私攻击，因为训练数据集中存在许多敏感信息。攻击者可以轻易从公共模型中窃取敏感信息，例如个人的电子邮件地址和电话号码。为了解决这些问题，特别是未经授权使用私人数据，我们提出了一种基于后门的成员推理方法命名为TextMarker的新型水印技术，可以保护嵌入在训练文本数据中的各种形式的私人信息。具体来说，TextMarker只需要数据所有者在黑盒访问假设下为数据版权保护标记少量样本到目标模型。通过广泛的评估，我们展示了TextMarker在各种真实世界数据集上的有效性，例如仅标记训练数据集的0.1%就足以实现有效的成员推理，并且对模型效用几乎没有影响。我们还讨论了潜在的对抗措施，并展示了TextMarker足够隐蔽以绕过这些对抗措施。

更新时间: 2024-07-12 19:29:56

领域: cs.CR

下载: http://arxiv.org/abs/2305.13257v4

Permutation Superposition Oracles for Quantum Query Lower Bounds

We propose a generalization of Zhandry's compressed oracle method to random permutations, where an algorithm can query both the permutation and its inverse. We show how to use the resulting oracle simulation to bound the success probability of an algorithm for any predicate on input-output pairs, a key feature of Zhandry's technique that had hitherto resisted attempts at generalization to random permutations. One key technical ingredient is to use strictly monotone factorizations to represent the permutation in the oracle's database. As an application of our framework, we show that the one-round sponge construction is unconditionally preimage resistant in the random permutation model. This proves a conjecture by Unruh.

Updated: 2024-07-12 19:27:13

标题: 置换叠加预言机用于量子查询下界

摘要: 我们提出了一种将Zhandry的压缩式预言方法推广到随机排列的方法，在这种方法中，算法可以查询排列本身及其逆。我们展示了如何利用得到的预言模拟来限制算法对输入-输出对的任何谓词的成功概率，这是Zhandry技术的一个关键特征，迄今为止一直抵制着将其推广到随机排列的尝试。一个关键的技术要素是使用严格单调的分解来表示预言数据库中的排列。作为我们框架的一个应用，我们证明了单轮海绵结构在随机排列模型中具有无条件的预像抗性。这证实了Unruh的一个猜想。

更新时间: 2024-07-12 19:27:13

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2407.09655v1

Robotic Control via Embodied Chain-of-Thought Reasoning

A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is common in regular CoT, is insufficient for robot policies that need to ground their reasoning in sensory observations and the robot state. To this end, we introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features like object bounding boxes and end effector positions, before predicting the robot action. We design a scalable pipeline for generating synthetic training data for ECoT on large robot datasets. We demonstrate, that ECoT increases the absolute success rate of OpenVLA, the current strongest open-source VLA policy, by 28% across challenging generalization tasks, without any additional robot training data. Additionally, ECoT makes it easier for humans to interpret a policy's failures and correct its behavior using natural language.

Updated: 2024-07-12 19:19:34

标题: 通过具体思维链控制机器人

摘要: 学习机器人控制策略的一个关键限制在于它们无法在训练数据之外进行泛化。最近关于视觉-语言-动作模型（VLAs）的研究表明，将大型、互联网预训练的视觉-语言模型作为学习机器人策略的基础可以显著提高它们的稳健性和泛化能力。然而，在其他领域，大型视觉-语言模型最令人兴奋的能力之一是它们能够通过迭代处理复杂问题。这种能力是否可以引入到机器人技术中，以允许策略在行动之前通过思考给定任务来提高性能呢？由于它们可以接触到的训练示例相对简单，因此对于标准VLAs，天真地使用“思维链”（CoT）风格提示的效果显著较差。此外，对于需要将推理基于感官观察和机器人状态的机器人策略，仅仅进行有关子任务的语义推理，正如常规CoT中那样，是不够的。为此，我们引入了基于实体的思维链推理（ECoT）用于VLAs，我们训练VLAs在预测机器人动作之前对计划、子任务、动作以及像物体边界框和末端执行器位置这样的视觉基础特征进行多步推理。我们设计了一个可扩展的流程，用于在大型机器人数据集上生成ECoT的合成训练数据。我们展示，ECoT在挑战性的泛化任务中将当前最强大的开源VLA策略OpenVLA的绝对成功率提高了28%，而无需额外的机器人训练数据。此外，ECoT使人类更容易解释策略的失败，并使用自然语言纠正其行为。

更新时间: 2024-07-12 19:19:34

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.08693v2

CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model-based domain adaptation. Specifically, our method employs the diffusion Schr\"odinger Bridge and an attention recurrent residual U-Net to capitalize on unpaired CT and MRI scans to derive automatic CT segmentation from those of the MRIs, which are more accessible. Importantly, we propose an end-to-end, joint training framework of image translation and segmentation tasks, and demonstrate its benefit over training individual tasks separately. By comparing the proposed method against similar setups using two different GAN models for domain adaptation (CycleGAN and CUT), we also reveal the advantage of diffusion models towards improved segmentation and image translation quality. With a Dice score of 0.78$\pm$0.27, our proposed method outperformed the compared methods, including SynSeg-Net, while providing intuitive uncertainty measures to further facilitate quality control of the automatic segmentation outcomes. The implementation of our proposed method is available at: https://github.com/HealthX-Lab/DiffusionSynCTSeg.

Updated: 2024-07-12 19:17:42

标题: 基于扩散Schrödinger桥的CT脑室分割，无需目标领域的地面实况

摘要: 从临床CT扫描中高效准确地分割脑室对于像脑室造瘘术这样的急诊手术至关重要。面对软组织对比度差和临床脑部CT数据库稀缺等挑战，我们提出一种新颖的基于扩散模型的领域适应方法，实现了脑室分割技术，无需依赖CT分割的基本真相。具体而言，我们的方法利用扩散Schr\"odinger桥和注意力循环残差U-Net，利用配对的CT和MRI扫描，从MRI扫描中推导出自动CT分割结果，这些MRI扫描更易获取。重要的是，我们提出了一种端到端的，联合训练图像转换和分割任务的框架，并证明其优于单独训练各个任务。通过将所提出的方法与使用两种不同GAN模型进行领域适应（CycleGAN和CUT）的类似设置进行比较，我们还揭示了扩散模型在改善分割和图像转换质量方面的优势。通过0.78±0.27的Dice分数，我们的方法胜过了比较方法，包括SynSeg-Net，同时提供直观的不确定性度量，进一步促进自动分割结果的质量控制。我们提出方法的实施可在以下网址找到：https://github.com/HealthX-Lab/DiffusionSynCTSeg。

更新时间: 2024-07-12 19:17:42

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18267v2

A Novel Framework for Automated Warehouse Layout Generation

Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.

Updated: 2024-07-12 19:06:45

标题: 一个自动化仓库布局生成的新框架

摘要: 优化仓库布局至关重要，因为它对效率和生产力有重要影响。我们提出了一个基于人工智能的自动化仓库布局生成框架。该框架采用受限束搜索方法，在给定的空间参数内生成最佳布局，符合所有功能需求。生成的布局的可行性根据项目可访问性、所需最小间隙和通道连接等标准进行验证。然后使用评分函数评估可行布局，考虑存储位置数量、访问点和可访问性成本。我们展示了我们的方法能够为各种仓库尺寸和形状、不同门位置和互联方式生成可行且最佳的布局。这种方法目前正在准备部署，将使人类设计师能够快速探索和确认选项，促进选择最适合他们用例的布局。

更新时间: 2024-07-12 19:06:45

领域: cs.AI

下载: http://arxiv.org/abs/2407.08633v2

Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.

Updated: 2024-07-12 19:04:39

标题: 在强化学习中的Hamilton-Jacobi可达性：一项调查

摘要: 最近的文献提出了一些方法，可以在保持安全性保证的同时学习具有高性能的控制策略。综合汉密尔顿-雅各比（HJ）可达集已成为验证安全性和监督基于强化学习的控制策略在复杂、高维系统中训练的有效工具。以前，HJ可达性仅限于验证低维动态系统--这是因为它依赖的动态规划方法的计算复杂性随系统状态数量的增加呈指数增长。为了解决这一限制，在近年来，已经有方法同时计算可达性值函数和学习控制策略，以扩展HJ可达性分析，同时仍保持对真实可达集的可靠估计。这些HJ可达性近似用于提高学习控制策略的安全性，甚至奖励表现，并且可以解决具有动态障碍物和/或基于激光雷达或视觉观察的挑战性任务。在本调查论文中，我们回顾了强化学习中HJ可达性估计领域的最新发展，这将为进一步研究高维系统中的可靠性提供基础。

更新时间: 2024-07-12 19:04:39

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2407.09645v1

Seq-to-Final: A Benchmark for Tuning from Sequential Distributions to a Final Time Point

Distribution shift over time occurs in many settings. Leveraging historical data is necessary to learn a model for the last time point when limited data is available in the final period, yet few methods have been developed specifically for this purpose. In this work, we construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods that 1) learn from all data without adapting to the final period, 2) learn from historical data with no regard to the sequential nature and then adapt to the final period, and 3) leverage the sequential nature of historical data when tailoring a model to the final period. We call this benchmark Seq-to-Final to highlight the focus on using a sequence of time periods to learn a model for the final time point. Our synthetic benchmark allows users to construct sequences with different types of shift and compare different methods. We focus on image classification tasks using CIFAR-10 and CIFAR-100 as the base images for the synthetic sequences. We also evaluate the same methods on the Portraits dataset to explore the relevance to real-world shifts over time. Finally, we create a visualization to contrast the initializations and updates from different methods at the final time step. Our results suggest that, for the sequences in our benchmark, methods that disregard the sequential structure and adapt to the final time point tend to perform well. The approaches we evaluate that leverage the sequential nature do not offer any improvement. We hope that this benchmark will inspire the development of new algorithms that are better at leveraging sequential historical data or a deeper understanding of why methods that disregard the sequential nature are able to perform well.

Updated: 2024-07-12 19:03:42

标题: Seq-to-Final：从序列分布调整到最终时间点的基准测试

摘要: 随着时间的推移，分布转移在许多情境中都会发生。在最终期间数据有限时，利用历史数据是必要的，以学习最后一个时间点的模型，然而目前为止很少有专门为此目的开发的方法。在这项工作中，我们构建了一个基准，其中包含不同序列的合成转移，以评估三类方法的有效性：1）从所有数据中学习，而不适应最终期间，2）从历史数据中学习，而没有考虑顺序性，然后适应最终期间，以及3）利用历史数据的顺序性调整模型以适应最终期间。我们称这个基准为“Seq-to-Final”，以突显使用时间序列学习最终时间点的模型的重点。我们的合成基准允许用户构建具有不同转移类型的序列，并比较不同方法。我们专注于使用CIFAR-10和CIFAR-100作为合成序列的基本图像进行图像分类任务。我们还评估了相同的方法在肖像数据集上，以探索与真实世界随时间转移相关的情况。最后，我们创建了一个可视化来对比不同方法在最终时间步的初始化和更新。我们的结果表明，在我们的基准序列中，忽略顺序结构并适应最终时间点的方法往往表现较好。我们评估的利用顺序性质的方法没有提供任何改进。我们希望这一基准能激发开发更好地利用顺序历史数据的新算法，或者更深入地了解为什么忽略顺序性的方法能够表现良好。

更新时间: 2024-07-12 19:03:42

领域: cs.LG

下载: http://arxiv.org/abs/2407.09642v1

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

Human motion sensing plays a crucial role in smart systems for decision-making, user interaction, and personalized services. Extensive research that has been conducted is predominantly based on cameras, whose intrusive nature limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose milliFlow, a novel deep learning approach to estimate scene flow as complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method when compared with the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition and human parsing and support human body part tracking. Code and dataset are available at https://github.com/Toytiny/milliFlow.

Updated: 2024-07-12 18:56:07

标题: milliFlow：用于人体运动感测的毫米波雷达点云场景流估计

摘要: 人体运动传感在智能系统中起着至关重要的作用，用于决策制定、用户交互和个性化服务。目前进行的广泛研究主要基于摄像头，然而其侵入性限制了在智能家居应用中的使用。为了解决这一问题，毫米波雷达因其保护隐私的特性而受到青睐。在这项工作中，我们提出了一种新颖的深度学习方法milliFlow，用于估计场景流，作为mmWave点云的补充运动信息，作为中级特征并直接受益于下游人体运动传感任务。实验结果表明，与竞争方法相比，我们的方法表现出优越性能。此外，通过整合场景流信息，我们在人体活动识别和人体解析方面取得了显著的改进，并支持人体部位跟踪。代码和数据集可在https://github.com/Toytiny/milliFlow找到。

更新时间: 2024-07-12 18:56:07

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.17010v8

Granger Causality in Extremes

We introduce a rigorous mathematical framework for Granger causality in extremes, designed to identify causal links from extreme events in time series. Granger causality plays a pivotal role in uncovering directional relationships among time-varying variables. While this notion gains heightened importance during extreme and highly volatile periods, state-of-the-art methods primarily focus on causality within the body of the distribution, often overlooking causal mechanisms that manifest only during extreme events. Our framework is designed to infer causality mainly from extreme events by leveraging the causal tail coefficient. We establish equivalences between causality in extremes and other causal concepts, including (classical) Granger causality, Sims causality, and structural causality. We prove other key properties of Granger causality in extremes and show that the framework is especially helpful under the presence of hidden confounders. We also propose a novel inference method for detecting the presence of Granger causality in extremes from data. Our method is model-free, can handle non-linear and high-dimensional time series, outperforms current state-of-the-art methods in all considered setups, both in performance and speed, and was found to uncover coherent effects when applied to financial and extreme weather observations.

Updated: 2024-07-12 18:41:07

标题: 《极端情况下的Granger因果关系》

摘要: 我们引入了一个严格的数学框架，用于极端情况下的Granger因果关系，旨在识别时间序列中极端事件的因果关系。Granger因果关系在揭示时变变量之间的方向性关系方面起着关键作用。虽然这种概念在极端和高度波动的时期具有更高的重要性，但现有的方法主要集中在分布的内部因果关系上，往往忽视只在极端事件发生时才显现的因果机制。我们的框架旨在通过利用因果尾部系数主要从极端事件中推断因果关系。我们建立了极端情况下的因果关系与其他因果概念（包括（经典的）Granger因果关系、Sims因果关系和结构因果关系）之间的等价关系。我们证明了极端情况下Granger因果关系的其他关键属性，并表明在存在隐藏混杂因素时，该框架特别有帮助。我们还提出了一种新颖的推断方法，用于从数据中检测极端情况下的Granger因果关系的存在。我们的方法是无模型的，可以处理非线性和高维时间序列，在所有考虑的设置中，性能和速度都优于当前的最先进方法，并且在应用于金融和极端天气观测时发现能够揭示一致的效果。

更新时间: 2024-07-12 18:41:07

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH,62M10,G.3

下载: http://arxiv.org/abs/2407.09632v1

Optimal Defender Strategies for CAGE-2 using Causal Modeling and Tree Search

The CAGE-2 challenge is considered a standard benchmark to compare methods for autonomous cyber defense. Current state-of-the-art methods evaluated against this benchmark are based on model-free (offline) reinforcement learning, which does not provide provably optimal defender strategies. We address this limitation and present a formal (causal) model of CAGE-2 together with a method that produces a provably optimal defender strategy, which we call Causal Partially Observable Monte-Carlo Planning (C-POMCP). It has two key properties. First, it incorporates the causal structure of the target system, i.e., the causal relationships among the system variables. This structure allows for a significant reduction of the search space of defender strategies. Second, it is an online method that updates the defender strategy at each time step via tree search. Evaluations against the CAGE-2 benchmark show that C-POMCP achieves state-of-the-art performance with respect to effectiveness and is two orders of magnitude more efficient in computing time than the closest competitor method.

Updated: 2024-07-12 18:34:55

标题: CAGE-2使用因果建模和树搜索的最佳防御策略

摘要: CAGE-2挑战被视为比较自主网络防御方法的标准基准。当前针对该基准评估的最先进方法基于无模型（离线）强化学习，这种方法并不能提供可证明的最佳防御策略。我们解决了这个局限性，并提出了CAGE-2的一个正式（因果）模型，以及一种可证明最佳的防御策略生成方法，我们称之为因果部分可观测蒙特卡洛规划（C-POMCP）。它具有两个关键特性。首先，它包含目标系统的因果结构，即系统变量之间的因果关系。这种结构能够显著减少防御策略的搜索空间。其次，它是一种在线方法，通过树搜索在每个时间步更新防御策略。针对CAGE-2基准的评估表明，C-POMCP在效果方面达到了最先进水平，并且在计算时间上比最接近的竞争对手方法高效两个数量级。

更新时间: 2024-07-12 18:34:55

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.11070v1

Accelerating Electron Dynamics Simulations through Machine Learned Time Propagators

Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under various external perturbations such as laser fields. In this work, we present a novel approach to accelerate real time TDDFT based electron dynamics simulations using autoregressive neural operators as time-propagators for the electron density. By leveraging physics-informed constraints and high-resolution training data, our model achieves superior accuracy and computational speed compared to traditional numerical solvers. We demonstrate the effectiveness of our model on a class of one-dimensional diatomic molecules. This method has potential in enabling real-time, on-the-fly modeling of laser-irradiated molecules and materials with varying experimental parameters.

Updated: 2024-07-12 18:29:48

标题: 通过机器学习的时间传播器加速电子动力学模拟

摘要: 时间依赖密度泛函理论（TDDFT）是一种广泛使用的方法，用于研究在各种外部扰动下（如激光场）的电子动力学。在这项工作中，我们提出了一种新颖的方法，通过使用自回归神经算子作为电子密度的时间传播器，加速实时TDDFT基础的电子动力学模拟。通过利用物理约束和高分辨率训练数据，我们的模型相比传统数值求解器实现了卓越的精度和计算速度。我们在一类一维双原子分子上展示了我们模型的有效性。这种方法有潜力在实时、即时模拟具有不同实验参数的受激光照射的分子和材料。

更新时间: 2024-07-12 18:29:48

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.09628v1

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.

Updated: 2024-07-12 18:22:26

标题: ArzEn-LLM：使用LLMs进行混合码交换的埃及阿拉伯语-英语翻译和语音识别

摘要: 受到最近埃及阿拉伯语与英语代码切换现象普遍增加的启发，本文探讨了机器翻译（MT）和自动语音识别（ASR）系统的复杂性，重点研究将代码切换的埃及阿拉伯语-英语翻译成英语或埃及阿拉伯语。我们的目标是介绍开发这些系统所采用的方法，利用大型语言模型如LLama和Gemma。在ASR领域，我们探讨了使用Whisper模型进行代码切换的埃及阿拉伯语识别，详细说明了我们的实验过程，包括数据预处理和训练技术。通过实施一个集成ASR和MT的连续语音到文本翻译系统，我们旨在克服由有限资源和埃及阿拉伯方言的独特特征所带来的挑战。根据已建立的度量标准进行评估显示了令人期待的结果，我们的方法在英语翻译方面取得了56%的显著改进，阿拉伯语翻译方面取得了9.3%的改进。由于代码切换在口头语言中根深蒂固，因此ASR系统能够有效处理这种现象至关重要。这种能力对于在各个领域实现无缝互动至关重要，包括商业谈判、文化交流和学术论述。我们的模型和代码可作为开源资源获得。代码：\url{http://github.com/ahmedheakl/arazn-llm}，模型：\url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}。

更新时间: 2024-07-12 18:22:26

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.18120v2

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models

The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.

Updated: 2024-07-12 18:19:28

标题: 简历地图：通过大规模数据集和大型语言模型重新审视简历分类

摘要: 随着对在线招聘平台的依赖增加以及人工智能技术的采用，突显了对高效简历分类方法的迫切需求。然而，挑战如小数据集、缺乏标准化简历模板和隐私问题阻碍了现有分类模型的准确性和效果。在这项工作中，我们通过提出一个全面的简历分类方法来解决这些挑战。我们精心筛选了来自不同来源的13,389份简历的大规模数据集，并采用了大型语言模型（LLMs）如BERT和Gemma1.1 2B进行分类。我们的结果显示，与传统的机器学习方法相比，我们的最佳模型的准确率达到了92\%的一级准确率和97.5\%的五级准确率。这些发现强调了数据集质量和先进的模型架构在提高简历分类系统的准确性和鲁棒性方面的重要性，从而推动了在线招聘实践领域的发展。

更新时间: 2024-07-12 18:19:28

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.18125v2

A Two-Layer Blockchain Sharding Protocol Leveraging Safety and Liveness for Enhanced Performance

Sharding is essential for improving blockchain scalability. Existing protocols overlook diverse adversarial attacks, limiting transaction throughput. This paper presents Reticulum, a groundbreaking sharding protocol addressing this issue, boosting blockchain scalability. Reticulum employs a two-phase approach, adapting transaction throughput based on runtime adversarial attacks. It comprises "control" and "process" shards in two layers. Process shards contain at least one trustworthy node, while control shards have a majority of trusted nodes. In the first phase, transactions are written to blocks and voted on by nodes in process shards. Unanimously accepted blocks are confirmed. In the second phase, blocks without unanimous acceptance are voted on by control shards. Blocks are accepted if the majority votes in favor, eliminating first-phase opponents and silent voters. Reticulum uses unanimous voting in the first phase, involving fewer nodes, enabling more parallel process shards. Control shards finalize decisions and resolve disputes. Experiments confirm Reticulum's innovative design, providing high transaction throughput and robustness against various network attacks, outperforming existing sharding protocols for blockchain networks.

Updated: 2024-07-12 18:12:21

标题: 一种利用安全性和活跃性提高性能的双层区块链分片协议

摘要: 分片技术对于提高区块链的可扩展性至关重要。现有的协议忽视了各种敌对攻击，限制了交易吞吐量。本文介绍了一种开创性的分片协议Reticulum，解决了这一问题，提升了区块链的可扩展性。 Reticulum采用了两阶段的方法，根据运行时的敌对攻击来调整交易吞吐量。它包括了两层的“控制”和“处理”分片。处理分片至少包含一个可信任的节点，而控制分片则有大多数信任节点。在第一阶段，交易被写入区块并由处理分片中的节点进行投票。一致接受的区块被确认。在第二阶段，没有一致接受的区块将由控制分片进行投票。如果多数票赞成，那么区块将被接受，消除了第一阶段的反对者和沉默投票者。Reticulum在第一阶段使用一致投票，涉及较少的节点，从而使更多的并行处理分片成为可能。控制分片最终确定决策并解决争议。实验证实了Reticulum的创新设计，提供了高交易吞吐量，并且对各种网络攻击具有强大的抵抗力，胜过了现有的区块链网络的分片协议。

更新时间: 2024-07-12 18:12:21

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2310.11373v5

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

Updated: 2024-07-12 18:04:32

标题: 异质图学习手册：基准、模型、理论分析、应用和挑战

摘要: 同质性原则，即具有相同标签或相似属性的节点更有可能相互连接，一直被普遍认为是图神经网络（GNNs）在图结构数据上优于传统神经网络（NNs）的主要原因，尤其是在节点级任务上。然而，最近的研究发现了一组数据集，其中GNN相对于NN的性能并不令人满意。异质性，即低同质性，被认为是这一经验观察的主要原因。人们已经开始重新审视和重新评估大多数现有的图模型，包括图变换器及其变种，在不同类型的图上，例如异构图、时间图和超图的异质性场景中。此外，发现许多与图相关的应用与异质性问题密切相关。在过去几年中，人们已经付出了大量努力来研究和解决异质性问题。在这项调查中，我们对异质性图学习的最新进展进行了全面回顾，包括对合成图上的同质性度量进行广泛总结的基准数据集和评估，对最新的监督和无监督学习方法进行细致分类，对同质性/异质性的理论分析进行深入解读，以及对异质性相关应用进行广泛探索。值得注意的是，通过详细实验，我们首次将基准异质数据集分类为三个子类别：恶性、良性和模糊的异质性。恶性和模糊的数据集被确定为真正具有挑战性的数据集，用于测试新模型在异质性挑战上的有效性。最后，我们提出了几个异质性图表示学习的挑战和未来方向。

更新时间: 2024-07-12 18:04:32

领域: cs.LG

下载: http://arxiv.org/abs/2407.09618v1

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

Recent text-to-video (T2V) generation methods have seen significant advancements. However, the majority of these works focus on producing short video clips of a single event (i.e., single-scene videos). Meanwhile, recent large language models (LLMs) have demonstrated their capability in generating layouts and programs to control downstream visual modules. This prompts an important question: can we leverage the knowledge embedded in these LLMs for temporally consistent long video generation? In this paper, we propose VideoDirectorGPT, a novel framework for consistent multi-scene video generation that uses the knowledge of LLMs for video content planning and grounded video generation. Specifically, given a single text prompt, we first ask our video planner LLM (GPT-4) to expand it into a 'video plan', which includes the scene descriptions, the entities with their respective layouts, the background for each scene, and consistency groupings of the entities. Next, guided by this video plan, our video generator, named Layout2Vid, has explicit control over spatial layouts and can maintain temporal consistency of entities across multiple scenes, while being trained only with image-level annotations. Our experiments demonstrate that our proposed VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with consistency, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation. Detailed ablation studies, including dynamic adjustment of layout control strength with an LLM and video generation with user-provided images, confirm the effectiveness of each component of our framework and its future potential.

Updated: 2024-07-12 18:03:29

标题: VideoDirectorGPT：通过LLM引导规划实现一致的多场景视频生成

摘要: 最近的文本到视频（T2V）生成方法取得了显著进展。然而，大多数研究集中在生成单个事件的短视频剪辑（即单场景视频）。与此同时，最近的大型语言模型（LLMs）展示了它们在生成布局和控制下游视觉模块的能力。这引发了一个重要问题：我们能否利用这些LLMs中嵌入的知识来进行具有时间一致性的长视频生成？在本文中，我们提出了VideoDirectorGPT，这是一个用于一致性多场景视频生成的新框架，利用LLMs的知识进行视频内容规划和基于实地视频生成。具体而言，给定一个单一文本提示，我们首先让我们的视频规划LLM（GPT-4）将其扩展为一个“视频计划”，其中包括场景描述、各个实体及其布局、每个场景的背景以及实体的一致性分组。接下来，在这个视频计划的指导下，我们的视频生成器，名为Layout2Vid，可以明确控制空间布局，并能在多个场景中保持实体的时间一致性，同时仅通过图像级别注释进行训练。我们的实验表明，我们提出的VideoDirectorGPT框架在单场景和多场景视频生成中显著提高了布局和移动控制，并且可以生成具有一致性的多场景视频，在开放领域单场景T2V生成方面表现出与SOTAs竞争力的性能。详细的消融研究，包括使用LLM动态调整布局控制强度和使用用户提供的图像进行视频生成，验证了我们框架的每个组件的有效性及其未来潜力。

更新时间: 2024-07-12 18:03:29

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2309.15091v2

Implicit meta-learning may lead language models to trust more reliable sources

We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about capabilities, risks, and controllability of future AI systems. Our code can be found at https://github.com/krasheninnikov/internalization.

Updated: 2024-07-12 18:03:25

标题: 隐式元学习可能会导致语言模型更加信任可靠的信息源

摘要: 我们证明LLMs可以学习文档有用性的指标，并相应地调节它们的更新。我们在一个合成的微调数据集中引入随机字符串（“标签”）作为有用性的指标。在这个数据集上的微调导致了隐式元学习（IML）：在进一步的微调中，模型更新以更多地利用被标记为有用的文本。我们对这一现象进行了彻底的实证调查，发现（除其他外）：（i）它发生在预训练的LLMs和从头开始训练的LLMs，以及视觉任务上，（ii）更大的模型和较小的批量大小往往会产生更多的IML。我们还使用探究来检查IML如何改变模型在参数中存储知识的方式。最后，我们反思了我们的结果可能暗示未来人工智能系统的能力、风险和可控性。我们的代码可以在https://github.com/krasheninnikov/internalization找到。

更新时间: 2024-07-12 18:03:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.15047v4

Real-time gravitational-wave inference for binary neutron stars using machine learning

Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we develop a machine learning approach that performs complete BNS inference in just one second without making any such approximations. This is enabled by a new method for explicit integration of physical domain knowledge into neural networks. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state and waveform systematics studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.

Updated: 2024-07-12 18:00:02

标题: 使用机器学习进行双中子星实时引力波推断

摘要: 双中子星（BNSs）的合并会在引力波（GW）和电磁（EM）频谱中发出信号。众所周知，2017年GW170817的多信使观测导致了横跨宇宙学、核物理和引力领域的科学发现。这些结果的核心是从GW数据中获得的天空定位和距离，这在GW170817的情况下帮助识别了相关的EM瞬变事件AT 2017gfo，距GW信号发出仅11小时。快速分析GW数据对指导及时的EM观测至关重要；然而，由于信号的长度和复杂性带来的挑战，通常需要进行牺牲准确性的近似。在这里，我们开发了一种机器学习方法，可以在仅一秒钟内完成完整的BNS推断，而无需进行任何此类近似。这得益于一种将物理领域知识明确集成到神经网络中的新方法。我们的方法通过提供（i）甚至在合并之前准确的定位；（ii）与近似低延迟方法相比约30％的改进定位精度；以及（iii）关于亮度距离、倾斜度和质量的详细信息，可用于优先考虑昂贵的望远镜时间来增强多信使观测。此外，我们方法的灵活性和成本降低为状态方程和波形系统误差研究开辟了新机会。最后，我们证明我们的方法可扩展到极长信号，长达一小时，因此可作为下一代地面和空间探测器数据分析的蓝图。

更新时间: 2024-07-12 18:00:02

领域: gr-qc,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2407.09602v1

Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting

Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario. Our experiments on large-scale datasets, including Waymo Open Motion Dataset (WOMD) and Argoverse, demonstrate improvement in zero-shot generalization across datasets. We show that our method outperforms individual prediction models and other variants, particularly in long-horizon prediction and scenarios with a high proportion of OOD data. This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving.

Updated: 2024-07-12 17:57:00

标题: 自适应预测集成：改善动作预测的超出分布泛化

摘要: 基于深度学习的自动驾驶轨迹预测模型经常在对超出分布范围（OOD）的场景进行泛化时遇到困难，有时表现比简单基于规则的模型更差。为了解决这一限制，我们提出了一个新颖的框架，自适应预测集成（APE），它集成了深度学习和基于规则的预测专家。一个学习的路由函数，与深度学习模型同时训练，根据输入场景动态选择最可靠的预测。我们在大规模数据集（包括Waymo Open Motion Dataset（WOMD）和Argoverse）上进行的实验显示，在数据集之间的零样本泛化方面有所改善。我们展示了我们的方法在长期预测和具有高比例OOD数据的场景中优于单个预测模型和其他变体，突出了混合方法在自动驾驶中鲁棒且泛化性强的运动预测中的潜力。

更新时间: 2024-07-12 17:57:00

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09475v1

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.

Updated: 2024-07-12 17:48:36

标题: 超越欧几里得：具有几何、拓扑和代数结构的现代机器学习图解指南

摘要: 欧几里得几何学的持久遗产支撑着经典机器学习，几十年来，这种机器学习主要是为了处于欧几里得空间中的数据而开发的。然而，现代机器学习越来越频繁地遇到本质上是非欧几里得的丰富结构化数据。这些数据可能展现出复杂的几何、拓扑和代数结构：从时空曲率的几何到大脑神经元之间的拓扑复杂相互作用，再到描述物理系统对称性的代数变换。从这样的非欧几里得数据中提取知识需要更广泛的数学视角。回应19世纪引发非欧几里得几何学的革命，一个新兴的研究方向正在重新定义具有非欧几里得结构的现代机器学习。其目标是将经典方法推广到具有几何、拓扑和代数的非传统数据类型。在本综述中，我们提供了一个通俗易懂的入口进入这个快速发展的领域，并提出了一个将最新进展融入直观统一框架的图形分类法。随后，我们提取了对当前挑战的见解，并突出了该领域未来发展的令人兴奋的机会。

更新时间: 2024-07-12 17:48:36

领域: cs.LG

下载: http://arxiv.org/abs/2407.09468v1

FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3

In the diverse world of AI-driven storytelling, there is a unique opportunity to engage young audiences with customized, and personalized narratives. This paper introduces FairyLandAI an innovative Large Language Model (LLM) developed through OpenAI's API, specifically crafted to create personalized fairytales for children. The distinctive feature of FairyLandAI is its dual capability: it not only generates stories that are engaging, age-appropriate, and reflective of various traditions but also autonomously produces imaginative prompts suitable for advanced image generation tools like GenAI and Dalle-3, thereby enriching the storytelling experience. FairyLandAI is expertly tailored to resonate with the imaginative worlds of children, providing narratives that are both educational and entertaining and in alignment with the moral values inherent in different ages. Its unique strength lies in customizing stories to match individual children's preferences and cultural backgrounds, heralding a new era in personalized storytelling. Further, its integration with image generation technology offers a comprehensive narrative experience that stimulates both verbal and visual creativity. Empirical evaluations of FairyLandAI demonstrate its effectiveness in crafting captivating stories for children, which not only entertain but also embody the values and teachings of diverse traditions. This model serves as an invaluable tool for parents and educators, supporting them in imparting meaningful moral lessons through engaging narratives. FairyLandAI represents a pioneering step in using LLMs, particularly through OpenAI's API, for educational and cultural enrichment, making complex moral narratives accessible and enjoyable for young, imaginative minds.

Updated: 2024-07-12 17:46:58

标题: FairyLandAI：利用ChatGPT和DALLE-3个性化童话故事

摘要: 在AI驱动的叙事多样化世界中，有一个独特的机会来吸引年轻观众，提供定制和个性化的叙事。本文介绍了FairyLandAI，这是一个创新的大型语言模型（LLM），通过OpenAI的API开发而成，专门为儿童创作个性化的童话故事。FairyLandAI的独特之处在于它具有双重功能：它不仅生成引人入胜、适合年龄的、反映各种传统的故事，还自主生成适合高级图像生成工具如GenAI和Dalle-3的想象力提示，从而丰富叙事体验。FairyLandAI精心打造，能 resonant with children's imaginative worlds，提供既教育又娱乐的叙事，与不同年龄固有的道德价值观保持一致。它的独特优势在于定制故事以匹配个别儿童的偏好和文化背景，开创了个性化叙事的新时代。此外，它与图像生成技术的整合提供了一个综合的叙事体验，刺激了口头和视觉创造力。对FairyLandAI进行的实证评估表明，它在为儿童打造引人入胜的故事方面非常有效，不仅娱乐，而且体现了各种传统的价值观和教导。这个模型对于家长和教育者来说是一个宝贵的工具，帮助他们通过引人入胜的叙事传达富有意义的道德教训。FairyLandAI代表了在教育和文化丰富化方面使用LLMs，特别是通过OpenAI的API，迈出的一大步，使复杂的道德叙事对年轻、富有想象力的头脑来说变得易于理解和愉快。

更新时间: 2024-07-12 17:46:58

领域: cs.AI

下载: http://arxiv.org/abs/2407.09467v1

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. We first prompt an LLM to generate training environments by giving it the task description and simulator objectives that the agents should learn and then asking it to generate a set of environment configurations (e.g., different terrains, items initially given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We also show that using an LLM to adapt environments dynamically outperforms curriculum learning approaches and how the environments are adapted to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of calls. Lastly, we present detailed ablation studies for EnvGen design choices.

Updated: 2024-07-12 17:39:19

标题: EnvGen：利用LLM生成和调整环境，用于训练具体代理

摘要: 最近最先进的实体学习方法通过互动直接使用大型语言模型（LLMs）作为代理来确定环境中的下一步。由于它们的世界知识和推理能力，LLM代理比基于强化学习（RL）的以前更小的代理取得更强的性能；然而，频繁调用LLMs是缓慢且昂贵的。我们提出了EnvGen，一个新颖的框架来解决这个问题。我们首先提示LLM通过给出任务描述和模拟器目标来生成训练环境，然后要求它生成一组环境配置（例如，不同的地形，最初给予代理的物品等）。接下来，我们在原始和LLM生成的环境的混合中训练一个小的RL代理。然后，我们使LLM能够持续地调整生成的环境，以逐渐改善代理在弱点上的技能，通过以代理性能的形式向LLM提供反馈。我们通过在Crafter和Heist环境中进行全面实验展示了EnvGen的实用性。我们发现，使用EnvGen训练的小型RL代理可以胜过SOTA方法，包括GPT-4代理，并且学习长期任务的速度显著更快。我们还展示，动态使用LLM调整环境胜过课程学习方法，并且展示了随着时间推移如何调整环境以帮助改善RL代理的较弱技能。此外，EnvGen更高效，因为它只使用少量的LLM调用（例如，总共4次），而LLM代理需要数千次调用。最后，我们对EnvGen设计选择进行了详细的消融研究。

更新时间: 2024-07-12 17:39:19

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12014v2

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to implement {\em weight block sparsity}, which is a structured sparsity that is friendly to hardware. By zeroing certain sections of the convolution and fully connected layers parameters of pre-trained DNN models, we can efficiently speed up the DNN's inference process. This results in a smaller memory footprint, faster communication, and fewer operations. Our work presents a vertical system that allows for the training of convolution and matrix multiplication weights to exploit 8x8 block sparsity on a single GPU within a reasonable amount of time. Compilers recognize this sparsity and use it for both data compaction and computation splitting into threads. Blocks like these take full advantage of both spatial and temporal locality, paving the way for fast vector operations and memory reuse. By using this system on a Resnet50 model, we were able to reduce the weight by half with minimal accuracy loss, resulting in a two-times faster inference speed. We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16 to demonstrate the necessary synergy between hardware overlay designs and software stacks for compiling and executing machine learning applications.

Updated: 2024-07-12 17:37:49

标题: 权重块稀疏性：训练、编译和AI引擎加速器

摘要: 如今，越来越大的深度神经网络（DNNs）正在被开发、训练和利用。这些网络需要大量的计算资源，给先进设备和有限设备都带来了压力。我们提出的解决方案是实现“权重块稀疏”，这是一种对硬件友好的结构化稀疏性。通过将预先训练的DNN模型的卷积和全连接层参数的特定部分归零，我们可以有效地加速DNN的推理过程。这将导致内存占用减小、通信速度更快和操作次数更少。我们提出了一个垂直系统，允许在合理的时间内利用单个GPU对卷积和矩阵乘法权重进行训练，以利用8x8块稀疏性。编译器识别这种稀疏性，并将其用于数据压缩和计算分解成线程。这些块充分利用了空间和时间上的局部性，为快速向量操作和内存重用铺平了道路。通过在Resnet50模型上使用这一系统，我们能够将权重减少一半，几乎不损失精度，从而实现两倍的推理速度。我们将使用准确和完整的代码生成为AIE2配置集（AMD Versal FPGAs）与Resnet50、Inception V3和VGG16展示性能估算，以展示硬件叠加设计与软件堆栈之间在编译和执行机器学习应用方面的必要协同作用。

更新时间: 2024-07-12 17:37:49

领域: cs.LG,cs.AR,cs.CL,C.5; D.3.4

下载: http://arxiv.org/abs/2407.09453v1

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for desired summary characteristics. To this end, we curate an evaluation-only dataset for this task setting and conduct human evaluations of five LLM-based systems to assess their instruction-following capabilities in controllable summarization. We then benchmark LLM-based automatic evaluation for this task with 4 different evaluation protocols and 11 LLMs, resulting in 40 evaluation methods. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) no LLM-based evaluation methods can achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation capabilities. We make our collected benchmark InstruSum publicly available to facilitate future research in this direction.

Updated: 2024-07-12 17:35:18

标题: 大型语言模型在可控制摘要生成和评估能力方面的基准测试

摘要: 尽管大型语言模型（LLMs）已经能够在标准的通用摘要基准测试上取得较强的表现，但它们在更复杂的摘要任务设置上的表现研究较少。因此，我们在指令可控文本摘要上对LLMs进行基准测试，其中模型输入包括源文章和所需摘要特征的自然语言要求。为此，我们为这一任务设置策划了一个仅供评估的数据集，并对五个基于LLMs的系统进行人类评估，以评估它们在可控摘要中遵循指令的能力。然后，我们使用4种不同的评估协议和11个LLMs对该任务进行基于LLMs的自动评估，共计40种评估方法。我们的研究揭示了指令可控文本摘要对LLMs仍然是一个具有挑战性的任务，因为（1）所有评估的LLMs在摘要中仍然存在事实和其他类型的错误；（2）基于LLMs的评估方法无法在判断候选摘要的质量时与人类标注者实现强大的对齐；（3）不同的LLMs在摘要生成和评估能力上显示出较大的性能差距。我们公开了我们收集的基准InstruSum，以促进未来研究在这个方向上的发展。

更新时间: 2024-07-12 17:35:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.09184v2

Human-like Episodic Memory for Infinite Context LLMs

Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs, enabling them to effectively handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an on-line fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench dataset demonstrate EM-LLM's superior performance, outperforming the state-of-the-art InfLLM model with an overall relative improvement of 4.3% across various tasks, including a 33% improvement on the PassageRetrieval task. Furthermore, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart. This work not only advances LLM capabilities in processing extended contexts but also provides a computational framework for exploring human memory mechanisms, opening new avenues for interdisciplinary research in AI and cognitive science.

Updated: 2024-07-12 17:34:03

标题: 人类式的无穷背景下的叙事记忆 LLMs

摘要: 大型语言模型（LLMs）展示了卓越的能力，但仍然在处理广泛语境时遇到困难，限制了它们在长序列上保持连贯性和准确性的能力。相比之下，人类大脑擅长组织和检索跨越广阔时间尺度的情节性经历，跨越一生。在这项工作中，我们介绍了EM-LLM，这是一种将人类情节性记忆和事件认知的关键方面整合到LLMs中的新方法，使它们能够有效处理几乎无限的上下文长度，同时保持计算效率。EM-LLM使用贝叶斯惊奇和图论边界细化的组合以在线方式将标记序列组织成连贯的情节事件。在需要时，通过两阶段记忆过程检索这些事件，结合基于相似性和时间上连续的检索，以实现对相关信息的高效和类似人类的访问。对LongBench数据集的实验显示，EM-LLM表现出优越的性能，比现有的InfLLM模型在各种任务中整体相对改进了4.3%，在PassageRetrieval任务上改进了33%。此外，我们的分析揭示了EM-LLM的事件分割与人类感知事件之间的强相关性，表明了这种人造系统与其生物对应物之间的桥梁。这项工作不仅推进了LLM在处理扩展语境方面的能力，还为探索人类记忆机制提供了计算框架，为人工智能和认知科学的跨学科研究开辟了新途径。

更新时间: 2024-07-12 17:34:03

领域: cs.AI,cs.CL,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2407.09450v1

The $μ\mathcal{G}$ Language for Programming Graph Neural Networks

Graph neural networks form a class of deep learning architectures specifically designed to work with graph-structured data. As such, they share the inherent limitations and problems of deep learning, especially regarding the issues of explainability and trustworthiness. We propose $\mu\mathcal{G}$, an original domain-specific language for the specification of graph neural networks that aims to overcome these issues. The language's syntax is introduced, and its meaning is rigorously defined by a denotational semantics. An equivalent characterization in the form of an operational semantics is also provided and, together with a type system, is used to prove the type soundness of $\mu\mathcal{G}$. We show how $\mu\mathcal{G}$ programs can be represented in a more user-friendly graphical visualization, and provide examples of its generality by showing how it can be used to define some of the most popular graph neural network models, or to develop any custom graph processing application.

Updated: 2024-07-12 17:27:43

标题: 用于编程图神经网络的$μ\mathcal{G}$语言

摘要: 图神经网络是一类专门设计用于处理图结构数据的深度学习架构。因此，它们与深度学习的固有限制和问题相似，特别是关于可解释性和可信度的问题。我们提出了$\mu\mathcal{G}$，这是一个用于指定图神经网络的领域特定语言，旨在克服这些问题。介绍了语言的语法，并通过一个指称语义严格定义了其含义。还提供了形式化语义的等价特征描述，并结合类型系统，用于证明$\mu\mathcal{G}$的类型完备性。我们展示了如何将$\mu\mathcal{G}$程序表示为更加用户友好的图形可视化，并通过展示如何使用它来定义一些最流行的图神经网络模型，或者开发任何自定义图处理应用程序，展示了其通用性的示例。

更新时间: 2024-07-12 17:27:43

领域: cs.FL,cs.AI,cs.LG,D.2.4

下载: http://arxiv.org/abs/2407.09441v1

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.

Updated: 2024-07-12 17:26:08

标题: 零射连续提示转移：在语言模型之间泛化任务语义

摘要: 在自然语言处理（NLP）中，即时调整已成为一种越来越受欢迎的方法，用于使大型语言模型适应特定任务。然而，这些提示的可转移性，尤其是连续提示，在不同模型之间仍然是一个挑战。在这项工作中，我们提出了一种零-shot连续提示转移方法，其中源提示被编码为相对空间，相应的目标提示被搜索以转移到目标模型。实验结果证实了我们的方法的有效性，显示连续提示中的“任务语义”可以在各种语言模型之间泛化。此外，我们发现从多个源模型中合并“任务语义”可以进一步增强转移的通用性。

更新时间: 2024-07-12 17:26:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.01691v2

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve model's parameter efficiency. We validate the effectiveness of our method by pruning two state-of-the-art MoE models, Mixtral-8x7B and Mixtral-8x22B. Evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. To facilitate future research, we will release our code and the pruned MoE models.

Updated: 2024-07-12 17:25:02

标题: 为了在稀疏混合专家模型中实现任务无关修剪，多样化专家知识

摘要: 通过增加模型参数，但在执行任务时稀疏激活它们，混合专家（MoE）架构显著提高了大型语言模型（LLMs）的性能，而不增加推断成本。然而，由于专家数量不断增加导致的内存消耗给这些模型在许多实际场景中的部署带来挑战。我们的实证研究表明，在预训练期间，一些专家编码了多余的知识。因此，我们提出了一种对类似专家进行分组和修剪以提高模型参数效率的方法。我们通过对两个最先进的MoE模型Mixtral-8x7B和Mixtral-8x22B进行修剪来验证我们方法的有效性。评估结果显示，我们的方法在一系列自然语言任务上优于其他模型修剪方法。为了促进未来的研究，我们将发布我们的代码和修剪后的MoE模型。

更新时间: 2024-07-12 17:25:02

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.09590v1

Needle in the Haystack for Memory Based Large Language Models

Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.

Updated: 2024-07-12 17:20:34

标题: 基于记忆的大型语言模型中的海岛中的针

摘要: 目前大型语言模型（LLMs）在简单事实检索任务上表现往往较差。在这里，我们调查了将动态可适应外部存储器与LLM相结合是否可以缓解这个问题。为此，我们测试了Larimar，这是一种最近提出的语言模型架构，它使用外部关联存储器，在包括密码和大海捞针测试在内的长上下文回忆任务上。我们证明了Larimar的外部存储器可以快速写入和读取文本样本的片段，在测试时可以处理比训练时看到的更长的上下文。我们进一步展示了从存储器中读取的潜在输出（用于写入长上下文）控制解码器生成正确输出，而存储器存储在GPU之外。与现有基于变压器的LLM架构相比，这些架构用于长上下文回忆任务的参数计数更大或使用修改后的注意机制，相对较小的Larimar能够在没有任何特定任务的训练或在更长上下文上进行训练的情况下保持较强的性能。

更新时间: 2024-07-12 17:20:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01437v2

Let Me DeCode You: Decoder Conditioning with Tabular Data

Training deep neural networks for 3D segmentation tasks can be challenging, often requiring efficient and effective strategies to improve model performance. In this study, we introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically, aiming to enhance the efficiency of the training process. DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features. Specifically, we develop an approach, where conditioning is applied during the training phase to guide the network toward robust segmentation. When labels are not available during inference, our model infers the necessary conditioning embedding directly from the input data, thanks to a feed-forward network learned during the training phase. This approach is tested using synthetic data and cone-beam computed tomography (CBCT) images of teeth. For CBCT, three datasets are used: one publicly available and two in-house. Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost. This work represents the first of its kind to explore conditioning strategies in 3D data segmentation, offering a novel and more efficient method for leveraging annotated data. Our code, pre-trained models are publicly available at https://github.com/SanoScience/DeCode .

Updated: 2024-07-12 17:14:33

标题: 让我来解码你：使用表格数据进行解码器条件设置

摘要: 训练深度神经网络进行3D分割任务可能具有挑战性，通常需要有效和有效的策略来提高模型性能。在这项研究中，我们介绍了一种新颖的方法，DeCode，它利用标签派生的特征来对模型进行调节，以支持解码器在重建过程中的动态操作，旨在增强训练过程的效率。DeCode专注于通过将学习的3D标签形状特征的数值表示与调节嵌入相结合，从而提高3D分割性能。具体而言，我们开发了一种方法，在训练阶段应用调节来引导网络实现稳健的分割。当在推断过程中标签不可用时，我们的模型可以直接从输入数据中推断出必要的调节嵌入，这要归功于在训练阶段学习的前馈网络。这种方法使用合成数据和锥形束计算机断层扫描（CBCT）牙齿图像进行测试。对于CBCT，使用了三个数据集：一个公开可用的和两个内部数据集。我们的结果表明，DeCode在泛化到未见数据方面明显优于传统的无调节模型，在降低计算成本的同时实现更高的准确性。这项工作是首次在3D数据分割中探索调节策略，提供了一种利用标注数据的新颖且更有效的方法。我们的代码和预训练模型可以在https://github.com/SanoScience/DeCode 上公开获取。

更新时间: 2024-07-12 17:14:33

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.09437v1

MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with. They have to adapt their mental model with every update -- a draining task that can lead to user dissatisfaction. In practice, fine-tuned downstream task adapters rely on pretrained LLM base models. When these base models are updated, these user-facing downstream task models experience instance regression or negative flips -- previously correct instances are now predicted incorrectly. This happens even when the downstream task training procedures remain identical. Our work aims to provide seamless model updates to a user in two ways. First, we provide evaluation metrics for a notion of compatibility to prior model versions, specifically for generative tasks but also applicable for discriminative tasks. We observe regression and inconsistencies between different model versions on a diverse set of tasks and model updates. Second, we propose a training strategy to minimize the number of inconsistencies in model updates, involving training of a compatibility model that can enhance task fine-tuned language models. We reduce negative flips -- instances where a prior model version was correct, but a new model incorrect -- by up to 40% from Llama 1 to Llama 2.

Updated: 2024-07-12 17:12:48

标题: MUSCLE：一个兼容LLM进化的模型更新策略

摘要: 大型语言模型（LLMs）经常因数据或架构变化而进行更新，以改善其性能。在更新模型时，开发人员通常专注于提高整体性能指标，对与之前模型版本兼容性的重视较少。然而，用户经常会建立对特定机器学习模型的功能和能力的心理模型。他们必须随着每次更新调整其心理模型 -- 这是一项耗费精力的任务，可能导致用户不满意。在实践中，精调的下游任务适配器依赖于预训练的LLM基础模型。当这些基础模型更新时，这些面向用户的下游任务模型会出现实例回归或负翻转 -- 以前正确的实例现在被错误预测。即使下游任务的训练程序保持不变，这种情况也会发生。我们的工作旨在为用户提供两种无缝的模型更新方式。首先，我们提供了一个兼容性指标的评估指标，特别适用于生成任务，但也适用于判别任务。我们观察到在各种任务和模型更新中不同模型版本之间的回归和不一致性。其次，我们提出了一种训练策略，以最小化模型更新中的不一致性数量，涉及训练一个能够增强任务精调语言模型的兼容性模型。我们成功将负翻转减少了40%，从Llama 1到Llama 2。

更新时间: 2024-07-12 17:12:48

领域: cs.AI

下载: http://arxiv.org/abs/2407.09435v1

NeuFair: Neural Network Fairness Repair with Dropout

This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existing bias mitigation algorithms often require modifying the input dataset or the learning algorithms. We posit that the prevalent dropout methods that prevent over-fitting during training by randomly dropping neurons may be an effective and less intrusive approach to improve the fairness of pre-trained DNNs. However, finding the ideal set of neurons to drop is a combinatorial problem. We propose NeuFair, a family of post-processing randomized algorithms that mitigate unfairness in pre-trained DNNs via dropouts during inference after training. Our randomized search is guided by an objective to minimize discrimination while maintaining the model's utility. We show that our design of randomized algorithms is effective and efficient in improving fairness (up to 69%) with minimal or no model performance degradation. We provide intuitive explanations of these phenomena and carefully examine the influence of various hyperparameters of search algorithms on the results. Finally, we empirically and conceptually compare NeuFair to different state-of-the-art bias mitigators.

Updated: 2024-07-12 17:10:14

标题: NeuFair: 使用Dropout修复神经网络公平性

摘要: 本文研究了神经元丢失作为深度神经网络（DNNs）后处理偏见缓解的方法。基于神经元的软件解决方案越来越多地应用于具有重要公平性影响的社会关键领域。尽管神经网络在从数据中找到统计模式方面表现出色，但它们可能会对历史数据中已存在的偏见进行编码和放大。现有的偏见缓解算法通常需要修改输入数据集或学习算法。我们认为，在训练过程中通过随机丢弃神经元来防止过拟合的普遍丢弃方法可能是改善预训练DNNs公平性的一种有效且不太侵入的方法。然而，找到要丢弃的理想神经元集是一个组合问题。我们提出NeuFair，一组随机后处理算法，通过在训练后的推断期间使用丢弃来减轻预训练DNNs中的不公平性。我们的随机搜索受到一个旨在最小化歧视的客观引导，同时保持模型的效用。我们展示了我们设计的随机算法在提高公平性方面是有效且高效的（高达69%），并且几乎没有或没有模型性能下降。我们提供了这些现象的直观解释，并仔细研究了搜索算法的各种超参数对结果的影响。最后，我们在实证和概念上将NeuFair与不同的最新偏见缓解器进行了比较。

更新时间: 2024-07-12 17:10:14

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.04268v2

A Perspective on Foundation Models for the Electric Power Grid

Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit.

Updated: 2024-07-12 17:09:47

标题: 一个关于电力网基础模型的视角

摘要: 基础模型（FMs）目前主导新闻头条。它们利用先进的深度学习架构通过自我监督从庞大数据集中自主提取结构信息。由此产生的复杂系统和动态的丰富表征可以应用于许多下游应用。因此，FMs可以在面临能源转型和气候变化挑战的电力网络中发挥作用。在本文中，我们呼吁发展，并阐述我们相信FMs在电力网络中的潜力。我们强调它们在不断变化的电网挑战中的优势和劣势。我们认为，一个从多样化电网数据和拓扑中学习的FM可以释放变革能力，开创一种新的方法，利用人工智能重新定义我们管理电力网中的复杂性和不确定性。最后，我们讨论了一个名为GridFM的电力网FM概念，基于图神经网络，并展示了不同下游任务的受益情况。

更新时间: 2024-07-12 17:09:47

领域: cs.LG,cs.AI,cs.CE,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.09434v1

Metric Learning from Limited Pairwise Preference Comparisons

We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, $o(d)$ comparisons reveal no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.

Updated: 2024-07-12 16:56:18

标题: 有限配对偏好比较中的度量学习

摘要: 我们研究了在理想点模型下通过偏好比较学习度量的问题，其中用户更倾向于某个物品而非另一个物品，如果它更接近他们潜在的理想物品。这些物品嵌入到$\mathbb{R}^d$中，配备有未知的马氏距离，该距离在用户之间共享。尽管最近的研究表明，通过每个用户的$\mathcal{O}(d)$两两比较可以同时恢复度量和理想物品，但在实践中，我们通常有一个有限的比较预算$o(d)$。我们研究了即使已知无法学习单个理想物品，度量是否仍然可以恢复。我们表明，一般来说，$o(d)$比较不会揭示有关度量的信息，即使有无限多的用户。然而，当比较的物品表现出低维结构时，每个用户都可以为限制在低维子空间的度量的学习做出贡献，从而可以联合识别度量。我们提出了一种分而治之的方法来实现这一点，并提供了理论恢复保证和经验验证。

更新时间: 2024-07-12 16:56:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.19629v2

Flow-Based Generative Emulation of Grids of Stellar Evolutionary Models

We present a flow-based generative approach to emulate grids of stellar evolutionary models. By interpreting the input parameters and output properties of these models as multi-dimensional probability distributions, we train conditional normalizing flows to learn and predict the complex relationships between grid inputs and outputs in the form of conditional joint distributions. Leveraging the expressive power and versatility of these flows, we showcase their ability to emulate a variety of evolutionary tracks and isochrones across a continuous range of input parameters. In addition, we describe a simple Bayesian approach for estimating stellar parameters using these flows and demonstrate its application to asteroseismic datasets of red giants observed by the Kepler mission. By applying this approach to red giants in open clusters NGC 6791 and NGC 6819, we illustrate how large age uncertainties can arise when fitting only to global asteroseismic and spectroscopic parameters without prior information on initial helium abundances and mixing length parameter values. We also conduct inference using the flow at a large scale by determining revised estimates of masses and radii for 15,388 field red giants. These estimates show improved agreement with results from existing grid-based modelling, reveal distinct population-level features in the red clump, and suggest that the masses of Kepler red giants previously determined using the corrected asteroseismic scaling relations have been overestimated by 5-10%.

Updated: 2024-07-12 16:54:17

标题: 基于流的生成仿真恒星演化模型格点

摘要: 我们提出了一种基于流的生成方法来模拟恒星演化模型的网格。通过将这些模型的输入参数和输出属性解释为多维概率分布，我们训练条件正规化流来学习和预测网格输入和输出之间的复杂关系，以条件联合分布的形式展现。利用这些流的表达能力和多样性，我们展示了它们能够在连续的输入参数范围内模拟各种演化轨迹和等时线。此外，我们描述了一种简单的贝叶斯方法，利用这些流来估计恒星参数，并展示了其在凯普勒任务观测到的红巨星的星震数据集中的应用。通过将这种方法应用于NGC 6791和NGC 6819开放星团中的红巨星，我们说明了当仅拟合全局星震和光谱参数而没有关于初始氦丰度和混合长度参数值的先验信息时，会出现大量年龄不确定性。我们还通过使用流进行大规模推断，确定了15,388颗场中红巨星的修订质量和半径估计。这些估计结果与现有基于网格的建模结果相一致，揭示了红色亮星团中的不同种群水平特征，并表明之前使用修正星震比例关系确定的凯普勒红巨星的质量被高估了5-10%。

更新时间: 2024-07-12 16:54:17

领域: astro-ph.SR,astro-ph.GA,cs.LG

下载: http://arxiv.org/abs/2407.09427v1

Identifying macro conditional independencies and macro total effects in summary causal graphs with latent confounding

Understanding causal relationships in dynamic systems is essential for numerous scientific fields, including epidemiology, economics, and biology. While causal inference methods have been extensively studied, they often rely on fully specified causal graphs, which may not always be available or practical in complex dynamic systems. Partially specified causal graphs, such as summary causal graphs (SCGs), provide a simplified representation of causal relationships, omitting temporal information and focusing on high-level causal structures. This simplification introduces new challenges concerning the types of queries of interest: macro queries, which involve relationships between clusters represented as vertices in the graph, and micro queries, which pertain to relationships between variables that are not directly visible through the vertices of the graph. In this paper, we first clearly distinguish between macro conditional independencies and micro conditional independencies and between macro total effects and micro total effects. Then, we demonstrate the soundness and completeness of the d-separation to identify macro conditional independencies in SCGs. Furthermore, we establish that the do-calculus is sound and complete for identifying macro total effects in SCGs. Conversely, we also show through various examples that these results do not hold when considering micro conditional independencies and micro total effects.

Updated: 2024-07-12 16:51:13

标题: 在具有潜在混淆的总结因果图中识别宏观条件独立性和宏观总效果

摘要: 理解动态系统中的因果关系对于许多科学领域至关重要，包括流行病学、经济学和生物学。虽然因果推断方法已经得到广泛研究，但它们通常依赖于完全指定的因果图，这在复杂的动态系统中并不总是可用或实际可行。部分指定的因果图，如总结性因果图（SCGs），提供了对因果关系的简化表示，省略了时间信息，专注于高层次的因果结构。这种简化引入了关于感兴趣查询类型的新挑战：宏观查询涉及图中表示为顶点的簇之间的关系，微观查询涉及通过图的顶点直接可见的变量之间的关系。在本文中，我们首先清楚地区分了宏观条件独立性和微观条件独立性，以及宏观总效应和微观总效应。然后，我们展示了d-分离用于识别SCGs中宏观条件独立性的完备性和完备性。此外，我们建立了do-演算法对于识别SCGs中宏观总效应的有效性和完备性。相反，我们还通过各种示例表明，在考虑微观条件独立性和微观总效应时，这些结果并不成立。

更新时间: 2024-07-12 16:51:13

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2407.07934v2

Improving Alignment and Robustness with Circuit Breakers

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

Updated: 2024-07-12 16:51:07

标题: 使用断路器改进对齐性和稳健性

摘要: AI系统可能采取有害行动，并且极易受到对抗性攻击的影响。我们提出了一种受到最新表示工程进展启发的方法，即在模型产生有害输出时通过“断路器”来中断其响应。现有的旨在改进对齐性的技术，如拒绝训练，往往会被绕过。对抗性训练等技术试图通过对抗特定攻击来填补这些漏洞。作为拒绝训练和对抗性训练的替代方案，断路器直接控制那些最初产生有害输出的表示。我们的技术可应用于仅文本和多模态语言模型，以防止生成有害输出而不牺牲效用--即使在强大的未知攻击存在的情况下。值得注意的是，尽管独立图像识别中的对抗性鲁棒性仍然是一个开放挑战，但断路器可以使更大的多模态系统可靠地抵抗旨在生成有害内容的图像“劫持”。最后，我们将我们的方法扩展到AI代理，当它们受到攻击时，演示了有害行动减少率的显著降低。我们的方法代表了在开发可靠防范有害行为和对抗性攻击方面的重要进步。

更新时间: 2024-07-12 16:51:07

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.04313v4

TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models

Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pre-train dataset, instruction dataset, preference dataset to perform continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed three new benchmarks, namely, Telecom Math Modeling, Telecom Open QnA and Telecom Code Tasks. These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain. Our fine-tuned LLM TelecomGPT outperforms state of the art (SOTA) LLMs including GPT-4, Llama-3 and Mistral in Telecom Math Modeling benchmark significantly and achieve comparable performance in various evaluation benchmarks such as TeleQnA, 3GPP technical documents classification, telecom code summary and generation and infilling.

Updated: 2024-07-12 16:51:02

标题: TelecomGPT：构建电信特定大型语言模型的框架

摘要: 大型语言模型（LLMs）有潜力彻底改变第六代（6G）通信网络。然而，当前主流的LLMs通常缺乏在电信领域的专业知识。在本文中，我们首次提出了一个流程，将任何通用目的的LLMs调整为电信专用的LLMs。我们收集和构建了电信专用的预训练数据集、指导数据集和偏好数据集，分别进行持续预训练、指导微调和对齐微调。此外，由于电信领域缺乏广泛接受的评估基准，我们扩展了现有的评估基准，并提出了三个新的基准，即电信数学建模、电信开放问答和电信代码任务。这些新的基准提供了对LLMs在电信领域的能力进行全面评估，包括数学建模、开放式问题回答、代码生成、填充、摘要和分析。我们微调的LLM TelecomGPT在电信数学建模基准中表现优于GPT-4、Llama-3和Mistral等LLMs的最新技术，并在各种评估基准中取得可比的表现，如TeleQnA、3GPP技术文件分类、电信代码摘要和生成以及填充。

更新时间: 2024-07-12 16:51:02

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2407.09424v1

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms

Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. Among emerging technologies, we also include some insights into quantum-based accelerators and photonics. To conclude, the survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning.

Updated: 2024-07-12 16:50:59

标题: 一项针对异构高性能计算平台深度学习硬件加速器的调查

摘要: 最近的深度学习（DL）趋势使硬件加速器成为高性能计算（HPC）应用程序的几个类别中最可行的解决方案，例如图像分类、计算机视觉和语音识别。本调查总结和分类了设计适合实现HPC应用程序性能要求的最新DL加速器的最新进展。特别地，它突出了支持深度学习加速的最先进方法，包括不仅是基于GPU和TPU的加速器，还有基于FPGA和ASIC的设计特定硬件加速器，神经处理单元，基于开放硬件RISC-V的加速器和协处理器。调查还描述了基于新兴内存技术和计算范式的加速器，例如3D堆叠处理器内存、非易失性内存（主要是电阻性RAM和相变存储器）用于实现内存计算、神经形态处理单元和基于多芯片模块的加速器。在新兴技术中，我们还包括一些关于基于量子和光子的加速器的见解。最后，调查对过去几年提出的最具影响力的架构和技术进行分类，旨在为读者提供对深度学习领域快速发展的全面视角。

更新时间: 2024-07-12 16:50:59

领域: cs.AR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2306.15552v2

Budget Recycling Differential Privacy

Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.

Updated: 2024-07-12 16:46:17

标题: 预算回收差分隐私

摘要: 差分隐私（DP）机制通常通过产生“超出边界”的嘈杂结果来强制降低数据效用，以满足严格的隐私预算。我们引入了预算回收差分隐私（BR-DP）框架，旨在为广泛范围的现有DP机制提供软边界的嘈杂输出。所谓“软边界”，指的是机制释放大部分输出在预定义的误差边界内，从而同时提高效用和保持隐私。BR-DP的核心包括两个组件：一个负责在每次迭代生成嘈杂答案的DP核心，以及一个通过概率方式回收/再生或释放嘈杂答案的回收器。我们深入探讨了BR-DP的隐私核算，最终制定了一项预算原则，可以在DP核心和回收器之间最优地分配可用预算。此外，我们提出了用于在组合情境中进行严格BR-DP核算的算法，我们的研究结果表明，与DP相比，BR-DP在组合后实现了降低的隐私泄露。此外，我们探讨了在BR-DP框架中通过子抽样进行隐私放大的概念，并提出了在各种查询中适用于BR-DP的最佳抽样率。我们使用真实数据进行实验，结果表明BR-DP在提高DP机制提供的效用-隐私权衡方面非常有效。

更新时间: 2024-07-12 16:46:17

领域: cs.CR,cs.DS,eess.SP

下载: http://arxiv.org/abs/2403.11445v4

A Benchmark Environment for Offline Reinforcement Learning in Racing Games

Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL) by eliminating the need for continuous environmental interactions. ORL exploits a dataset of pre-collected transitions and thus expands the range of application of RL to tasks in which the excessive environment queries increase training time and decrease efficiency, such as in modern AAA games. This paper introduces OfflineMania a novel environment for ORL research. It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine. The environment simulates a single-agent racing game in which the objective is to complete the track through optimal navigation. We provide a variety of datasets to assess ORL performance. These datasets, created from policies of varying ability and in different sizes, aim to offer a challenging testbed for algorithm development and evaluation. We further establish a set of baselines for a range of Online RL, ORL, and hybrid Offline to Online RL approaches using our environment.

Updated: 2024-07-12 16:44:03

标题: 《赛车游戏中离线强化学习的基准环境》

摘要: 离线强化学习（ORL）是一种有希望的方法，可以通过消除对连续环境交互的需求来减少传统强化学习（RL）的高样本复杂性。ORL利用预先收集的转换数据集，从而扩大了RL的应用范围，适用于环境查询过多导致训练时间增加和效率降低的任务，例如现代AAA游戏。本文介绍了OfflineMania，这是一个用于ORL研究的新颖环境。它受到了标志性的TrackMania系列的启发，并使用Unity 3D游戏引擎开发。该环境模拟了一个单一代理赛车游戏，其目标是通过最佳导航完成赛道。我们提供了多种数据集来评估ORL的性能。这些数据集由不同能力和大小的策略创建，旨在为算法开发和评估提供具有挑战性的实验平台。我们进一步建立了一组基线，用于使用我们的环境进行一系列在线RL，ORL和混合离线到在线RL方法的评估。

更新时间: 2024-07-12 16:44:03

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09415v1

On scalable oversight with weak LLMs judging strong LLMs

Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.

Updated: 2024-07-12 16:38:12

标题: 关于具有弱LLMs评价强LLMs的可扩展监督

摘要: 可扩展的监督协议旨在使人类能够准确监督超级人工智能。在本文中，我们研究了辩论，其中两个人工智能竞争说服一名裁判；咨询，其中一个人工智能试图说服一个提问问题的裁判；并与直接问答的基准进行比较，裁判直接回答而不需要人工智能。我们使用大型语言模型(LLMs)作为人工智能代理和人类裁判的替代品，将裁判模型视为比代理模型更弱。我们对裁判和代理之间的各种不对称进行基准测试，扩展先前对具有信息不对称的单一抽取式问答任务的研究，还包括数学、编码、逻辑和多模态推理的不对称性。我们发现，在咨询时，如果顾问被随机分配为争论正确/错误答案，则辩论在所有任务中表现优于咨询。将辩论与直接问答进行比较，结果取决于任务类型：在具有信息不对称的抽取式问答任务中，辩论优于直接问答，但在其他没有信息不对称的任务中，结果是混合的。先前的研究为辩手/顾问分配了一个要辩论的答案。当我们允许他们选择要辩论的答案时，我们发现在辩论中，裁判被错误答案说服的频率比在咨询中要少。此外，我们发现更强的辩手模型提高了裁判的准确性，尽管比以前的研究更适度。

更新时间: 2024-07-12 16:38:12

领域: cs.LG

下载: http://arxiv.org/abs/2407.04622v2

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Seeking answers to questions within long scientific research articles is a crucial area of study that aids readers in quickly addressing their inquiries. However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. To address this limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the first large-scale QA dataset specifically designed to interpret complex figures and tables within the context of scientific research articles across various domains of computer science. Leveraging the breadth of expertise and ability of multimodal large language models (MLLMs) to understand figures, we employ automatic and manual curation to create the dataset. We craft an information-seeking task involving multiple images that cover a wide variety of plots, charts, tables, schematic diagrams, and result visualizations. SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits. Through extensive experiments with 12 prominent foundational models, we evaluate the ability of current multimodal systems to comprehend the nuanced aspects of research articles. Additionally, we propose a Chain-of-Thought (CoT) evaluation strategy with in-context retrieval that allows fine-grained, step-by-step assessment and improves model performance. We further explore the upper bounds of performance enhancement with additional textual information, highlighting its promising potential for future research and the dataset's impact on revolutionizing how we interact with scientific literature.

Updated: 2024-07-12 16:37:59

标题: SPIQA：一个用于科学论文多模态问答的数据集

摘要: 寻找长篇科学研究文章中的问题的答案是一个关键的研究领域，它帮助读者快速解决他们的疑问。然而，现有基于科学论文的问答（QA）数据集在规模上受限，并且仅关注文本内容。为了解决这一局限，我们引入了SPIQA（Scientific Paper Image Question Answering），这是第一个专门设计用于解释计算机科学领域科学研究文章中复杂图表和表格的大规模QA数据集。利用多模态大语言模型（MLLMs）的广泛专业知识和理解图像的能力，我们采用自动和手动策划来创建数据集。我们设计了一个涉及多个图像的信息搜索任务，涵盖各种情节、图表、表格、示意图和结果可视化。SPIQA包括270K个问题，分为训练、验证和三个不同的评估分割。通过与12个著名的基础模型进行广泛实验，我们评估当前多模态系统理解研究文章微妙方面的能力。此外，我们提出了一种Chain-of-Thought（CoT）评估策略，通过上下文检索进行细粒度、逐步评估，提高模型性能。我们进一步探讨通过额外文本信息提高性能的上限，并突出其对未来研究的潜力以及对改变我们与科学文献互动方式的数据集影响。

更新时间: 2024-07-12 16:37:59

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.09413v1

StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images

Stain normalization algorithms aim to transform the color and intensity characteristics of a source multi-gigapixel histology image to match those of a target image, mitigating inconsistencies in the appearance of stains used to highlight cellular components in the images. We propose a new approach, StainFuser, which treats this problem as a style transfer task using a novel Conditional Latent Diffusion architecture, eliminating the need for handcrafted color components. With this method, we curate SPI-2M the largest stain normalization dataset to date of over 2 million histology images with neural style transfer for high-quality transformations. Trained on this data, StainFuser outperforms current state-of-the-art deep learning and handcrafted methods in terms of the quality of normalized images and in terms of downstream model performance on the CoNIC dataset.

Updated: 2024-07-12 16:27:06

标题: StainFuser：在多千兆像素组织学图像中控制扩散以实现更快的神经样式转移

摘要: 染色归一化算法旨在将源多吉普赛尔组织学图像的颜色和强度特征转换为与目标图像相匹配，从而减轻图像中用于突出细胞组分的染料外观不一致性。我们提出了一种新方法，StainFuser，将这个问题视为使用新颖的条件潜在扩散架构的风格转移任务，从而消除了手工制作颜色组分的需要。通过这种方法，我们策划了迄今为止最大的染色归一化数据集SPI-2M，其中包含超过2百万个组织学图像，通过神经风格转移进行高质量转换。在这些数据上训练后，StainFuser在标准化图像质量和CoNIC数据集的下游模型性能方面均优于当前最先进的深度学习和手工制作方法。

更新时间: 2024-07-12 16:27:06

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.09302v2

Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...

Updated: 2024-07-12 16:18:05

标题: 深度词袋模型：一种高效且可解释的用于中国电子商务的相关性架构

摘要: 文本相关性或文本匹配是电子商务搜索系统的关键技术，以确保显示的产品能够与查询的意图匹配。许多研究侧重于改进搜索系统中相关性模型的性能。最近，像BERT这样的预训练语言模型在文本相关性任务上取得了令人期待的表现。尽管这些模型在离线测试数据集上表现良好，但由于高延迟，将预训练语言模型部署到在线系统仍然存在障碍。双塔模型在工业场景中被广泛应用，因为它能够在性能和计算效率之间取得平衡。不幸的是，这些模型具有不透明的“黑盒”特性，阻碍开发人员进行特殊优化。在本文中，我们提出了一种深度词袋模型（DeepBoW），这是一种高效且可解释的用于中国电子商务的相关性架构。我们的方法建议将查询和产品编码为稀疏的词袋表示，其中包含一组单词-权重对。权重表示对应单词与原始文本之间的重要性或相关性得分。相关性得分通过查询和产品的稀疏词袋表示之间匹配的单词累积来衡量。与通常存在黑盒问题的流行的密集分布表示相比，所提出的表示模型的最大优势是高度可解释和可干预，这是比部署和操作在线搜索引擎更具优势的优势。此外，所提出模型的在线效率甚至优于最高效的密集表示形式的内积形式。

更新时间: 2024-07-12 16:18:05

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.09395v1

Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning

The primary objective of methods in continual learning is to learn tasks in a sequential manner over time from a stream of data, while mitigating the detrimental phenomenon of catastrophic forgetting. In this paper, we focus on learning an optimal representation between previous class prototypes and newly encountered ones. We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL) tailored specifically for class-incremental learning scenarios. Therefore, we introduce a contrastive loss that incorporates new classes into the latent representation by reducing the intra-class distance and increasing the inter-class distance. Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique. Empirical evaluations conducted on both the CIFAR-10 and CIFAR-100 dataset for image classification and images of a GNSS-based dataset for interference classification validate the efficacy of our method, showcasing its superiority over existing state-of-the-art approaches.

Updated: 2024-07-12 16:14:33

标题: 基于贝叶斯学习的原型对比损失用于类别增量学习

摘要: 持续学习方法的主要目标是从数据流中以顺序方式随时间学习任务，同时减轻灾难性遗忘的不利现象。本文重点研究了在先前类别原型和新遇到的类别之间学习最佳表示。我们提出了一种具有贝叶斯学习驱动对比损失（BLCL）的原型网络，专门为类别增量学习场景量身定制。因此，我们引入了一种对比损失，通过减小类内距离和增加类间距离，将新类别纳入潜在表示中。我们的方法通过贝叶斯学习技术动态调整交叉熵和对比损失函数之间的平衡。在CIFAR-10和CIFAR-100数据集上进行的实证评估，用于图像分类，以及用于干扰分类的GNSS数据集的图像验证了我们方法的有效性，展示了其优于现有最先进方法的卓越性。

更新时间: 2024-07-12 16:14:33

领域: cs.CV,cs.AI,62P30, 68T30, 68T05, 68T37,G.3; I.2.4; I.2.6

下载: http://arxiv.org/abs/2405.11067v2

GAVEL: Generating Games Via Evolution and Language Models

Automatically generating novel and interesting games is a complex task. Challenges include representing game rules in a computationally workable form, searching through the large space of potential games under most such representations, and accurately evaluating the originality and quality of previously unseen games. Prior work in automated game generation has largely focused on relatively restricted rule representations and relied on domain-specific heuristics. In this work, we explore the generation of novel games in the comparatively expansive Ludii game description language, which encodes the rules of over 1000 board games in a variety of styles and modes of play. We draw inspiration from recent advances in large language models and evolutionary computation in order to train a model that intelligently mutates and recombines games and mechanics expressed as code. We demonstrate both quantitatively and qualitatively that our approach is capable of generating new and interesting games, including in regions of the potential rules space not covered by existing games in the Ludii dataset. A sample of the generated games are available to play online through the Ludii portal.

Updated: 2024-07-12 16:08:44

标题: GAVEL：通过进化和语言模型生成游戏

摘要: 自动生成新颖有趣的游戏是一项复杂的任务。挑战包括以可计算形式表示游戏规则、在大规模的潜在游戏空间中搜索以大多数这样的表示方式、准确评估以前未见游戏的独创性和质量。以往的自动游戏生成工作主要集中在相对受限制的规则表示上，并依赖于领域特定的启发式方法。在这项工作中，我们探索在相对较广泛的Ludii游戏描述语言中生成新颖游戏，该语言编码了1000多种棋盘游戏的规则，包括各种风格和玩法。我们从最近在大型语言模型和进化计算方面取得的进展中汲取灵感，以训练一个能够智能地变异和重组以代码表达的游戏和机制的模型。我们定量和定性地证明，我们的方法能够生成新颖有趣的游戏，包括在Ludii数据集中现有游戏尚未涵盖的潜在规则空间中。一些生成的游戏样本可通过Ludii门户网站在线游玩。

更新时间: 2024-07-12 16:08:44

领域: cs.AI

下载: http://arxiv.org/abs/2407.09388v1

Meta-Analysis with Untrusted Data

[See paper for full abstract] Meta-analysis is a crucial tool for answering scientific questions. It is usually conducted on a relatively small amount of ``trusted'' data -- ideally from randomized, controlled trials -- which allow causal effects to be reliably estimated with minimal assumptions. We show how to answer causal questions much more precisely by making two changes. First, we incorporate untrusted data drawn from large observational databases, related scientific literature and practical experience -- without sacrificing rigor or introducing strong assumptions. Second, we train richer models capable of handling heterogeneous trials, addressing a long-standing challenge in meta-analysis. Our approach is based on conformal prediction, which fundamentally produces rigorous prediction intervals, but doesn't handle indirect observations: in meta-analysis, we observe only noisy effects due to the limited number of participants in each trial. To handle noise, we develop a simple, efficient version of fully-conformal kernel ridge regression, based on a novel condition called idiocentricity. We introduce noise-correcting terms in the residuals and analyze their interaction with a ``variance shaving'' technique. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper charts a new course for meta-analysis and evidence-based medicine, where heterogeneity and untrusted data are embraced for more nuanced and precise predictions.

Updated: 2024-07-12 16:07:53

标题: 元分析中的不可信数据

摘要: [完整摘要请参阅论文] 元分析是回答科学问题的关键工具。通常，它是在相对较少的“可信”数据上进行的 -- 理想情况下是来自随机对照试验 -- 这些数据允许可靠地估计因果效应而假设最小。我们展示了通过两个改变更精确地回答因果问题的方法。首先，我们融合了来自大型观察性数据库、相关科学文献和实践经验的不可信数据 -- 而不会牺牲严谨性或引入强假设。其次，我们训练了能够处理异质试验的更丰富模型，解决了元分析中长期存在的挑战。我们的方法基于一致预测，该方法基本上产生严格的预测区间，但不处理间接观察：在元分析中，我们仅观察到由于每个试验中参与者数量有限而导致的噪声效应。为了处理噪声，我们开发了一种简单、高效的全一致核岭回归版本，基于一种称为特征性的新颖条件。我们引入了纠正噪声的项，并分析它们与“方差削减”技术的相互作用。在多个医疗保健数据集上的多次实验中，我们的算法提供比传统方法更紧密、更合理的区间。本文为元分析和基于证据的医学开辟了新的道路，其中异质性和不可信数据被接受，以获得更加细致和准确的预测。

更新时间: 2024-07-12 16:07:53

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2407.09387v1

Long-term drought prediction using deep neural networks based on geospatial weather data

The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our systematic research employs diverse proposed models and five distinct environmental regions as a testbed to evaluate the efficacy of the Palmer Drought Severity Index (PDSI) prediction. Key aggregated findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts. At the same time, the Convolutional LSTM excels in longer-term forecasting.

Updated: 2024-07-12 16:05:50

标题: 基于地理空间天气数据的深度神经网络长期干旱预测

摘要: 提前一年进行高质量干旱预测的问题对于农业规划和保险至关重要。然而，由于数据复杂性和干旱随机性，这个问题仍然无法以合理的精度解决。我们通过引入一种端到端方法来处理干旱数据，该方法采用了一个以可访问的开放月度气候数据为输入的时空神经网络模型。我们的系统研究采用了多种提出的模型和五个不同的环境区域作为试验基地，以评估帕尔默干旱严重指数（PDSI）预测的有效性。主要的综合发现是Transformer模型EarthFormer在进行准确的短期（长达六个月）预测方面表现出色。同时，卷积LSTM在长期预测方面表现出色。

更新时间: 2024-07-12 16:05:50

领域: cs.LG

下载: http://arxiv.org/abs/2309.06212v6

The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited

Message passing is the dominant paradigm in Graph Neural Networks (GNNs). The efficiency of message passing, however, can be limited by the topology of the graph. This happens when information is lost during propagation due to being oversquashed when travelling through bottlenecks. To remedy this, recent efforts have focused on graph rewiring techniques, which disconnect the input graph originating from the data and the computational graph, on which message passing is performed. A prominent approach for this is to use discrete graph curvature measures, of which several variants have been proposed, to identify and rewire around bottlenecks, facilitating information propagation. While oversquashing has been demonstrated in synthetic datasets, in this work we reevaluate the performance gains that curvature-based rewiring brings to real-world datasets. We show that in these datasets, edges selected during the rewiring process are not in line with theoretical criteria identifying bottlenecks. This implies they do not necessarily oversquash information during message passing. Subsequently, we demonstrate that SOTA accuracies on these datasets are outliers originating from sweeps of hyperparameters -- both the ones for training and dedicated ones related to the rewiring algorithm -- instead of consistent performance gains. In conclusion, our analysis nuances the effectiveness of curvature-based rewiring in real-world datasets and brings a new perspective on the methods to evaluate GNN accuracy improvements.

Updated: 2024-07-12 16:03:58

标题: 曲率基础重连的有效性及超参数在GNN中的作用再探讨

摘要: 信息传递是图神经网络（GNNs）中的主导范式。然而，信息传递的效率可能会受到图的拓扑结构的限制。当信息在传播过程中由于通过瓶颈区域时被过度压缩而丢失时，会发生这种情况。为了解决这个问题，最近的研究工作集中在图重连技术上，这些技术会断开源自数据的输入图和计算图，信息传递是在后者上执行的。其中一个突出的方法是使用离散图曲率度量，已经提出了几种变体，用于识别并绕过瓶颈区域进行重连，促进信息传播。虽然在合成数据集中已经证明了过度压缩的情况，在这项工作中，我们重新评估了基于曲率的重连对真实数据集带来的性能提升。我们发现在这些数据集中，在重连过程中选择的边与理论标准识别瓶颈的标准并不一致。这意味着它们在信息传递过程中并不一定会过度压缩信息。随后，我们证明了这些数据集上的SOTA准确性是超参数扫描的异常值来源 - 包括用于训练和与重连算法相关的专用超参数 - 而不是一致的性能提升。总之，我们的分析细化了基于曲率的重连在真实数据集中的有效性，并为评估GNN准确性改进的方法带来了新的视角。

更新时间: 2024-07-12 16:03:58

领域: cs.LG

下载: http://arxiv.org/abs/2407.09381v1

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

Updated: 2024-07-12 15:59:53

标题: 超越静态人工智能评估：推进人类相互作用评估LLM危害和风险

摘要: 模型评估对于理解人工智能系统的安全性、风险和社会影响至关重要。虽然大多数现实世界的人工智能应用涉及人工智能与人类的互动，但目前大多数模型评估（例如常见的基准测试）并不考虑这一点。相反，它们以有限的方式纳入人类因素，仅评估模型的安全性，因此未能捕捉到人类与模型互动的复杂性。在本文中，我们讨论并操作化了一个新兴类别评估的定义——“人机互动评估”（HIEs），重点关注人机互动的评估或人类使用模型的过程和结果。首先，我们认为HIEs可以用于提高安全性评估的有效性，评估直接的人类影响和特定互动的危害，并指导未来模型社会影响的评估。其次，我们提出了一个以安全为重点的HIE设计框架——包含一个人机-模型互动分类法——包括三个阶段：（1）确定风险或危害领域，（2）描述使用背景，（3）选择评估参数。第三，我们将我们的框架应用于超依赖和说服风险的两个潜在评估。最后，我们得出了针对HIEs成本、可复制性和不代表性的担忧的具体建议。

更新时间: 2024-07-12 15:59:53

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10632v5

Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.

Updated: 2024-07-12 15:59:41

标题: Alistair：针对差分隐私广告测量系统的高效设备端预算管理

摘要: 随着主要浏览器即将移除第三方cookie，并引入新的保护隐私的广告API，研究界有机会在Web的隐私保护方面帮助行业进行定性改进。本文讨论了我们在W3C社区小组中的努力，以增强现有的隐私保护广告测量API。我们分析了来自谷歌、苹果、Meta和Mozilla的设计，并通过更严格和高效的差分隐私（DP）预算组件对其进行增强。我们的方法名为Alistair，实施了明确定义的DP保证，并使广告商能够准确进行更私密的测量查询。通过用个体形式的DP来表述隐私保证，我们可以使DP预算比当前使用传统DP定义的系统更高效。我们将Alistair整合到Chrome中，并在微基准测试和广告数据集上进行评估。在所有工作负载中，Alistair在提供更多广告测量数据的同时提供了可比较的DP保护，明显优于基线。

更新时间: 2024-07-12 15:59:41

领域: cs.CR

下载: http://arxiv.org/abs/2405.16719v2

Graph Neural Network Causal Explanation via Neural Causal Models

Graph neural network (GNN) explainers identify the important subgraph that ensures the prediction for a given graph. Until now, almost all GNN explainers are based on association, which is prone to spurious correlations. We propose {\name}, a GNN causal explainer via causal inference. Our explainer is based on the observation that a graph often consists of a causal underlying subgraph. {\name} includes three main steps: 1) It builds causal structure and the corresponding structural causal model (SCM) for a graph, which enables the cause-effect calculation among nodes. 2) Directly calculating the cause-effect in real-world graphs is computationally challenging. It is then enlightened by the recent neural causal model (NCM), a special type of SCM that is trainable, and design customized NCMs for GNNs. By training these GNN NCMs, the cause-effect can be easily calculated. 3) It uncovers the subgraph that causally explains the GNN predictions via the optimized GNN-NCMs. Evaluation results on multiple synthetic and real-world graphs validate that {\name} significantly outperforms existing GNN explainers in exact groundtruth explanation identification

Updated: 2024-07-12 15:56:33

标题: 通过神经因果模型的图神经网络因果解释

摘要: 图神经网络（GNN）解释器识别确保给定图的预测的重要子图。到目前为止，几乎所有的GNN解释器都是基于关联的，容易产生伪相关性。我们提出了一种名为{\name}的GNN因果解释器，通过因果推理。我们的解释器基于一个观察，即图通常由一个因果潜在的子图组成。{\name}包括三个主要步骤：1）它为图建立因果结构和相应的结构因果模型（SCM），从而实现节点之间的因果计算。2）在真实世界的图中直接计算因果效应是计算上具有挑战性的。然后，它受到最近的神经因果模型（NCM）的启发，这是一种可训练的特殊类型的SCM，并为GNN设计定制的NCM。通过训练这些GNN NCMs，因果效应可以被轻松计算。3）它通过优化的GNN-NCMs揭示因果解释GNN预测的子图。对多个合成和真实世界的图的评估结果验证了{\name}在精确地识别基准解释方面明显优于现有的GNN解释器。

更新时间: 2024-07-12 15:56:33

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.09378v1

HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

This work explores the in-context learning capabilities of State Space Models (SSMs) and presents, to the best of our knowledge, the first theoretical explanation of a possible underlying mechanism. We introduce a novel weight construction for SSMs, enabling them to predict the next state of any dynamical system after observing previous states without parameter fine-tuning. This is accomplished by extending the HiPPO framework to demonstrate that continuous SSMs can approximate the derivative of any input signal. Specifically, we find an explicit weight construction for continuous SSMs and provide an asymptotic error bound on the derivative approximation. The discretization of this continuous SSM subsequently yields a discrete SSM that predicts the next state. Finally, we demonstrate the effectiveness of our parameterization empirically. This work should be an initial step toward understanding how sequence models based on SSMs learn in context.

Updated: 2024-07-12 15:56:11

标题: HiPPO-预言：状态空间模型可以在特定情境下被证明学习动态系统

摘要: 这项工作探讨了状态空间模型（SSMs）的上下文学习能力，并据我们所知，首次提出了可能存在的潜在机制的理论解释。我们引入了一种新颖的权重构造方法，使SSMs能够在观察先前状态后预测任何动态系统的下一个状态，而无需进行参数微调。通过扩展HiPPO框架，我们证明了连续SSMs可以近似任何输入信号的导数。具体而言，我们找到了连续SSMs的明确权重构造，并为导数近似提供了一个渐近误差界限。对这种连续SSM的离散化随后产生了一个能够预测下一个状态的离散SSM。最后，我们通过实验证明了我们的参数化方法的有效性。这项工作应该是理解基于SSMs的序列模型如何在上下文中学习的一个初步步骤。

更新时间: 2024-07-12 15:56:11

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.09375v1

Towards Personalised Patient Risk Prediction Using Temporal Hospital Data Trajectories

Quantifying a patient's health status provides clinicians with insight into patient risk, and the ability to better triage and manage resources. Early Warning Scores (EWS) are widely deployed to measure overall health status, and risk of adverse outcomes, in hospital patients. However, current EWS are limited both by their lack of personalisation and use of static observations. We propose a pipeline that groups intensive care unit patients by the trajectories of observations data throughout their stay as a basis for the development of personalised risk predictions. Feature importance is considered to provide model explainability. Using the MIMIC-IV dataset, six clusters were identified, capturing differences in disease codes, observations, lengths of admissions and outcomes. Applying the pipeline to data from just the first four hours of each ICU stay assigns the majority of patients to the same cluster as when the entire stay duration is considered. In-hospital mortality prediction models trained on individual clusters had higher F1 score performance in five of the six clusters when compared against the unclustered patient cohort. The pipeline could form the basis of a clinical decision support tool, working to improve the clinical characterisation of risk groups and the early detection of patient deterioration.

Updated: 2024-07-12 15:53:26

标题: 朝向使用时间医院数据轨迹进行个性化患者风险预测

摘要: 量化患者的健康状况为临床医生提供了洞察力，使其能够更好地对资源进行分配和管理。早期预警评分（EWS）被广泛用于衡量住院患者的整体健康状况和不良结果的风险。然而，目前的EWS在个性化方面存在局限性，并且使用静态观察数据。我们提出了一个流程，通过患者整个住院期间的观察数据轨迹将重症监护病房患者分组，作为发展个性化风险预测的基础。考虑特征重要性以提供模型的可解释性。使用MIMIC-IV数据集，我们识别出了六个簇，捕捉了疾病代码、观察数据、住院时间和结果的差异。将该流程应用于每位重症监护病房患者住院的前四个小时的数据时，将大多数患者分配到与考虑整个住院期间时相同的簇中。在个别簇上训练的院内死亡预测模型在与未分组患者队列相比时，在六个簇中的五个中表现出更高的F1得分性能。该流程可以成为临床决策支持工具的基础，有助于改善风险群体的临床特征描述和患者恶化的早期检测。

更新时间: 2024-07-12 15:53:26

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09373v1

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. Despite their effectiveness, existing PEs require manual, empirical adjustment of crucial hyperparameters, specifically the Fourier features, tailored to each unique task. Further, PEs face challenges in efficiently learning high-frequency functions, particularly in tasks with limited data. In this paper, we introduce sinusoidal PE (SPE), designed to efficiently learn adaptive frequency features closely aligned with the true underlying function. Our experiments demonstrate that SPE, without hyperparameter tuning, consistently achieves enhanced fidelity and faster training across various tasks, including 3D view synthesis, Text-to-Speech generation, and 1D regression. SPE is implemented as a direct replacement for existing PEs. Its plug-and-play nature lets numerous tasks easily adopt and benefit from SPE.

Updated: 2024-07-12 15:51:53

标题: 使用正弦位置编码轻松学习高频函数

摘要: 基于傅里叶特征的位置编码（PE）通常用于涉及从低维输入中学习高频特征的机器学习任务，例如3D视图合成和具有神经切线核的时间序列回归。尽管现有的PE在有效性上表现出色，但需要对关键超参数进行手动、经验性调整，特别是对每个独特任务定制的傅里叶特征。此外，PE在高效学习高频函数方面面临挑战，特别是在数据有限的任务中。在本文中，我们介绍了正弦PE（SPE），旨在有效学习与真实底层函数密切相关的自适应频率特征。我们的实验表明，SPE在没有超参数调整的情况下，在各种任务中，包括3D视图合成、文本转语音生成和1D回归，始终实现了增强的保真度和更快的训练。SPE作为现有PE的直接替代实现。其即插即用的特性让许多任务轻松采用并受益于SPE。

更新时间: 2024-07-12 15:51:53

领域: cs.LG

下载: http://arxiv.org/abs/2407.09370v1

Revealing the True Cost of Locally Differentially Private Protocols: An Auditing Perspective

While the existing literature on Differential Privacy (DP) auditing predominantly focuses on the centralized model (e.g., in auditing the DP-SGD algorithm), we advocate for extending this approach to audit Local DP (LDP). To achieve this, we introduce the LDP-Auditor framework for empirically estimating the privacy loss of locally differentially private mechanisms. This approach leverages recent advances in designing privacy attacks against LDP frequency estimation protocols. More precisely, through the analysis of numerous state-of-the-art LDP protocols, we extensively explore the factors influencing the privacy audit, such as the impact of different encoding and perturbation functions. Additionally, we investigate the influence of the domain size and the theoretical privacy loss parameters $\epsilon$ and $\delta$ on local privacy estimation. In-depth case studies are also conducted to explore specific aspects of LDP auditing, including distinguishability attacks on LDP protocols for longitudinal studies and multidimensional data. Finally, we present a notable achievement of our LDP-Auditor framework, which is the discovery of a bug in a state-of-the-art LDP Python package. Overall, our LDP-Auditor framework as well as our study offer valuable insights into the sources of randomness and information loss in LDP protocols. These contributions collectively provide a realistic understanding of the local privacy loss, which can help practitioners in selecting the LDP mechanism and privacy parameters that best align with their specific requirements. We open-sourced LDP-Auditor in \url{https://github.com/hharcolezi/ldp-audit}.

Updated: 2024-07-12 15:49:48

标题: 揭示本地差分隐私协议的真实成本：审计视角

摘要: 尽管现有关于差分隐私(Differential Privacy, DP)审计的文献主要集中在集中模型上（例如，在审计DP-SGD算法中），我们主张将此方法扩展到审核本地差分隐私(Local DP, LDP)。为了实现这一目标，我们引入了LDP-Auditor框架，用于经验估计局部差分隐私机制的隐私损失。这种方法利用了设计针对LDP频率估计协议的隐私攻击的最新进展。更具体地说，通过对众多最新的LDP协议进行分析，我们广泛探讨了影响隐私审计的因素，如不同编码和扰动函数的影响。此外，我们调查了域大小和理论隐私损失参数$\epsilon$和$\delta$对本地隐私估计的影响。还进行了深入的案例研究，探讨了LDP审计的特定方面，包括对纵向研究和多维数据的LDP协议进行可区分性攻击。最后，我们展示了我们的LDP-Auditor框架的一个显著成就，即发现了一个最先进的LDP Python包中的漏洞。总的来说，我们的LDP-Auditor框架以及我们的研究为LDP协议中的随机性和信息损失来源提供了宝贵的见解。这些贡献共同提供了对本地隐私损失的现实理解，这有助于从业者选择最符合其特定需求的LDP机制和隐私参数。我们在\url{https://github.com/hharcolezi/ldp-audit}上开源了LDP-Auditor。

更新时间: 2024-07-12 15:49:48

领域: cs.CR

下载: http://arxiv.org/abs/2309.01597v3

MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented Kinematics

Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance.

Updated: 2024-07-12 15:48:09

标题: MS-TCRNet：使用传感器增强的运动学进行动作分割的多阶段时间卷积循环网络

摘要: 动作分割是高级过程分析中的一项具有挑战性的任务，通常在从各种传感器获得的视频或动力学数据上执行。本研究介绍了与动力学数据相关的两个动作分割方面的贡献。首先，我们介绍了两个版本的多阶段时间卷积循环网络（MS-TCRNet），专门设计用于动力学数据。这些架构包括具有阶内正则化和基于双向LSTM或GRU的优化阶段的预测生成器。其次，我们提出了两种新的数据增强技术，世界坐标系旋转和手部倒置，利用动力学数据的强几何结构来提高算法性能和鲁棒性。我们在三个外科缝合任务数据集上评估了我们的模型：变量组织模拟（VTS）数据集和我们收集的新引入的肠道修复模拟（BRS）数据集，以及JHU-ISI手势和技能评估工作集（JIGSAWS），这是机器人外科手术中的一个著名基准。我们的方法取得了最先进的性能。

更新时间: 2024-07-12 15:48:09

领域: cs.CV,cs.LG,cs.RO,eess.IV

下载: http://arxiv.org/abs/2303.07814v2

Instruction Tuning for Secure Code Generation

Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its simplicity, we show that SafeCoder is effective across a variety of popular LMs and datasets. It is able to drastically improve security (by about 30%), while preserving utility.

Updated: 2024-07-12 15:45:57

标题: 安全代码生成的指令调优

摘要: 现代语言模型（LMs）在日常和专业环境中获得了广泛的认可，特别是在编程领域。促使这种采用的一个关键步骤是指令调整，通过训练LMs遵循用户指令和人类偏好，显著增强了LMs的实用性。然而，现有的指令调整方案忽视了一个关键方面：生成代码的安全性。因此，即使是最先进的指令调整LMs也经常生成不安全的代码，带来重大安全风险。在这项工作中，我们介绍了SafeCoder来填补这一空白。SafeCoder使用我们使用自动化流水线收集的多样化和高质量数据集进行以安全为中心的微调。我们将安全微调与标准指令调整相结合，以便共同优化安全性和实用性。尽管它简单，我们展示了SafeCoder在各种热门LMs和数据集上的有效性。它能够显著提高安全性（约30%），同时保持实用性。

更新时间: 2024-07-12 15:45:57

领域: cs.CR,cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2402.09497v2

Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text

The significant progress in the development of Large Language Models has contributed to blurring the distinction between human and AI-generated text. The increasing pervasiveness of AI-generated text and the difficulty in detecting it poses new challenges for our society. In this paper, we tackle the problem of detecting and attributing AI-generated text by proposing WhosAI, a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI and to unveil the authorship of the text. Unlike most existing approaches, our proposed framework is conceived to learn semantic similarity representations from multiple generators at once, thus equally handling both detection and attribution tasks. Furthermore, WhosAI is model-agnostic and scalable to the release of new AI text-generation models by incorporating their generated instances into the embedding space learned by our framework. Experimental results on the TuringBench benchmark of 200K news articles show that our proposed framework achieves outstanding results in both the Turing Test and Authorship Attribution tasks, outperforming all the methods listed in the TuringBench benchmark leaderboards.

Updated: 2024-07-12 15:44:56

标题: 你需要对比吗？对比学习在检测和归因AI生成文本中的应用

摘要: 大型语言模型的发展取得了显著进展，导致人类生成的文本和人工智能生成的文本之间的界限变得模糊。人工智能生成的文本日益普及，检测的困难给我们的社会带来了新的挑战。在本文中，我们通过提出WhosAI，一个三元网络对比学习框架，来解决检测和归因人工智能生成文本的问题，旨在预测给定输入文本是由人类还是人工智能生成，并揭示文本的作者。与大多数现有方法不同，我们提出的框架被设计为同时从多个生成器中学习语义相似性表示，因此能够平等处理检测和归因任务。此外，WhosAI是与模型无关的，并且能够通过将它们生成的实例结合到我们的框架学习的嵌入空间中，从而扩展到新的人工智能文本生成模型的发布。对于200K新闻文章的TuringBench基准测试的实验结果显示，我们提出的框架在图灵测试和作者归因任务中取得了出色的结果，超过了TuringBench基准测试排行榜中列出的所有方法。

更新时间: 2024-07-12 15:44:56

领域: cs.CL,cs.AI,cs.CY,cs.HC,physics.soc-ph

下载: http://arxiv.org/abs/2407.09364v1

Chasing Convex Functions with Long-term Constraints

We introduce and study a family of online metric problems with long-term constraints. In these problems, an online player makes decisions $\mathbf{x}_t$ in a metric space $(X,d)$ to simultaneously minimize their hitting cost $f_t(\mathbf{x}_t)$ and switching cost as determined by the metric. Over the time horizon $T$, the player must satisfy a long-term demand constraint $\sum_{t} c(\mathbf{x}_t) \geq 1$, where $c(\mathbf{x}_t)$ denotes the fraction of demand satisfied at time $t$. Such problems can find a wide array of applications to online resource allocation in sustainable energy/computing systems. We devise optimal competitive and learning-augmented algorithms for the case of bounded hitting cost gradients and weighted $\ell_1$ metrics, and further show that our proposed algorithms perform well in numerical experiments.

Updated: 2024-07-12 15:44:38

标题: 追逐具有长期约束的凸函数

摘要: 我们介绍并研究了一类具有长期约束的在线度量问题。在这些问题中，一个在线玩家在度量空间$(X,d)$中做出决策$\mathbf{x}_t$，同时最小化他们的击中成本$f_t(\mathbf{x}_t)$和由度量确定的切换成本。在时间范围$T$内，玩家必须满足长期需求约束$\sum_{t} c(\mathbf{x}_t) \geq 1$，其中$c(\mathbf{x}_t)$表示时间$t$满足的需求比例。这类问题可以在可持续能源/计算系统的在线资源分配中找到广泛的应用。我们为具有有界击中成本梯度和加权$\ell_1$度量的情况设计了最优竞争和学习增强算法，并进一步证明我们提出的算法在数值实验中表现良好。

更新时间: 2024-07-12 15:44:38

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2402.14012v2

A Neural Rewriting System to Solve Algorithmic Problems

Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we focus on formula simplification problems, a class of synthetic benchmarks used to study the systematic generalization capabilities of neural architectures. We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas by only relying on a minimal set of training examples. Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules: the Selector, trained to identify solvable sub-expressions; the Solver, mapping sub-expressions to their values; and the Combiner, replacing sub-expressions in the original formula with the solution provided by the Solver. We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies. We demonstrate that our approach achieves a higher degree of out-of-distribution generalization compared to these alternative approaches on three different types of formula simplification problems, and we discuss its limitations by analyzing its failures.

Updated: 2024-07-12 15:42:45

标题: 一个神经重写系统来解决算法问题

摘要: 现代神经网络架构仍然难以学习需要系统地应用组合规则来解决超出分布问题实例的算法程序。在这项工作中，我们专注于公式简化问题，这是一类用于研究神经结构的系统泛化能力的合成基准。我们提出了一个模块化架构，旨在学习一种通用过程，仅依赖最少量的训练示例即可解决嵌套数学公式。受符号人工智能中的重写系统的启发，我们在架构中包括三个专门化且相互作用的模块：选择器，训练以识别可解决的子表达式；求解器，将子表达式映射到它们的值；组合器，用求解器提供的解决方案替换原始公式中的子表达式。我们将我们的系统与最近专门用于系统化泛化的神经数据路由器模型以及最先进的大型语言模型（GPT-4）进行基准测试，并使用先进的提示策略进行探测。我们表明，与这些替代方法相比，我们的方法在三种不同类型的公式简化问题上实现了更高程度的超出分布泛化，并通过分析其失败来讨论其局限性。

更新时间: 2024-07-12 15:42:45

领域: cs.NE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.17407v2

Novel clustered federated learning based on local loss

This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing more accurate classification results. LCFL does not require prior knowledge of clients' data distributions. We provide a rigorous mathematical analysis, demonstrating the correctness and feasibility of our framework. Numerical experiments with neural network instances highlight the superior performance of LCFL over baselines on several clustered federated learning benchmarks.

Updated: 2024-07-12 15:37:05

标题: 基于本地损失的新型集群化联邦学习

摘要: 本文提出了LCFL，一种用于评估联邦学习中客户数据分布的新型聚类度量。LCFL符合联邦学习的要求，准确评估了客户之间数据分布的变化。它相对于现有的聚类联邦学习方法具有优势，解决了隐私问题，改善了对非凸模型的适用性，并提供了更准确的分类结果。LCFL不需要事先了解客户的数据分布。我们提供了严格的数学分析，证明了我们框架的正确性和可行性。对神经网络实例的数值实验突显了LCFL相对于基线在几个聚类联邦学习基准上的优越性能。

更新时间: 2024-07-12 15:37:05

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.09360v1

What Makes a Good Explanation?: A Harmonized View of Properties of Explanations

Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.

Updated: 2024-07-12 15:34:29

标题: 什么构成了一个好的解释？：对解释性质的和谐观点

摘要: 可解释性提供了一种方式，使人类能够验证机器学习（ML）模型的各个方面，并在任务无法完全自动化的情况下增强人类与机器学习团队的合作。不同的背景需要具有不同特性的解释。例如，确定早期心脏骤停警告系统是否准备好集成到护理环境中所需的解释，与帮助贷款申请人确定他们可能需要采取的行动以使其申请成功所需的解释类型非常不同。不幸的是，在解释属性方面存在缺乏标准化：不同的论文可能使用相同术语表示不同数量，使用不同术语表示相同数量。这种缺乏标准化的术语和ML解释属性分类阻碍了我们既严格比较可解释的机器学习方法，又确定在不同背景下需要哪些属性。在这项工作中，我们调查了可解释机器学习论文中定义的属性，根据它们实际测量的内容进行综合，并描述了这些属性不同表述之间的权衡。通过这样做，我们能够更明智地选择任务适用的解释属性表述，并为可解释机器学习领域的未来工作提供标准化。

更新时间: 2024-07-12 15:34:29

领域: cs.LG

下载: http://arxiv.org/abs/2211.05667v3

Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees

Generating novel molecules is challenging, with most representations leading to generative models producing many invalid molecules. Spanning Tree-based Graph Generation (STGG) is a promising approach to ensure the generation of valid molecules, outperforming state-of-the-art SMILES and graph diffusion models for unconditional generation. In the real world, we want to be able to generate molecules conditional on one or multiple desired properties rather than unconditionally. Thus, in this work, we extend STGG to multi-property-conditional generation. Our approach, STGG+, incorporates a modern Transformer architecture, random masking of properties during training (enabling conditioning on any subset of properties and classifier-free guidance), an auxiliary property-prediction loss (allowing the model to self-criticize molecules and select the best ones), and other improvements. We show that STGG+ achieves state-of-the-art performance on in-distribution and out-of-distribution conditional generation, and reward maximization.

Updated: 2024-07-12 15:32:44

标题: 使用生成树进行自我批评的任意属性条件分子生成

摘要: 生成新颖的分子是具有挑战性的，大多数表示方法会导致生成模型产生许多无效的分子。基于生成树的图生成（STGG）是一种有前途的方法，可以确保生成有效的分子，在无条件生成方面超越了最先进的SMILES和图扩散模型。在现实世界中，我们希望能够根据一个或多个期望的属性条件生成分子，而不是无条件生成。因此，在这项工作中，我们将STGG扩展到多属性条件生成。我们的方法STGG+结合了现代Transformer架构、训练过程中属性的随机屏蔽（使模型能够根据任何属性子集进行条件生成，并且无需分类器的指导）、辅助属性预测损失（使模型能够自我批评分子并选择最佳的分子）以及其他改进。我们展示了STGG+在分布内和分布外条件生成以及奖励最大化方面达到了最先进的性能。

更新时间: 2024-07-12 15:32:44

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2407.09357v1

AI-Enhanced Intensive Care Unit: Revolutionizing Patient Care with Pervasive Sensing

The intensive care unit (ICU) is a specialized hospital space where critically ill patients receive intensive care and monitoring. Comprehensive monitoring is imperative in assessing patients conditions, in particular acuity, and ultimately the quality of care. However, the extent of patient monitoring in the ICU is limited due to time constraints and the workload on healthcare providers. Currently, visual assessments for acuity, including fine details such as facial expressions, posture, and mobility, are sporadically captured, or not captured at all. These manual observations are subjective to the individual, prone to documentation errors, and overburden care providers with the additional workload. Artificial Intelligence (AI) enabled systems has the potential to augment the patient visual monitoring and assessment due to their exceptional learning capabilities. Such systems require robust annotated data to train. To this end, we have developed pervasive sensing and data processing system which collects data from multiple modalities depth images, color RGB images, accelerometry, electromyography, sound pressure, and light levels in ICU for developing intelligent monitoring systems for continuous and granular acuity, delirium risk, pain, and mobility assessment. This paper presents the Intelligent Intensive Care Unit (I2CU) system architecture we developed for real-time patient monitoring and visual assessment.

Updated: 2024-07-12 15:30:28

标题: AI增强的重症监护室：通过普遍感知技术改革患者护理

摘要: 重症监护病房（ICU）是一个专门的医院空间，临床危重病人接受密切监护。全面监测对评估患者的状况至关重要，特别是病情严重程度，最终影响护理质量。然而，由于时间限制和医护人员的工作量，ICU中对患者监测的程度受到限制。目前，用于病情严重程度的视觉评估，包括面部表情、姿势和活动等细节，往往被零星地捕捉，或者根本没有被捕捉。这些手动观察是主观的，容易出现文档错误，并给护理人员增加了额外的工作量。人工智能（AI）系统有潜力增强患者的视觉监测和评估，因为它们具有出色的学习能力。这些系统需要强大的注释数据进行训练。为此，我们开发了一种全面感知和数据处理系统，从ICU中收集多种模态的数据，包括深度图像、彩色RGB图像、加速度计、肌电图、声压和光强度，用于开发智能监测系统，实现对病情的连续和细粒度评估，包括病情严重程度、谵妄风险、疼痛和活动能力。本文介绍了我们为实时患者监测和视觉评估开发的智能重症监护病房（I2CU）系统架构。

更新时间: 2024-07-12 15:30:28

领域: cs.AI

下载: http://arxiv.org/abs/2303.06252v2

FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313

Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region. This approach enhances patient privacy by performing imputation directly on edge devices. As a case study, we focus on PRS313, a polygenic risk score comprising 313 SNPs used for breast cancer risk prediction. Utilizing consumer genetic panels such as 23andMe, our model democratizes access to personalized genetic insights by allowing 23andMe users to obtain their PRS313 score. We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe. Our linear regression model achieved an R^2 of 0.86, compared to 0.33 without imputation and 0.28 with simple imputation (substituting missing SNPs with the minor allele frequency). These findings suggest that popular SNP analysis libraries could benefit from integrating linear regression models for genotype imputation, providing a viable and light-weight alternative to reference based imputation.

Updated: 2024-07-12 15:28:13

标题: FastImpute: 一种基线用于开源、无参考基因型插补方法-- 以PRS313为例研究

摘要: 基因型插补通过利用参考单倍型信息预测缺失的SNP，增强了遗传数据。传统方法利用连锁不平衡（LD）来推断未标记的SNP基因型，依赖于已基因型化目标集和完全测序的参考面板之间LD结构的相似性。最近出现了基于无参考深度学习的方法，提供了一种有前途的选择，通过预测缺失的基因型而无需外部数据库，从而增强了隐私性和可访问性。然而，这些方法通常会产生数千万个参数的模型，导致了需要大量的计算资源来训练和在客户端部署时效率低下的挑战。我们的研究通过引入一种新颖的基因型插补流水线基线来解决这些限制，该流水线支持通用于任何基因芯片和基因组区域的客户端插补模型。这种方法通过在边缘设备上直接执行插补来增强患者隐私。作为一个案例研究，我们专注于PRS313，这是一个包含313个SNP用于乳腺癌风险预测的多基因风险评分。利用23andMe等消费者基因面板，我们的模型通过允许23andMe用户获得他们的PRS313分数，使个性化基因洞察力的访问民主化。我们展示了简单的线性回归可以显著提高使用从消费者基因面板如23andMe插补的SNP计算的PRS313分数的准确性。我们的线性回归模型达到了0.86的R^2，而没有插补时为0.33，简单插补（用次要等位基因频率替代缺失的SNP）时为0.28。这些发现表明，流行的SNP分析库可以受益于整合线性回归模型进行基因型插补，提供了一种可行且轻量级的替代方案，而不必依赖参考基因型插补。

更新时间: 2024-07-12 15:28:13

领域: q-bio.GN,cs.AI

下载: http://arxiv.org/abs/2407.09355v1

Private Blockchain-based Procurement and Asset Management System with QR Code

The developed system aims to incorporate a private blockchain technology in the procurement process for the supply office. The procurement process includes the canvassing, purchasing, delivery and inspection of items, inventory, and disposal. The blockchain-based system includes a distributed ledger technology, peer-to-peer network, Proof-of-Authority consensus mechanism, and SHA3-512 cryptographic hash function algorithm. This will ensure trust and proper accountability to the custodian of the property while safeguarding sensitive information in the procurement records. The extreme prototyping model will be used as software development life cycle. It is mostly used for web-based applications and has an increased user involvement. The prototype version of the system allows the users get a better understanding of the system being developed. It also reduces the time and cost, has quicker user feedback, missing and difficult functions can be recognized, and confusing processes can be addressed on an early stage. The implementation of a private blockchain technology has an increased privacy, enhanced security, improved efficiency, and reduced complexity over traditional blockchain network. The use of SHA3-512 as cryptographic hash function algorithm is much faster than its predecessors when cryptography is handled by hardware components. Furthermore, it is not vulnerable to length extension attacks making it reliable in terms of security of data. The study recommends the use of private blockchain-based technology with the procurement and asset management system in the supply office. The procurement records will be protected against tampering using this technology. This will promote trust and confidence of the stakeholders. The implementation of blockchain technology in developing a system served as advancement and innovation in terms of securing data.

Updated: 2024-07-12 15:27:36

标题: 基于私有区块链的采购和资产管理系统，带有QR码

摘要: 该系统旨在将私有区块链技术纳入供应办公室的采购流程中。采购流程包括货物的征集、购买、交付和检验、库存和处置。基于区块链的系统包括分布式账本技术、点对点网络、权威证明共识机制和SHA3-512密码散列函数算法。这将确保对财产保管人的信任和适当问责，同时保护采购记录中的敏感信息。极限原型模型将被用作软件开发生命周期。它主要用于基于Web的应用程序，并具有更高的用户参与度。系统的原型版本允许用户更好地了解正在开发的系统。它还减少了时间和成本，具有更快的用户反馈，可以识别缺失和难以实现的功能，并可以在早期阶段解决混乱的流程。私有区块链技术的实施具有增强的隐私性、加强的安全性、提高的效率和降低传统区块链网络的复杂性。在硬件组件处理密码学时，SHA3-512作为密码散列函数算法比其前任快得多。此外，它不容易受到长度扩展攻击，使其在数据安全方面更加可靠。该研究建议在供应办公室的采购和资产管理系统中使用基于私有区块链的技术。通过这项技术，采购记录将受到篡改的保护。这将促进利益相关者的信任和信心。在开发系统中实施区块链技术作为保护数据方面的进步和创新。

更新时间: 2024-07-12 15:27:36

领域: cs.CR

下载: http://arxiv.org/abs/2407.09353v1

Efficient Bayesian Updates for Deep Learning via Laplace Approximations

Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.

Updated: 2024-07-12 15:23:28

标题: 通过拉普拉斯近似进行深度学习的高效贝叶斯更新

摘要: 由于训练深度神经网络需要大量的计算资源，因此通过新数据扩展训练数据集是困难的，因为通常需要完全重新训练。此外，特定应用程序不允许由于时间或计算约束而进行昂贵的重新训练。我们通过提出一种新颖的贝叶斯更新方法来解决这个问题，该方法利用最后一层的拉普拉斯近似。具体来说，我们利用高斯后验分布的二阶优化技术，在封闭形式中计算逆海森矩阵的拉普拉斯近似。这种方式使得我们的方法能够在静态设置中快速有效地更新新数据。跨不同数据模态的大规模评估研究证实，我们的更新是昂贵重新训练的快速和有竞争力的替代方法。此外，我们通过使用我们的更新来改进现有的选择策略，展示了它在深度主动学习场景中的适用性。

更新时间: 2024-07-12 15:23:28

领域: cs.LG

下载: http://arxiv.org/abs/2210.06112v2

Predictable and Performant Reactive Synthesis Modulo Theories via Functional Synthesis

Reactive synthesis is the process of generating correct controllers from temporal logic specifications. Classical LTL reactive synthesis handles (propositional) LTL as a specification language. Boolean abstractions allow reducing LTLt specifications (i.e., LTL with propositions replaced by literals from a theory calT), into equi-realizable LTL specifications. In this paper we extend these results into a full static synthesis procedure. The synthesized system receives from the environment valuations of variables from a rich theory calT and outputs valuations of system variables from calT. We use the abstraction method to synthesize a reactive Boolean controller from the LTL specification, and we combine it with functional synthesis to obtain a static controller for the original LTLt specification. We also show that our method allows responses in the sense that the controller can optimize its outputs in order to e.g., always provide the smallest safe values. This is the first full static synthesis method for LTLt, which is a deterministic program (hence predictable and efficient).

Updated: 2024-07-12 15:23:27

标题: 可预测且高性能的反应式合成理论模块化功能合成

摘要: 反应合成是从时间逻辑规范生成正确控制器的过程。经典的LTL反应合成处理（命题）LTL作为规范语言。布尔抽象允许将LTLt规范（即，将命题替换为理论calT中的文字的LTL）简化为等效的LTL规范。在本文中，我们将这些结果扩展为完整的静态合成过程。合成系统从富理论calT的环境中接收变量的估值，并输出calT中系统变量的估值。我们使用抽象方法从LTL规范中合成一个反应布尔控制器，并将其与功能合成相结合，以获得原始LTLt规范的静态控制器。我们还展示了我们的方法允许响应，即控制器可以优化其输出，以便始终提供最小安全值。这是LTLt的第一个完整静态合成方法，它是一个确定性程序（因此可预测和高效）。

更新时间: 2024-07-12 15:23:27

领域: cs.LO,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.09348v1

Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments

We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in bi-directional MR.

Updated: 2024-07-12 15:15:58

标题: 双向MR的识别和估计以及一些无效工具

摘要: 我们考虑在双向Mendelian随机化（MR）中从纯观察数据中估计因果效应的挑战性问题，通常存在一些无效工具以及未被测量的混杂因素。为了解决这个问题，大多数现有方法尝试通过专家知识或假设因果模型是单向MR模型来找到适当的有效工具变量（IVs）以用于目标因果效应。因此，在本文中，我们首先从理论上研究了从观察数据中识别双向MR的问题。具体来说，我们提供了确保正确识别有效IV集的必要和充分条件，从而使双向MR模型可识别，包括一对表型的因果方向（即，处理和结果）。此外，基于识别理论，我们开发了一种类似于聚类融合的方法来发现有效的IV集并估计感兴趣的因果效应。我们在理论上证明了所提出算法的正确性。实验结果显示了我们的方法在估计双向MR中的因果效应方面的有效性。

更新时间: 2024-07-12 15:15:58

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.07933v2

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.

Updated: 2024-07-12 15:15:03

标题: 结构化生成：使用分层聚类指导扩散模型

摘要: 本文介绍了Diffuse-TreeVAE，这是一种深度生成模型，将分层聚类集成到去噪扩散概率模型(DDPMs)的框架中。所提出的方法通过从学习的潜在树VAE结构的根嵌入中进行采样来生成新图像，然后沿着分层路径传播，并利用第二阶段的DDPM来优化和生成每个数据簇的独特、高质量的图像。结果是一个不仅提高图像清晰度，而且确保生成的样本代表其各自的簇，解决了先前基于VAE的方法的局限性，并推进了基于聚类的生成建模的状态。

更新时间: 2024-07-12 15:15:03

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.06124v2

CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases

Debugging is one of the most time-consuming and expensive tasks in software development. Several formula-based fault localization (FBFL) methods have been proposed, but they fail to guarantee a set of diagnoses across all failing tests or may produce redundant diagnoses that are not subset-minimal, particularly for programs with multiple faults. This paper introduces a novel fault localization approach for C programs with multiple faults. CFaults leverages Model-Based Diagnosis (MBD) with multiple observations and aggregates all failing test cases into a unified MaxSAT formula. Consequently, our method guarantees consistency across observations and simplifies the fault localization procedure. Experimental results on two benchmark sets of C programs, TCAS and C-Pack-IPAs, show that CFaults is faster than other FBFL approaches like BugAssist and SNIPER. Moreover, CFaults only generates subset-minimal diagnoses of faulty statements, whereas the other approaches tend to enumerate redundant diagnoses.

Updated: 2024-07-12 15:14:49

标题: CFaults: 使用基于模型的诊断技术在包含多个测试用例的C程序中进行故障定位

摘要: 调试是软件开发中最耗时和昂贵的任务之一。已经提出了几种基于公式的故障定位（FBFL）方法，但它们无法保证在所有失败测试中生成一组诊断，或者可能产生不是子集最小的冗余诊断，特别是对于具有多个故障的程序。本文介绍了一种针对具有多个故障的C程序的新颖故障定位方法。CFaults利用基于模型的诊断（MBD）与多个观测，并将所有失败的测试用例聚合到一个统一的MaxSAT公式中。因此，我们的方法保证了观测之间的一致性，并简化了故障定位过程。对两个C程序基准集TCAS和C-Pack-IPAs的实验结果表明，CFaults比BugAssist和SNIPER等其他FBFL方法更快。此外，CFaults只生成有故障语句的子集最小诊断，而其他方法往往会枚举冗余诊断。

更新时间: 2024-07-12 15:14:49

领域: cs.SE,cs.AI,cs.LO

下载: http://arxiv.org/abs/2407.09337v1

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others' goals both across and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in mixed-motive environments, HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.

Updated: 2024-07-12 15:13:43

标题: 在混合动机环境中通过分层对手建模和规划实现高效适应

摘要: 尽管多智能体强化学习（MARL）算法取得了最近的成功，但在混合动机环境中有效地适应协作对手仍然是一个重大挑战。一种可行的方法是基于推断其特征对协作对手的行为进行层次建模。然而，这些方法通常在有效推理和利用推断信息方面遇到困难。为了解决这些问题，我们提出了一种新颖的多智能体决策算法——Hierarchical Opponent modeling and Planning（HOP），它可以在混合动机环境中实现对未见策略的少次适应。HOP由两个模块层次构成：一个对手建模模块，推断他人的目标并学习相应的目标条件策略；一个规划模块，采用蒙特卡洛树搜索（MCTS）来确定最佳响应。我们的方法通过跨和在剧集内更新关于他人目标的信念，并利用对手建模模块的信息来指导规划，提高了效率。实验结果表明，在混合动机环境中，HOP在与各种未见代理进行互动时表现出卓越的少次适应能力，并在自我对弈场景中表现出色。此外，在我们的实验中出现的社会智能强调了我们的方法在复杂多智能体环境中的潜力。

更新时间: 2024-07-12 15:13:43

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.08002v2

Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation.

Updated: 2024-07-12 15:13:16

标题: 时间序列分类对比学习中增强选择的指导方针

摘要: 自我监督对比学习已成为深度学习中的关键技术，特别是在时间序列分析中，因其能够在没有明确监督的情况下学习有意义的表示。数据增强是对比学习中的关键组成部分，不同的增强可以极大地影响性能，有时会使准确率提高超过30%。然而，增强的选择主要是经验性的，这可能并不是最优的，或者是耗时的网格搜索。在本文中，我们建立了一个基于数据集特征（如趋势和季节性）的增强选择原则性框架。具体地，我们构建了包含趋势、季节性和整合权重的12个合成数据集。然后，我们评估了8种不同增强在这些合成数据集上的有效性，从而在时间序列特征和增强效率之间引发可推广的关联。此外，我们还评估了在包括活动识别、疾病诊断、交通监测、用电量、机械故障预测和金融等领域的6个现实世界数据集上引发的关联。这些现实世界数据集多样化，涵盖了从1到12个通道、2到10个类别、序列长度从14到1280以及数据频率从250 Hz到每日间隔的范围。实验结果表明，我们提出的基于趋势季节性的增强推荐算法可以准确地识别给定时间序列数据集的有效增强，平均Recall@3为0.667，优于基线。我们的工作为在时间序列分析中使用对比学习的研究提供了指导，具有广泛的应用。所有的代码、数据集和分析结果将在https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation 上发布。

更新时间: 2024-07-12 15:13:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09336v1

HETOCompiler: An MLIR-based crypTOgraphic Compilation Framework for HEterogeneous Devices

Hash algorithms are fundamental tools in cryptography, offering irreversible and sensitive transformations of input data for various security purposes. As computing architectures evolve towards heterogeneous systems, efficiently harnessing diverse computing resources for hash encryption algorithms becomes crucial. This paper presents HETOCompiler, a novel cryptography compilation framework designed for heterogeneous systems. Leveraging Multi-Level Intermediate Representation (MLIR), HETOCompiler abstracts syntax and semantics for cryptographic primitives and heterogeneous computing models, facilitating efficient compilation of high-level hash encryption algorithms into executable programs compatible with diverse devices. Experimental results demonstrate significant performance improvements over existing OpenSSL library, with average enhancements of 49.3x, 1.5x, and 23.4x for SHA-1, MD5, and SM3 algorithms respectively.

Updated: 2024-07-12 15:12:51

标题: HETOCompiler：基于MLIR的面向异构设备的密码编译框架

摘要: 哈希算法是密码学中的基本工具，为各种安全目的提供不可逆转和敏感的输入数据转换。随着计算架构向异构系统演进，高效地利用不同的计算资源进行哈希加密算法变得至关重要。本文介绍了HETOCompiler，这是一个专为异构系统设计的新型密码编译框架。利用多级中间表示（MLIR），HETOCompiler抽象出密码原语和异构计算模型的语法和语义，便于将高级哈希加密算法有效地编译成与各种设备兼容的可执行程序。实验结果表明，与现有的OpenSSL库相比，平均性能提升分别为49.3倍、1.5倍和23.4倍，针对SHA-1、MD5和SM3算法。

更新时间: 2024-07-12 15:12:51

领域: cs.CR

下载: http://arxiv.org/abs/2407.09333v1

Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations

High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches for training these DNN surrogates rely on extensive simulation data which are heuristically selected and generated with expensive computation -- a challenge under-explored in the literature. In this paper, we investigate the potential of incorporating active learning into DNN surrogate training. This allows intelligent and objective selection of training simulations, reducing the need to generate extensive simulation data as well as the dependency of the performance of DNN surrogates on pre-defined training simulations. In the problem context of constructing DNN surrogates for diffusion equations with sources, we examine the efficacy of diversity- and uncertainty-based strategies for selecting training simulations, considering two different DNN architecture. The results set the groundwork for developing the high-performance computing infrastructure for Smart Surrogates that supports on-the-fly generation of simulation data steered by active learning strategies to potentially improve the efficiency of scientific simulations.

Updated: 2024-07-12 15:10:53

标题: 智能替代品科学模拟中主动学习的可行性研究

摘要: 高性能科学模拟对于理解复杂系统至关重要，特别是在探索广泛参数空间时遇到计算挑战。近年来，人们越来越关注开发深度神经网络（DNNs）作为能加速模拟的代理模型。然而，现有的训练这些DNN代理的方法依赖于大量启发式选择和昂贵计算生成的模拟数据，这在文献中尚未得到充分探讨。在本文中，我们调查了将主动学习引入DNN代理训练的潜力。这使得可以智能和客观地选择训练模拟，减少生成大量模拟数据的需求，以及减少DNN代理性能对预定义训练模拟的依赖性。在为具有源的扩散方程构建DNN代理的问题背景下，我们研究了选择训练模拟的基于多样性和不确定性的策略的有效性，考虑了两种不同的DNN架构。结果为开发支持通过主动学习策略引导的现场生成模拟数据以潜在提高科学模拟效率的智能代理的高性能计算基础设施奠定了基础。

更新时间: 2024-07-12 15:10:53

领域: cs.LG

下载: http://arxiv.org/abs/2407.07674v2

State space representations of the Roesser type for convolutional layers

From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.

Updated: 2024-07-12 15:08:15

标题: Roesser类型的状态空间表示用于卷积层

摘要: 从控制理论的角度来看，卷积层（神经网络）是2-D（或N-D）线性时不变动态系统。通常通过卷积核对卷积层进行表示对应于通过脉冲响应对动态系统进行表示。然而，许多控制理论中的分析工具，例如涉及线性矩阵不等式，需要状态空间表示。因此，我们明确为具有$c_\mathrm{in}r_1 + c_\mathrm{out}r_2$个状态的2-D卷积层提供了Roesser类型的状态空间表示，其中$c_\mathrm{in}$/$c_\mathrm{out}$是层的输入/输出通道数，$r_1$/$r_2$表示卷积核的宽度/长度。对于$c_\mathrm{in} = c_\mathrm{out}$，该表示被证明是最小的。我们进一步构建了膨胀、跨距和N-D卷积的状态空间表示。

更新时间: 2024-07-12 15:08:15

领域: eess.SY,cs.LG,cs.SY,eess.IV,eess.SP

下载: http://arxiv.org/abs/2403.11938v2

Detecting Visual Cues in the Intensive Care Unit and Association with Patient Clinical Status

Intensive Care Units (ICU) provide close supervision and continuous care to patients with life-threatening conditions. However, continuous patient assessment in the ICU is still limited due to time constraints and the workload on healthcare providers. Existing patient assessments in the ICU such as pain or mobility assessment are mostly sporadic and administered manually, thus introducing the potential for human errors. Developing Artificial intelligence (AI) tools that can augment human assessments in the ICU can be beneficial for providing more objective and granular monitoring capabilities. For example, capturing the variations in a patient's facial cues related to pain or agitation can help in adjusting pain-related medications or detecting agitation-inducing conditions such as delirium. Additionally, subtle changes in visual cues during or prior to adverse clinical events could potentially aid in continuous patient monitoring when combined with high-resolution physiological signals and Electronic Health Record (EHR) data. In this paper, we examined the association between visual cues and patient condition including acuity status, acute brain dysfunction, and pain. We leveraged our AU-ICU dataset with 107,064 frames collected in the ICU annotated with facial action units (AUs) labels by trained annotators. We developed a new "masked loss computation" technique that addresses the data imbalance problem by maximizing data resource utilization. We trained the model using our AU-ICU dataset in conjunction with three external datasets to detect 18 AUs. The SWIN Transformer model achieved 0.57 mean F1-score and 0.89 mean accuracy on the test set. Additionally, we performed AU inference on 634,054 frames to evaluate the association between facial AUs and clinically important patient conditions such as acuity status, acute brain dysfunction, and pain.

Updated: 2024-07-12 15:05:24

标题: 在重症监护室中检测视觉线索及其与患者临床状况的关联

摘要: 重症监护病房（ICU）为患有危及生命状况的患者提供密切监督和持续护理。然而，由于时间限制和医护人员的工作量，ICU中的连续患者评估仍然受到限制。ICU中现有的患者评估，如疼痛或活动能力评估，大多是零星的，手动进行，从而引入了人为错误的可能性。开发能够增强ICU中人类评估的人工智能（AI）工具对于提供更客观和细致的监测能力是有益的。例如，捕捉与疼痛或激动有关的患者面部表情的变化可以帮助调整与疼痛相关的药物或检测诸如谵妄等引起激动的病况。此外，在不良临床事件期间或之前，视觉线索的微妙变化结合高分辨率生理信号和电子健康记录（EHR）数据可能有助于连续患者监测。在本文中，我们研究了视觉线索与患者状况（包括严重程度状态、急性脑功能障碍和疼痛）之间的关联。我们利用我们的AU-ICU数据集，其中包含在ICU中由训练有素的标注者标注的面部动作单位（AUs）标签的107,064帧，开发了一种新的“掩模丢失计算”技术，通过最大化数据资源利用来解决数据不平衡问题。我们使用我们的AU-ICU数据集与三个外部数据集结合训练模型，以检测18个AU。SWIN Transformer模型在测试集上实现了0.57的平均F1分数和0.89的平均准确率。此外，我们对634,054帧进行AU推断，以评估面部AU与重要的患者状况（如严重程度状态、急性脑功能障碍和疼痛）之间的关联。

更新时间: 2024-07-12 15:05:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.00565v2

Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda

The proliferation of bias and propaganda on social media is an increasingly significant concern, leading to the development of techniques for automatic detection. This article presents a multilingual corpus of 12, 000 Facebook posts fully annotated for bias and propaganda. The corpus was created as part of the FigNews 2024 Shared Task on News Media Narratives for framing the Israeli War on Gaza. It covers various events during the War from October 7, 2023 to January 31, 2024. The corpus comprises 12, 000 posts in five languages (Arabic, Hebrew, English, French, and Hindi), with 2, 400 posts for each language. The annotation process involved 10 graduate students specializing in Law. The Inter-Annotator Agreement (IAA) was used to evaluate the annotations of the corpus, with an average IAA of 80.8% for bias and 70.15% for propaganda annotations. Our team was ranked among the bestperforming teams in both Bias and Propaganda subtasks. The corpus is open-source and available at https://sina.birzeit.edu/fada

Updated: 2024-07-12 15:04:09

标题: 新浪在FigNews 2024年：带有偏见和宣传注释的多语言数据集

摘要: 社交媒体上偏见和宣传的泛滥日益成为一个重要的关注点，导致自动检测技术的发展。本文介绍了一个包含12,000条Facebook帖子的多语言语料库，完全注释了其中的偏见和宣传。该语料库是作为FigNews 2024关于以色列对加沙战争框架的新闻媒体叙事共享任务的一部分而创建的。它涵盖了从2023年10月7日到2024年1月31日的战争期间的各种事件。该语料库包括五种语言（阿拉伯语、希伯来语、英语、法语和印地语）的12,000条帖子，每种语言有2,400条帖子。注释过程涉及10名专攻法律的研究生。使用互评者一致性（IAA）评估了语料库的注释，偏见注释的平均IAA为80.8%，宣传注释为70.15%。我们的团队在偏见和宣传子任务中名列表现最佳的团队之一。该语料库是开源的，可在https://sina.birzeit.edu/fada 上获得。

更新时间: 2024-07-12 15:04:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.09327v1

Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization

Federated learning (FL) emerged as a paradigm designed to improve data privacy by enabling data to reside at its source, thus embedding privacy as a core consideration in FL architectures, whether centralized or decentralized. Contrasting with recent findings by Pasquini et al., which suggest that decentralized FL does not empirically offer any additional privacy or security benefits over centralized models, our study provides compelling evidence to the contrary. We demonstrate that decentralized FL, when deploying distributed optimization, provides enhanced privacy protection - both theoretically and empirically - compared to centralized approaches. The challenge of quantifying privacy loss through iterative processes has traditionally constrained the theoretical exploration of FL protocols. We overcome this by conducting a pioneering in-depth information-theoretical privacy analysis for both frameworks. Our analysis, considering both eavesdropping and passive adversary models, successfully establishes bounds on privacy leakage. We show information theoretically that the privacy loss in decentralized FL is upper bounded by the loss in centralized FL. Compared to the centralized case where local gradients of individual participants are directly revealed, a key distinction of optimization-based decentralized FL is that the relevant information includes differences of local gradients over successive iterations and the aggregated sum of different nodes' gradients over the network. This information complicates the adversary's attempt to infer private data. To bridge our theoretical insights with practical applications, we present detailed case studies involving logistic regression and deep neural networks. These examples demonstrate that while privacy leakage remains comparable in simpler models, complex models like deep neural networks exhibit lower privacy risks under decentralized FL.

Updated: 2024-07-12 15:01:09

标题: 分布式优化通过去中心化联邦学习的可证明隐私优势

摘要: 联邦学习（FL）作为一种旨在通过使数据驻留在其源头来提高数据隐私性的范式而出现，从而将隐私作为FL架构的核心考虑因素，无论是集中式还是分散式。与Pasquini等人最近的研究结果相反，他们认为分散式FL在实证上并未提供任何额外的隐私或安全优势，我们的研究提供了令人信服的相反证据。我们证明，当部署分布式优化时，分散式FL在理论和实证上提供了比集中式方法更好的隐私保护。通过迭代过程量化隐私损失的挑战传统上限制了FL协议的理论探索。我们通过为两种框架进行深入的信息论隐私分析来克服这一挑战。考虑窃听和被动对手模型，我们成功地建立了隐私泄露的界限。我们从信息论上表明，分散式FL的隐私损失上限受限于集中式FL的损失。与集中式情况相比，个体参与者的局部梯度直接暴露，基于优化的分散式FL的一个关键区别是相关信息包括连续迭代中局部梯度的差异和网络上不同节点的梯度之和。这些信息使对手难以推断私人数据。为了将我们的理论见解与实际应用联系起来，我们提供了涉及逻辑回归和深度神经网络的详细案例研究。这些示例表明，虽然在更简单的模型中隐私泄露保持可比性，但在分散式FL下，复杂模型如深度神经网络展现出更低的隐私风险。

更新时间: 2024-07-12 15:01:09

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2407.09324v1

Can large language models explore in-context?

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

Updated: 2024-07-12 14:52:49

标题: 大型语言模型能够探索上下文吗？

摘要: 我们研究了当代大规模语言模型（LLMs）在探索方面的能力，这是强化学习和决策制定中的核心能力。我们专注于现有LLMs的本地性能，而不进行训练干预。我们将LLMs部署为简单多臂老虎机环境中的代理，完全在上下文中指定环境描述和互动历史，即在LLM提示中。我们尝试了GPT-3.5、GPT-4和Llama2，使用各种提示设计，并发现这些模型在没有重大干预的情况下并不稳健地进行探索：i）在所有实验中，只有一个配置导致了令人满意的探索行为：GPT-4具有思维链推理和外部总结的互动历史，作为足够统计量呈现；ii）所有其他配置都没有产生稳健的探索行为，包括那些具有思维链推理但未总结历史的配置。尽管这些发现可以积极解释，但它们表明外部总结——在更复杂的环境中可能不可行——对于从LLM代理中获得理想行为是重要的。我们得出结论，可能需要进行非平凡的算法干预，例如微调或数据集筛选，才能在复杂环境中赋予LLM决策制定代理权力。

更新时间: 2024-07-12 14:52:49

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15371v2

Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration

Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human interactions. To evaluate the ability of LLM agents to satisfy these requirements, we examine AI agent ensembles engaged in cultural collaboration and debate by analyzing their private responses and chat transcripts. Our findings suggest that multi-agent discussions can encourage collective decisions that reflect diverse perspectives, yet this benefit is tempered by the agents' susceptibility to conformity due to perceived peer pressure and challenges in maintaining consistent personas and opinions. Instructions that encourage debate in support of one's opinions rather than collaboration increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs or more realistic simulations of group decision-making will remain untapped.

Updated: 2024-07-12 14:50:25

标题: 一致性、虚构和冒充：多智能体LLM协作中的人物不一致性

摘要: 多智能体人工智能系统可以用于模拟科学和实际应用中的集体决策。它们还可以用于在聊天机器人流程中引入多样化的群体讨论步骤，增强聊天机器人对文化的敏感性。然而，这些应用是建立在人工智能代理能够可靠地采用指定的人设并模拟人类互动的能力上的。为了评估LLM代理满足这些要求的能力，我们通过分析他们的私人回复和聊天记录来研究参与文化合作和辩论的人工智能代理集合。我们的研究结果表明，多智能体讨论可以鼓励反映多元视角的集体决策，然而这种好处受到代理对认知同伴压力和保持一致人设和观点的挑战的影响。鼓励支持自己观点的辩论而非合作的指导会增加不一致性的比例。如果不解决我们发现的因素，多智能体框架实现更多文化多样化的人工智能输出或更真实的群体决策模拟的潜力将无法挖掘。

更新时间: 2024-07-12 14:50:25

领域: cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2405.03862v2

Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing

Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts. The intermediate gloss annotations of videos aim to guide the translation process. In our work, we focus on {\em Gloss2Text} translation stage and propose several advances by leveraging pre-trained large language models (LLMs), data augmentation, and novel label-smoothing loss function exploiting gloss translation ambiguities improving significantly the performance of state-of-the-art approaches. Through extensive experiments and ablation studies on the PHOENIX Weather 2014T dataset, our approach surpasses state-of-the-art performance in {\em Gloss2Text} translation, indicating its efficacy in addressing sign language translation and suggesting promising avenues for future research and development.

Updated: 2024-07-12 14:44:33

标题: Gloss2Text：使用LLMs和语义感知标签平滑进行手语术语翻译

摘要: 视频到口头文本的手语翻译面临独特挑战，主要是由于不同说话者和背景下视觉外观的高度变化，以及手语的独特语法和表达细微差别。视频的中间标注旨在引导翻译过程。在我们的工作中，我们专注于Gloss2Text翻译阶段，并通过利用预训练的大型语言模型（LLMs）、数据增强和利用Gloss翻译模糊性的新标签平滑损失函数提出了几项进展，显著改善了最先进方法的性能。通过对PHOENIX Weather 2014T数据集的广泛实验和消融研究，我们的方法在Gloss2Text翻译中超越了最新技术性能，表明其在解决手语翻译方面的有效性，并为未来研究和发展提出了有希望的途径。

更新时间: 2024-07-12 14:44:33

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01394v2

Leveraging Computer Vision in the Intensive Care Unit (ICU) for Examining Visitation and Mobility

Despite the importance of closely monitoring patients in the Intensive Care Unit (ICU), many aspects are still assessed in a limited manner due to the time constraints imposed on healthcare providers. For example, although excessive visitations during rest hours can potentially exacerbate the risk of circadian rhythm disruption and delirium, it is not captured in the ICU. Likewise, while mobility can be an important indicator of recovery or deterioration in ICU patients, it is only captured sporadically or not captured at all. In the past few years, the computer vision field has found application in many domains by reducing the human burden. Using computer vision systems in the ICU can also potentially enable non-existing assessments or enhance the frequency and accuracy of existing assessments while reducing the staff workload. In this study, we leverage a state-of-the-art noninvasive computer vision system based on depth imaging to characterize ICU visitations and patients' mobility. We then examine the relationship between visitation and several patient outcomes, such as pain, acuity, and delirium. We found an association between deteriorating patient acuity and the incidence of delirium with increased visitations. In contrast, self-reported pain, reported using the Defense and Veteran Pain Rating Scale (DVPRS), was correlated with decreased visitations. Our findings highlight the feasibility and potential of using noninvasive autonomous systems to monitor ICU patients.

Updated: 2024-07-12 14:43:01

标题: 在重症监护室（ICU）中利用计算机视觉检查探访和移动。

摘要: 尽管密切监测重症监护室（ICU）患者的重要性不言而喻，但由于医疗提供者面临的时间限制，许多方面仍然受到限制的评估。例如，尽管在休息时间过多的探视可能会加剧昼夜节律紊乱和谵妄的风险，但在ICU中并没有得到捕捉。同样，虽然患者的活动能够成为ICU患者康复或恶化的重要指标，但仅被零星捕捉或根本没有被捕捉。在过去的几年中，计算机视觉领域通过减轻人力负担在许多领域找到了应用。在ICU中使用计算机视觉系统还可能使原本不存在的评估变得可能或增强现有评估的频率和准确性，同时减轻员工的工作负担。在这项研究中，我们利用基于深度成像的最先进的无创计算机视觉系统来表征ICU探视和患者的活动能力。然后我们研究探视与疼痛、严重程度和谵妄等几种患者结果之间的关系。我们发现患者严重程度恶化和谵妄发生率与探视增加之间存在关联。相反，使用国防和退伍军人疼痛评分量表（DVPRS）自报的疼痛与探视减少相关。我们的研究结果突出了使用无创自主系统监测ICU患者的可行性和潜力。

更新时间: 2024-07-12 14:43:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.06322v2

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s

Updated: 2024-07-12 14:37:08

标题: 移动VLA：具有长背景VLMs和拓扑图的多模式指导导航

摘要: 导航研究中一个难以实现的目标是构建一个智能代理，能够理解包括自然语言和图像在内的多模态指令，并执行有用的导航。为了实现这一目标，我们研究了一类广泛应用的导航任务，我们称之为具有演示游览的多模态指令导航（MINT），其中环境先验是通过之前记录的演示视频提供的。最近视觉语言模型（VLMs）的进展显示出了在实现这一目标方面的有希望的途径，因为它展示了感知和推理多模态输入的能力。然而，VLMs通常是训练成预测文本输出的，如何最好地利用它们在导航中是一个开放的研究问题。为了解决MINT，我们提出了移动性VLA，一个结合长上下文VLMs和基于拓扑图的强健低级导航策略的分层视觉-语言-动作（VLA）导航策略。高级策略由一个长上下文VLM组成，它接受演示游览视频和多模态用户指令作为输入，以在游览视频中找到目标帧。接下来，低级策略使用目标帧和离线构建的拓扑图在每个时间步生成机器人动作。我们在一个836平方米的真实环境中评估了移动性VLA，并展示了它在以前未解决的多模态指令上的高端到端成功率，比如“我应该把这个放在哪里？”同时拿着一个塑料箱。可以在这里找到展示移动性VLA的视频：https://youtu.be/-Tof__Q8_5s

更新时间: 2024-07-12 14:37:08

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.07775v2

Structural Design Through Reinforcement Learning

This paper introduces the Structural Optimization gym (SOgym), a novel open-source Reinforcement Learning (RL) environment designed to advance machine learning in Topology Optimization (TO). SOgym enables RL agents to generate physically viable and structurally robust designs by integrating the physics of TO into the reward function. To enhance scalability, SOgym leverages feature-mapping methods as a mesh-independent interface between the environment and the agent, allowing efficient interaction with the design variables regardless of mesh resolution. Baseline results use a model-free Proximal Policy Optimization agent and a model-based DreamerV3 agent. Three observation space configurations were tested. The TopOpt game-inspired configuration, an interactive educational tool that improves students' intuition in designing structures to minimize compliance under volume constraints, performed best in terms of performance and sample efficiency. The 100M parameter version of DreamerV3 produced structures within 54% of the baseline compliance achieved by traditional optimization methods and a 0% disconnection rate, an improvement over supervised learning approaches that often struggle with disconnected load paths. When comparing the learning rates of the agents to those of engineering students from the TopOpt game experiment, the DreamerV3-100M model shows a learning rate approximately four orders of magnitude lower, an impressive feat for a policy trained from scratch through trial and error. These results suggest RL's potential to solve continuous TO problems and its capacity to explore and learn from diverse design solutions. SOgym provides a platform for developing RL agents for complex structural design challenges and is publicly available to support further research in the field.

Updated: 2024-07-12 14:31:35

标题: 通过强化学习进行结构设计

摘要: 本文介绍了结构优化健身房（SOgym），这是一个新颖的开源强化学习（RL）环境，旨在推动拓扑优化（TO）中的机器学习。SOgym使RL代理能够通过将TO的物理特性整合到奖励函数中来生成物理可行且结构稳健的设计。为了提高可扩展性，SOgym利用特征映射方法作为环境与代理之间的网格独立接口，使得在网格分辨率不同的情况下能够有效地与设计变量进行交互。基准结果使用了无模型Proximal Policy Optimization代理和基于模型的DreamerV3代理。测试了三种观察空间配置。TopOpt游戏灵感配置是一种互动教育工具，可以提高学生设计结构以在体积约束下最小化顺从性的直觉，在性能和样本效率方面表现最佳。DreamerV3的100M参数版本所生成的结构符合传统优化方法所达到的基线顺从性的54％，且断开率为0％，这是对监督学习方法的改进，后者经常在处理断开的载荷路径时遇到困难。当将代理的学习速率与TopOpt游戏实验中的工程学生的学习速率进行比较时，DreamerV3-100M模型显示出大约低四个数量级的学习速率，这是一个通过试错训练的策略所取得的令人印象深刻的成就。这些结果表明RL有解决连续TO问题的潜力，并且具有探索和学习各种设计解决方案的能力。SOgym为开发复杂结构设计挑战的RL代理提供了平台，并可公开支持该领域的进一步研究。

更新时间: 2024-07-12 14:31:35

领域: cs.AI,68T07 (Primary), 74P05 (Secondary),J.2; J.6; I.2

下载: http://arxiv.org/abs/2407.07288v2

Learning Distances from Data with Normalizing Flows and Score Matching

Density-based distances (DBDs) offer an elegant solution to the problem of metric learning. By defining a Riemannian metric which increases with decreasing probability density, shortest paths naturally follow the data manifold and points are clustered according to the modes of the data. We show that existing methods to estimate Fermat distances, a particular choice of DBD, suffer from poor convergence in both low and high dimensions due to i) inaccurate density estimates and ii) reliance on graph-based paths which are increasingly rough in high dimensions. To address these issues, we propose learning the densities using a normalizing flow, a generative model with tractable density estimation, and employing a smooth relaxation method using a score model initialized from a graph-based proposal. Additionally, we introduce a dimension-adapted Fermat distance that exhibits more intuitive behavior when scaled to high dimensions and offers better numerical properties. Our work paves the way for practical use of density-based distances, especially in high-dimensional spaces.

Updated: 2024-07-12 14:30:41

标题: 使用归一化流和得分匹配从数据中学习距离

摘要: 密度基距离（DBD）为度量学习问题提供了一种优雅的解决方案。通过定义一个随着概率密度减小而增加的黎曼度量，最短路径自然地遵循数据流形，并根据数据的模式对点进行聚类。我们展示了现有方法用于估计费马距离，即DBD的一种特定选择，由于i）密度估计不准确和ii）依赖于在高维度中越来越粗糙的基于图的路径，导致在低维和高维中收敛性差。为了解决这些问题，我们提出使用正规化流来学习密度，这是一种具有可估计密度的生成模型，并采用一个从基于图的提议初始化的分数模型的平滑松弛方法。此外，我们引入一种适应维度的费马距离，当缩放到高维时表现出更直观的行为，并提供更好的数值特性。我们的工作为在高维空间中实际应用密度基距离铺平了道路。

更新时间: 2024-07-12 14:30:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.09297v1

Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and device screenshots as inputs. Despite the increased human-machine interaction efficiency, the security risks of MLLM-based mobile agent systems have not been systematically studied. Existing security benchmarks for agents mainly focus on Web scenarios, and the attack techniques against MLLMs are also limited in the mobile agent scenario. To close these gaps, this paper proposes a mobile agent security matrix covering 3 functional modules of the agent systems. Based on the security matrix, this paper proposes 4 realistic attack paths and verifies these attack paths through 8 attack methods. By analyzing the attack results, this paper reveals that MLLM-based mobile agent systems are not only vulnerable to multiple traditional attacks, but also raise new security concerns previously unconsidered. This paper highlights the need for security awareness in the design of MLLM-based systems and paves the way for future research on attacks and defense methods.

Updated: 2024-07-12 14:30:05

标题: 移动设备上多模态代理的安全矩阵：系统性和概念验证研究

摘要: 多模大型语言模型（MLLMs）推动了移动设备上自主代理系统的发展，这是因为其推理能力的快速进展。基于MLLM的移动代理系统包括感知、推理、记忆和多代理协作模块，能够自动分析用户指令并设计只需自然语言和设备截图作为输入的任务流水线。尽管人机交互效率提高，但基于MLLM的移动代理系统的安全风险尚未得到系统研究。现有代理的安全基准主要关注Web场景，对于MLLM的攻击技术也在移动代理场景中有限。为了弥补这些差距，本文提出了一个涵盖代理系统三个功能模块的移动代理安全矩阵。基于这个安全矩阵，本文提出了四条现实攻击路径，并通过八种攻击方法验证了这些攻击路径。通过分析攻击结果，本文揭示了基于MLLM的移动代理系统不仅容易受到多种传统攻击，还引发了以前未考虑的新安全问题。本文强调了在设计基于MLLM的系统时需要安全意识，并为未来攻击和防御方法的研究铺平了道路。

更新时间: 2024-07-12 14:30:05

领域: cs.CR

下载: http://arxiv.org/abs/2407.09295v1

CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models

This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness and explore the embedded defense mechanisms in these models. Our approach is distinctive for its capacity to elucidate the reasons behind the generation of harmful responses by LLMs through an incremental counterfactual methodology. By organizing the prompt modification process into four incremental levels: (word, sentence, character, and a combination of character and word) we facilitate a thorough examination of the susceptibilities inherent to LLMs. The findings from our study not only provide counterfactual explanation insight but also demonstrate that our framework significantly enhances the effectiveness of attack prompts.

Updated: 2024-07-12 14:26:14

标题: CEIPA：对大型语言模型的反事实可解释增量提示攻击分析

摘要: 这项研究揭示了在大型语言模型（LLMs）中，如GPT-4和LLaMA-2中加强安全和隐私措施的迫切需要，通过识别和减轻它们的脆弱性，通过可解释性分析提示攻击。我们提出了反事实可解释性增量提示攻击（CEIPA），这是一种新颖的技术，我们通过特定方式引导提示来量化攻击效果，并探索这些模型中嵌入的防御机制。我们的方法独特之处在于它能够通过逐步反事实方法阐明LLMs生成有害响应的原因。通过将提示修改过程分为四个逐步级别：（单词、句子、字符和字符和单词的组合），我们促进了对LLMs固有脆弱性的彻底检查。我们研究的发现不仅提供反事实解释洞察，而且证明我们的框架显著提高了攻击提示的效果。

更新时间: 2024-07-12 14:26:14

领域: cs.CR

下载: http://arxiv.org/abs/2407.09292v1

Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents. The language module (based on LLM) translates the language instruction into a high-level action plan, which is then executed by a pre-trained reinforcement learning agent. We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.

Updated: 2024-07-12 14:19:36

标题: 在虚拟环境中的目标条件强化学习下的指令跟随

摘要: 在这项研究中，我们讨论了让人工智能代理在虚拟环境中执行复杂语言指令的问题。在我们的框架中，我们假设这些指令涉及复杂的语言结构和多个相互依赖的任务，必须成功地导航才能实现期望的结果。为了有效管理这些复杂性，我们提出了一个层次结构框架，将大型语言模型的深层语言理解与强化学习代理的自适应行动执行能力相结合。语言模块（基于LLM）将语言指令翻译成高级行动计划，然后由预训练的强化学习代理执行。我们已经在两个不同的环境中证明了我们方法的有效性：在IGLU中，代理被要求建造结构，在Crafter中，代理根据语言指令执行任务并与周围环境中的对象互动。

更新时间: 2024-07-12 14:19:36

领域: cs.AI

下载: http://arxiv.org/abs/2407.09287v1

Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay

We explore the hypothesis that LLMs, such as GPT-3.5 and GPT-4, possess broader cognitive functions, particularly in non-linguistic domains. Our approach extends beyond standard linguistic benchmarks by incorporating games like Tic-Tac-Toe, Connect Four, and Battleship, encoded via ASCII, to assess strategic thinking and decision-making. To evaluate the models' ability to generalize beyond their training data, we introduce two additional games. The first game, LEGO Connect Language (LCL), tests the models' capacity to understand spatial logic and follow assembly instructions. The second game, the game of shapes, challenges the models to identify shapes represented by 1s within a matrix of zeros, further testing their spatial reasoning skills. This "show, don't tell" strategy uses games instead of simply querying the models. Our results show that despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre. Both models fail to anticipate losing moves in Tic-Tac-Toe and Connect Four, and they are unable to play Battleship correctly. While GPT-4 shows some success in the game of shapes, both models fail at the assembly tasks presented in the LCL game. These results suggest that while GPT models can emulate conversational proficiency and basic rule comprehension, their performance in strategic gameplay and spatial reasoning tasks is very limited. Importantly, this reveals a blind spot in current LLM benchmarks that we highlight with our gameplay benchmark suite ChildPlay (https://github.com/child-play-neurips/child-play). Our findings provide a cautionary tale about claims of emergent intelligence and reasoning capabilities of LLMs that are roughly the size of GPT-3.5 and GPT-4.

Updated: 2024-07-12 14:17:26

标题: 展示，不是告诉：通过ChildPlay评估大型语言模型的文本理解之外的能力

摘要: 我们探讨了LLM（如GPT-3.5和GPT-4）具有更广泛的认知功能，特别是在非语言领域的假设。我们的方法不仅超越了标准的语言基准，还通过ASCII编码游戏（如井字游戏、四子连珠和战舰）来评估战略思维和决策能力。为了评估模型在训练数据之外的泛化能力，我们引入了两个额外的游戏。第一个游戏LEGO Connect Language（LCL）测试模型理解空间逻辑和遵循组装说明的能力。第二个游戏形状游戏挑战模型在一个由0组成的矩阵中识别由1表示的形状，进一步测试他们的空间推理能力。这种“展示，不要告诉”的策略使用游戏而不是简单地查询模型。我们的结果显示，尽管在标准基准上表现出色，但GPT-3.5和GPT-4在没有预训练的情况下玩游戏和推理完全可见游戏的能力一般。两个模型都无法预测井字游戏和四子连珠中的失败移动，也无法正确玩战舰游戏。虽然GPT-4在形状游戏中取得了一些成功，但两个模型在LCL游戏中呈现的组装任务上失败了。这些结果表明，虽然GPT模型可以模拟对话能力和基本规则理解，但它们在战略游戏和空间推理任务中的表现非常有限。重要的是，这揭示了当前LLM基准测试中的一个盲点，我们通过我们的游戏基准套件ChildPlay（https://github.com/child-play-neurips/child-play）来强调这一点。我们的发现提供了一个关于LLM（大致大小为GPT-3.5和GPT-4）新兴智能和推理能力声明的警示故事。

更新时间: 2024-07-12 14:17:26

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.11068v1

3DReact: Geometric deep learning for chemical reactions

Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction datasets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different datasets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.

Updated: 2024-07-12 14:15:23

标题: 3DReact：用于化学反应的几何深度学习

摘要: 几何深度学习模型将相关的分子对称性纳入神经网络架构中，显著提高了预测分子性质准确性和数据效率。在此成功基础上，我们引入了3DReact，一个基于三维结构的几何深度学习模型，用于预测反应物和产物的反应性质。我们证明了模型的不变版本对现有的反应数据集已经足够。我们展示了它在不同原子映射制度下在GDB7-22-TS、Cyclo-23-TS和Proparg-21-TS数据集上预测活化能壁的竞争表现。我们展示了与现有的反应性质预测模型相比，3DReact提供了一个灵活的框架，可以利用原子映射信息，如果可用的话，以及反应物和产物的几何结构（以不变或等变的方式）。因此，它在不同数据集、原子映射制度以及插值和外推任务中表现出色。

更新时间: 2024-07-12 14:15:23

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2312.08307v2

Semi-Supervised Learning for Deep Causal Generative Models

Developing models that are capable of answering questions of the form "How would x change if y had been z?'" is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.

Updated: 2024-07-12 14:13:41

标题: 深度因果生成模型的半监督学习

摘要: 开发能够回答“如果y是z，x会如何变化？”这种问题的模型对于推进医学图像分析至关重要。目前，训练能够回答这类反事实问题的因果生成模型需要观察到所有相关变量，并且对应的标签在训练数据中可用。然而，临床数据可能并没有所有患者的完整记录，而且最先进的因果生成模型无法充分利用这一点。因此，我们首次开发了一种半监督深度因果生成模型，利用变量之间的因果关系来最大程度地利用所有可用数据。我们在每个样本要么完全标记，要么完全未标记的情况下进行探索，以及更具临床现实性的情况，即每个样本的不同标签都缺失。我们利用因果推断技术推断缺失值，并随后生成逼真的反事实结果，即使对于具有不完整标签的样本也是如此。

更新时间: 2024-07-12 14:13:41

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2403.18717v2

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns.

Updated: 2024-07-12 14:13:06

标题: 预测和理解人类行动决策：来自大型语言模型和认知基于实例学习的见解

摘要: 大型语言模型（LLMs）已经证明了它们在各种任务中的能力，从语言翻译到复杂的推理。理解和预测人类行为和偏见对于人工智能（AI）辅助系统提供有用的帮助至关重要，但这仍然是一个开放的问题，即这些模型能否实现这一目标。本文通过利用LLMs的推理和生成能力来预测两个连续决策任务中的人类行为来填补这一差距。这些任务涉及在开发性和探索性行为之间取得平衡以及处理延迟反馈，这两者对于模拟现实生活中的决策过程至关重要。我们将LLMs的性能与一种模拟人类经验性决策制定的认知基于实例学习（IBL）模型进行比较。我们的发现表明，LLMs擅长迅速吸收反馈以提高预测准确性。相比之下，认知IBL模型更好地解释了人类的探索行为，并有效地捕捉了损失厌恶偏见，即倾向于选择具有较少步骤成本惩罚的次优目标，而不是探索寻找最佳选择，即使经验有限。结果突显了将LLMs与认知架构相结合的好处，表明这种协同作用可以增强对复杂人类决策模式的建模和理解。

更新时间: 2024-07-12 14:13:06

领域: cs.AI

下载: http://arxiv.org/abs/2407.09281v1

H2O-Danube3 Technical Report

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

Updated: 2024-07-12 14:09:40

标题: H2O-Danube3技术报告

摘要: 我们介绍了H2O-Danube3，这是一系列小型语言模型，包括经过训练的H2O-Danube3-4B（使用6T个标记）和H2O-Danube3-500M（使用4T个标记）。我们的模型经过三个不同数据混合阶段在高质量Web数据上进行预训练，主要由英语标记组成，最终进行有监督微调以用于聊天版本。这些模型在多种学术、聊天和微调基准测试中表现出极具竞争力的指标。由于其紧凑的架构，H2O-Danube3可以在现代智能手机上高效运行，甚至在移动设备上实现本地推理和快速处理能力。我们将所有模型公开提供，使用Apache 2.0许可证，从而进一步使LLMs对更广泛的经济受众民主化。

更新时间: 2024-07-12 14:09:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.09276v1

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.

Updated: 2024-07-12 14:03:02

标题: 将序列、结构和描述统一为大型多模型HelixProtX的任何蛋白生成。

摘要: 蛋白质是生物系统的基本组成部分，可以通过各种形式来表示，包括序列、结构和文本描述。尽管深度学习和科学大型语言模型（LLMs）在蛋白质研究中取得了进展，但目前的方法主要集中在有限的专门任务上，通常是从一种蛋白质形式预测另一种蛋白质形式。这些方法限制了对多模态蛋白数据的理解和生成。相比之下，大型多模态模型已经展示了在生成任何形式的内容（如文本、图像和视频）方面的潜力，从而丰富了各个领域用户之间的互动。将这些多模态模型技术整合到蛋白质研究中，有望通过潜在地改变蛋白质研究的方式来提供重要的希望。为此，我们介绍了HelixProtX，这是一个建立在大型多模态模型之上的系统，旨在通过支持任何到任何的蛋白质形式生成来为蛋白质研究提供全面解决方案。与现有方法不同，它允许将任何输入蛋白质形式转换为任何所需的蛋白质形式。实验结果证实了HelixProtX的先进能力，不仅能够从氨基酸序列中生成功能描述，还能够执行诸如从文本描述中设计蛋白质序列和结构等关键任务。初步研究结果表明，HelixProtX在一系列与蛋白质相关的任务中始终达到了卓越的准确性，超越了现有的最先进模型。通过将多模态大型模型整合到蛋白质研究中，HelixProtX为理解蛋白质生物学开辟了新的途径，从而有望加速科学发现。

更新时间: 2024-07-12 14:03:02

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2407.09274v1

iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that non-continual neural mesh models exhibit strong performance in generalizing to such OOD scenarios. To leverage this decisive property in a continual learning setting, we propose incremental neural mesh models that can be extended with new meshes over time. In addition, we present a latent space initialization strategy that enables us to allocate feature space for future unseen classes in advance and a positional regularization term that forces the features of the different classes to consistently stay in respective latent space regions. We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets and show that our approach outperforms the baselines for classification by $2-6\%$ in the in-domain and by $6-50\%$ in the OOD setting. Our work also presents the first incremental learning approach for pose estimation. Our code and model can be found at https://github.com/Fischer-Tom/iNeMo.

Updated: 2024-07-12 13:57:49

标题: iNeMo：增量神经网格模型用于稳健的类增量学习

摘要: 与人类本质不同的是，目前在视觉任务中，训练深度学习模型通常仅在初始阶段和固定数据集上进行。最近有许多方法处理连续数据流。然而，将这些方法扩展到管理分布之外（OOD）的情况尚未得到有效研究。另一方面，最近已经表明，非连续神经网格模型在概括到这种OOD情况时表现出强大性能。为了在连续学习环境中利用这一决定性属性，我们提出了可以随着时间推移扩展新网格的增量神经网格模型。此外，我们提出了一种潜在空间初始化策略，使我们能够提前为未来未见类别分配特征空间，并引入了一个位置正则化项，强制不同类别的特征始终保持在各自的潜在空间区域。我们通过对Pascal3D和ObjectNet3D数据集的广泛实验证明了我们方法的有效性，并展示了我们的方法在领域内的分类中比基线高出$2-6\%$，在OOD设置中高出$6-50\%$。我们的工作还提出了第一个用于姿势估计的增量学习方法。我们的代码和模型可以在https://github.com/Fischer-Tom/iNeMo 找到。

更新时间: 2024-07-12 13:57:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09271v1

The Emperor is Now Clothed: A Secure Governance Framework for Web User Authentication through Password Managers

Existing approaches to facilitate the interaction between password managers and web applications fall short of providing adequate functionality and mitigation strategies against prominent attacks. HTML Autofill is not sufficiently expressive, Credential Management API does not support browser extension password managers, and other proposed solutions do not conform to established user mental models. In this paper, we propose Berytus, a browser-based governance framework that mediates the interaction between password managers and web applications. Two APIs are designed to support Berytus acting as an orchestrator between password managers and web applications. An implementation of the framework in Firefox is developed that fully supports registration and authentication processes. As an orchestrator, Berytus is able to authenticate web applications and facilitate authenticated key exchange between web applications and password managers, which as we show, can provide effective mitigation strategies against phishing, cross-site scripting, inline code injection (e.g., by a malicious browser extension), and TLS proxy in the middle attacks, whereas existing mitigation strategies such as Content Security Policy and credential tokenisation are only partially effective. The framework design also provides desirable functional properties such as support for multi-step, multi-factor, and custom authentication schemes. We provide a comprehensive security and functionality evaluation and discuss possible future directions.

Updated: 2024-07-12 13:52:09

标题: 帝王现在穿着新衣：通过密码管理器实现Web用户身份验证的安全治理框架

摘要: 现有的促进密码管理器与Web应用程序交互的方法存在不足，无法提供足够的功能和对突出攻击的缓解策略。HTML Autofill表达能力不足，凭证管理API不支持浏览器扩展密码管理器，其他提出的解决方案也不符合已建立的用户心智模型。在本文中，我们提出了一个名为Berytus的基于浏览器的治理框架，用于调解密码管理器和Web应用程序之间的交互。设计了两个API来支持Berytus作为密码管理器和Web应用程序之间协调者的角色。在Firefox中开发了该框架的实现，完全支持注册和认证过程。作为协调者，Berytus能够验证Web应用程序并促进Web应用程序和密码管理器之间的经过验证的密钥交换，我们展示了这可以提供有效的对抗钓鱼、跨站脚本、内联代码注入（例如恶意浏览器扩展）和TLS代理中间人攻击的缓解策略，而现有的缓解策略，如内容安全策略和凭证令牌化，只是部分有效。该框架设计还提供了理想的功能特性，例如支持多步骤、多要素和自定义认证方案。我们提供了全面的安全性和功能评估，并讨论可能的未来方向。

更新时间: 2024-07-12 13:52:09

领域: cs.CR

下载: http://arxiv.org/abs/2407.07205v2

GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study

Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.

Updated: 2024-07-12 13:46:47

标题: GPT-4生成的生活事件叙述：使用结构化叙述提示的验证研究

摘要: 大型语言模型（LLMs）在生成大量叙述方面发挥着关键作用，促进了对它们在叙述生活事件方面有效性的系统探索。在这项研究中，我们使用零射击结构化叙述提示，利用OpenAI的GPT-4生成了24,000个叙述。从这个数据集中，我们手动分类了2,880个叙述，并评估它们在传达出生、死亡、雇用和解雇事件方面的有效性。值得注意的是，87.43%的叙述足以传达结构化提示的意图。为了自动识别有效和无效的叙述，我们在分类数据集上训练和验证了九个机器学习模型。利用这些模型，我们扩展了我们的分析，预测了剩下的21,120个叙述的分类。所有的机器学习模型在将有效叙述分类为有效方面表现出色，但在同时将无效叙述分类为无效方面遇到挑战。我们的发现不仅推动了对LLM能力、局限性和有效性的研究，还为叙述生成和自然语言处理应用提供了实用的见解。

更新时间: 2024-07-12 13:46:47

领域: cs.CL,cs.AI,cs.LG,I.2.7; I.6.4

下载: http://arxiv.org/abs/2402.05435v2

Synthetic Cancer -- Augmenting Worms with LLMs

With increasingly sophisticated large language models (LLMs), the potential for abuse rises drastically. As a submission to the Swiss AI Safety Prize, we present a novel type of metamorphic malware leveraging LLMs for two key processes. First, LLMs are used for automatic code rewriting to evade signature-based detection by antimalware programs. The malware then spreads its copies via email by utilizing an LLM to socially engineer email replies to encourage recipients to execute the attached malware. Our submission includes a functional minimal prototype, highlighting the risks that LLMs pose for cybersecurity and underscoring the need for further research into intelligent malware.

Updated: 2024-07-12 13:40:10

标题: 合成癌症--利用LLMs增强蠕虫

摘要: 随着越来越复杂的大型语言模型（LLMs）的出现，滥用的潜力急剧上升。作为瑞士人工智能安全奖的一项提交，我们提出了一种利用LLMs进行两个关键过程的新型变形恶意软件。首先，LLMs用于自动代码重写，以躲避基于签名的反恶意软件程序的检测。然后，该恶意软件通过电子邮件传播其副本，利用LLMs来社交工程电子邮件回复，鼓励收件人执行附加的恶意软件。我们的提交包括一个功能性的最小原型，突出了LLMs对网络安全造成的风险，并强调了对智能恶意软件进行进一步研究的必要性。

更新时间: 2024-07-12 13:40:10

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.19570v2

Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis

The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.

Updated: 2024-07-12 13:37:53

标题: 《欧盟的联合学习与人工智能监管：谁应负责？-- 一项跨学科分析》

摘要: 欧盟人工智能法案规定在开发和部署机器学习应用程序时必须明确利益相关者的责任，以避免受到重罚，优先考虑在数据保持原始状态的情况下进行私密和安全数据处理。联邦学习（FL）使得可以跨数据孤立进行生成式AI模型的训练，仅共享模型参数同时提高数据安全性。由于FL是一种合作学习范式，客户端和服务器在FL流程中自然分享法律责任。我们的工作有助于澄清双方的角色，解释了将责任转移到服务器运营商的策略，并指出了我们必须解决的开放技术挑战，以改善在欧盟人工智能法案下FL的实际适用性。

更新时间: 2024-07-12 13:37:53

领域: cs.AI,K.5; I.2.11; C.2.4; D.2.1

下载: http://arxiv.org/abs/2407.08105v2

Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats

In pioneering work from 2019, Barcel\'o and coauthors identified logics that precisely match the expressive power of constant iteration-depth graph neural networks (GNNs) relative to properties definable in first-order logic. In this article, we give exact logical characterizations of recurrent GNNs in two scenarios: (1) in the setting with floating-point numbers and (2) with reals. For floats, the formalism matching recurrent GNNs is a rule-based modal logic with counting, while for reals we use a suitable infinitary modal logic, also with counting. These results give exact matches between logics and GNNs in the recurrent setting without relativising to a background logic in either case, but using some natural assumptions about floating-point arithmetic. Applying our characterizations, we also prove that, relative to graph properties definable in monadic second-order logic (MSO), our infinitary and rule-based logics are equally expressive. This implies that recurrent GNNs with reals and floats have the same expressive power over MSO-definable properties and shows that, for such properties, also recurrent GNNs with reals are characterized by a (finitary!) rule-based modal logic. In the general case, in contrast, the expressive power with floats is weaker than with reals. In addition to logic-oriented results, we also characterize recurrent GNNs, with both reals and floats, via distributed automata, drawing links to distributed computing models.

Updated: 2024-07-12 13:34:58

标题: 用实数和浮点数对循环图神经网络进行逻辑特征描述

摘要: 在2019年的开创性工作中，Barceló和合著者确定了与常数迭代深度图神经网络（GNNs）相对应的逻辑，这些逻辑与一阶逻辑中可定义的属性的表达能力完全相匹配。在这篇文章中，我们对两种情景下的循环GNNs进行了精确的逻辑特征描述：（1）在浮点数设置下；（2）在实数设置下。对于浮点数，与循环GNNs相匹配的形式化是一种带计数的基于规则的模态逻辑，而对于实数，我们使用适当的无穷模态逻辑，同样带有计数。这些结果在循环设置中给出了逻辑和GNNs之间的精确匹配，而无需将其相对于任一背景逻辑相对化，但是使用了一些关于浮点算术的自然假设。应用我们的表征，我们还证明，相对于可在单调二阶逻辑（MSO）中定义的图形属性，我们的无穷和基于规则的逻辑具有相同的表达能力。这意味着具有实数和浮点数的循环GNNs在MSO可定义属性上具有相同的表达能力，并且表明，对于这样的属性，具有实数的循环GNNs也是通过（有限的！）基于规则的模态逻辑进行表征。相比之下，一般情况下，浮点数的表达能力要弱于实数。除了与逻辑相关的结果外，我们还通过分布式自动机对具有实数和浮点数的循环GNNs进行了特征化，从而将其与分布式计算模型联系起来。

更新时间: 2024-07-12 13:34:58

领域: cs.LO,cs.AI,F.4.1; F.1.1; I.2.0

下载: http://arxiv.org/abs/2405.14606v2

Deep Adversarial Defense Against Multilevel-Lp Attacks

Deep learning models have shown considerable vulnerability to adversarial attacks, particularly as attacker strategies become more sophisticated. While traditional adversarial training (AT) techniques offer some resilience, they often focus on defending against a single type of attack, e.g., the $\ell_\infty$-norm attack, which can fail for other types. This paper introduces a computationally efficient multilevel $\ell_p$ defense, called the Efficient Robust Mode Connectivity (EMRC) method, which aims to enhance a deep learning model's resilience against multiple $\ell_p$-norm attacks. Similar to analytical continuation approaches used in continuous optimization, the method blends two $p$-specific adversarially optimal models, the $\ell_1$- and $\ell_\infty$-norm AT solutions, to provide good adversarial robustness for a range of $p$. We present experiments demonstrating that our approach performs better on various attacks as compared to AT-$\ell_\infty$, E-AT, and MSD, for datasets/architectures including: CIFAR-10, CIFAR-100 / PreResNet110, WideResNet, ViT-Base.

Updated: 2024-07-12 13:30:00

标题: 深度对抗性防御多级Lp攻击

摘要: 深度学习模型对对抗性攻击表现出相当大的脆弱性，特别是随着攻击者策略变得更加复杂。虽然传统的对抗性训练（AT）技术提供了一定程度的韧性，但它们通常侧重于防御单一类型的攻击，例如$\ell_\infty$-范数攻击，这可能会导致其他类型攻击的失败。本文引入了一种计算高效的多级$\ell_p$防御方法，称为高效稳健模态连接（EMRC）方法，旨在增强深度学习模型对多种$\ell_p$-范数攻击的韧性。类似于连续优化中使用的解析延拓方法，该方法将两种$p$-特定的对抗最优模型，$\ell_1$-和$\ell_\infty$-范数 AT 解决方案，混合在一起，以提供在一定范围内的$p$ 的良好对抗鲁棒性。我们进行了实验，结果表明我们的方法在各种攻击中表现比AT-$\ell_\infty$、E-AT和MSD更好，适用于数据集/架构，包括：CIFAR-10、CIFAR-100/PreResNet110、WideResNet、ViT-Base。

更新时间: 2024-07-12 13:30:00

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.09251v1

FedsLLM: Federated Split Learning for Large Language Models over Communication Networks

Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into client subnetworks and server subnetworks. It leverages a federated server to aggregate and update client models. As the training data are transmitted through a wireless network between clients and both main and federated servers, the training delay is determined by the learning accuracy and the allocation of communication bandwidth. This paper models the minimization of the training delay by integrating computation and communication optimization, simplifying the optimization problem into a convex problem to find the optimal solution. Additionally, it presents a lemma that describes the precise solutions to this problem. Simulation results demonstrate that the proposed optimization algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.

Updated: 2024-07-12 13:23:54

标题: FedsLLM：基于通信网络的大型语言模型的联邦分布式学习

摘要: 本文结合低秩适应技术（LoRA）和分割学习框架，提出了用于大型语言模型的联邦分割学习（FedsLLM）框架，以解决在无线通信网络中部署大型语言模型所面临的挑战。本文介绍的方法利用LoRA技术将网络分割为客户子网络和服务器子网络，以减少处理负载。它利用联邦服务器来聚合和更新客户端模型。由于训练数据通过无线网络在客户端和主服务器以及联邦服务器之间传输，训练延迟由学习准确性和通信带宽分配决定。本文对训练延迟的最小化进行建模，通过整合计算和通信优化，将优化问题简化为凸问题以找到最优解。此外，本文提出了一个引理，描述了这个问题的精确解。模拟结果表明，与未优化的情况相比，提出的优化算法平均减少了47.63%的延迟。

更新时间: 2024-07-12 13:23:54

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2407.09250v1

GNN with Model-based RL for Multi-agent Systems

Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.

Updated: 2024-07-12 13:21:35

标题: 用模型基础的强化学习的GNN在多智能体系统中的应用

摘要: 多智能体系统（MAS）在探索机器智能和先进应用中发挥着重要作用。为了深入研究MAS场景中复杂的相互作用，我们首次提出了“GNN for MBRL”模型，该模型利用基于状态空间的图神经网络结合基于模型的强化学习来解决特定的MAS任务（例如，避球、自动驾驶汽车）。具体而言，我们首先使用GNN模型预测多个智能体的未来状态和轨迹，然后应用交叉熵方法（CEM）优化的模型预测控制来辅助自我智能体规划行动，并成功完成某些MAS任务。

更新时间: 2024-07-12 13:21:35

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2407.09249v1

Constrained Intrinsic Motivation for Reinforcement Learning

This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.

Updated: 2024-07-12 13:20:52

标题: 受限制的内在动机对强化学习的影响

摘要: 本文研究了在奖励无关的预训练（RFPT）任务和探索与内在动机（EIM）任务中利用内在动机（IM）进行强化学习时出现的两个基本问题：1）如何设计有效的内在目标在RFPT任务中，2）如何减少EIM任务中内在目标引入的偏见。现有的IM方法在RFPT任务中存在静态技能、有限状态覆盖、样本效率低以及EIM任务中次优性的问题。为了解决这些问题，我们分别提出了RFPT和EIM任务的“受限内在动机（CIM）”：1）RFPT的CIM最大化条件状态熵的下界，受到对状态编码器网络的对齐约束，以实现有效的动态和多样化技能发现以及状态覆盖的最大化；2）EIM的CIM利用受限策略优化来自适应地调整内在目标的系数，以减轻内在目标的干扰。在各种MuJoCo机器人环境中，我们实证表明，RFPT的CIM在技能多样性，状态覆盖和微调性能方面远远超过了十五种IM方法，用于无监督技能发现。此外，我们展示了在任务奖励从一开始就暴露时，EIM的CIM在赎回内在奖励方面的有效性。我们的代码可在https://github.com/x-zheng16/CIM找到。

更新时间: 2024-07-12 13:20:52

领域: cs.AI

下载: http://arxiv.org/abs/2407.09247v1

Early Classification of Time Series: Taxonomy and Benchmark

In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see \url{https://github.com/ML-EDM/ml_edm}).

Updated: 2024-07-12 13:16:16

标题: 时间序列的早期分类：分类学和基准

摘要: 在许多情况下，所研究现象的测量结果是依次提供的，需要尽早进行类别预测，以避免产生过高的时间惩罚，但不能过早以至于冒着错误分类的成本。这个问题在时间序列的情况下得到了广泛研究，被称为时间序列的早期分类（ECTS）。尽管已经有了大量文献对这一问题进行了研究，但仍然缺乏一个系统的、共享的评估协议，以比较各种现有方法的相对优点。本文首先将这些方法置于基于原则的分类中。它定义了组织评估的维度，然后报告了涉及九种最新的ECTS算法的一系列非常广泛的实验结果。此外，这些实验和其他实验可以使用一个开源库进行，该库中大多数现有的ECTS算法已经被实现（参见\url{https://github.com/ML-EDM/ml_edm}）。

更新时间: 2024-07-12 13:16:16

领域: cs.LG

下载: http://arxiv.org/abs/2406.18332v2

FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder

The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits, necessitating confidentiality and protection from unknown collectors. To address this challenge, privacy-preserving methods like K-anonymity and Differential Privacy have been proposed to safeguard private information in the dataset. Despite their effectiveness, these methods can impact the original features by introducing perturbations or generating unrealistic trajectory data, leading to suboptimal performance in downstream tasks. To overcome these limitations, we propose a Federated Variational AutoEncoder (FedVAE) approach, which effectively generates a new trajectory dataset while preserving the confidentiality of private information and retaining the structure of the original features. In addition, FedVAE leverages Variational AutoEncoder (VAE) to maintain the original feature space and generate new trajectory data, and incorporates Federated Learning (FL) during the training stage, ensuring that users' data remains locally stored to protect their personal information. The results demonstrate its superior performance compared to other existing methods, affirming FedVAE as a promising solution for enhancing data privacy and utility in location-based applications.

Updated: 2024-07-12 13:10:59

标题: FedVAE：基于联邦变分自动编码器的轨迹隐私保护

摘要: 具有丰富的时空信息的轨迹数据在智能交通系统（ITS）和各种交通系统任务中至关重要。基于位置的服务（LBS）利用这些轨迹数据为用户提供根据其位置信息量身定制的个性化服务。然而，这些轨迹数据包含有关用户移动模式和习惯的敏感信息，需要保密并防止未知收集者获取。为解决这一挑战，提出了类似K-匿名和差分隐私之类的隐私保护方法，以保护数据集中的个人信息。尽管这些方法有效，但它们可能通过引入扰动或生成不真实的轨迹数据来影响原始特征，从而在下游任务中导致性能亚优。为了克服这些限制，我们提出了一种联邦变分自动编码器（FedVAE）方法，有效生成新的轨迹数据，同时保护私人信息的保密性并保留原始特征的结构。此外，FedVAE利用变分自动编码器（VAE）来维持原始特征空间和生成新的轨迹数据，并在训练阶段整合联邦学习（FL），以确保用户数据保持本地存储以保护其个人信息。结果表明，与其他现有方法相比，其表现优越，确认FedVAE作为增强位置应用中数据隐私和效用的有前途的解决方案。

更新时间: 2024-07-12 13:10:59

领域: cs.AI

下载: http://arxiv.org/abs/2407.09239v1

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, we propose a novel one-shot tuning paradigm, termed as OneActor. It efficiently performs consistent subject generation solely driven by prompts via a learned semantic guidance to bypass the laborious backbone tuning. We lead the way to formalize the objective of consistent subject generation from a clustering perspective, and thus design a cluster-conditioned model. To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance. These techniques are later verified to significantly enhance the generation quality. Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory subject consistency, superior prompt conformity as well as high image quality. Our method is capable of multi-subject generation and compatible with popular diffusion extensions. Besides, we achieve a 4 times faster tuning speed than tuning-based baselines and, if desired, avoid increasing inference time. Furthermore, to our best knowledge, we are the first to prove that the semantic space of the diffusion model has the same interpolation property as the latent space does. This property can serve as another promising tool for fine generation control.

Updated: 2024-07-12 13:03:00

标题: OneActor: 通过簇条件引导实现一致性角色生成

摘要: 文本到图像扩散模型使艺术家受益于高质量图像生成。然而，它们的随机性阻碍了艺术家创作相同主题的一致图像。现有方法试图以各种方式应对这一挑战，并生成一致的内容。然而，它们要么依赖于外部受限数据，要么需要昂贵的调整扩散模型。针对这一问题，我们提出了一种新颖的一次性调整范式，称为OneActor。它通过学习的语义引导，仅通过提示有效地执行一致的主题生成，从而避免了繁琐的骨干调整。我们引领了一条从聚类角度规范一致主题生成目标的途径，因此设计了一个集群条件模型。为了减轻一次性调整管道共享的过拟合挑战，我们通过辅助样本增强了调整，并设计了两种推理策略：语义插值和集群引导。后来证实这些技术显著提高了生成质量。全面的实验表明，我们的方法在主题一致性、优越的提示符符合性以及高质量图像方面优于各种基线。我们的方法能够进行多主题生成，并与流行的扩散扩展兼容。此外，我们实现了比基于调整的基线更快4倍的调整速度，并且如果需要，可以避免增加推理时间。此外，据我们所知，我们是第一个证明扩散模型的语义空间具有与潜在空间相同的插值特性的人。这种特性可以作为另一个有前途的细化生成控制工具。

更新时间: 2024-07-12 13:03:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.10267v2

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.

Updated: 2024-07-12 12:55:53

标题: ReLU神经网络的凸松弛在多项式时间内逼近全局最优解

摘要: 在本文中，我们研究了使用权重衰减正则化的两层ReLU网络与它们的凸松弛之间的最优性差距。我们表明，当训练数据是随机的时，原始问题与其松弛之间的相对最优性差距可以通过O(log n^0.5)的因子进行界定，其中n是训练样本的数量。一个简单的应用导致了一个可行的多项式时间算法，可以保证以对数因子解决原始的非凸问题。此外，在温和的假设下，我们表明局部梯度方法在高概率下会收敛到一个具有低训练损失的点。与现有结果相比，我们的结果是指数级的改进，并为理解局部梯度方法为何有效提供了新的见解。

更新时间: 2024-07-12 12:55:53

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.03625v3

On the Need of a Modeling Language for Distribution Shifts: Illustrations on Tabular Datasets

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to research, we build an empirical testbed comprising natural shifts across 5 tabular datasets and 60,000 method configurations encompassing imbalanced learning and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature. The performance of robust algorithms varies significantly over shift types, and is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that although often neglected by researchers, implementation details -- such as the choice of underlying model class (e.g., XGBoost) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a data-driven, inductive understanding of distribution shifts can enhance both data-centric and algorithmic interventions.

Updated: 2024-07-12 12:54:37

标题: 关于分布转移建模语言的必要性：基于表格数据集的示例

摘要: 不同的分布变化需要不同的干预措施，算法必须基于其所涉及的具体变化进行设计。然而，用于稳健算法的方法论发展通常依赖于缺乏经验验证的结构假设。我们倡导以经验为基础的数据驱动方法进行研究，建立一个实证测试平台，包括5个表格数据集和60,000种方法配置，涵盖不平衡学习和分布鲁棒优化（DRO）方法。我们发现$Y|X$-变化在我们的测试平台上最为普遍，与机器学习文献中对$X$（协变量）变化的重点形成鲜明对比。稳健算法的性能在不同的变化类型上有显著差异，并且并不比普通方法更好。为了理解这一点，我们对DRO方法进行了深入的实证分析，并发现尽管研究人员经常忽视，但实施细节 - 如选择基础模型类（例如XGBoost）和超参数选择 - 对性能的影响比模糊集或其半径更大。为了进一步弥合方法论研究和实践之间的差距，我们设计了案例研究，展示这种基于数据驱动、归纳的对分布变化的理解如何增强数据中心和算法干预。

更新时间: 2024-07-12 12:54:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.05284v3

Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag

Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.

Updated: 2024-07-12 12:47:19

标题: 比较监督学习动态：深度神经网络与人类数据效率相匹配，但显示出一种泛化滞后

摘要: 最近的研究在图像分类领域中看到了许多人类和深度神经网络（DNNs）之间的行为比较。通常，比较研究侧重于通过测量和比较一旦形成的对象类别表示的相似性来评估学习过程的最终结果。然而，这些表示是如何产生的过程 - 即，在获取过程中观察到的行为变化和中间阶段 - 往往不直接进行经验比较。在这里，我们报告了对人类观察者和各种经典和最先进的DNNs中的学习动态进行了详细调查。我们开发了一个受限的监督学习环境，以对齐学习相关条件，如起始点、输入模态、可用输入数据和提供的反馈。在整个学习过程中，我们评估和比较了学习到的表示如何能够推广到以前未见的测试数据。整个学习过程的比较表明，DNNs展示了与人类学习者相媲美的数据效率水平，挑战了该领域中一些普遍的假设。然而，我们的结果也揭示了表示差异：虽然DNNs的学习特点是明显的泛化滞后，人类似乎可以立即获得可推广的表示，而无需学习特定于训练集的信息的初步阶段，这些信息仅在稍后转移到新数据。

更新时间: 2024-07-12 12:47:19

领域: cs.CV,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2402.09303v3

Evaluating AI Evaluation: Perils and Prospects

As AI systems appear to exhibit ever-increasing capability and generality, assessing their true potential and safety becomes paramount. This paper contends that the prevalent evaluation methods for these systems are fundamentally inadequate, heightening the risks and potential hazards associated with AI. I argue that a reformation is required in the way we evaluate AI systems and that we should look towards cognitive sciences for inspiration in our approaches, which have a longstanding tradition of assessing general intelligence across diverse species. We will identify some of the difficulties that need to be overcome when applying cognitively-inspired approaches to general-purpose AI systems and also analyse the emerging area of "Evals". The paper concludes by identifying promising research pathways that could refine AI evaluation, advancing it towards a rigorous scientific domain that contributes to the development of safe AI systems.

Updated: 2024-07-12 12:37:13

标题: 评估人工智能评估：风险与前景

摘要: 随着人工智能系统似乎展现出越来越强大和普适的能力，评估它们的真实潜力和安全性变得至关重要。本文认为，目前用于评估这些系统的方法基本上是不足的，增加了与人工智能相关的风险和潜在危害。我认为我们需要改革我们评估人工智能系统的方式，并且应该向认知科学寻求灵感，认知科学在不同物种中评估智能具有悠久的传统。我们将确定在将认知启发方法应用于通用人工智能系统时需要克服的一些困难，并分析“Evals”这一新兴领域。本文最后指出了一些有希望的研究路径，可以改进人工智能评估，将其推进到一个严格科学的领域，有助于开发安全的人工智能系统。

更新时间: 2024-07-12 12:37:13

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.09221v1

Asymmetric GANs for Image-to-Image Translation

Existing models for unsupervised image translation with Generative Adversarial Networks (GANs) can learn the mapping from the source domain to the target domain using a cycle-consistency loss. However, these methods always adopt a symmetric network architecture to learn both forward and backward cycles. Because of the task complexity and cycle input difference between the source and target domains, the inequality in bidirectional forward-backward cycle translations is significant and the amount of information between two domains is different. In this paper, we analyze the limitation of existing symmetric GANs in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image translation tasks. Moreover, the training stage of existing methods has the common problem of model collapse that degrades the quality of the generated images, thus we explore different optimization losses for better training of AsymmetricGAN, making image translation with higher consistency and better stability. Extensive experiments on both supervised and unsupervised generative tasks with 8 datasets show that AsymmetricGAN achieves superior model capacity and better generation performance compared with existing GANs. To the best of our knowledge, we are the first to investigate the asymmetric GAN structure on both unsupervised and supervised image translation tasks.

Updated: 2024-07-12 12:34:52

标题: 不对称GANs用于图像到图像翻译

摘要: 目前的无监督图像翻译模型使用生成对抗网络（GANs）可以通过循环一致性损失学习从源域到目标域的映射。然而，这些方法总是采用对称的网络架构来学习正向和反向循环。由于任务复杂性和源域与目标域之间的循环输入差异，双向正向-反向循环翻译的不平等性显著，并且两个域之间的信息量不同。在本文中，我们分析了现有对称GAN在不对称翻译任务中的局限性，并提出了一个AsymmetricGAN模型，其包括大小不等的翻译和重建生成器，以及不同的参数共享策略，以适应无监督和监督图像翻译任务中的不对称需求。此外，现有方法的训练阶段存在模型崩溃的常见问题，降低了生成图像的质量，因此我们探索了不同的优化损失来更好地训练AsymmetricGAN，使图像翻译具有更高的一致性和更好的稳定性。在8个数据集上进行了广泛的实验，结果表明与现有的GAN相比，AsymmetricGAN具有更强的模型容量和更好的生成性能。据我们所知，我们是第一个研究无监督和监督图像翻译任务上不对称GAN结构的研究者。

更新时间: 2024-07-12 12:34:52

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/1912.06931v2

Conversational Assistants in Knowledge-Intensive Contexts: An Evaluation of LLM- versus Intent-based Systems

Conversational Assistants (CA) are increasingly supporting human workers in knowledge management. Traditionally, CAs respond in specific ways to predefined user intents and conversation patterns. However, this rigidness does not handle the diversity of natural language well. Recent advances in natural language processing, namely Large Language Models (LLMs), enable CAs to converse in a more flexible, human-like manner, extracting relevant information from texts and capturing information from expert humans but introducing new challenges such as ``hallucinations''. To assess the potential of using LLMs for knowledge management tasks, we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited better user experience, task completion rate, usability, and perceived performance than intent-based systems, suggesting that switching NLP techniques can be beneficial in the context of knowledge management.

Updated: 2024-07-12 12:31:14

标题: 在知识密集型环境中的对话助手：LLM与基于意图系统的评估

摘要: 会话助手（CA）越来越多地支持人类工作者进行知识管理。传统上，CA以特定方式回应预定义的用户意图和对话模式。然而，这种刚性方式并不能很好地处理自然语言的多样性。最近自然语言处理方面的进展，特别是大型语言模型（LLMs），使得CA能够以更灵活、更类似人类的方式进行对话，从文本中提取相关信息，并从专家人员那里获取信息，但同时也引入了新的挑战，比如“幻觉”。为了评估使用LLMs进行知识管理任务的潜力，我们进行了一项用户研究，比较了基于LLM的CA和基于意图的系统在交互效率、用户体验、工作负担和可用性方面。结果显示，基于LLM的CA表现出比基于意图的系统更好的用户体验、任务完成率、可用性和感知性能，这表明在知识管理的背景下切换自然语言处理技术可能是有益的。

更新时间: 2024-07-12 12:31:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2402.04955v2

A Fair Ranking and New Model for Panoptic Scene Graph Generation

In panoptic scene graph generation (PSGG), models retrieve interactions between objects in an image which are grounded by panoptic segmentation masks. Previous evaluations on panoptic scene graphs have been subject to an erroneous evaluation protocol where multiple masks for the same object can lead to multiple relation distributions per mask-mask pair. This can be exploited to increase the final score. We correct this flaw and provide a fair ranking over a wide range of existing PSGG models. The observed scores for existing methods increase by up to 7.4 mR@50 for all two-stage methods, while dropping by up to 19.3 mR@50 for all one-stage methods, highlighting the importance of a correct evaluation. Contrary to recent publications, we show that existing two-stage methods are competitive to one-stage methods. Building on this, we introduce the Decoupled SceneFormer (DSFormer), a novel two-stage model that outperforms all existing scene graph models by a large margin of +11 mR@50 and +10 mNgR@50 on the corrected evaluation, thus setting a new SOTA. As a core design principle, DSFormer encodes subject and object masks directly into feature space.

Updated: 2024-07-12 12:28:08

标题: 一个公平的排名和全视场景图生成的新模型 (Note: "Panoptic Scene Graph Generation" is a term referring to the process of generating scene graphs for panoptic segmentation, which involves both semantic segmentation and instance segmentation tasks in computer vision.)

摘要: 在全视野场景图生成（PSGG）中，模型检索图像中对象之间的相互作用，这些相互作用由全视野分割掩模进行支撑。先前对全视野场景图的评估存在一个错误的评估协议，即同一对象的多个掩模可能导致每个掩模-掩模对之间存在多个关系分布。这可以被利用来提高最终得分。我们纠正了这一缺陷，并在现有PSGG模型的广泛范围内提供公正的排名。观察到，现有方法的得分在所有两阶段方法中增加了高达7.4 mR@50，而在所有一阶段方法中下降了高达19.3 mR@50，突显了正确评估的重要性。与最近的出版物相反，我们展示了现有的两阶段方法与一阶段方法具有竞争力。基于此，我们介绍了分解场景形成器（DSFormer），这是一种新颖的两阶段模型，在校正后的评估中，其性能优于所有现有的场景图模型，+11 mR@50和+10 mNgR@50，从而确立了新的SOTA。作为核心设计原则，DSFormer直接将主体和对象掩模编码到特征空间中。

更新时间: 2024-07-12 12:28:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09216v1

Generating SROI^{-} Ontologies via Knowledge Graph Query Embedding Learning

Query embedding approaches answer complex logical queries over incomplete knowledge graphs (KGs) by computing and operating on low-dimensional vector representations of entities, relations, and queries. However, current query embedding models heavily rely on excessively parameterized neural networks and cannot explain the knowledge learned from the graph. We propose a novel query embedding method, AConE, which explains the knowledge learned from the graph in the form of SROI^{-} description logic axioms while being more parameter-efficient than most existing approaches. AConE associates queries to a SROI^{-} description logic concept. Every SROI^{-} concept is embedded as a cone in complex vector space, and each SROI^{-} relation is embedded as a transformation that rotates and scales cones. We show theoretically that AConE can learn SROI^{-} axioms, and defines an algebra whose operations correspond one to one to SROI^{-} description logic concept constructs. Our empirical study on multiple query datasets shows that AConE achieves superior results over previous baselines with fewer parameters. Notably on the WN18RR dataset, AConE achieves significant improvement over baseline models. We provide comprehensive analyses showing that the capability to represent axioms positively impacts the results of query answering.

Updated: 2024-07-12 12:20:39

标题: 通过知识图谱查询嵌入学习生成SROI^{-}本体论

摘要: Query embedding approaches通过计算和操作实体、关系和查询的低维向量表示，来回答知识图谱（KGs）上的复杂逻辑查询。然而，当前的查询嵌入模型严重依赖于过度参数化的神经网络，无法解释从图中学到的知识。我们提出了一种新颖的查询嵌入方法AConE，该方法以SROI^{-}描述逻辑公理的形式解释从图中学到的知识，同时比大多数现有方法更节省参数。AConE将查询关联到SROI^{-}描述逻辑概念。每个SROI^{-}概念在复杂向量空间中嵌入为一个锥体，每个SROI^{-}关系被嵌入为一个旋转和缩放锥体的变换。我们理论上证明AConE能够学习SROI^{-}公理，并定义了一个代数，其操作与SROI^{-}描述逻辑概念构造一一对应。我们在多个查询数据集上的实证研究表明，AConE相对于先前的基线模型取得了更优异的结果，同时使用更少的参数。特别是在WN18RR数据集上，AConE相对于基线模型取得了显著的改进。我们提供了全面的分析，展示了表示公理的能力对查询回答结果的积极影响。

更新时间: 2024-07-12 12:20:39

领域: cs.AI,cs.DB,cs.LG,cs.LO

下载: http://arxiv.org/abs/2407.09212v1

On the Design and Security of Collective Remote Attestation Protocols

Collective remote attestation (CRA) is a security service that aims to efficiently identify compromised (often low-powered) devices in a (heterogeneous) network. The last few years have seen an extensive growth in CRA protocol proposals, showing a variety of designs guided by different network topologies, hardware assumptions and other functional requirements. However, they differ in their trust assumptions, adversary models and role descriptions making it difficult to uniformly assess their security guarantees. In this paper we present Catt, a unifying framework for CRA protocols that enables them to be compared systematically, based on a comprehensive study of 40 CRA protocols and their adversary models. Catt characterises the roles that devices can take and based on these we develop a novel set of security properties for CRA protocols. We then classify the security aims of all the studied protocols. We illustrate the applicability of our security properties by encoding them in the tamarin prover and verifying the SIMPLE+ protocol against them.

Updated: 2024-07-12 12:06:49

标题: 关于集体远程验证协议的设计和安全性

摘要: 集体远程认证（CRA）是一种旨在高效识别网络中受损（通常是低功率）设备的安全服务。过去几年，CRA协议提案呈现了广泛增长，展示了根据不同网络拓扑、硬件假设和其他功能要求指导的各种设计。然而，它们在信任假设、对手模型和角色描述方面存在差异，使得难以统一评估它们的安全保证。本文提出了Catt，一个用于CRA协议的统一框架，能够基于对40个CRA协议及其对手模型的全面研究进行系统比较。Catt界定了设备可以扮演的角色，并基于此开发了一组新颖的CRA协议安全属性。然后，我们对所有研究的协议的安全目标进行了分类。我们通过在tamarin证明器中编码它们并验证SIMPLE+协议来说明我们安全属性的适用性。

更新时间: 2024-07-12 12:06:49

领域: cs.CR

下载: http://arxiv.org/abs/2407.09203v1

Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology

Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.

Updated: 2024-07-12 12:05:28

标题: 物理学意识的机器学习革命改变了基于过程的水文科学范式

摘要: 准确的水文理解和水循环预测对于解决与水资源管理相关的科学和社会挑战至关重要，特别是在人为气候变化的动态影响下。现有的综述主要集中在机器学习（ML）在这一领域的发展，然而，水文学和机器学习作为独立范式之间存在明显区别。在这里，我们介绍了物理感知机器学习作为一种变革性方法，以克服所认知的障碍并革新两个领域。具体而言，我们提出了物理感知机器学习方法的全面综述，构建了一个结构化社区（PaML），整合了先前的物理知识或基于物理的建模到机器学习中。我们系统分析了这些PaML方法，涉及四个方面：物理数据引导的机器学习、物理信息的机器学习、物理嵌入的机器学习和物理感知的混合学习。PaML促进了机器学习辅助的假设，加速了从大数据中获得见解并促进了科学发现。我们首先对PaML中的水文学进行了系统综述，包括降雨径流水文过程和水动力学过程，并突出了不同目标和PaML方法的最有前途和具有挑战性的方向。最后，发布了一个基于新PaML的水文平台，称为HydroPML，作为水文应用的基础。HydroPML增强了机器学习的可解释性和因果关系，并为数字水循环的实现奠定了基础。HydroPML平台公开可在https://hydropml.github.io/获取。

更新时间: 2024-07-12 12:05:28

领域: cs.LG,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2310.05227v5

Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review

Although highly valuable for a variety of applications, urban mobility data is rarely made openly available as it contains sensitive personal information. Synthetic data aims to solve this issue by generating artificial data that resembles an original dataset in structural and statistical characteristics, but omits sensitive information. For mobility data, a large number of corresponding models have been proposed in the last decade. This systematic review provides a structured comparative overview of the current state of this heterogeneous, active field of research. A special focus is put on the applicability of the reviewed models in practice.

Updated: 2024-07-12 11:54:29

标题: 合成城市交通数据的生成模型：系统文献综述

摘要: 尽管城市交通数据对各种应用非常有价值，但很少公开发布，因为其中包含敏感个人信息。合成数据旨在通过生成结构和统计特征类似于原始数据集的人工数据来解决这个问题，但省略了敏感信息。在过去十年中，针对移动数据提出了大量对应的模型。本系统性评论提供了对这个异质且活跃研究领域当前状况的结构化比较概述。重点关注的是所审查模型在实践中的适用性。

更新时间: 2024-07-12 11:54:29

领域: cs.CR

下载: http://arxiv.org/abs/2407.09198v1

A Scale-Invariant Diagnostic Approach Towards Understanding Dynamics of Deep Neural Networks

This paper introduces a scale-invariant methodology employing \textit{Fractal Geometry} to analyze and explain the nonlinear dynamics of complex connectionist systems. By leveraging architectural self-similarity in Deep Neural Networks (DNNs), we quantify fractal dimensions and \textit{roughness} to deeply understand their dynamics and enhance the quality of \textit{intrinsic} explanations. Our approach integrates principles from Chaos Theory to improve visualizations of fractal evolution and utilizes a Graph-Based Neural Network for reconstructing network topology. This strategy aims at advancing the \textit{intrinsic} explainability of connectionist Artificial Intelligence (AI) systems.

Updated: 2024-07-12 11:54:05

标题: 一个尺度不变的诊断方法用于理解深度神经网络的动态特性

摘要: 本文介绍了一种利用分形几何分析和解释复杂连接主义系统非线性动力学的尺度不变方法。通过利用深度神经网络（DNNs）中的建筑自相似性，我们量化分形维度和粗糙度，深入理解它们的动态并提高内在解释的质量。我们的方法整合了混沌理论原则，以改进分形演变的可视化，并利用基于图的神经网络重建网络拓扑。这一策略旨在推进连接主义人工智能系统的内在可解释性。

更新时间: 2024-07-12 11:54:05

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.09585v1

A Chatbot for Asylum-Seeking Migrants in Europe

We present ACME: A Chatbot for asylum-seeking Migrants in Europe. ACME relies on computational argumentation and aims to help migrants identify the highest level of protection they can apply for. This would contribute to a more sustainable migration by reducing the load on territorial commissions, Courts, and humanitarian organizations supporting asylum applicants. We describe the context, system architectures, technologies, and the case study used to run the demonstration.

Updated: 2024-07-12 11:53:40

标题: 一个面向欧洲寻求庇护移民的聊天机器人

摘要: 我们提出了ACME：一个为欧洲寻求庇护的移民设计的聊天机器人。ACME依赖于计算论证，并旨在帮助移民确定他们可以申请的最高保护级别。这将有助于通过减少对领土委员会、法院和支持庇护申请人的人道组织的负担，实现更可持续的移民。我们描述了背景、系统架构、技术和用于运行演示的案例研究。

更新时间: 2024-07-12 11:53:40

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.09197v1

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CAFE) learning strategy for PSG. Specifically, we incorporate shape-aware features (i.e., mask features and boundary features) into PSG, moving beyond reliance solely on bbox features. Furthermore, drawing inspiration from human cognition, we propose to integrate shape-aware features in an easy-to-hard manner. To achieve this, we categorize the predicates into three groups based on cognition learning difficulty and correspondingly divide the training process into three stages. Each stage utilizes a specialized relation classifier to distinguish specific groups of predicates. As the learning difficulty of predicates increases, these classifiers are equipped with features of ascending complexity. We also incorporate knowledge distillation to retain knowledge acquired in earlier stages. Due to its model-agnostic nature, CAFE can be seamlessly incorporated into any PSG model. Extensive experiments and ablations on two PSG tasks under both robust and zero-shot PSG have attested to the superiority and robustness of our proposed CAFE, which outperforms existing state-of-the-art methods by a large margin.

Updated: 2024-07-12 11:48:33

标题: 从简单到困难：学习课程形状感知特征以实现稳健的全景场景图生成

摘要: 全景场景图生成（PSG）旨在基于全景分割掩蔽生成全面的图结构表示。尽管在PSG方面取得了显著进展，几乎所有现有方法都忽视了形状感知特征的重要性，这些特征本质上关注对象的轮廓和边界。为了弥补这一差距，我们提出了一个与模型无关的逐步形状感知特征学习策略（CAFE）用于PSG。具体来说，我们将形状感知特征（即掩膜特征和边界特征）纳入PSG，超越了仅依赖于bbox特征的局限性。此外，受人类认知的启发，我们提出以易到难的方式集成形状感知特征。为了实现这一目标，我们根据认知学习难度将谓词分为三组，并相应地将训练过程分为三个阶段。每个阶段利用专门的关系分类器来区分特定组的谓词。随着谓词的学习难度增加，这些分类器将配备越来越复杂的特征。我们还采用知识蒸馏来保留在早期阶段获得的知识。由于其与模型无关的特性，CAFE可以无缝地集成到任何PSG模型中。在两个PSG任务的广泛实验和消融测试中，无论是在强健PSG还是零样本PSG下，都证明了我们提出的CAFE的优越性和稳健性，其性能明显优于现有的最先进方法。

更新时间: 2024-07-12 11:48:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.09191v1

Deep Learning Safety Concerns in Automated Driving Perception

Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.

Updated: 2024-07-12 11:46:08

标题: 深度学习在自动驾驶感知中的安全问题

摘要: 最近在深度学习领域取得的进展以及深度神经网络（DNNs）在感知方面表现出的出色性能，导致了对它们在自动驾驶系统中的使用需求增加。这类系统的安全性至关重要，因此需要考虑DNNs的独特属性。为了以系统性和全面的方式确保基于DNN感知组件的自动驾驶系统的安全性，所谓的安全关注点被引入作为一个合适的结构元素。一方面，安全关注点的概念在设计上与现有与AD系统安全相关的标准（如ISO 21448（SOTIF））很好地契合。另一方面，它已经启发了一些学术出版物以及即将发布的有关AI安全的标准，如ISO PAS 8800。尽管安全关注点的概念之前已经被引入，但本文扩展和完善了它，利用来自领域和安全专家的反馈。特别是，本文引入了一个额外的分类，以便更好地理解，并使跨职能团队共同解决这些关注点。

更新时间: 2024-07-12 11:46:08

领域: cs.LG,cs.CV,cs.SY,eess.SY

下载: http://arxiv.org/abs/2309.03774v3

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

Updated: 2024-07-12 11:44:33

标题: ROS-LLM：一个具有任务反馈和结构化推理的机器人操作系统框架

摘要: 我们提出了一个框架，用于由非专家进行直观机器人编程，利用来自机器人操作系统（ROS）的自然语言提示和上下文信息。我们的系统集成了大型语言模型（LLMs），使非专家可以通过聊天界面向系统表达任务要求。该框架的关键特点包括：将ROS与连接到大量开源和商业LLMs的AI代理集成在一起，从LLM输出中自动提取行为并执行ROS操作/服务，支持三种行为模式（序列、行为树、状态机），模仿学习以将新的机器人动作添加到可能动作库中，以及通过人类和环境反馈反映LLM。广泛的实验证实了该框架在各种场景中的稳健性、可扩展性和多功能性，包括长期任务、桌面重新排列和远程监督控制。为了促进我们框架的采用并支持我们结果的复制，我们已经将我们的代码开源。您可以在以下网址访问：https://github.com/huawei-noah/HEBO/tree/master/ROSLLM。

更新时间: 2024-07-12 11:44:33

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.19741v3

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy. Project page: https://chenliang-zhou.github.io/CLIP-PAE/.

Updated: 2024-07-12 11:44:29

标题: CLIP-PAE：投影增强嵌入以提取相关特征，用于解耦、可解释和可控的文本引导人脸操作

摘要: 最近引入的对比语言-图像预训练（CLIP）通过将图像和文本嵌入到共同的潜在空间中来桥接它们。这打开了一个广泛的文献领域，旨在通过提供文本解释来操纵输入图像。然而，由于联合空间中图像和文本嵌入之间的差异，将文本嵌入用作优化目标通常会在生成的图像中引入不希望的伪像。解耦、可解释性和可控性也很难保证操纵。为了减轻这些问题，我们提出定义由相关提示构成的语料子空间，以捕捉特定图像特征。我们引入CLIP投影增强嵌入（PAE）作为优化目标，以改善基于文本引导的图像操作的性能。我们的方法是一个简单而通用的范式，可以轻松计算和适应，并平稳地融入任何基于CLIP的图像操作算法中。为了展示我们方法的有效性，我们进行了几项理论和实证研究。作为案例研究，我们利用该方法进行了文本引导的语义面部编辑。我们定量和定性地展示，PAE促进了更为解耦、可解释和可控的图像操作，具有最先进的质量和准确性。项目页面：https://chenliang-zhou.github.io/CLIP-PAE/。

更新时间: 2024-07-12 11:44:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2210.03919v5

ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting

Class-agnostic counting methods enumerate objects of an arbitrary class, providing tremendous utility in many fields. Prior works have limited usefulness as they require either a set of examples of the type to be counted or that the query image contains only a single type of object. A significant factor in these shortcomings is the lack of a dataset to properly address counting in settings with more than one kind of object present. To address these issues, we propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A Blind Counter (ABC123), a method that can count multiple types of objects simultaneously without using examples of type during training or inference. ABC123 introduces a new paradigm where instead of requiring exemplars to guide the enumeration, examples are found after the counting stage to help a user understand the generated outputs. We show that ABC123 outperforms contemporary methods on MCAC without needing human in-the-loop annotations. We also show that this performance transfers to FSC-147, the standard class-agnostic counting dataset. MCAC is available at MCAC.active.vision and ABC123 is available at ABC123.active.vision.

Updated: 2024-07-12 11:41:33

标题: ABC易如123：用于无样本多类别类别不可知计数的盲计数器

摘要: 通用类别计数方法可以枚举任意类别的对象，在许多领域具有巨大的实用性。先前的研究在实用性上存在限制，因为它们要求要计数的类型的示例集，或者查询图像只包含单一类型的对象。这些缺陷的一个重要因素是缺乏数据集来正确处理存在多种对象的计数设置。为了解决这些问题，我们提出了第一个多类别、通用类别计数数据集（MCAC）和盲目计数器（ABC123），一种可以在训练或推断过程中同时计数多种类型的对象而无需使用类型示例的方法。ABC123引入了一种新的范式，不再需要示例来指导枚举，而是在计数阶段之后找到示例以帮助用户理解生成的输出。我们展示了ABC123在MCAC上优于当代方法，而无需人为介入的注释。我们还展示了这种性能转移到FSC-147，标准的通用类别计数数据集。MCAC可在MCAC.active.vision上获得，而ABC123可在ABC123.active.vision上获得。

更新时间: 2024-07-12 11:41:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2309.04820v2

Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings

Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms.

Updated: 2024-07-12 11:40:17

标题: 提高孟加拉语中抑郁帖子的检测能力：TF-IDF、BERT和FastText嵌入的比较研究

摘要: 由于社交媒体的大规模普及，通过社交媒体分析检测用户抑郁症具有重要意义，特别是对于少数语言，如孟加拉语。本研究引入了一种基于先进自然语言处理技术的方法，以识别孟加拉语中的抑郁社交媒体帖子。本研究使用领域专家标注的数据集，包括抑郁和非抑郁帖子，确保了模型训练和评估的高质量数据。为了解决类别不平衡的普遍问题，我们利用了少数类别的随机过采样，从而提高了模型准确检测抑郁帖子的能力。我们探索了各种数字表示技术，包括词频-逆文档频率（TF-IDF）、来自变压器的双向编码器表示（BERT）嵌入和FastText嵌入，将它们与基于深度学习的卷积神经网络-双向长短期记忆（CNN-BiLSTM）模型结合起来。通过广泛的实验得到的结果表明，BERT方法表现得比其他方法更好，实现了84%的F1分数。这表明，BERT与CNN-BiLSTM架构结合起来有效地识别了与孟加拉语文本相关的抑郁内容的微妙之处。与现有的最先进方法进行比较分析表明，我们采用的BERT嵌入方法在评估指标和数据集标注的可靠性方面表现优于其他方法。我们的研究对于开发可靠的工具来检测孟加拉语中的抑郁帖子具有重要贡献。通过突出不同嵌入技术和深度学习模型的有效性，本研究为通过社交媒体平台改善精神健康监测铺平了道路。

更新时间: 2024-07-12 11:40:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.09187v1

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction

The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: https://github.com/MingyuJ666/ProLLM.

Updated: 2024-07-12 11:38:56

标题: ProLLM: 蛋白链思维增强LLM用于蛋白质相互作用预测

摘要: 蛋白质相互作用（PPI）的预测对于理解生物功能和疾病至关重要。先前的机器学习方法主要集中在直接物理相互作用上，忽略了通过中间蛋白质的更广泛联系的背景，从而限制了它们的有效性。大型语言模型（LLMs）的出现为解决这一复杂生物学挑战提供了新机会。通过将结构化数据转化为自然语言提示，我们可以将蛋白质之间的关系映射到文本中。这种方法允许LLMs识别蛋白质之间的间接连接，追踪从上游到下游的路径。因此，我们提出了一个首次采用专为PPI定制的LLM的新框架ProLLM。具体而言，我们提出了蛋白质思维链（ProCoT），它将信号通路作为自然语言提示来复制生物机制。ProCoT将信号通路视为蛋白质推理过程，从上游蛋白质开始，通过多个中间蛋白质传递生物信号到下游蛋白质。因此，我们可以使用ProCoT来预测上游蛋白质和下游蛋白质之间的相互作用。ProLLM的训练采用ProCoT格式，增强了模型对复杂生物问题的理解。除了ProCoT外，本文还探讨了将蛋白质位点嵌入自然语言提示以及蛋白质知识数据集中的指令微调。我们通过严格验证与基准数据集来展示ProLLM的有效性，显示在预测准确性和泛化能力方面相对现有方法有显著改进。代码可在https://github.com/MingyuJ666/ProLLM找到。

更新时间: 2024-07-12 11:38:56

领域: q-bio.BM,cs.LG,q-bio.MN

下载: http://arxiv.org/abs/2405.06649v2

Variational Inference via Smoothed Particle Hydrodynamics

A new variational inference method, SPH-ParVI, based on smoothed particle hydrodynamics (SPH), is proposed for sampling partially known densities (e.g. up to a constant) or sampling using gradients. SPH-ParVI simulates the flow of a fluid under external effects driven by the target density; transient or steady state of the fluid approximates the target density. The continuum fluid is modelled as an interacting particle system (IPS) via SPH, where each particle carries smoothed properties, interacts and evolves as per the Navier-Stokes equations. This mesh-free, Lagrangian simulation method offers fast, flexible, scalable and deterministic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference and generative modelling.

Updated: 2024-07-12 11:38:41

标题: 通过平滑粒子流体动力学的变分推断

摘要: 一种基于平滑粒子流体动力学（SPH）的新变分推理方法SPH-ParVI被提出，用于对部分已知密度（例如常数）进行采样或使用梯度进行采样。SPH-ParVI模拟了受目标密度驱动的外部效应下流体的流动；流体的瞬态或稳态近似于目标密度。连续流体通过SPH建模为相互作用的粒子系统（IPS），其中每个粒子携带平滑的属性，根据Navier-Stokes方程进行相互作用和演化。这种无网格、拉格朗日模拟方法为一类概率模型（如贝叶斯推理和生成建模中遇到的模型）提供了快速、灵活、可扩展和确定性的采样和推理。

更新时间: 2024-07-12 11:38:41

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.09186v1

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.

Updated: 2024-07-12 11:16:44

标题: DART: 一种自动化的端到端目标检测流水线，具有数据多样化、开放词汇边界框注释、伪标签审核和模型训练

摘要: 快速准确地检测指定对象对许多工业应用至关重要，例如在建筑工地上进行安全监测。然而，传统方法往往依赖于繁琐的手动标注和数据收集，难以适应不断变化的环境和新颖的目标对象。为了解决这些限制，本文介绍了DART，一个自动化的端到端流水线，旨在简化从数据收集到模型部署的整个对象检测应用工作流程。DART消除了人工标注和大量数据收集的需求，同时在各种场景中表现出色。它利用一个基于主题的图像生成模块（DreamBooth with SDXL）进行数据多样化，然后通过一个开放词汇对象检测（Grounding DINO）注释阶段生成生成和原始图像的边界框注释。这些伪标签然后由一个大型多模型模型（GPT-4o）审查，以确保可信度，然后作为真实时间目标检测器（YOLO）的训练地面真相。我们将DART应用于一个名为Liebherr Product的自采集建筑机器数据集，包含23个类别的超过15K张高质量图像。目前的DART实施显著提高了平均精度（AP）从0.064到0.832。此外，我们采用了DART的模块化设计，以确保易于交换和扩展性。这使得未来更先进算法的平滑过渡，新对象类别的无需手动标注的无缝集成，以及适应定制环境而无需额外数据收集。代码和数据集发布在https://github.com/chen-xin-94/DART。

更新时间: 2024-07-12 11:16:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.09174v1

Conformal Inductive Graph Neural Networks

Conformal prediction (CP) transforms any model's output into prediction sets guaranteed to include (cover) the true label. CP requires exchangeability, a relaxation of the i.i.d. assumption, to obtain a valid distribution-free coverage guarantee. This makes it directly applicable to transductive node-classification. However, conventional CP cannot be applied in inductive settings due to the implicit shift in the (calibration) scores caused by message passing with the new nodes. We fix this issue for both cases of node and edge-exchangeable graphs, recovering the standard coverage guarantee without sacrificing statistical efficiency. We further prove that the guarantee holds independently of the prediction time, e.g. upon arrival of a new node/edge or at any subsequent moment.

Updated: 2024-07-12 11:12:49

标题: 共形归纳图神经网络

摘要: 共形预测（CP）将任何模型的输出转换为预测集，保证包含（覆盖）真实标签。CP需要可交换性，这是独立同分布假设的一种放松，以获得有效的无分布覆盖保证。这使得它直接适用于转导节点分类。然而，由于与新节点进行消息传递导致（校准）分数的隐式偏移，传统的CP不能应用于归纳设置。我们解决了节点和边可交换图的这个问题，恢复了标准的覆盖保证，而不损害统计效率。我们进一步证明了该保证独立于预测时间，例如，新节点/边到达时或任何随后的时刻。

更新时间: 2024-07-12 11:12:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.09173v1

CausalLP: Learning causal relations with weighted knowledge graph link prediction

Causal networks are useful in a wide variety of applications, from medical diagnosis to root-cause analysis in manufacturing. In practice, however, causal networks are often incomplete with missing causal relations. This paper presents a novel approach, called CausalLP, that formulates the issue of incomplete causal networks as a knowledge graph completion problem. More specifically, the task of finding new causal relations in an incomplete causal network is mapped to the task of knowledge graph link prediction. The use of knowledge graphs to represent causal relations enables the integration of external domain knowledge; and as an added complexity, the causal relations have weights representing the strength of the causal association between entities in the knowledge graph. Two primary tasks are supported by CausalLP: causal explanation and causal prediction. An evaluation of this approach uses a benchmark dataset of simulated videos for causal reasoning, CLEVRER-Humans, and compares the performance of multiple knowledge graph embedding algorithms. Two distinct dataset splitting approaches are used for evaluation: (1) random-based split, which is the method typically employed to evaluate link prediction algorithms, and (2) Markov-based split, a novel data split technique that utilizes the Markovian property of causal relations. Results show that using weighted causal relations improves causal link prediction over the baseline without weighted relations.

Updated: 2024-07-12 11:11:26

标题: CausalLP: 使用加权知识图链接预测学习因果关系

摘要: 因果网络在各种应用中都很有用，从医学诊断到制造业根本原因分析。然而，在实践中，因果网络经常是不完整的，缺少因果关系。本文提出了一种新的方法，称为CausalLP，将不完整的因果网络问题表述为知识图完整问题。更具体地说，在不完整的因果网络中找到新的因果关系的任务被映射到知识图链接预测的任务上。使用知识图表示因果关系使得可以整合外部领域知识；而作为额外的复杂性，因果关系具有表示知识图中实体之间因果关联强度的权重。CausalLP支持两个主要任务：因果解释和因果预测。对这种方法的评估使用了一个模拟视频因果推理的基准数据集CLEVRER-Humans，并比较了多种知识图嵌入算法的性能。评估使用了两种不同的数据集分割方法：（1）基于随机分割，这是评估链接预测算法通常采用的方法，以及（2）基于马尔可夫分割的新颖数据分割技术，利用了因果关系的马尔可夫特性。结果显示，使用加权因果关系比没有加权关系的基线提高了因果链接预测的性能。

更新时间: 2024-07-12 11:11:26

领域: cs.AI

下载: http://arxiv.org/abs/2405.02327v2

Machine Apophenia: The Kaleidoscopic Generation of Architectural Images

This study investigates the application of generative artificial intelligence in architectural design. We present a novel methodology that combines multiple neural networks to create an unsupervised and unmoderated stream of unique architectural images. Our approach is grounded in the conceptual framework called machine apophenia. We hypothesize that neural networks, trained on diverse human-generated data, internalize aesthetic preferences and tend to produce coherent designs even from random inputs. The methodology involves an iterative process of image generation, description, and refinement, resulting in captioned architectural postcards automatically shared on several social media platforms. Evaluation and ablation studies show the improvement both in technical and aesthetic metrics of resulting images on each step.

Updated: 2024-07-12 11:11:19

标题: 机器错觉：建筑图像的万花筒生成

摘要: 这项研究调查了生成人工智能在建筑设计中的应用。我们提出了一种新颖的方法论，结合了多个神经网络，以创建一个无监督和无调节的独特的建筑图像流。我们的方法基于称为机器错觉的概念框架。我们假设神经网络在多样化的人类生成数据上进行训练后，内化了审美偏好，并倾向于即使从随机输入中也能产生连贯的设计。该方法涉及图像生成、描述和细化的迭代过程，最终在几个社交媒体平台上自动分享带有标题的建筑明信片。评估和消融研究表明，在每个步骤上，生成图像的技术和审美指标都有所改善。

更新时间: 2024-07-12 11:11:19

领域: cs.AI,cs.CV,68T01, 68U05, 00A66, 00A67,I.2.1; I.3.3; J.5; H.5.1

下载: http://arxiv.org/abs/2407.09172v1

SE(3)-bi-equivariant Transformers for Point Cloud Assembly

Given a pair of point clouds, the goal of assembly is to recover a rigid transformation that aligns one point cloud to the other. This task is challenging because the point clouds may be non-overlapped, and they may have arbitrary initial positions. To address these difficulties, we propose a method, called SE(3)-bi-equivariant transformer (BITR), based on the SE(3)-bi-equivariance prior of the task: it guarantees that when the inputs are rigidly perturbed, the output will transform accordingly. Due to its equivariance property, BITR can not only handle non-overlapped PCs, but also guarantee robustness against initial positions. Specifically, BITR first extracts features of the inputs using a novel $SE(3) \times SE(3)$-transformer, and then projects the learned feature to group SE(3) as the output. Moreover, we theoretically show that swap and scale equivariances can be incorporated into BITR, thus it further guarantees stable performance under scaling and swapping the inputs. We experimentally show the effectiveness of BITR in practical tasks.

Updated: 2024-07-12 11:01:28

标题: SE(3)-双等变变压器用于点云组装

摘要: 给定一对点云，组装的目标是恢复一个刚性变换，将一个点云对齐到另一个点云。这个任务具有挑战性，因为点云可能不重叠，并且它们可能具有任意的初始位置。为了解决这些困难，我们提出了一种基于任务的SE(3)-双等变换器（BITR）的方法：它保证当输入被刚性扰动时，输出会相应变换。由于其等变性质，BITR不仅可以处理不重叠的点云，还可以保证对初始位置的鲁棒性。具体地，BITR首先使用一种新颖的$SE(3) \times SE(3)$-变换器提取输入的特征，然后将学习到的特征投影到SE(3)群作为输出。此外，我们在理论上展示了可以将交换和缩放等变性合并到BITR中，因此它进一步保证了在缩放和交换输入时稳定的性能。我们通过实验证明了BITR在实际任务中的有效性。

更新时间: 2024-07-12 11:01:28

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09167v1

Robust Yet Efficient Conformal Prediction Sets

Conformal prediction (CP) can convert any model's output into prediction sets guaranteed to include the true label with any user-specified probability. However, same as the model itself, CP is vulnerable to adversarial test examples (evasion) and perturbed calibration data (poisoning). We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels).

Updated: 2024-07-12 10:59:44

标题: 强大而高效的符合性预测集

摘要: 保守预测（CP）可以将任何模型的输出转换为预测集，保证以任何用户指定的概率包含真实标签。然而，与模型本身一样，CP容易受到对抗性测试示例（规避）和扰动的校准数据（中毒）的影响。我们通过限制一致性得分的最坏情况变化来推导出可靠的稳健集。我们更紧密的边界导致更有效的集。我们涵盖连续和离散（稀疏）数据，我们的保证同时适用于规避和中毒攻击（对特征和标签）。

更新时间: 2024-07-12 10:59:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09165v1

TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversarial attacks. The former could induce LLMs to respond to triggers to insert malicious code snippets by poisoning the training data or model parameters, while the latter can craft malicious adversarial input codes to reduce the quality of generated codes. However, both attack methods have underlying limitations: backdoor attacks rely on controlling the model training process, while adversarial attacks struggle with fulfilling specific malicious purposes. To inherit the advantages of both backdoor and adversarial attacks, this paper proposes a new attack paradigm, i.e., target-specific and adversarial prompt injection (TAPI), against Code LLMs. TAPI generates unreadable comments containing information about malicious instructions and hides them as triggers in the external source code. When users exploit Code LLMs to complete codes containing the trigger, the models will generate attacker-specified malicious code snippets at specific locations. We evaluate our TAPI attack on four representative LLMs under three representative malicious objectives and seven cases. The results show that our method is highly threatening (achieving an attack success rate of up to 89.3\%) and stealthy (saving an average of 53.1\% of tokens in the trigger design). In particular, we successfully attack some famous deployed code completion integrated applications, including CodeGeex and Github Copilot. This further confirms the realistic threat of our attack.

Updated: 2024-07-12 10:59:32

标题: TAPI：针对代码LLMs的目标特定和对抗性提示注入

摘要: 最近，面向代码的大型语言模型（Code LLMs）被广泛且成功地用于简化和促进代码编程。借助这些工具，开发人员可以根据不完整的代码和自然语言提示轻松生成所需的完整功能代码。然而，一些开创性的研究发现，这些Code LLMs也存在脆弱性，例如对后门和对抗性攻击。前者可以通过操纵训练数据或模型参数诱使LLMs响应触发器插入恶意代码片段，而后者可以制作恶意对抗输入代码以降低生成代码的质量。然而，这两种攻击方法都有潜在的局限性：后门攻击依赖于控制模型训练过程，而对抗性攻击则难以实现特定的恶意目的。为了继承后门和对抗性攻击的优势，本文提出了一种新的攻击范式，即面向目标和对抗性提示注入（TAPI），针对Code LLMs。TAPI生成包含有关恶意指令信息的不可读评论，并将其隐藏为外部源代码中的触发器。当用户利用Code LLMs完成包含触发器的代码时，模型将在特定位置生成攻击者指定的恶意代码片段。我们在四个代表性LLMs上对我们的TAPI攻击进行评估，涵盖三个代表性的恶意目标和七个案例。结果显示，我们的方法具有极高的威胁性（攻击成功率高达89.3\%）和隐秘性（在触发器设计中平均节省53.1\%的标记）。特别是，我们成功攻击了一些知名的已部署的代码完成集成应用程序，包括CodeGeex和Github Copilot。这进一步证实了我们攻击的现实威胁。

更新时间: 2024-07-12 10:59:32

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.09164v1

Exploring State Space and Reasoning by Elimination in Tsetlin Machine

The Tsetlin Machine (TM) has gained significant attention in Machine Learning (ML). By employing logical fundamentals, it facilitates pattern learning and representation, offering an alternative approach for developing comprehensible Artificial Intelligence (AI) with a specific focus on pattern classification in the form of conjunctive clauses. In the domain of Natural Language Processing (NLP), TM is utilised to construct word embedding and describe target words using clauses. To enhance the descriptive capacity of these clauses, we study the concept of Reasoning by Elimination (RbE) in clauses' formulation, which involves incorporating feature negations to provide a more comprehensive representation. In more detail, this paper employs the Tsetlin Machine Auto-Encoder (TM-AE) architecture to generate dense word vectors, aiming at capturing contextual information by extracting feature-dense vectors for a given vocabulary. Thereafter, the principle of RbE is explored to improve descriptivity and optimise the performance of the TM. Specifically, the specificity parameter s and the voting margin parameter T are leveraged to regulate feature distribution in the state space, resulting in a dense representation of information for each clause. In addition, we investigate the state spaces of TM-AE, especially for the forgotten/excluded features. Empirical investigations on artificially generated data, the IMDB dataset, and the 20 Newsgroups dataset showcase the robustness of the TM, with accuracy reaching 90.62\% for the IMDB.

Updated: 2024-07-12 10:58:01

标题: 在Tsetlin机器中探索状态空间和通过排除法推理

摘要: Tsetlin机器（TM）在机器学习（ML）领域引起了广泛关注。通过运用逻辑基础，它促进了模式学习和表示，为开发可理解的人工智能（AI）提供了一种替代方法，特别关注模式分类的连接子句形式。在自然语言处理（NLP）领域中，TM被用于构建词嵌入并使用子句描述目标词。为了增强这些子句的描述能力，我们研究了子句制定中的排除推理（RbE）概念，其中包括引入特征否定以提供更全面的表示。具体来说，本文采用Tsetlin机器自动编码器（TM-AE）架构生成密集词向量，旨在通过为给定词汇提取特征密集向量来捕获上下文信息。随后，探讨了RbE原则以提高描述性并优化TM的性能。具体地，利用特定性参数s和投票边界参数T来调节状态空间中的特征分布，为每个子句提供信息的密集表示。此外，我们对TM-AE的状态空间进行了研究，特别是对于被遗忘/排除的特征。对人工生成数据、IMDB数据集和20个新闻组数据集的实证研究展示了TM的稳健性，IMDB的准确率达到了90.62\%。

更新时间: 2024-07-12 10:58:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09162v1

Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion

Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi-modal movie recommendation system by extract features of the well designed posters for each movie and the narrative text description of the movie. This system uses the BERT model to extract the information of text modality, the ViT model applied to extract the information of poster/image modality, and the Transformer architecture for feature fusion of all modalities to predict users' preference. The integration of pre-trained foundational models with some smaller data sets in downstream applications capture multi-modal content features in a more comprehensive manner, thereby providing more accurate recommendations. The efficiency of the proof-of-concept model is verified by the standard benchmark problem the MovieLens 100K and 1M datasets. The prediction accuracy of user ratings is enhanced in comparison to the baseline algorithm, thereby demonstrating the potential of this cross-modal algorithm to be applied for movie or video recommendation.

Updated: 2024-07-12 10:44:51

标题: 通过多模态Transformer特征融合的电影推荐与海报关注

摘要: Pre-trained models在大型数据集中学习通用表示，可以被微调用于特定任务，显著减少训练时间。像生成式预训练变换器（GPT）、双向编码器变换器（BERT）、视觉变换器（ViT）这样的预训练模型已经成为当前机器学习研究的基石。本研究提出了一个多模态电影推荐系统，通过提取每部电影精心设计的海报和电影的叙述文本描述的特征。该系统使用BERT模型提取文本模态的信息，应用ViT模型提取海报/图像模态的信息，并使用Transformer架构对所有模态的特征进行融合以预测用户的偏好。在下游应用中将预训练基础模型与一些较小的数据集集成，以更全面地捕获多模态内容特征，从而提供更准确的推荐。通过标准基准问题MovieLens的100K和1M数据集验证了概念验证模型的效率。与基准算法相比，用户评分的预测准确性得到了提高，从而证明了这种跨模态算法在电影或视频推荐中的潜力。

更新时间: 2024-07-12 10:44:51

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09157v1

Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments

In this work, we tackle the limitations of current LiDAR-based 3D object detection systems, which are hindered by a restricted class vocabulary and the high costs associated with annotating new object classes. Our exploration of open-vocabulary (OV) learning in urban environments aims to capture novel instances using pre-trained vision-language models (VLMs) with multi-sensor data. We design and benchmark a set of four potential solutions as baselines, categorizing them into either top-down or bottom-up approaches based on their input data strategies. While effective, these methods exhibit certain limitations, such as missing novel objects in 3D box estimation or applying rigorous priors, leading to biases towards objects near the camera or of rectangular geometries. To overcome these limitations, we introduce a universal \textsc{Find n' Propagate} approach for 3D OV tasks, aimed at maximizing the recall of novel objects and propagating this detection capability to more distant areas thereby progressively capturing more. In particular, we utilize a greedy box seeker to search against 3D novel boxes of varying orientations and depth in each generated frustum and ensure the reliability of newly identified boxes by cross alignment and density ranker. Additionally, the inherent bias towards camera-proximal objects is alleviated by the proposed remote simulator, which randomly diversifies pseudo-labeled novel instances in the self-training process, combined with the fusion of base samples in the memory bank. Extensive experiments demonstrate a 53% improvement in novel recall across diverse OV settings, VLMs, and 3D detectors. Notably, we achieve up to a 3.97-fold increase in Average Precision (AP) for novel object classes. The source code is made available at https://github.com/djamahl99/findnpropagate.

Updated: 2024-07-12 10:42:30

标题: 查找和传播：城市环境中的开放词汇3D物体检测

摘要: 在这项工作中，我们解决了当前基于LiDAR的3D目标检测系统存在的局限性，这些系统受限于受限的类别词汇和标注新目标类别的高成本。我们在城市环境中探索开放词汇学习（OV learning），旨在利用预先训练的视觉-语言模型（VLMs）和多传感器数据捕获新颖实例。我们设计并基准测试了四种潜在解决方案作为基线，根据它们的输入数据策略将它们分类为自顶向下或自底向上方法。虽然这些方法有效，但存在一定局限性，例如在3D框估计中缺失新颖对象或应用严格的先验知识，导致对靠近相机或具有矩形几何形状的对象的偏见。为了克服这些局限性，我们引入了一种针对3D OV任务的通用“查找 n' 传播”方法，旨在最大化新颖对象的召回率，并将这种检测能力传播到更远的区域，从而逐步捕获更多。具体来说，我们利用贪婪框搜索器在每个生成的视锥体中搜索不同方向和深度的3D新颖框，并通过交叉对齐和密度排名器确保新识别框的可靠性。此外，通过提出的远程模拟器，可以减轻对靠近相机的对象的固有偏见，该模拟器在自我训练过程中随机使伪标记的新颖实例多样化，同时与内存库中的基本样本融合。广泛的实验证明，在各种OV设置、VLMs和3D检测器中，新颖召回率提高了53%。值得注意的是，对于新颖目标类别，我们的平均精度（AP）增加了高达3.97倍。源代码可在https://github.com/djamahl99/findnpropagate 上找到。

更新时间: 2024-07-12 10:42:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.13556v2

Tree Ensembles for Contextual Bandits

We propose a novel framework for contextual multi-armed bandits based on tree ensembles. Our framework integrates two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. We demonstrate the effectiveness of our framework via several experimental studies, employing both XGBoost and random forest, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.

Updated: 2024-07-12 10:40:08

标题: 树集成用于情境匪徒

摘要: 我们提出了一个基于树集成的上下文多臂赌博机的新框架。我们的框架整合了两种广泛使用的赌博机方法，即上限置信界和汤普森抽样，适用于标准和组合设置。通过几项实验研究，我们证明了我们的框架的有效性，使用了两种流行的树集成方法XGBoost和随机森林。与基于决策树和神经网络的最先进方法相比，我们的方法在遗憾最小化和计算运行时间方面表现出更好的性能，适用于基准数据集和在道路网络上导航的真实世界应用。

更新时间: 2024-07-12 10:40:08

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.06963v2

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.

Updated: 2024-07-12 10:39:28

标题: 朝向跨多个模型统一任务嵌入：为基于提示的大型语言模型及其他模型之间的鸿沟构建桥梁

摘要: 任务嵌入是一种捕捉特定任务信息的元学习技术，尤其在多任务学习、模型编辑和可解释性等领域备受青睐。然而，随着以无梯度方式操作的提示引导的大型语言模型（LLMs）的出现，它面临挑战。现有的任务嵌入方法依赖于经过微调的特定任务语言模型，这限制了任务嵌入在不同模型之间的适应性，尤其是基于提示的LLMs。为了发挥LLMs时代任务嵌入的潜力，我们提出了统一任务嵌入框架（FUTE），将来自各种模型的任务嵌入（包括较小的语言模型和带有不同提示的LLMs）协调在一个单一的向量空间中。这种统一性使得可以比较和分析不同模型之间的相似性，扩大了现有任务嵌入方法在多模型场景中的范围和实用性，同时保持其性能与特定架构方法可比。

更新时间: 2024-07-12 10:39:28

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.14522v2

The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Hallucination detection in Large Language Models (LLMs) is crucial for ensuring their reliability. This work presents our participation in the CLEF ELOQUENT HalluciGen shared task, where the goal is to develop evaluators for both generating and detecting hallucinated content. We explored the capabilities of four LLMs: Llama 3, Gemma, GPT-3.5 Turbo, and GPT-4, for this purpose. We also employed ensemble majority voting to incorporate all four models for the detection task. The results provide valuable insights into the strengths and weaknesses of these LLMs in handling hallucination generation and detection tasks.

Updated: 2024-07-12 10:34:46

标题: 硬币的两面：以LLMs为评估者的幻觉生成和检测

摘要: 大型语言模型（LLMs）中的幻觉检测对于确保它们的可靠性至关重要。本文介绍了我们参与CLEF ELOQUENT HalluciGen共享任务的工作，其目标是开发评估器，用于生成和检测幻觉内容。我们探索了四种LLMs的能力：Llama 3、Gemma、GPT-3.5 Turbo和GPT-4，以此为目的。我们还采用了集成多数投票的方法，将所有四个模型纳入检测任务中。结果为我们提供了关于这些LLMs在处理幻觉生成和检测任务中的优势和劣势的宝贵见解。

更新时间: 2024-07-12 10:34:46

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.09152v1

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Efficient management of GPU memory is essential for high throughput LLM inference. Prior systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity due to internal fragmentation. Inspired by demand paging, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation and improves serving throughout. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. As a consequence, one needs to rewrite the attention kernels to support paging, and implement a memory manager in the serving framework. This results in both performance and programming overheads, as well as portability challenges in adopting state-of-the-art attention kernels. In this paper, we propose vAttention, a new approach for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention stores KV-cache in contiguous virtual memory and leverages OS support for on-demand allocation of physical memory. vAttention thus enables one to use state-of-the art attention kernels out-of-the-box by adding support for dynamic allocation of physical memory without having to re-write their code. We implement vAttention in the vLLM serving stack to show that it also helps improve decode throughput by up to 1.99x over vLLM, and the end-to-end serving throughput by up to 1.22x and 1.29x, compared to using the state-of-the-art PagedAttention based kernels of FlashAttention and FlashInfer.

Updated: 2024-07-12 10:33:31

标题: vAttention: 为无分页注意力服务的动态内存管理

摘要: GPU内存的高效管理对于高吞吐量的LLM推断至关重要。先前的系统通常提前保留KV-cache内存，导致由于内部碎片而浪费了容量。受需求分页的启发，vLLM提出了PagedAttention，以实现对KV-cache的动态内存分配。这种方法消除了碎片，并提高了服务吞吐量。然而，为了能够动态分配物理内存，PagedAttention将KV-cache的布局从连续的虚拟内存更改为非连续的虚拟内存。因此，需要重新编写注意力内核以支持分页，并在服务框架中实现内存管理器。这导致了性能和编程开销，以及在采用最先进的注意内核时的可移植性挑战。在本文中，我们提出了vAttention，一种用于动态KV-cache内存管理的新方法。与PagedAttention相比，vAttention将KV-cache存储在连续的虚拟内存中，并利用操作系统支持的按需分配物理内存。因此，vAttention使得可以通过添加对物理内存动态分配的支持而无需重写其代码即可使用最先进的注意内核。我们在vLLM服务堆栈中实现了vAttention，以展示它还可以帮助提高解码吞吐量高达1.99倍，以及相比于使用基于最先进的PagedAttention的内核FlashAttention和FlashInfer，端到端服务吞吐量提高了1.22倍和1.29倍。

更新时间: 2024-07-12 10:33:31

领域: cs.LG,cs.OS

下载: http://arxiv.org/abs/2405.04437v2

Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off

Machine learning models are vulnerable to tiny adversarial input perturbations optimized to cause a very large output error. To measure this vulnerability, we need reliable methods that can find such adversarial perturbations. For image classification models, evaluation methodologies have emerged that have stood the test of time. However, we argue that in the area of semantic segmentation, a good approximation of the sensitivity to adversarial perturbations requires significantly more effort than what is currently considered satisfactory. To support this claim, we re-evaluate a number of well-known robust segmentation models in an extensive empirical study. We propose new attacks and combine them with the strongest attacks available in the literature. We also analyze the sensitivity of the models in fine detail. The results indicate that most of the state-of-the-art models have a dramatically larger sensitivity to adversarial perturbations than previously reported. We also demonstrate a size-bias: small objects are often more easily attacked, even if the large objects are robust, a phenomenon not revealed by current evaluation metrics. Our results also demonstrate that a diverse set of strong attacks is necessary, because different models are often vulnerable to different attacks.

Updated: 2024-07-12 10:32:53

标题: 评估语义分割的对抗鲁棒性：更加努力值得

摘要: 机器学习模型容易受到微小的对抗性输入扰动的影响，这些扰动被优化为引起非常大的输出误差。为了衡量这种脆弱性，我们需要可靠的方法来找到这种对抗性扰动。对于图像分类模型，已经出现了经得住时间考验的评估方法。然而，我们认为在语义分割领域，对对抗性扰动的敏感性的良好近似需要比目前被认为令人满意的努力更多。为了支持这一观点，我们在一项广泛的实证研究中重新评估了许多知名的鲁棒分割模型。我们提出了新的攻击方法，并将它们与文献中最强大的攻击方法相结合。我们还对模型的敏感性进行了详细分析。结果表明，大多数最先进的模型对对抗性扰动的敏感性比以前报道的要大得多。我们还展示了一个尺寸偏差：小物体通常更容易受到攻击，即使大物体是健壮的，这一现象并未被当前的评估指标揭示。我们的结果还表明，需要一系列强大的攻击方法，因为不同的模型往往对不同的攻击方法易受攻击。

更新时间: 2024-07-12 10:32:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09150v1

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we propose that the most basic syntactic operations can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary properties of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution from raw acoustic inputs.

Updated: 2024-07-12 10:30:23

标题: 从语音中获取的基本句法：无监督深度神经网络中的自发串联

摘要: 句法的计算模型主要是基于文本的。在这里，我们提出最基本的句法操作可以直接从原始语音中以完全无监督的方式建模。我们关注句法最普遍和基本的特性之一--连接。我们介绍了自发连接：一个现象，卷积神经网络(CNNs)在单词的声音记录上训练后开始生成输出，其中两个甚至三个单词被连接在一起，而没有访问包含多个单词的数据。我们在几个独立训练的具有不同超参数和训练数据的模型中复制了这一发现。此外，训练两个单词的网络学习将单词嵌入到新颖的未观察到的单词组合中。据我们所知，这是在原始语音上在ciwGAN/fiwGAN设置下训练的CNNs以前未报道的特性，并对我们理解这些体系结构如何学习以及对从原始声学输入建模句法及其演变的影响。

更新时间: 2024-07-12 10:30:23

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2305.01626v2

Securing Confidential Data For Distributed Software Development Teams: Encrypted Container File

In the context of modern software engineering, there is a trend towards Cloud-native software development involving international teams with members from all over the world. Cloud-based version management services like GitHub are commonly used for source code and other files. However, a challenge arises when developers from different companies or organizations share the platform, as sensitive data should be encrypted to restrict access to certain developers only. This paper discusses existing tools addressing this issue, highlighting their shortcomings. The authors propose their own solution, Encrypted Container Files, designed to overcome the deficiencies observed in other tools.

Updated: 2024-07-12 10:19:49

标题: 保护分布式软件开发团队的机密数据：加密容器文件

摘要: 在现代软件工程的背景下，有一种趋势是云原生软件开发，涉及来自世界各地的国际团队的成员。像GitHub这样的基于云的版本管理服务通常用于源代码和其他文件。然而，当来自不同公司或组织的开发人员共享平台时，会出现一个挑战，即敏感数据应该加密以限制只有某些开发人员可以访问。本文讨论了解决这个问题的现有工具，突出它们的不足之处。作者提出了他们自己的解决方案，即加密容器文件，旨在克服其他工具中观察到的缺陷。

更新时间: 2024-07-12 10:19:49

领域: cs.CR,cs.DC,cs.SE

下载: http://arxiv.org/abs/2407.09142v1

Accuracy is Not All You Need

When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.

Updated: 2024-07-12 10:19:02

标题: 准确性并非唯一所需

摘要: 当使用诸如量化等技术对大型语言模型（LLMs）进行压缩时，证明这些技术的有效性的主要方法是通过在各种基准测试上测量模型的准确性。如果基线模型和压缩模型的准确性接近，那么可以假设质量的下降是微不足道的。然而，即使基线模型和压缩模型的准确性相似，我们观察到翻转现象，即答案从正确变为不正确，反之亦然。我们对多种压缩技术、模型和数据集中的指标进行了详细研究，表明压缩模型对终端用户的行为通常与基线模型明显不同，即使准确性相似。我们进一步使用MT-Bench对压缩模型进行了定性和定量评估，并展示在这个自由形式生成任务中，压缩模型明显比基线模型差。因此，我们认为压缩技术也应该使用距离指标进行评估。我们提出了两种这样的指标，KL散度和翻转，并展示它们之间有很好的相关性。

更新时间: 2024-07-12 10:19:02

领域: cs.LG

下载: http://arxiv.org/abs/2407.09141v1

Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review

In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 83 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subtyping (23/83), grading (27/83), molecular marker prediction (20/83), and survival prediction (29/83). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (52/83) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (16/83) or in addition to the TCGA datasets (15/83). Current approaches mostly rely on convolutional neural networks (63/83) for analyzing tissue at 20x magnification (35/83). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (29/83). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability.

Updated: 2024-07-12 10:16:55

标题: 《人工智能在胶质瘤组织病理学图像分析中的应用：综述》

摘要: 近年来，胶质瘤的诊断变得日益复杂。利用人工智能（AI）分析胶质瘤组织病理学图像为诊断和结果预测提供了新的机会。为了概述当前研究的现状，本综述研究了83项公开可用的研究，这些研究提出了基于AI的方法，用于人类胶质瘤的全切片组织病理学图像，涵盖了亚型分类（23/83）、分级（27/83）、分子标记预测（20/83）和生存预测（29/83）的诊断任务。所有研究都在方法学方面和临床适用性方面进行了审查。发现当前研究的重点是对成人型弥漫性胶质瘤的伊红和伊红组织切片进行评估。大多数研究（52/83）基于癌症基因组图谱（TCGA）中公开可用的胶质母细胞瘤和低级别胶质瘤数据集，只有少数研究单独使用其他数据集（16/83）或与TCGA数据集一起使用（15/83）。当前方法主要依赖于卷积神经网络（63/83）对20倍放大的组织进行分析（35/83）。一个新的研究领域是整合临床数据，组学数据或磁共振成像（29/83）。到目前为止，基于AI的方法取得了令人鼓舞的结果，但尚未在真实临床环境中使用。未来的工作应该集中在对更大规模、多中心数据集上的独立验证，以高质量和最新的临床和分子病理学注释来展示常规适用性。

更新时间: 2024-07-12 10:16:55

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.15022v4

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines.

Updated: 2024-07-12 10:11:40

标题: 逐步验证和纠正学生推理错误的大型语言模型辅导员

摘要: 大型语言模型（LLMs）为向所有人提供高质量个性化教育提供了机会。朝着这个目标的一个有前途的方法是构建对话辅导模型，支持学生的问题解决过程。然而，尽管现有的LLMs在解决推理问题方面表现良好，它们却很难精确地检测学生的错误并根据这些错误量身定制反馈。受到现实世界教学实践的启发，教师会识别学生的错误并根据它们自定义回应，我们专注于验证学生解决方案，并展示如何通过这种验证来提高辅导响应生成的整体质量。我们收集了一个包含1K个数学推理链的数据集，其中教师标注了第一个错误步骤。我们通过实证方法表明，对于当前模型来说，找出学生解决方案中的错误是具有挑战性的。我们提出并评估了几种用于检测这些错误的验证器。通过自动和人工评估，我们展示了学生解决方案验证器将生成模型引导至对学生错误更有针对性的回应，这些回应更常正确且比现有基准线产生更少的幻觉。

更新时间: 2024-07-12 10:11:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09136v1

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......

Updated: 2024-07-12 09:58:10

标题: 大型模型的参数高效微调：全面调查

摘要: 大型模型代表着多个应用领域的突破性进展，使各种任务取得了显著的成就。然而，它们前所未有的规模带来了巨大的计算成本。这些模型通常由数十亿个参数组成，需要大量的计算资源来执行。特别是，庞大的规模和计算需求在将它们定制为特定下游任务时带来了相当大的挑战，特别是在计算能力受限的硬件平台上。参数高效调整（PEFT）通过有效地调整大型模型以适应各种下游任务提供了一个实用的解决方案。具体而言，PEFT指的是调整预训练大型模型的参数，以适应特定任务或领域，同时最小化引入的附加参数数量或所需的计算资源。当处理具有高参数数量的大规模语言模型时，这种方法尤为重要，因为从头开始微调这些模型可能在计算方面昂贵且资源密集，对支持系统平台设计提出了相当大的挑战。在这项调查中，我们对各种PEFT算法进行了全面研究，考察它们的性能和计算开销。此外，我们还概述了使用不同PEFT算法开发的应用程序，并讨论了用于减少PEFT计算成本的常见技术。除了从算法角度提供广泛的调查外，我们还研究了各种真实世界的系统设计，以调查与不同PEFT方法相关的实现成本。这项调查为那些希望了解PEFT算法及其系统实施的研究人员提供了一个必不可少的资源，提供详细的......

更新时间: 2024-07-12 09:58:10

领域: cs.LG

下载: http://arxiv.org/abs/2403.14608v6

Robustness of Explainable Artificial Intelligence in Industrial Process Modelling

eXplainable Artificial Intelligence (XAI) aims at providing understandable explanations of black box models. In this paper, we evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To this end, we used an Electric Arc Furnace (EAF) model to better understand the limits and robustness characteristics of XAI methods such as SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), as well as Averaged Local Effects (ALE) or Smooth Gradients (SG) in a highly topical setting. These XAI methods were applied to various types of black-box models and then scored based on their correctness compared to the ground-truth sensitivity of the data-generating processes using a novel scoring evaluation methodology over a range of simulated additive noise. The resulting evaluation shows that the capability of the Machine Learning (ML) models to capture the process accurately is, indeed, coupled with the correctness of the explainability of the underlying data-generating process. We furthermore show the differences between XAI methods in their ability to correctly predict the true sensitivity of the modeled industrial process.

Updated: 2024-07-12 09:46:26

标题: 可解释人工智能在工业过程建模中的鲁棒性

摘要: 可解释人工智能（XAI）旨在提供对黑匣子模型的可理解解释。本文通过对基于真实模拟和敏感性分析的得分来评估当前的XAI方法。为此，我们使用电弧炉（EAF）模型来更好地了解XAI方法（如SHapley Additive exPlanations（SHAP），本地可解释模型不可知解释（LIME），以及平均本地效应（ALE）或平滑梯度（SG））的局限性和稳健特征在一个高度时尚的环境中。这些XAI方法被应用于各种类型的黑匣子模型，然后根据它们相对于数据生成过程的基本真实敏感性的正确性得分，使用一种新颖的评分评估方法在一系列模拟添加噪声。结果评估表明，机器学习（ML）模型捕捉过程的准确性的能力确实与解释底层数据生成过程的正确性相结合。我们进一步展示了XAI方法在正确预测建模工业过程的真实敏感性方面的差异。

更新时间: 2024-07-12 09:46:26

领域: cs.LG

下载: http://arxiv.org/abs/2407.09127v1

On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance

A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): down-scaling, (2): epsilon-embedding, and (3): down-translating. The resulting algorithm is referred to as SET-Adam, where SET is a brief notation of the three operations. The down-scaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding all-one subvectors. Extensive experimental results show that SET-Adam outperforms eight adaptive optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the eight adaptive methods when training WGAN-GP models for image generation tasks. Furthermore, SET-Adam produces higher validation accuracies than Adam and AdaBelief for training ResNet18 over ImageNet.

Updated: 2024-07-12 09:46:14

标题: 关于抑制Adam自适应步长范围以提高泛化性能的研究

摘要: 最近一些自适应优化器通过基本上减少自适应步长的方差来改善Adam的泛化性能，以更接近带有动量的SGD。在上述动机的基础上，我们通过利用层次梯度统计来抑制Adam的自适应步长范围。具体而言，在每次迭代中，我们建议在使用第二动量v_t更新DNN模型之前对其进行三个连续操作：（1）：缩小比例，（2）：epsilon嵌入，以及（3）：向下平移。得到的算法被称为SET-Adam，其中SET是三个操作的简要符号。对v_t进行的缩小比例操作是通过利用v_t的层次子向量与相应的全1子向量之间的夹角来逐层执行的。大量实验结果表明，当训练NLP的transformers和LSTMs以及用于图像分类的VGG和ResNet时，SET-Adam在CIAF10和CIFAR100上优于八种自适应优化器，并且在训练用于图像生成任务的WGAN-GP模型时与八种自适应方法的最佳性能相匹配。此外，SET-Adam在训练ResNet18进行ImageNet时产生比Adam和AdaBelief更高的验证准确性。

更新时间: 2024-07-12 09:46:14

领域: cs.LG

下载: http://arxiv.org/abs/2302.01029v3

Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.

Updated: 2024-07-12 09:38:47

标题: 使用集群同步激光网络的去中心化多智能体强化学习算法

摘要: 多智能体强化学习（MARL）研究了适用于各种领域的关键原则，包括无线网络和自动驾驶。我们提出了一种基于光子的决策算法，用于解决MARL中最基本的问题之一，即竞争性多臂老虎机（CMAB）问题。我们的数值模拟表明，光学耦合激光器的混沌振荡和集群同步，以及我们提出的去中心化耦合调整，有效地平衡了勘探和开发，同时促进了合作决策，而无需明确共享智能体之间的信息。我们的研究展示了如何通过利用简单算法控制的复杂物理过程来实现去中心化强化学习。

更新时间: 2024-07-12 09:38:47

领域: cs.LG,cs.MA,nlin.CD,physics.optics

下载: http://arxiv.org/abs/2407.09124v1

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa.

Updated: 2024-07-12 09:36:33

标题: 在您感到不安全时拒绝：通过分离的拒绝训练提高LLMs的安全性

摘要: 这项研究解决了大型语言模型（LLMs）安全调整实践中的一个关键问题，即识别和解决拒绝位置偏见在安全调整数据中的问题，这会影响模型拒绝生成不安全内容的能力。我们引入了一种新颖的方法，称为Decoupled Refusal Training（DeRTa），旨在赋予LLMs拒绝生成有害提示的能力，显著增强其安全性能。DeRTa包括两个新颖的组件：（1）带有有害响应前缀的最大似然估计（MLE），通过在安全响应的开头添加一个有害响应段，训练模型识别和避免不安全内容；（2）强化过渡优化（RTO），使模型能够在有害响应序列中始终从潜在危害过渡到安全拒绝。我们使用LLaMA3和Mistral模型系列在六种攻击场景下进行了实证评估，结果表明我们的方法不仅提高了模型的安全性而且不损害性能，还超过了著名模型如GPT-4在防御攻击方面的表现。重要的是，我们的方法成功防御了最近先进的攻击方法（例如CodeAttack），这些方法已经破解了GPT-4和LLaMA3-70B-Instruct。我们的代码和数据可以在https://github.com/RobustNLP/DeRTa找到。

更新时间: 2024-07-12 09:36:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.09121v1

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusing only on consensus between views or provide unreliable recovered views due to the absence of supervision. To address these limitations, we propose a novel Unified and Robust Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC). URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples. Firstly, to overcome the limitations of cross-view contrastive learning, URRL-IMVC incorporates an attention-based auto-encoder framework to fuse multi-view information and generate unified embeddings. Secondly, URRL-IMVC directly enhances the robustness of the unified embedding against view-missing conditions through KNN imputation and data augmentation techniques, eliminating the need for explicit missing view recovery. Finally, incremental improvements are introduced to further enhance the overall performance, such as the Clustering Module and the customization of the Encoder. We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance. Furthermore, comprehensive ablation studies are performed to validate the effectiveness of our design.

Updated: 2024-07-12 09:35:25

标题: URRL-IMVC：不完整多视角聚类的统一和稳健表示学习

摘要: Incomplete multi-view clustering (IMVC)旨在对仅部分可用的多视图数据进行聚类。这带来了两个主要挑战：有效利用多视图信息和减轻缺失视图的影响。现有解决方案采用跨视图对比学习和缺失视图恢复技术。然而，它们要么忽视了有价值的互补信息，只关注视图之间的一致性，要么提供了不可靠的恢复视图，因为缺乏监督。为了解决这些限制，我们提出了一种新颖的不完全多视图聚类的统一和稳健表示学习（URRL-IMVC）。URRL-IMVC通过整合多个视图和相邻样本的信息，直接学习出一个对视图缺失条件具有稳健性的统一嵌入。首先，为了克服跨视图对比学习的局限性，URRL-IMVC将基于注意力的自编码器框架集成到其中，以融合多视图信息并生成统一嵌入。其次，URRL-IMVC通过KNN插值和数据增强技术直接增强了对缺失视图条件下统一嵌入的稳健性，消除了对显式缺失视图恢复的需求。最后，引入了渐进式改进，进一步提高了整体性能，如聚类模块和编码器的定制化。我们在各种基准数据集上广泛评估了提出的URRL-IMVC框架，展示了其最先进的性能。此外，进行了全面的消融研究以验证我们设计的有效性。

更新时间: 2024-07-12 09:35:25

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2407.09120v1

DISTINQT: A Distributed Privacy Aware Learning Framework for QoS Prediction for Future Mobile and Wireless Networks

Beyond 5G and 6G networks are expected to support new and challenging use cases and applications that depend on a certain level of Quality of Service (QoS) to operate smoothly. Predicting the QoS in a timely manner is of high importance, especially for safety-critical applications as in the case of vehicular communications. Although until recent years the QoS prediction has been carried out by centralized Artificial Intelligence (AI) solutions, a number of privacy, computational, and operational concerns have emerged. Alternative solutions have surfaced (e.g. Split Learning, Federated Learning), distributing AI tasks of reduced complexity across nodes, while preserving the privacy of the data. However, new challenges rise when it comes to scalable distributed learning approaches, taking into account the heterogeneous nature of future wireless networks. The current work proposes DISTINQT, a novel multi-headed input privacy-aware distributed learning framework for QoS prediction. Our framework supports multiple heterogeneous nodes, in terms of data types and model architectures, by sharing computations across them. This enables the incorporation of diverse knowledge into a sole learning process that will enhance the robustness and generalization capabilities of the final QoS prediction model. DISTINQT also contributes to data privacy preservation by encoding any raw input data into highly complex, compressed, and irreversible latent representations before any transmission. Evaluation results showcase that DISTINQT achieves a statistically identical performance compared to its centralized version, while also proving the validity of the privacy preserving claims. DISTINQT manages to achieve a reduction in prediction error of up to 65% on average against six state-of-the-art centralized baseline solutions presented in the Tele-Operated Driving use case.

Updated: 2024-07-12 09:27:57

标题: DISTINQT：面向隐私的分布式学习框架，用于未来移动和无线网络的QoS预测

摘要: 超越5G和6G网络预计将支持新的和具有挑战性的用例和应用程序，这些应用程序依赖于一定水平的服务质量（QoS）才能顺利运行。及时预测QoS具有极高的重要性，特别是对于安全关键应用，如车载通信。尽管直到最近年来，QoS预测一直由集中式人工智能（AI）解决方案执行，但出现了一些隐私、计算和运营方面的问题。出现了替代解决方案（例如分布式学习、联邦学习），在节点之间分配简化的AI任务，同时保护数据的隐私。然而，在考虑未来无线网络的异构性时，采用可扩展的分布式学习方法时会面临新的挑战。当前工作提出了DISTINQT，一种新颖的多头输入隐私意识分布式学习框架，用于QoS预测。我们的框架支持多个异构节点，包括数据类型和模型架构，通过在节点之间共享计算，实现了多样化知识融入单一学习过程，从而增强了最终QoS预测模型的鲁棒性和泛化能力。DISTINQT还通过将任何原始输入数据编码为高度复杂、压缩且不可逆的潜在表示形式，在任何传输之前进行数据隐私保护。评估结果显示，DISTINQT在统计上与其集中式版本性能相同，同时也证明了隐私保护声明的有效性。DISTINQT在Tele-Operated Driving用例中，相对于六种最先进的集中式基线解决方案，平均实现了高达65%的预测误差降低。

更新时间: 2024-07-12 09:27:57

领域: cs.NI,cs.AI,cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2401.10158v2

Inference Optimization of Foundation Models on AI Accelerators

Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.

Updated: 2024-07-12 09:24:34

标题: 在AI加速器上基础模型的推理优化

摘要: 强大的基础模型，包括具有Transformer架构的大型语言模型（LLMs），已经开启了生成式人工智能在各行各业中的新时代。工业界和研究界见证了大量基于这些基础模型的新应用的出现。这些应用包括问答、客户服务、图像和视频生成以及代码补全等。然而，随着模型参数数量达到数百亿，它们在实际场景中的部署导致了昂贵的推理成本和高延迟。因此，对于使用人工智能加速器进行成本效益和快速推理的需求愈发迫切。为此，我们的教程提供了关于使用人工智能加速器的互补推理优化技术的全面讨论。从基本Transformer架构和深度学习系统框架的概述开始，我们深入探讨了用于快速和内存高效的注意力计算的系统优化技术，并讨论了它们如何能够在人工智能加速器上高效实现。接下来，我们描述了对于快速Transformer推理至关重要的架构元素。最后，我们在相同的背景下考察了各种模型压缩和快速解码策略。

更新时间: 2024-07-12 09:24:34

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09111v1

A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure

Multilayer perception (MLP) has permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, MLP has obvious drawbacks. 1), The type of activation function is single and relatively fixed, which leads to poor `representation ability' of the network, and it is often to solve simple problems with complex networks; 2), the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. In this work, we propose a novel neural network paradigm X-Net promising to replace MLPs. X-Net can dynamically learn activation functions individually based on derivative information during training to improve the network's representational ability for specific tasks. At the same time, X-Net can precisely adjust the network structure at the neuron level to accommodate tasks of varying complexity and reduce computational costs. We show that X-Net outperforms MLPs in terms of representational capability. X-Net can achieve comparable or even better performance than MLP with much smaller parameters on regression and classification tasks. Specifically, in terms of the number of parameters, X-Net is only 3% of MLP on average and only 1.1% under some tasks. We also demonstrate X-Net's ability to perform scientific discovery on data from various disciplines such as energy, environment, and aerospace, where X-Net is shown to help scientists discover new laws of mathematics or physics.

Updated: 2024-07-12 09:21:00

标题: 一个新的神经计算范式：具有可学习神经元和可调整结构的X-Net

摘要: 多层感知（MLP）已经渗透到各种学科领域，从生物信息学到金融分析，它们的应用已经成为当代科学研究不可或缺的一部分。然而，MLP存在明显的缺点。1）激活函数的类型是单一且相对固定的，导致网络的`表示能力'较差，通常难以用复杂网络解决简单问题；2）网络结构不是自适应的，容易导致网络结构冗余或不足。在这项工作中，我们提出了一种新颖的神经网络范式X-Net，有望取代MLPs。X-Net可以根据训练过程中的导数信息动态学习激活函数，以提高网络在特定任务中的表示能力。同时，X-Net可以精确调整网络结构在神经元级别，以适应不同复杂度的任务并减少计算成本。我们展示了X-Net在表示能力方面优于MLPs。X-Net在回归和分类任务中可以达到与MLP相当甚至更好的性能，而参数更少。具体来说，就参数数量而言，X-Net在平均值上仅为MLP的3%，在某些任务下甚至只有1.1%。我们还展示了X-Net在能源、环境和航空航天等各种学科的数据上进行科学发现的能力，X-Net可以帮助科学家发现数学或物理学的新定律。

更新时间: 2024-07-12 09:21:00

领域: cs.AI,cs.NI

下载: http://arxiv.org/abs/2401.01772v2

TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation

Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods.

Updated: 2024-07-12 09:12:19

标题: TOP-Nav：腿部导航集成地形、障碍物和本体感知估计

摘要: 四肢导航通常在开放世界、越野和具有挑战性的环境中进行研究。在这些场景中，估计外部干扰需要对多模态信息进行复杂的综合。这突显了现有研究的一个主要局限性，即主要关注避开障碍物。在这项工作中，我们提出了TOP-Nav，一个集成了全面路径规划器、地形感知、避障和闭环本体感的新型四肢导航框架。TOP-Nav强调了视觉和本体感在路径和运动规划中的协同作用。在路径规划器中，我们提出并集成了一个地形估计器，使机器人能够在具有更高可穿越性的地形上选择航点，同时有效地避免障碍物。在运动规划层面上，我们不仅实现了一个运动控制器来跟踪导航命令，还构建了一个本体感顾问，为路径规划器提供运动评估。基于闭环运动反馈，我们对基于视觉的地形和障碍物估计进行在线修正。因此，TOP-Nav实现了机器人能够处理超出先前知识分布范围的地形或干扰，并克服了视觉条件所施加的约束。通过在模拟和实际环境中进行的大量实验，TOP-Nav在开放世界导航方面表现出比现有方法更出色的性能。

更新时间: 2024-07-12 09:12:19

领域: cs.RO,cs.AI,cs.CV,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.15256v3

Enhancing Training Efficiency Using Packing with Flash Attention

Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including irrelevant padding tokens in the computation and wastes GPU resources. On the other hand, the Hugging Face SFT trainer offers the option to use packing to combine multiple training examples up to the maximum sequence length. This allows for maximal utilization of GPU resources. However, without proper masking of each packed training example, attention will not be computed correctly when using SFT trainer. We enable and then analyse packing and Flash Attention with proper attention masking of each example and show the benefits of this training paradigm.

Updated: 2024-07-12 09:10:37

标题: 使用闪存注意力机制来增强训练效率

摘要: 填充通常用于通过向较短的训练示例添加特殊标记来调整LLM模型，以匹配每个批次中最长序列的长度。虽然这确保了批处理的一致性，但它通过在计算中包含无关的填充标记引入了低效，并浪费了GPU资源。另一方面，Hugging Face SFT训练器提供了使用打包来将多个训练示例组合到最大序列长度的选项。这允许最大化利用GPU资源。然而，在没有正确屏蔽每个打包的训练示例时，当使用SFT训练器时，注意力计算将不正确。我们使用适当的注意屏蔽启用并分析打包和Flash Attention每个示例，并展示这种训练范式的好处。

更新时间: 2024-07-12 09:10:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09105v1

UserBoost: Generating User-specific Synthetic Data for Faster Enrolment into Behavioural Biometric Systems

Behavioural biometric authentication systems entail an enrolment period that is burdensome for the user. In this work, we explore generating synthetic gestures from a few real user gestures with generative deep learning, with the application of training a simple (i.e. non-deep-learned) authentication model. Specifically, we show that utilising synthetic data alongside real data can reduce the number of real datapoints a user must provide to enrol into a biometric system. To validate our methods, we use the publicly available dataset of WatchAuth, a system proposed in 2022 for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. We develop a regularised autoencoder model for generating synthetic user-specific wrist motion data representing these physical gestures, and demonstrate the diversity and fidelity of our synthetic gestures. We show that using synthetic gestures in training can improve classification ability for a real-world system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system by more than 40% without negatively impacting its error rates.

Updated: 2024-07-12 09:10:07

标题: UserBoost：为了更快地将用户注册到行为生物识别系统中生成用户特定的合成数据

摘要: 行为生物特征认证系统涉及一个对用户来说繁重的注册期。在这项工作中，我们探讨了利用生成式深度学习从少量真实用户手势中生成合成手势，应用于训练一个简单（即非深度学习）的认证模型。具体来说，我们展示了利用合成数据与真实数据相结合可以减少用户必须提供的真实数据点数量以注册到生物特征认证系统中。为了验证我们的方法，我们使用了2022年提出的用于通过向支付终端伸手的物理手势进行智能手表支付认证的WatchAuth公开可用数据集。我们开发了一个正则化自编码器模型，用于生成代表这些物理手势的合成用户特定手腕运动数据，并展示了我们合成手势的多样性和保真度。我们展示了在训练中使用合成手势可以提高实际系统的分类能力。通过这种技术，我们可以将注册用户到类似WatchAuth系统中所需手势的数量减少超过40％，而不会对其错误率产生负面影响。

更新时间: 2024-07-12 09:10:07

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.09104v1

DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents

Information extraction from handwritten documents involves traditionally three distinct steps: Document Layout Analysis, Handwritten Text Recognition, and Named Entity Recognition. Recent approaches have attempted to integrate these steps into a single process using fully end-to-end architectures. Despite this, these integrated approaches have not yet matched the performance of language models, when applied to information extraction in plain text. In this paper, we introduce DANIEL (Document Attention Network for Information Extraction and Labelling), a fully end-to-end architecture integrating a language model and designed for comprehensive handwritten document understanding. DANIEL performs layout recognition, handwriting recognition, and named entity recognition on full-page documents. Moreover, it can simultaneously learn across multiple languages, layouts, and tasks. For named entity recognition, the ontology to be applied can be specified via the input prompt. The architecture employs a convolutional encoder capable of processing images of any size without resizing, paired with an autoregressive decoder based on a transformer-based language model. DANIEL achieves competitive results on four datasets, including a new state-of-the-art performance on RIMES 2009 and M-POPP for Handwriting Text Recognition, and IAM NER for Named Entity Recognition. Furthermore, DANIEL is much faster than existing approaches. We provide the source code and the weights of the trained models at \url{https://github.com/Shulk97/daniel}.

Updated: 2024-07-12 09:09:56

标题: DANIEL：一种用于手写文档信息提取和标注的快速文档注意力网络

摘要: 手写文档的信息提取传统上涉及三个不同的步骤：文档布局分析、手写文本识别和命名实体识别。最近的方法尝试将这些步骤整合到一个使用完全端到端架构的单一过程中。尽管如此，这些整合方法在应用于纯文本信息提取时尚未达到语言模型的性能。在本文中，我们介绍了DANIEL（文档注意力网络用于信息提取和标记），这是一个完全端到端架构，集成了一种语言模型，旨在全面理解手写文档。DANIEL可以在整页文档上执行布局识别、手写识别和命名实体识别。此外，它可以同时跨越多种语言、布局和任务进行学习。对于命名实体识别，可以通过输入提示指定要应用的本体论。该架构采用了一个卷积编码器，能够处理任意大小的图像而无需调整大小，配合基于变压器的语言模型的自回归解码器。DANIEL在四个数据集上取得了有竞争力的结果，包括对手写文本识别的RIMES 2009和M-POPP以及对命名实体识别的IAM NER的最新性能。此外，DANIEL比现有方法快得多。我们提供经过训练模型的源代码和权重，网址为\url{https://github.com/Shulk97/daniel}。

更新时间: 2024-07-12 09:09:56

领域: cs.AI

下载: http://arxiv.org/abs/2407.09103v1

SAT Encoding of Partial Ordering Models for Graph Coloring Problems

In this paper, we suggest new SAT encodings of the partial-ordering based ILP model for the graph coloring problem (GCP) and the bandwidth coloring problem (BCP). The GCP asks for the minimum number of colors that can be assigned to the vertices of a given graph such that each two adjacent vertices get different colors. The BCP is a generalization, where each edge has a weight that enforces a minimal "distance" between the assigned colors, and the goal is to minimize the "largest" color used. For the widely studied GCP, we experimentally compare our new SAT encoding to the state-of-the-art approaches on the DIMACS benchmark set. Our evaluation confirms that this SAT encoding is effective for sparse graphs and even outperforms the state-of-the-art on some DIMACS instances. For the BCP, our theoretical analysis shows that the partial-ordering based SAT and ILP formulations have an asymptotically smaller size than that of the classical assignment-based model. Our practical evaluation confirms not only a dominance compared to the assignment-based encodings but also to the state-of-the-art approaches on a set of benchmark instances. Up to our knowledge, we have solved several open instances of the BCP from the literature for the first time.

Updated: 2024-07-12 09:08:05

标题: SAT编码部分排序模型用于图着色问题

摘要: 在本文中，我们提出了基于偏序的ILP模型的新SAT编码，用于图着色问题（GCP）和带宽着色问题（BCP）。 GCP要求为给定图的顶点分配的最小颜色数，使得每两个相邻的顶点得到不同的颜色。 BCP是一个泛化问题，其中每条边都有一个权重，强制分配的颜色之间有最小的“距离”，目标是最小化使用的“最大”颜色。对于广泛研究的GCP，我们在DIMACS基准集上实验比较了我们的新SAT编码与最先进方法。我们的评估证实，对于稀疏图，这种SAT编码是有效的，甚至在某些DIMACS实例上表现优于最先进的方法。对于BCP，我们的理论分析表明，基于偏序的SAT和ILP公式比传统的基于分配的模型具有渐近更小的大小。我们的实际评估不仅证实了与基于分配的编码相比的优势，还证实了对一组基准实例的最先进方法的优势。据我们所知，我们首次解决了文献中的几个BCP的未解实例。

更新时间: 2024-07-12 09:08:05

领域: cs.AI,cs.DM,cs.DS,cs.LO

下载: http://arxiv.org/abs/2403.15961v2

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale and diversity of current image-text interleaved data restrict the development of multimodal large language models. In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset. Using an efficient data engine, we filter and extract large-scale high-quality documents, which contain 8.6 billion images and 1,696 billion text tokens. Compared to counterparts (e.g., MMC4, OBELICS), our dataset 1) has 15 times larger scales while maintaining good data quality; 2) features more diverse sources, including both English and non-English websites as well as video-centric websites; 3) is more flexible, easily degradable from an image-text interleaved format to pure text corpus and image-text pairs. Through comprehensive analysis and experiments, we validate the quality, usability, and effectiveness of the proposed dataset. We hope this could provide a solid data foundation for future multimodal model research. Code and data are released at https://github.com/OpenGVLab/OmniCorpus.

Updated: 2024-07-12 08:54:51

标题: OmniCorpus：一个统一的多模态语料库，包含与文本交织的100亿级图像

摘要: 图像文本交错数据是由多个图像和文本组成的自然文档格式，符合互联网数据的呈现范式，并且与人类阅读习惯密切相关。最近的研究表明，这种数据有助于多模态上下文学习，并在多模态微调期间保持大型语言模型的能力。然而，当前图像文本交错数据的规模和多样性受限，限制了多模态大型语言模型的发展。在本文中，我们介绍了OmniCorpus，一个规模达到100亿的图像文本交错数据集。通过高效的数据引擎，我们过滤和提取了包含86亿张图像和1696亿个文本标记的大规模高质量文档。与其他数据集（例如MMC4、OBELICS）相比，我们的数据集1）规模大15倍，同时保持良好的数据质量；2）具有更多元化的来源，包括英语和非英语网站以及以视频为中心的网站；3）更加灵活，容易从图像文本交错格式转化为纯文本语料库和图像文本对。通过全面的分析和实验，我们验证了所提出数据集的质量、可用性和有效性。我们希望这能为未来的多模态模型研究提供坚实的数据基础。代码和数据已在https://github.com/OpenGVLab/OmniCorpus发布。

更新时间: 2024-07-12 08:54:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.08418v3

Music Proofreading with RefinPaint: Where and How to Modify Compositions given Context

Autoregressive generative transformers are key in music generation, producing coherent compositions but facing challenges in human-machine collaboration. We propose RefinPaint, an iterative technique that improves the sampling process. It does this by identifying the weaker music elements using a feedback model, which then informs the choices for resampling by an inpainting model. This dual-focus methodology not only facilitates the machine's ability to improve its automatic inpainting generation through repeated cycles but also offers a valuable tool for humans seeking to refine their compositions with automatic proofreading. Experimental results suggest RefinPaint's effectiveness in inpainting and proofreading tasks, demonstrating its value for refining music created by both machines and humans. This approach not only facilitates creativity but also aids amateur composers in improving their work.

Updated: 2024-07-12 08:52:27

标题: 使用RefinPaint进行音乐校对：在何处以及如何修改给定语境下的作品

摘要: 自回归生成变压器在音乐生成中起着关键作用，能够产生连贯的作品，但在人机协作中面临挑战。我们提出了一种名为RefinPaint的迭代技术，可以改进采样过程。它通过识别较弱的音乐元素，利用反馈模型来指导inpainting模型的重新采样选择，从而实现这一目的。这种双重关注的方法不仅有助于机器通过重复循环改进其自动inpainting生成能力，还为寻求通过自动校对来改进其作品的人提供了有价值的工具。实验结果表明，RefinPaint在inpainting和校对任务中的有效性，展示了它在改进由机器和人类创作的音乐方面的价值。这种方法不仅促进了创造力，还帮助业余作曲家改进自己的作品。

更新时间: 2024-07-12 08:52:27

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.09099v1

STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs

Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their development in understanding spatial-temporal data has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-LLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{LLM}s, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-LLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers as well as virtual nodes. Topology-aware node embeddings are designed for LLMs to comprehend and exploit the topology structure of data. Additionally, to capture the non-pairwise and higher-order correlations, we design a hypergraph learning module for LLMs, which can enhance the overall performance and improve efficiency. Extensive experiments demonstrate that STD-LLM exhibits strong performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-LLM achieves promising results on both few-shot and zero-shot learning tasks.

Updated: 2024-07-12 08:48:16

标题: STD-LLM：使用LLMs理解时空数据的时空特性

摘要: 时空预测和插补对于现实世界中的动态系统如智能交通、城市规划和公共健康至关重要。大多数现有方法针对单个预测或插补任务进行了定制，但并非同时设计。此外，它们对于零样本和少样本学习效果较差。虽然大型语言模型（LLMs）在各种任务中展示了强大的模式识别和推理能力，包括少样本和零样本学习，但它们在理解时空数据方面的发展受到了模型复杂相关性建模不足的限制，例如时间相关性、空间连接性、数据中的非成对和高阶时空相关性。在本文中，我们提出了用于理解空间和时间属性的STD-LLM（Spatial-Temporal Data with LLMs），能够实现时空预测和插补任务。STD-LLM通过明确定义的空间和时间标记器以及虚拟节点理解时空相关性。为了让LLMs理解和利用数据的拓扑结构，我们设计了拓扑感知节点嵌入。此外，为了捕捉非成对和高阶相关性，我们为LLMs设计了一个超图学习模块，可以提高整体性能并提高效率。广泛的实验表明，STD-LLM在各种数据集上的预测和插补任务中表现出强大的性能和泛化能力。此外，STD-LLM在少样本和零样本学习任务上取得了令人满意的结果。

更新时间: 2024-07-12 08:48:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09096v1

TAPFixer: Automatic Detection and Repair of Home Automation Vulnerabilities based on Negated-property Reasoning

Trigger-Action Programming (TAP) is a popular end-user programming framework in the home automation (HA) system, which eases users to customize home automation and control devices as expected. However, its simplified syntax also introduces new safety threats to HA systems through vulnerable rule interactions. Accurately fixing these vulnerabilities by logically and physically eliminating their root causes is essential before rules are deployed. However, it has not been well studied. In this paper, we present TAPFixer, a novel framework to automatically detect and repair rule interaction vulnerabilities in HA systems. It extracts TAP rules from HA profiles, translates them into an automaton model with physical and latency features, and performs model checking with various correctness properties. It then uses a novel negated-property reasoning algorithm to automatically infer a patch via model abstraction and refinement and model checking based on negated-properties. We evaluate TAPFixer on market HA apps (1177 TAP rules and 53 properties) and find that it can achieve an 86.65% success rate in repairing rule interaction vulnerabilities. We additionally recruit 23 HA users to conduct a user study that demonstrates the usefulness of TAPFixer for vulnerability repair in practical HA scenarios.

Updated: 2024-07-12 08:43:26

标题: TAPFixer：基于否定属性推理的家庭自动化漏洞自动检测和修复

摘要: 触发-动作编程（TAP）是家庭自动化（HA）系统中一种流行的终端用户编程框架，它使用户能够按预期自定义家庭自动化和控制设备。然而，其简化的语法也通过易受攻击的规则交互引入了新的安全威胁到HA系统中。在规则部署之前，通过逻辑和物理上消除这些漏洞的根本原因是至关重要的。然而，这方面尚未得到很好的研究。在本文中，我们提出了TAPFixer，这是一个新颖的框架，用于自动检测和修复HA系统中的规则交互漏洞。它从HA配置文件中提取TAP规则，将它们转换为具有物理和延迟特性的自动机模型，并使用各种正确性属性进行模型检查。然后，它使用一种新颖的否定属性推理算法，通过模型抽象和细化以及基于否定属性的模型检查自动推断出一个补丁。我们在市场HA应用程序上评估了TAPFixer（1177个TAP规则和53个属性），发现它在修复规则交互漏洞方面可以达到86.65%的成功率。此外，我们还招募了23名HA用户进行用户研究，证明了TAPFixer在实际HA场景中对漏洞修复的实用性。

更新时间: 2024-07-12 08:43:26

领域: cs.CR

下载: http://arxiv.org/abs/2407.09095v1

On Exact Bit-level Reversible Transformers Without Changing Architectures

In the literature, various reversible deep neural networks (DNN) models have been proposed to reduce memory consumption or improve data-throughput in the training process. However, almost all existing reversible DNNs either are constrained to have special structures or are constructed by modifying the original DNN architectures considerably to enable reversibility. In this work, we propose exact bit-level reversible transformers without changing the architectures in the inference procedure. The basic idea is to first treat each transformer block as the Euler integration approximation for solving an ordinary differential equation (ODE) and then incorporate the technique of bidirectional integration approximation (BDIA) (see [26]) for BDIA-based diffusion inversion) into the neural architecture together with activation quantization to make it exactly bit-level reversible, referred to as BDIA-transformer. In the training process, we let a hyper-parameter $\gamma$ in BDIA-transformer randomly take one of the two values $\{0.5, -0.5\}$ per transformer block for averaging two consecutive integration approximations, which regularizes the models for improving the validation accuracy. Light-weight side information per transformer block is required to be stored in the forward process to account for binary quantization loss to enable exact bit-level reversibility. In the inference procedure, the expectation $\mathbb{E}(\gamma)=0$ is taken to make the resulting architectures of BDIA-transformer be identical to transformers up to activation quantization. Empirical study indicates that BDIA-transformers outperform their original counterparts notably due to the regularization effect of the $\gamma$ parameter.

Updated: 2024-07-12 08:42:58

标题: 关于在不改变架构的情况下实现精确位级可逆变换器

摘要: 在文献中，已经提出了各种可逆深度神经网络（DNN）模型，旨在减少内存消耗或改善训练过程中的数据吞吐量。然而，几乎所有现有的可逆DNN要么被限制为具有特殊结构，要么通过大幅修改原始DNN架构来实现可逆性。在这项工作中，我们提出了精确的比特级可逆变压器，而不改变推断过程中的架构。基本思想是首先将每个变压器块视为求解普通微分方程（ODE）的欧拉积分近似，然后将双向积分近似技术（BDIA）（见[26]）与神经结构一起整合，同时结合激活量化使其完全比特级可逆，称为BDIA变压器。在训练过程中，我们让BDIA变压器中的一个超参数$ \gamma $在每个变压器块中随机取两个值$ \{0.5，-0.5\} $以对两个连续积分近似进行平均，从而对模型进行正则化以提高验证准确性。每个变压器块需要在前向过程中存储轻量级辅助信息，以解释二进制量化损失，从而实现精确的比特级可逆性。在推断过程中，期望值$ \mathbb{E}（\gamma）=0 $用于使BDIA变压器的结果架构与变压器相同，直到激活量化。实证研究表明，由于$ \gamma $参数的正则化效果，BDIA变压器明显优于其原始对应物。

更新时间: 2024-07-12 08:42:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09093v1

Learning Contrastive Feature Representations for Facial Action Unit Detection

Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by the class imbalance issue of each AU type, and the presence of noisy and false AU labels. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on four widely-utilized benchmark datasets (BP4D, DISFA, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at \url{https://github.com/Ziqiao-Shang/AUNCE}.

Updated: 2024-07-12 08:41:21

标题: 学习用于面部动作单元检测的对比特征表示

摘要: 面部动作单元（AU）检测长期以来一直面临着检测AU激活时微妙特征差异的挑战。现有方法通常依赖于编码AU的像素级信息，这不仅编码了额外的冗余信息，还导致了模型复杂性的增加和泛化能力的限制。此外，AU检测的准确性受到每种AU类型的类别不平衡问题以及嘈杂和虚假AU标签的影响。在本文中，我们介绍了一种针对AU检测的新型对比学习框架，该框架融合了自监督和监督信号，从而增强了对准确检测AU的判别特征的学习。为了解决类别不平衡问题，我们采用一种负样本重新加权策略，调整了更新少数类和多数类样本参数的步长。此外，为了应对嘈杂和虚假AU标签带来的挑战，我们采用了一种涵盖三种不同类型正样本对的抽样技术。这使我们能够将自监督信号注入到监督信号中，有效地减轻噪声标签的不利影响。我们在四个广泛使用的基准数据集（BP4D、DISFA、GFT和Aff-Wild2）上进行的实验评估强调了我们的方法相对于AU检测的最新方法的优越性能。我们的代码可在\url{https://github.com/Ziqiao-Shang/AUNCE}上找到。

更新时间: 2024-07-12 08:41:21

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.06165v2

Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions

The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to manually label their data owing to high cost and lack of expertise. To overcome these issues, there have been initial attempts to handle unlabelled data in a privacy preserving distributed manner using unsupervised federated data clustering. The goal is partition the data available on clients into $k$ partitions (called clusters) without actual exchange of data. Most of the existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. Furthermore, due to presence of skewed nature of data across clients in most of practical scenarios existing models might result in clients suffering high clustering cost making them reluctant to participate in federated process. To this, we are first to introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients. We validate the efficacy of p-FClus against variety of federated datasets showcasing it's data independence nature, applicability to any finite $\ell$-norm, while simultaneously achieving lower cost and variance.

Updated: 2024-07-12 08:35:33

标题: 公平的个性化联邦数据聚类：弥合多样数据分布之间的差距

摘要: 边缘设备数据的快速增长催生了机器学习算法的性能。然而，生成的数据存储在客户设备上，因此传统机器学习范例面临两个主要挑战 - 为训练集中化数据，以及对于大多数生成的数据，类标签缺失且客户没有动力手动标记其数据，因为成本高昂且缺乏专业知识。为了克服这些问题，已经开始尝试在保护隐私的分布式方式中处理未标记数据，使用无监督的联邦数据聚类。目标是将客户端上的数据划分为$k$个分区（称为簇），而无需实际交换数据。大多数现有算法高度依赖于客户端之间的数据分布模式或计算成本昂贵。此外，在大多数实际情况下，由于客户端之间数据的偏斜性，现有模型可能导致客户端承担较高的聚类成本，使他们不愿参与联邦过程。为此，我们首次提出了在联邦聚类中引入个性化的概念。目标是在实现较低的聚类成本的同时，实现客户端之间的统一成本。我们提出了p-FClus，在服务器和客户端之间的单轮通信中实现这些目标。我们验证了p-FClus在各种联邦数据集上的有效性，展示了其数据独立性特性，适用于任何有限的$\ell$-范数，同时实现了更低的成本和方差。

更新时间: 2024-07-12 08:35:33

领域: cs.LG

下载: http://arxiv.org/abs/2407.04302v2

FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at: https://github.com/xmed-lab/FD-SOS.

Updated: 2024-07-12 08:29:25

标题: FD-SOS：用于从口腔内图像中检测骨窗和脱位的视觉-语言开放域检测器

摘要: 骨窗和骨缺损（FD）的准确检测对于牙科中的有效治疗规划至关重要。虽然锥束计算机断层扫描（CBCT）是评估FD的金标准，但由于辐射暴露、有限的可及性和较高的成本，存在一定限制。在口腔内影像中，牙医们在FD的鉴别诊断方面面临挑战。本文提出了一种新颖且临床意义重大的FD检测应用，仅基于口腔内影像。为了实现这一目标，我们提出了FD-SOS，一种从口腔内影像中检测FD的新型开放式目标检测器。FD-SOS具有两个新颖组件：条件对比去噪（CCDN）和牙齿特定匹配分配（TMA）。这些模块使FD-SOS能够有效利用外部的牙齿语义。实验结果显示，我们的方法优于现有的检测方法，并在相同精度水平下将专业牙科人员的召回率提高了35％。代码可在以下链接获取：https://github.com/xmed-lab/FD-SOS。

更新时间: 2024-07-12 08:29:25

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.09088v1

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.

Updated: 2024-07-12 08:26:25

标题: 扩散回火通过概率积分器改进常微分方程参数估计

摘要: 常微分方程（ODEs）被广泛应用于描述科学中的动态系统，但确定解释实验测量的参数是具有挑战性的。特别是，虽然ODEs是可微的并且允许基于梯度的参数优化，但ODEs的非线性动态通常导致许多局部极小值和对初始条件极端敏感。因此，我们提出了扩散调温，这是一种新颖的概率数值方法的正则化技术，它改善了ODEs中基于梯度的参数优化的收敛性。通过迭代减少概率积分器的噪声参数，所提出的方法更可靠地收敛到真实参数。我们证明了我们的方法对于不同复杂性的动态系统是有效的，并展示它为Hodgkin-Huxley模型获得了一个具有实际相关参数数量的可靠参数估计。

更新时间: 2024-07-12 08:26:25

领域: cs.LG

下载: http://arxiv.org/abs/2402.12231v3

On the Role of Discrete Tokenization in Visual Representation Learning

In the realm of self-supervised learning (SSL), masked image modeling (MIM) has gained popularity alongside contrastive learning methods. MIM involves reconstructing masked regions of input images using their unmasked portions. A notable subset of MIM methodologies employs discrete tokens as the reconstruction target, but the theoretical underpinnings of this choice remain underexplored. In this paper, we explore the role of these discrete tokens, aiming to unravel their benefits and limitations. Building upon the connection between MIM and contrastive learning, we provide a comprehensive theoretical understanding on how discrete tokenization affects the model's generalization capabilities. Furthermore, we propose a novel metric named TCAS, which is specifically designed to assess the effectiveness of discrete tokens within the MIM framework. Inspired by this metric, we contribute an innovative tokenizer design and propose a corresponding MIM method named ClusterMIM. It demonstrates superior performance on a variety of benchmark datasets and ViT backbones. Code is available at https://github.com/PKU-ML/ClusterMIM.

Updated: 2024-07-12 08:25:31

标题: 关于离散标记在视觉表示学习中的作用

摘要: 在自监督学习领域，掩蔽图像建模（MIM）与对比学习方法并驾齐驱。MIM涉及使用未掩蔽部分重建输入图像的掩蔽区域。 MIM方法论的一个显著子集采用离散标记作为重建目标，但这种选择的理论基础仍未得到充分探讨。在本文中，我们探讨这些离散标记的作用，旨在揭示它们的优势和局限性。在MIM和对比学习之间的关联基础上，我们提供了对离散标记化如何影响模型泛化能力的全面理论理解。此外，我们提出了一个名为TCAS的新指标，专门设计用于评估MIM框架内离散标记的有效性。受此指标启发，我们提出了一种创新的标记器设计，并提出了一种相应的MIM方法，名为ClusterMIM。它在各种基准数据集和ViT骨干上表现出优越性能。代码可在https://github.com/PKU-ML/ClusterMIM获得。

更新时间: 2024-07-12 08:25:31

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.09087v1

Jailbreaking as a Reward Misspecification Problem

The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts. Building upon these insights, we present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs. ReMiss achieves state-of-the-art attack success rates on the AdvBench benchmark while preserving the human readability of the generated prompts. Detailed analysis highlights the unique advantages brought by the proposed reward misspecification objective compared to previous methods.

Updated: 2024-07-12 08:15:45

标题: 越狱作为奖励误差问题

摘要: 广泛采用大语言模型(LLMs)引发了人们对它们的安全性和可靠性的担忧，特别是对它们易受对抗攻击的脆弱性。在本文中，我们提出了一个新颖的观点，认为这种脆弱性是由于在对齐过程中奖励错误规定引起的。我们引入了一个度量指标ReGap来量化奖励错误规定的程度，并展示了它在检测有害后门提示方面的有效性和稳健性。基于这些见解，我们提出了ReMiss，一个用于自动红队行动的系统，针对各种目标对齐的LLMs生成对抗性提示。ReMiss在AdvBench基准测试中实现了最先进的攻击成功率，同时保持了生成提示的可读性。详细分析突显了所提出的奖励错误规定目标相对于先前方法带来的独特优势。

更新时间: 2024-07-12 08:15:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.14393v2

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation

Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.

Updated: 2024-07-12 08:15:22

标题: Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation “少即是多：伪标签过滤用于持续测试时适应”

摘要: 持续测试时适应（CTTA）旨在在测试阶段将预训练模型调整到一系列目标域，而无需访问源数据。为了适应未标记的来自未知域的数据，现有方法依赖于为所有样本构建伪标签并通过自我训练更新模型。然而，这些伪标签通常涉及噪声，导致适应不足。为了提高伪标签的质量，我们提出了一种用于CTTA的伪标签选择方法，称为伪标签过滤器（PLF）。PLF的关键思想是持续选择适当的阈值用于伪标签，并识别可靠的用于自我训练。具体而言，我们提出了在连续域学习过程中设置阈值的三个原则，包括初始化、增长和多样性。基于这些原则，我们设计了自适应阈值筛选伪标签。此外，我们引入了一种类先验对齐（CPA）方法，以鼓励模型为未知域样本进行多样化预测。通过大量实验证明，PLF优于当前最先进的方法，在CTTA中证明了其有效性。

更新时间: 2024-07-12 08:15:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02609v2

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems

Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. In such approaches, environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GENBO (GENerator of BOundary state pairs), a novel test generator for ADS testing. GENBO mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment instance. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has, on average, up to 3x higher success rate on a separate set of evaluation tracks with respect to the original DNN model.

Updated: 2024-07-12 08:02:19

标题: 边界状态生成用于测试和改进自动驾驶系统

摘要: 最近深度神经网络（DNNs）和传感器技术的进步使得具有日益增加自主性的自动驾驶系统（ADSs）成为可能。然而，评估它们的可靠性仍然是一个关键关注点。当前最先进的ADS测试方法是修改模拟驾驶环境的可控属性，直到ADS出现异常行为。在这种方法中，ADS成功的环境实例被丢弃，尽管它们可能包含ADS可能出现异常行为的隐藏驾驶条件。在本文中，我们介绍了GENBO（GENerator of BOundary state pairs），这是一个用于ADS测试的新型测试生成器。GENBO改变了自我车辆（位置、速度和方向）在一个无故障环境实例中收集的驾驶条件，并在同一环境实例中高效地生成具有挑战性的驾驶条件，即行为边界（即模型开始出现异常行为的地方）。我们使用这样的边界条件来扩充初始训练数据集，并在测试下重新训练DNN模型。我们的评估结果表明，重新训练的模型在另一组评估轨道上的成功率平均高出原始DNN模型的3倍。

更新时间: 2024-07-12 08:02:19

领域: cs.SE,cs.AI,cs.RO

下载: http://arxiv.org/abs/2307.10590v2

Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports

Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available.

Updated: 2024-07-12 07:34:10

标题: 基于DICOM结构化报告的多模态数据集创建用于联邦学习

摘要: 目的：由于异构数据集的存在，联合训练通常受到阻碍，原因包括不同的数据存储选项、不一致的命名方案、不同的注释程序和标签质量的差异。这在新兴的多模态学习范式中尤为明显，其中数据集的协调，包括统一的数据表示和过滤选项，至关重要。方法：DICOM结构化报告使得可以在影像领域之外标准化地关联任意信息，并可以在Python深度学习管道中通过highdicom使用。在此基础上，我们开发了一个用于数据集成和交互式过滤能力的开放平台，简化了装配多模态数据集的过程。结果：在这项研究中，我们通过展示其适用性，将其扩展到更多和不同的数据类型，并在德国八所大学医院的一个已建立的财团中，为联合训练流程优化数据集。我们通过创建在所有地点预测经过微创心脏瓣膜置换后结果的协调多模态数据集，证明了其并行过滤能力。数据包括DICOM数据（例如计算机断层扫描图像、心电图扫描）、注释（例如钙化分割、点集和起搏器依赖）和元数据（例如假体和诊断）。结论：结构化报告弥合了影像系统和信息系统之间传统的差距。利用固有的DICOM参考系统，可以同时查询任意数据类型，以创建有意义的临床研究队列。图形界面以及示例结构化报告模板将公开提供。

更新时间: 2024-07-12 07:34:10

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.09064v1

NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis. Our project page is available at https://weizheliu.github.io/NeuSDFusion/ .

Updated: 2024-07-12 07:30:00

标题: NeuSDFusion: 一种空间感知的生成模型，用于3D形状的补全、重建和生成

摘要: 3D形状生成旨在产生符合特定条件和约束的创新3D内容。现有方法通常将3D形状分解为一系列局部组件，将每个元素单独处理，而不考虑空间一致性。因此，这些方法在3D数据表示和形状生成方面表现出有限的多功能性，阻碍了它们生成符合指定约束的高度多样化的3D形状的能力。在本文中，我们介绍了一种新颖的空间感知3D形状生成框架，利用2D平面表示来增强3D形状建模。为了确保空间一致性并减少内存使用，我们采用了一种混合形状表示技术，直接学习使用正交2D平面的3D形状的连续有符号距离场表示。此外，我们通过基于变换器的自动编码器结构精心强化不同平面之间的空间对应关系，促进生成的3D形状中空间关系的保持。这产生了一种算法，始终优于各种任务上的最先进的3D形状生成方法，包括无条件形状生成、多模态形状完成、单视图重建和文本到形状合成。我们的项目页面可在 https://weizheliu.github.io/NeuSDFusion/ 上找到。

更新时间: 2024-07-12 07:30:00

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2403.18241v2

Spectral Self-supervised Feature Selection

Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors. The subset of eigenvectors used for computing pseudo-labels is chosen based on a model stability criterion. We then measure the importance of each feature by training a surrogate model to predict the pseudo-labels from the observations. Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures. We demonstrate the effectiveness of our method through experiments on real-world datasets, showing its robustness across multiple domains, particularly its effectiveness on biological datasets.

Updated: 2024-07-12 07:29:08

标题: 光谱自监督特征选择

摘要: 在无监督设置中，从高维观测中选择一个有意义的特征子集可以极大地提高下游分析的准确性，如聚类或降维，并为给定数据集中异质性来源提供宝贵的见解。在本文中，我们提出了一种基于自监督图的无监督特征选择方法。我们的方法的核心是通过对图拉普拉斯特征向量应用简单的处理步骤来计算稳健的伪标签。用于计算伪标签的特征子集是基于模型稳定性标准选择的特征向量。然后，我们通过训练一个替代模型来测量每个特征的重要性，以预测观测值中的伪标签。我们的方法被证明对具有挑战性的情景具有稳健性，如异常值和复杂的子结构存在。我们通过对真实世界数据集的实验展示了我们方法的有效性，显示了其在多个领域的稳健性，特别是在生物数据集上的有效性。

更新时间: 2024-07-12 07:29:08

领域: cs.LG

下载: http://arxiv.org/abs/2407.09061v1

ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spatial information within the images. To overcome these challenges, we introduce ImageFlowNet, a novel framework that learns latent-space flow fields that evolve multiscale representations in joint embedding spaces using neural ODEs and SDEs to model disease progression in the image domain. Notably, ImageFlowNet learns multiscale joint representation spaces by combining cohorts of patients together so that information can be transferred between the patient samples. The dynamics then provide plausible trajectories of progression, with the SDE providing alternative trajectories from the same starting point. We provide theoretical insights that support our formulation of ODEs, and motivate our regularizations involving high-level visual features, latent space organization, and trajectory smoothness. We then demonstrate ImageFlowNet's effectiveness through empirical evaluations on three longitudinal medical image datasets depicting progression in retinal geographic atrophy, multiple sclerosis, and glioblastoma.

Updated: 2024-07-12 07:28:55

标题: ImageFlowNet：使用不规则采样的纵向医学图像预测疾病进展的多尺度轨迹

摘要: 从图像中预测疾病进展对于临床决策是一个至关重要的目标。然而，这项任务受到纵向图像采集中固有的高维度、时间稀疏性和采样不规则性的复杂性影响。现有的方法通常依赖于提取手工制作的特征，并在这个向量空间中进行时间序列分析，导致图像中丰富的空间信息丢失。为了克服这些挑战，我们引入了ImageFlowNet，这是一个新颖的框架，通过使用神经ODEs和SDEs在联合嵌入空间中学习演变多尺度表示，以模拟图像领域中的疾病进展。值得注意的是，ImageFlowNet通过将患者群体合并在一起来学习多尺度联合表示空间，以便信息可以在患者样本之间传递。然后，动态提供了进展的合理轨迹，SDE提供了从相同起点开始的替代轨迹。我们提供理论洞察力，支持我们对ODEs的公式化，并激励我们涉及高级视觉特征、潜在空间组织和轨迹平滑度的正则化。然后，我们通过对三个描述视网膜地理性萎缩、多发性硬化和胶质母细胞瘤进展的纵向医学图像数据集进行实证评估，展示了ImageFlowNet的有效性。

更新时间: 2024-07-12 07:28:55

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.14794v3

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis

Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.

Updated: 2024-07-12 07:22:45

标题: 先进的图聚类方法：全面深入分析

摘要: 图形聚类旨在将图形划分为几个同质群体，是一个涵盖社交网络分析、生物信息学和图像分割等各个领域应用的关键研究领域。本文探讨了图形聚类的传统和最新方法。首先介绍了图论中的关键概念和定义。背景部分涵盖了包括图拉普拉斯和将深度学习集成到图分析中的基本主题。接着深入探讨了传统的聚类方法，包括谱聚类和Leiden算法。随后，对利用深度学习的最新聚类技术进行了研究。通过实验证明了这些方法的全面比较。最后，论文总结了图形聚类的实际应用和潜在的未来研究方向。

更新时间: 2024-07-12 07:22:45

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.09055v1

From MIDI to Rich Tablatures: an Automatic Generative System incorporating Lead Guitarists' Fingering and Stylistic choices

Although the automatic identification of the optimal fingering for the performance of melodies on fretted string instruments has already been addressed (at least partially) in the literature, the specific case regarding lead electric guitar requires a dedicated approach. We propose a system that can generate, from simple MIDI melodies, tablatures enriched by fingerings, articulations, and expressive techniques. The basic fingering is derived by solving a constrained and multi-attribute optimization problem, which derives the best position of the fretting hand, not just the finger used at each moment.Then, by analyzing statistical data from the mySongBook corpus, the most common clich{\'e}s and biomechanical feasibility, articulations, and expressive techniques are introduced. Finally, the obtained output is converted into MusicXML format, which allows for easy visualization and use. The quality of the tablatures derived and the high configurability of the proposed approach can have several impacts, in particular in the fields of instrumental teaching, assisted composition and arranging, and computational expressive music performance models.

Updated: 2024-07-12 07:18:24

标题: 从MIDI到丰富的吉他谱：一种自动生成系统，融合了主吉他手的指法和风格选择

摘要: 尽管文献中已经部分讨论了在弦乐器上演奏旋律的最佳指法的自动识别，但是针对主音电吉他的具体情况需要专门的方法。我们提出了一个系统，可以从简单的MIDI旋律生成富有指法、发音和表现技巧的吉他谱。基本指法是通过解决受限和多属性优化问题来得出的，该问题确定了按弦手的最佳位置，而不仅仅是每个时刻使用的手指。然后，通过分析mySongBook语料库的统计数据，引入了最常见的陈词滥调和生物力学可行性、发音和表现技巧。最后，得到的输出被转换为MusicXML格式，这样可以方便地进行可视化和使用。所得到的吉他谱的质量以及所提出方法的高度可配置性可能会在乐器教学、辅助作曲和编曲以及计算表现音乐模型等领域产生多种影响。

更新时间: 2024-07-12 07:18:24

领域: cs.AI,math.OC

下载: http://arxiv.org/abs/2407.09052v1

Refusing Safe Prompts for Multi-modal Large Language Models

Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-Refusal, the first method that induces refusals for safe prompts. In particular, our MLLM-Refusal optimizes a nearly-imperceptible refusal perturbation and adds it to an image, causing target MLLMs to likely refuse a safe prompt containing the perturbed image and a safe question. Specifically, we formulate MLLM-Refusal as a constrained optimization problem and propose an algorithm to solve it. Our method offers competitive advantages for MLLM model providers by potentially disrupting user experiences of competing MLLMs, since competing MLLM's users will receive unexpected refusals when they unwittingly use these perturbed images in their prompts. We evaluate MLLM-Refusal on four MLLMs across four datasets, demonstrating its effectiveness in causing competing MLLMs to refuse safe prompts while not affecting non-competing MLLMs. Furthermore, we explore three potential countermeasures -- adding Gaussian noise, DiffPure, and adversarial training. Our results show that they are insufficient: though they can mitigate MLLM-Refusal's effectiveness, they also sacrifice the accuracy and/or efficiency of the competing MLLM. The code is available at https://github.com/Sadcardation/MLLM-Refusal.

Updated: 2024-07-12 07:18:05

标题: 拒绝多模态大型语言模型的安全提示

摘要: 多模态大型语言模型（MLLMs）已成为当今生成式人工智能生态系统的基石，引发了科技巨头和初创公司之间的激烈竞争。具体来说，MLLM生成一个文本响应，给定一个包含图像和问题的提示。虽然最先进的MLLM使用安全过滤器和对齐技术拒绝不安全的提示，但在这项工作中，我们介绍了MLLM-Refusal，这是第一种为安全提示引入拒绝的方法。具体来说，我们的MLLM-Refusal优化几乎无法察觉的拒绝扰动，并将其添加到一个图像中，导致目标MLLM更有可能拒绝包含扰动图像和安全问题的安全提示。具体地，我们将MLLM-Refusal构建为一个受限优化问题，并提出了解决方案。我们的方法为MLLM模型提供商提供了竞争优势，可能扰乱竞争MLLM的用户体验，因为当他们在提示中无意中使用这些扰动图像时，竞争MLLM的用户将收到意外的拒绝。我们在四个数据集上评估了MLLM-Refusal对四个MLLM的影响，证明了其在导致竞争MLLM拒绝安全提示的有效性，同时不影响非竞争MLLM。此外，我们探讨了三种潜在的对策 - 添加高斯噪声，DiffPure和对抗训练。我们的结果表明它们并不足够：尽管它们可以减轻MLLM-Refusal的有效性，但也会牺牲竞争MLLM的准确性和/或效率。代码可在https://github.com/Sadcardation/MLLM-Refusal找到。

更新时间: 2024-07-12 07:18:05

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.09050v1

Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning

We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model that allows for quick optimization, achieving nearly equivalent performance at a significantly lower cost.Our benchmark demonstrates that, for most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations. Furthermore, we confirm that diffusion-based models generally outperform other models on tabular data. However, this advantage is not significant when the entire tuning and training process is restricted to the same GPU budget for all models.

Updated: 2024-07-12 07:16:33

标题: 表格数据生成模型内部机制：超参数调整的强大影响

摘要: 我们通过对16个数据集进行广泛基准测试，研究了数据集特定超参数、特征编码和架构调整对五个最近的表格数据生成模型家族的影响。这项研究解决了对模型进行统一评估的实际需求，完全考虑了超参数优化。此外，我们提出了每个模型的缩减搜索空间，可以快速优化，以较低的成本实现几乎相同的性能。我们的基准测试表明，对于大多数模型来说，大规模数据集特定调整相比原始配置显著提高了性能。此外，我们确认扩散模型通常在表格数据上胜过其他模型。然而，在所有模型的整个调整和训练过程都受限于相同的GPU预算时，这种优势并不显著。

更新时间: 2024-07-12 07:16:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.12945v2

KUNPENG: An Embodied Large Model for Intelligent Maritime

Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance.

Updated: 2024-07-12 07:16:22

标题: KUNPENG：用于智能海事的具体大型模型

摘要: 智能海洋作为智能海洋建设的重要组成部分，深度融合了先进的人工智能技术和数据分析方法，涵盖了智能船舶、航线优化、安全导航等多个方面，旨在提高海洋资源利用效率和交通网络智能化水平。然而，复杂动态的海洋环境以及多样化和异构的大规模数据源为智能海洋中的实时决策带来挑战。本文提出了KUNPENG，这是智能海洋建设中首个具有实体化大型模型，由六个系统组成。该模型感知多源异构数据以认知环境互动，并制定自主决策策略，用于智能船舶在安全和紧急保障下执行导航行为，并持续优化动力以实现海洋中的实体智能。在综合海洋任务评估中，KUNPENG表现出色。

更新时间: 2024-07-12 07:16:22

领域: cs.AI

下载: http://arxiv.org/abs/2407.09048v1

Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification

Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.

Updated: 2024-07-12 07:10:47

标题: 时间频率分析变长WiFi CSI信号用于人员重新识别

摘要: 人员再识别（ReID）作为安全领域中关键的技术，在安全检测和人员计数中发挥着重要作用。当前的安全和监控系统主要依赖于视觉信息，这可能侵犯个人隐私，并容易受到特定场景中行人外观和服装的干扰。同时，路由器的广泛使用为ReID 提供了新的可能性。本文介绍了一种利用WiFi信道状态信息（CSI）的方法，利用WiFi信号的多径传播特性作为区分不同行人特征的基础。我们提出了一个能够处理可变长度数据的两流网络结构，该结构分析WiFi信号的时间域幅度和频率域相位，通过连续的横向连接融合时频信息，并采用先进的客观函数进行表示和度量学习。在实际收集的数据集上进行测试，我们的方法实现了93.68%的mAP和98.13%的排名1。

更新时间: 2024-07-12 07:10:47

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.09045v1

Molecule Language Model with Augmented Pairs and Expertise Transfer

Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery.

Updated: 2024-07-12 07:09:10

标题: 具有增强对和专业知识转移的分子语言模型

摘要: 最近，通过分子语言模型（MoLM）理解分子及其文本描述引起了研究人员的浓厚兴趣。然而，在MoLM领域存在独特的挑战，包括1）分子文本配对数据有限，2）由于专家们专注于特定领域而导致缺乏专业知识。因此，我们提出了AMOLE，该方法通过保留结构相似性损失来增加分子文本配对数据，并在分子之间传递专业知识。对各种下游任务的广泛实验表明，AMOLE在理解分子及其描述方面具有优越性，突显其在真实世界药物发现中的潜力。

更新时间: 2024-07-12 07:09:10

领域: cs.AI

下载: http://arxiv.org/abs/2407.09043v1

Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach

Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.

Updated: 2024-07-12 07:04:06

标题: 在表格数据分类中克服灾难性遗忘：基于伪排练的方法

摘要: 持续学习（CL）面临着一个重要挑战，即在适应不断变化的数据分布的同时保留先前获得的知识， consol新的知识。在本文中，我们引入了一种新方法，称为基于表格数据排练的增量生命周期学习框架（TRIL3），旨在解决表格数据分类问题中的灾难性遗忘现象。TRIL3使用基于原型的增量生成模型XuILVQ生成合成数据以保留旧知识，并修改了以增量方式运行的DNDF算法来学习表格数据的分类任务，而不存储旧样本。经过不同测试以获得合适百分比的合成数据，并将TRIL3与其他CL可用提议进行比较，我们得出结论，TRIL3的性能在只使用50％的合成数据时优于文献中的其他选项。

更新时间: 2024-07-12 07:04:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09039v1

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

While deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features, call into question how reliably these classifiers work in the wild. Furthermore, for safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. In this paper, we address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation. We analyze the decisions of image classifiers by visual counterfactual explanations (VCEs), detection of systematic mistakes by analyzing images where classifiers maximally disagree, and visualization of neurons and spurious features. In this way, we validate existing observations, e.g. the shape bias of adversarially robust models, as well as novel failure modes, e.g. systematic errors of zero-shot CLIP classifiers. Moreover, our VCEs outperform previous work while being more versatile.

Updated: 2024-07-12 06:53:50

标题: DiG-IN：用于研究网络的扩散引导——揭示分类器差异的神经元可视化和视觉反事实解释

摘要: 尽管深度学习在复杂图像分类任务如ImageNet中取得了巨大进展，但意外的失败模式，例如通过虚假特征，使人怀疑这些分类器在实际应用中的可靠性。此外，对于安全关键任务来说，它们的决策黑匣子性质是有问题的，迫切需要解释或至少能够使决策合理的方法。在本文中，我们通过使用引导图像生成框架生成优化分类器衍生目标的图像来解决这些问题。我们通过视觉对反事实解释(VCEs)、分析分类器在最大程度上分歧的图像来检测系统性错误，并可视化神经元和虚假特征来分析图像分类器的决策。通过这种方式，我们验证了现有观察结果，例如对抗性鲁棒模型的形状偏见，以及新型失败模式，例如零样本CLIP分类器的系统性错误。此外，我们的VCEs在更加多才多艺的同时，也胜过了先前的工作。

更新时间: 2024-07-12 06:53:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.17833v3

DRM Revisited: A Complete Error Analysis

In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision?

Updated: 2024-07-12 06:48:00

标题: 数字版权管理再探讨：完整的错误分析

摘要: 在这项工作中，我们解决了深度里茨方法（DRM）在过度参数化制度下的理论分析中的一个基础问题：在给定目标精度水平的情况下，如何确定适当数量的训练样本、神经网络的关键架构参数、投影梯度下降优化过程的步长以及必要的迭代次数，以便梯度下降过程的输出能够与底层偏微分方程的真实解以指定精度相近？

更新时间: 2024-07-12 06:48:00

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2407.09032v1

HPC: Hierarchical Progressive Coding Framework for Volumetric Video

Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.

Updated: 2024-07-12 06:34:24

标题: HPC：体积视频的分层渐进编码框架

摘要: 基于神经辐射场（NeRF）的体积视频具有广泛的潜力，适用于各种3D应用，但其庞大的数据量对压缩和传输提出了重大挑战。当前的NeRF压缩缺乏在单个模型中调整视频质量和比特率以适应各种网络和设备容量的灵活性。为解决这些问题，我们提出了HPC，这是一种新颖的分层渐进式体积视频编码框架，实现了使用单一模型的可变比特率。具体而言，HPC引入了一个具有多分辨率残差辐射场的分层表示，以减少长时间序列中的时间冗余，同时生成不同级别的细节。然后，我们提出了一种端到端的渐进学习方法，采用多速率失真损失函数，共同优化分层表示和压缩。我们的HPC只需训练一次即可实现多个压缩级别，而当前方法需要为不同速率失真（RD）权衡训练多个固定比特率模型。大量实验证明，HPC通过单一模型实现了灵活的质量级别和可变比特率，并展现出竞争力的RD性能，甚至在各种数据集上表现优于固定比特率模型。

更新时间: 2024-07-12 06:34:24

领域: cs.CV,cs.LG,cs.MM,eess.IV

下载: http://arxiv.org/abs/2407.09026v1

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks.

Updated: 2024-07-12 06:34:21

标题: SpreadsheetLLM：为大型语言模型编码电子表格

摘要: 电子表格，以其广泛的二维网格、各种布局和多样的格式选项，为大型语言模型（LLMs）提出了显著挑战。作为回应，我们引入了SpreadsheetLLM，开创了一种高效的编码方法，旨在释放和优化LLMs在电子表格上的强大理解和推理能力。最初，我们提出了一种基本的序列化方法，该方法包括单元格地址、值和格式。然而，由于LLMs的令牌限制，这种方法在大多数应用中是不现实的。为了解决这一挑战，我们开发了SheetCompressor，一种创新的编码框架，可以有效地为LLMs压缩电子表格。它包括三个模块：基于结构锚点的压缩、逆向索引转换和数据格式感知聚合。它在电子表格表格检测任务中显著提高了性能，在GPT4的情境学习设置中，比基本方法表现出25.6%的优势。此外，使用SheetCompressor进行微调的LLM具有平均压缩比为25倍，但达到了一流的78.9%的F1分数，超过了最佳现有模型12.3%。最后，我们提出了Chain of Spreadsheet，用于电子表格理解的下游任务，并在一个新的、要求严格的电子表格问答任务中进行验证。我们系统地利用了电子表格的固有布局和结构，展示了SpreadsheetLLM在各种电子表格任务中的高效性。

更新时间: 2024-07-12 06:34:21

领域: cs.AI

下载: http://arxiv.org/abs/2407.09025v1

Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them compatible with existing LLM alignment theories. During policy fine-tuning, we extend preference-based alignment methods like Direct Preference Optimization (DPO) to align diffusion behaviors with continuous Q-functions. Our evaluation on the D4RL benchmark shows that EDA exceeds all baseline methods in overall performance. Notably, EDA maintains about 95\% of performance and still outperforms several baselines given only 1\% of Q-labelled data during fine-tuning.

Updated: 2024-07-12 06:32:36

标题: 将扩散行为与Q函数对齐，以实现高效的连续控制

摘要: 借鉴最近在语言模型对齐方面的进展，我们将离线强化学习定义为一个两阶段优化问题：首先在无奖励行为数据集上预训练具有表现力的生成策略，然后微调这些策略以与任务特定的注释（如Q值）对齐。这种策略使我们能够利用丰富多样的行为数据来增强泛化能力，并通过最少的注释实现对下游任务的快速适应。特别是，我们引入了用于解决连续控制问题的Efficient Diffusion Alignment (EDA)。EDA利用扩散模型进行行为建模。然而，与先前的方法不同，我们将扩散策略表示为相对于动作输入的标量神经网络的导数。这种表示至关重要，因为它使得可以直接计算扩散模型的密度，使其与现有的LLM对齐理论兼容。在策略微调过程中，我们扩展了基于偏好的对齐方法，如Direct Preference Optimization (DPO)，以将扩散行为与连续Q函数对齐。我们在D4RL基准上的评估结果显示，EDA在整体性能方面超越了所有基准方法。值得注意的是，在微调过程中仅使用了1%的Q标记数据，EDA仍然保持了约95%的性能，并仍然优于几个基准方法。

更新时间: 2024-07-12 06:32:36

领域: cs.LG

下载: http://arxiv.org/abs/2407.09024v1

Revisit Human-Scene Interaction via Space Occupancy

Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks. However, one of the major obstacles is its limited data scale. High-quality data with simultaneously captured human and 3D environments is hard to acquire, resulting in limited data diversity and complexity. In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective, leading us to a unified novel view of Human-Occupancy Interaction. By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database: Motion Occupancy Base (MOB). Thus, the need for costly paired motion-scene datasets with high-quality scene scans can be substantially alleviated. With this new unified view of Human-Occupancy interaction, a single motion controller is proposed to reach the target state given the surrounding occupancy. Once trained on MOB with complex occupancy layout, which is stringent to human movements, the controller could handle cramped scenes and generalize well to general scenes with limited complexity like regular living rooms. With no GT 3D scenes for training, our method can generate realistic and stable HSI motions in diverse scenarios, including both static and dynamic scenes. The project is available at https://foruck.github.io/occu-page/.

Updated: 2024-07-12 06:25:13

标题: 重新审视人类与场景之间的空间占用交互

摘要: 人-场景交互（HSI）生成是一项具有挑战性且对各种下游任务至关重要的任务。然而，其中一个主要障碍是其有限的数据规模。同时捕捉人类和3D环境的高质量数据很难获得，导致数据多样性和复杂性有限。在这项工作中，我们认为与场景互动本质上是从抽象物理角度与场景的空间占用互动，从而引导我们到一种统一的新视角：人-占用交互。通过将纯运动序列视为人类与看不见的场景占用互动的记录，我们可以将仅运动数据聚合到一个大规模的配对人-占用交互数据库中：Motion Occupancy Base（MOB）。因此，可以大大减轻对昂贵的配对运动-场景数据集（带有高质量场景扫描）的需求。通过这种新的统一视角，提出了一个单一的运动控制器，以在周围占用的情况下达到目标状态。一旦在对人类移动严格要求的复杂占用布局上经过MOB训练，该控制器可以处理狭窄的场景，并且能够很好地推广到像普通起居室这样具有有限复杂性的一般场景。在没有用于训练的GT 3D场景的情况下，我们的方法可以在各种场景中生成逼真且稳定的HSI动作，包括静态和动态场景。该项目可在https://foruck.github.io/occu-page/上找到。

更新时间: 2024-07-12 06:25:13

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2312.02700v2

Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users. To address these issues, we develop a novel method that leverages a Heterogeneous Subgraph Network with Prompt Learning(HSNPL) and contrastive learning mechanisms. Specifically, prompt learning is employed to map users' implicit psychological symbols with excellent interpretability while deep semantic and diverse behavioral features are incorporated by a heterogeneous information network. Then, the heterogeneous graph network with a dual attention mechanism is constructed to model the relationships among heterogeneous social information at the feature level. Furthermore, the heterogeneous subgraph network integrating subgraph attention and self-supervised contrastive learning is developed to explore complicated interactions among users and groups at the user level. Extensive experimental results demonstrate that our proposed method significantly outperforms state-of-the-art methods for depression detection on social media.

Updated: 2024-07-12 06:20:59

标题: 社交媒体上可解释的抑郁症检测的异质子图网络与即时学习

摘要: 大规模社交媒体数据可以反映人们真实的想法、情绪、沟通等，因此可以用于早期检测抑郁等心理健康问题。现有关于社交媒体早期抑郁检测的研究缺乏可解释性，忽视了社交媒体数据的异质性。此外，它们忽略了用户之间的全局互动。为了解决这些问题，我们开发了一种利用异构子图网络与提示学习（HSNPL）和对比学习机制的新方法。具体来说，提示学习被用来将用户的隐含心理符号映射为具有出色可解释性的特征，同时，通过异构信息网络融入深层语义和多样化的行为特征。然后，构建了带有双重注意机制的异构图网络，以模拟特征级别上异质社交信息之间的关系。此外，开发了整合子图注意力和自监督对比学习的异构子图网络，以探索用户和群体之间复杂的互动关系。大量实验证明，我们提出的方法在社交媒体抑郁检测方面明显优于现有方法。

更新时间: 2024-07-12 06:20:59

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2407.09019v1

Continual Developmental Neurosimulation Using Embodied Computational Agents

There is much to learn through synthesis of Developmental Biology, Cognitive Science and Computational Modeling. Our path forward involves a design for developmentally-inspired learning agents based on Braitenberg Vehicles. Continual developmental neurosimulation allows us to consider the role of developmental trajectories in bridging the related phenomena of nervous system morphogenesis, developmental learning, and plasticity. Being closely tied to continual learning, our approach is tightly integrated with developmental embodiment, and can be implemented using a type of agent called developmental Braitenberg Vehicles (dBVs). dBVs begin their lives as a set of undefined structures that transform into agent-based systems including a body, sensors, effectors, and nervous system. This phenotype is characterized in terms of developmental timing: with distinct morphogenetic, critical, and acquisition (developmental learning) periods. We further propose that network morphogenesis can be accomplished using a genetic algorithmic approach, while developmental learning can be implemented using a number of computational methodologies. This approach provides a framework for adaptive agent behavior that might result from a developmental approach: namely by exploiting critical periods or growth and acquisition, an explicitly embodied network architecture, and a distinction between the assembly of neuronal networks and active learning on these networks. In conclusion, we will consider agent learning and development at different timescales, from very short (<100ms) intervals to long-term evolution. The development, evolution, and learning in an embodied agent-based approach is key to an integrative view of biologically-inspired intelligence.

Updated: 2024-07-12 06:10:30

标题: 持续发展的神经模拟：使用具体化的计算代理

摘要: 通过发展生物学、认知科学和计算建模的综合，我们有很多东西可以学习。我们的前进道路涉及基于Braitenberg车辆的发展启发式学习代理的设计。持续的发展性神经模拟使我们能够考虑发展轨迹在连接神经系统形态发生、发展性学习和可塑性等相关现象中的作用。与持续学习紧密相关，我们的方法与发展性体现紧密结合，可以使用一种称为发展性Braitenberg车辆（dBVs）的代理类型来实现。dBVs开始时是一组未定义的结构，然后转变成包括身体、传感器、执行器和神经系统在内的基于代理的系统。这种表型以发展时机为特征：具有明显的形态发生、关键和习得（发展性学习）阶段。我们进一步建议可以通过一种遗传算法的方法来完成网络形态发生，而发展性学习可以使用多种计算方法来实现。这种方法为可能由发展性方法导致的自适应代理行为提供了一个框架：即通过利用关键时期或生长和习得，一个明确体现的网络架构，以及神经网络的组装和对这些网络的主动学习之间的区别。总之，我们将考虑代理学习和发展在不同时间尺度上，从非常短的（<100ms）间隔到长期演变。在一个具有身体的代理为基础的方法中，发展、演变和学习对于生物启发智能的综合观点至关重要。

更新时间: 2024-07-12 06:10:30

领域: q-bio.NC,cs.AI,cs.NE

下载: http://arxiv.org/abs/2103.05753v3

AI-Driven Guided Response for Security Operation Centers with Microsoft Copilot for Security

Security operation centers contend with a constant stream of security incidents, ranging from straightforward to highly complex. To address this, we developed Copilot Guided Response (CGR), an industry-scale ML architecture that guides security analysts across three key tasks -- (1) investigation, providing essential historical context by identifying similar incidents; (2) triaging to ascertain the nature of the incident -- whether it is a true positive, false positive, or benign positive; and (3) remediation, recommending tailored containment actions. CGR is integrated into the Microsoft Defender XDR product and deployed worldwide, generating millions of recommendations across thousands of customers. Our extensive evaluation, incorporating internal evaluation, collaboration with security experts, and customer feedback, demonstrates that CGR delivers high-quality recommendations across all three tasks. We provide a comprehensive overview of the CGR architecture, setting a precedent as the first cybersecurity company to openly discuss these capabilities in such depth. Additionally, we GUIDE, the largest public collection of real-world security incidents, spanning 13M evidences across 1M annotated incidents. By enabling researchers and practitioners to conduct research on real-world data, GUIDE advances the state of cybersecurity and supports the development of next-generation machine learning systems.

Updated: 2024-07-12 06:10:01

标题: 人工智能驱动的安全运营中心指导响应：使用微软安全协作助手

摘要: 安全运营中心不断应对各种安全事件，从简单到高度复杂不等。为了解决这一问题，我们开发了Copilot Guided Response（CGR），这是一个行业规模的机器学习架构，引导安全分析人员完成三项关键任务：（1）调查，通过识别类似事件提供必要的历史背景；（2）分类，确定事件的性质——是真正的阳性、假阳性还是良性阳性；以及（3）补救，推荐量身定制的遏制行动。CGR已整合到微软Defender XDR产品中，并在全球范围内部署，为数千客户生成数百万条建议。我们进行了广泛的评估，包括内部评估、与安全专家的合作以及客户反馈，证明CGR在所有三项任务中提供高质量的建议。我们提供了CGR架构的全面概述，作为首家公开深入讨论这些能力的网络安全公司的先例。此外，我们还推出了GUIDE，这是最大的公开实际安全事件集合，涵盖了1000万个带注释事件的1300万证据。通过使研究人员和从业者能够在实际数据上进行研究，GUIDE推动了网络安全的发展，并支持下一代机器学习系统的开发。

更新时间: 2024-07-12 06:10:01

领域: cs.LG,cs.CR,cs.IR

下载: http://arxiv.org/abs/2407.09017v1

VS-PINN: A fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior

Physics-informed neural networks (PINNs) have recently emerged as a promising way to compute the solutions of partial differential equations (PDEs) using deep neural networks. However, despite their significant success in various fields, it remains unclear in many aspects how to effectively train PINNs if the solutions of PDEs exhibit stiff behaviors or high frequencies. In this paper, we propose a new method for training PINNs using variable-scaling techniques. This method is simple and it can be applied to a wide range of problems including PDEs with rapidly-varying solutions. Throughout various numerical experiments, we will demonstrate the effectiveness of the proposed method for these problems and confirm that it can significantly improve the training efficiency and performance of PINNs. Furthermore, based on the analysis of the neural tangent kernel (NTK), we will provide theoretical evidence for this phenomenon and show that our methods can indeed improve the performance of PINNs.

Updated: 2024-07-12 06:08:09

标题: VS-PINN: 使用可变缩放方法快速高效地训练物理知识神经网络，解决具有刚性行为的偏微分方程

摘要: 物理信息神经网络（PINNs）最近已经出现，成为使用深度神经网络计算偏微分方程（PDEs）解的一种有前途的方法。然而，尽管它们在各个领域取得了显著的成功，但在许多方面仍不清楚如何有效地训练PINNs，如果PDEs的解表现出僵硬行为或高频率的话。在本文中，我们提出了一种使用可变缩放技术训练PINNs的新方法。这种方法简单，并且可以应用于包括具有快速变化解的PDEs在内的广泛问题范围。通过各种数字实验，我们将展示所提出方法在这些问题上的有效性，并确认它可以显著提高PINNs的训练效率和性能。此外，基于神经切线核（NTK）的分析，我们将提供理论证据支持这一现象，并展示我们的方法确实可以提高PINNs的性能。

更新时间: 2024-07-12 06:08:09

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2406.06287v2

Static Analysis of Logic Programs via Boolean Networks

Answer Set Programming (ASP) is a declarative problem solving paradigm that can be used to encode a combinatorial problem as a logic program whose stable models correspond to the solutions of the considered problem. ASP has been widely applied to various domains in AI and beyond. The question "What can be said about stable models of a logic program from its static information?" has been investigated and proved useful in many circumstances. In this work, we dive into this direction more deeply by making the connection between a logic program and a Boolean network, which is a prominent modeling framework with applications to various areas. The proposed connection can bring the existing results in the rich history on static analysis of Boolean networks to explore and prove more theoretical results on ASP, making it become a unified and powerful tool to further study the static analysis of ASP. In particular, the newly obtained insights have the potential to benefit many problems in the field of ASP.

Updated: 2024-07-12 06:07:05

标题: 透过布尔网络对逻辑程序进行静态分析

摘要: 答案集编程（ASP）是一种声明性问题解决范式，可以用来将一个组合问题编码为一个逻辑程序，其稳定模型对应于所考虑问题的解决方案。ASP已广泛应用于人工智能及其他领域。关于“从静态信息中可以得出有关逻辑程序稳定模型的什么信息？”这个问题已经被研究并在许多情况下被证明是有用的。在这项工作中，我们通过将逻辑程序与布尔网络联系起来，深入探讨了这个方向，布尔网络是一个具有在各个领域应用的突出建模框架。所提出的连接可以将布尔网络静态分析丰富历史上的现有结果带来，并探索和证明更多关于ASP的理论结果，使其成为进一步研究ASP静态分析的统一和强大工具。特别是，新获得的见解有潜力使许多ASP领域中的问题受益。

更新时间: 2024-07-12 06:07:05

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2407.09015v1

Data organization limits the predictability of binary classification

The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering.

Updated: 2024-07-12 06:04:47

标题: 数据组织限制了二元分类的可预测性。

摘要: 数据组织结构被广泛认为对机器学习算法的效力有着重要影响，特别是在二元分类任务中。我们的研究提供了一个理论框架，表明在给定数据集上二元分类器的最大潜力主要受到数据固有特性的限制。通过理论推理和经验检验，我们使用了标准的客观函数、评估指标和二元分类器得出了两个主要结论。首先，我们展示了在实际数据集上二元分类性能的理论上限是可以在理论上达到的。这个上限代表了学习损失与评估指标之间的可计算平衡。其次，我们计算了三个常用评估指标的精确上限，揭示了与我们的总体论点密切相关的基本一致性：上限与数据集的特征紧密相关，而独立于所使用的分类器。此外，我们的后续分析揭示了性能上限与二元分类数据中类别重叠水平之间的详细关系。这种关系对于找到最有效的特征子集以用于特征工程非常重要。

更新时间: 2024-07-12 06:04:47

领域: cs.LG,cs.DS,physics.data-an

下载: http://arxiv.org/abs/2401.17036v2

Procedural Content Generation via Generative Artificial Intelligence

The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is effective for PCG, one significant issues it faces is that building high-performance generative AI requires vast amounts of training data. Because content generally highly customized, domain-specific training data is scarce, and straightforward approaches to generative AI models may not work well. For PCG research to advance further, issues related to limited training data must be overcome. Thus, we also give special consideration to research that addresses the challenges posed by limited training data.

Updated: 2024-07-12 06:03:38

标题: 通过生成式人工智能的程序化内容生成

摘要: 过去已经尝试过在PCG中利用机器学习。在这篇综述论文中，我们调查了生成人工智能（AI）如何被用于PCG，这在2010年代中期引起了极大的兴趣。我们回顾了生成AI在创建各种类型内容（包括地形、物品，甚至情节）方面的应用。虽然生成AI对PCG是有效的，但它面临的一个重要问题是，构建高性能的生成AI需要大量的训练数据。由于内容通常高度定制化，领域特定的训练数据是稀缺的，而且对生成AI模型的简单方法可能不起作用。为了推进PCG研究，必须克服与有限训练数据相关的问题。因此，我们还特别关注了解决有限训练数据带来挑战的研究。

更新时间: 2024-07-12 06:03:38

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09013v1

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/

Updated: 2024-07-12 06:02:13

标题: TCAN：使用扩散模型的时间一致姿势指导为人类图像添加动画

摘要: Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/

更新时间: 2024-07-12 06:02:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.09012v1

One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

This paper presents a novel and comprehensive solution to enhance both the robustness and efficiency of question answering (QA) systems through supervised contrastive learning (SCL). Training a high-performance QA system has become straightforward with pre-trained language models, requiring only a small amount of data and simple fine-tuning. However, despite recent advances, existing QA systems still exhibit significant deficiencies in functionality and training efficiency. We address the functionality issue by defining four key tasks: user input intent classification, out-of-domain input detection, new intent discovery, and continual learning. We then leverage a unified SCL-based representation learning method to efficiently build an intra-class compact and inter-class scattered feature space, facilitating both known intent classification and unknown intent detection and discovery. Consequently, with minimal additional tuning on downstream tasks, our approach significantly improves model efficiency and achieves new state-of-the-art performance across all tasks.

Updated: 2024-07-12 06:01:51

标题: 一石四鸟：使用监督对比学习的QA系统全面解决方案

摘要: 这篇论文提出了一种新颖且全面的解决方案，通过监督对比学习（SCL）来增强问答（QA）系统的鲁棒性和效率。通过预训练语言模型，训练高性能QA系统已变得简单，只需要少量数据和简单的微调。然而，尽管最近取得了进展，现有的QA系统仍然在功能性和训练效率方面存在显著缺陷。我们通过定义四个关键任务来解决功能性问题：用户输入意图分类，域外输入检测，新意图发现和持续学习。然后，我们利用统一的基于SCL的表示学习方法，有效地构建一个类内紧凑和类间分散的特征空间，促进已知意图分类和未知意图检测和发现。因此，通过在下游任务上进行最少的额外调整，我们的方法显著提高了模型的效率，并在所有任务中实现了新的最先进性能。

更新时间: 2024-07-12 06:01:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09011v1

A Comprehensive Review of Community Detection in Graphs

The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a thorough exposition of various community detection methods from perspectives of modularity-based method, spectral clustering, probabilistic modelling, and deep learning. Along with the methods, a new community detection method designed by us is also presented. Additionally, the performance of these methods on the datasets with and without ground truth is compared. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs.

Updated: 2024-07-12 05:55:47

标题: 图中社区检测的综合回顾

摘要: 复杂网络的研究显著推动了我们对社区结构的理解，这是现实世界图的一个关键特征。在图中检测社区是一个具有挑战性的问题，在社会学、生物学和计算机科学等领域有着广泛的应用。尽管跨学科科学界的努力，对这一问题尚未达到令人满意的解决方案。本综述文章深入探讨了图中社区检测的主题，全面介绍了各种社区检测方法，包括基于模块性的方法、谱聚类、概率建模和深度学习等不同视角。除了这些方法，我们还提出了一种新的社区检测方法。此外，还比较了这些方法在有和无基准数据集上的性能。总之，这篇综述文章提供了对图中社区检测的深入理解。

更新时间: 2024-07-12 05:55:47

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2309.11798v5

Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions. To address these challenges, high-performance deep learning algorithms tailored to maritime imagery and high-quality datasets specialized for maritime scenes are essential. Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems. Therefore, in this paper, we propose a Vertical and Detail Attention (VaDA) model for maritime object segmentation and a new model evaluation method, the Integrated Figure of Calculation Performance (IFCP), to verify its suitability for the system in real-time. Additionally, we introduce a benchmark maritime dataset, OASIs (Ocean AI Segmentation Initiatives) to standardize model performance evaluation across diverse maritime environments. OASIs dataset and details are available at our website: https://www.navlue.com/dataset

Updated: 2024-07-12 05:48:53

标题: 引入VaDA：利用新数据集进行海事目标分割的新型图像分割模型

摘要: 海运航运行业正在快速发展，受计算机视觉人工智能（AI）技术进步的推动。因此，针对海运运输的基于AI的物体识别模型的研究正在稳步增长，利用了传感器技术和计算性能的进步。然而，在海运环境中的物体识别面临诸如光反射、干扰、强烈光照和各种天气条件等挑战。为了解决这些挑战，针对海洋影像的高性能深度学习算法和专门针对海洋场景的高质量数据集至关重要。现有的AI识别模型和数据集在构建自主导航系统方面适用性有限。因此，在本文中，我们提出了一种适用于海洋物体分割的垂直和细节关注（VaDA）模型，以及一种新的模型评估方法，即综合计算性能指数（IFCP），以验证其在实时系统中的适用性。此外，我们引入了一个基准海洋数据集，OASIs（洋海AI分割倡议），以标准化对不同海洋环境中模型性能的评估。OASIs数据集和详细信息可在我们的网站上找到：https://www.navlue.com/dataset

更新时间: 2024-07-12 05:48:53

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2407.09005v1

Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision

The data revolution holds significant promise for the health sector. Vast amounts of data collected from individuals will be transformed into knowledge, AI models, predictive systems, and best practices. One area of health that stands to benefit greatly is the genomic domain. Progress in AI, machine learning, and data science has opened new opportunities for genomic research, promising breakthroughs in personalized medicine. However, increasing awareness of privacy and cybersecurity necessitates robust solutions to protect sensitive data in collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD, a platform for secure health data collaboration. The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data while mitigating risks associated with data breaches. By integrating advanced privacy-preserving algorithms, the solution ensures the protection of individual privacy without compromising data utility. A unique feature of the system is its ability to balance trade-offs between data sharing and privacy, providing stakeholders tools to quantify privacy risks and make informed decisions. Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques. This approach preserves essential statistical properties of the data, facilitating effective research and analysis. Moreover, the system incorporates real-time data monitoring and advanced visualization tools, enhancing user experience and decision-making. The paper highlights the need for tailored privacy attacks and defenses specific to genomic data. Addressing these challenges fosters collaboration in genomic research, advancing personalized medicine and public health.

Updated: 2024-07-12 05:43:13

标题: 隐私保护的协作基因组研究：实际部署与愿景

摘要: 数据革命对健康领域具有重要的潜力。从个人收集的大量数据将转化为知识、人工智能模型、预测系统和最佳实践。一个能够大大受益的健康领域是基因组学领域。人工智能、机器学习和数据科学的进步为基因组研究开辟了新的机遇，承诺在个性化医疗方面取得突破。然而，对隐私和网络安全的日益关注需要强有力的解决方案来保护合作研究中的敏感数据。本文介绍了一个针对基因组研究的隐私保护框架的实际部署，与Lynx.MD合作开发，这是一个用于安全卫生数据协作的平台。该框架解决了关键的网络安全和隐私挑战，实现了基因组数据的隐私保护共享和分析，同时减轻了与数据泄露相关的风险。通过整合先进的隐私保护算法，该解决方案确保了个人隐私的保护，而不会损害数据的效用。该系统的一个独特特点是其能够在数据共享和隐私之间平衡权衡，为利益相关者提供工具来量化隐私风险并做出明智决策。将该框架应用于Lynx.MD涉及将基因组数据编码为二进制格式，并通过受控扰动技术施加噪声。这种方法保留了数据的基本统计特性，促进了有效的研究和分析。此外，该系统还整合了实时数据监控和先进的可视化工具，增强了用户体验和决策能力。本文强调了针对基因组数据特定的定制隐私攻击和防御的需求。解决这些挑战促进了基因组研究的合作，推动了个性化医疗和公共卫生的发展。

更新时间: 2024-07-12 05:43:13

领域: cs.CR

下载: http://arxiv.org/abs/2407.09004v1

Enhancing Few-Shot Stock Trend Prediction with Large Language Models

The goal of stock trend prediction is to forecast future market movements for informed investment decisions. Existing methods mostly focus on predicting stock trends with supervised models trained on extensive annotated data. However, human annotation can be resource-intensive and the annotated data are not readily available. Inspired by the impressive few-shot capability of Large Language Models (LLMs), we propose using LLMs in a few-shot setting to overcome the scarcity of labeled data and make prediction more feasible to investors. Previous works typically merge multiple financial news for predicting stock trends, causing two significant problems when using LLMs: (1) Merged news contains noise, and (2) it may exceed LLMs' input limits, leading to performance degradation. To overcome these issues, we propose a two-step method 'denoising-then-voting'. Specifically, we introduce an `Irrelevant' category, and predict stock trends for individual news instead of merged news. Then we aggregate these predictions using majority voting. The proposed method offers two advantages: (1) Classifying noisy news as irrelevant removes its impact on the final prediction. (2) Predicting for individual news mitigates LLMs' input length limits. Our method achieves 66.59% accuracy in S&P 500, 62.17% in CSI-100, and 61.17% in HK stock prediction, outperforming the standard few-shot counterparts by around 7%, 4%, and 4%. Furthermore, our proposed method performs on par with state-of-the-art supervised methods.

Updated: 2024-07-12 05:43:11

标题: 利用大型语言模型增强少样本股票趋势预测

摘要: 股票趋势预测的目标是为了预测未来市场走势，以便做出明智的投资决策。现有方法主要集中在使用受过广泛标注数据训练的监督模型来预测股票趋势。然而，人工标注可能需要大量资源，并且标注数据并不容易获得。受大型语言模型（LLMs）出色的少样本能力的启发，我们提出在少样本环境中使用LLMs来克服标记数据稀缺性，并使投资者更容易进行预测。先前的研究通常合并多个财经新闻来预测股票趋势，但使用LLMs时会出现两个重要问题：（1）合并新闻包含噪音，（2）可能超出LLMs的输入限制，导致性能下降。为了解决这些问题，我们提出了一个两步方法“去噪-投票”。具体而言，我们引入了一个“无关”的类别，并对个别新闻而不是合并新闻进行股票趋势预测。然后，我们使用多数投票来汇总这些预测结果。所提出的方法具有两个优点：（1）将噪音新闻分类为无关新闻可以消除其对最终预测结果的影响。（2）对个别新闻进行预测可以减轻LLMs的输入长度限制。我们的方法在标准普尔500指数中达到了66.59％的准确率，在CSI-100中为62.17％，在香港股票预测中为61.17％，比标准的少样本方法分别高出约7％，4％和4％。此外，我们提出的方法与最先进的监督方法表现相当。

更新时间: 2024-07-12 05:43:11

领域: cs.AI

下载: http://arxiv.org/abs/2407.09003v1

GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs for evaluation. This method facilitates the separation of query logic from linguistic variations, enabling the testing of hypotheses related to non-robust textual forms; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities. For implementation details, refer to our GitHub repository: https://github.com/xinzhel/grammar.

Updated: 2024-07-12 05:16:30

标题: 语法：基于实证和模块化方法的封闭领域检索增强语言模型评估

摘要: 检索增强生成（RAG）系统已经被广泛研究和部署在各个行业中，用于查询领域特定的知识库。然而，评估这些系统面临独特的挑战，因为领域特定查询和相应的地面真相稀缺，以及缺乏系统性的方法来诊断失败案例的原因 - 无论是由于知识缺陷还是与系统稳健性有关的问题。为了解决这些挑战，我们引入了GRAMMAR（GRounded And Modular Methodology for Assessment of RAG），这是一个评估框架，包括两个关键元素：1）利用关系数据库和LLMs的数据生成过程，有效产生可扩展的查询-答案对进行评估。这种方法有助于将查询逻辑与语言变体分离，从而使得可以测试与非稳健文本形式相关的假设；2）一个评估框架，区分知识空白和稳健性，并能够识别有缺陷的模块。我们的实证结果强调了当前无参考评估方法的局限性，以及GRAMMAR准确识别模型脆弱性的可靠性。有关实施细节，请参阅我们的GitHub存储库：https://github.com/xinzhel/grammar。

更新时间: 2024-07-12 05:16:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.19232v5

Emotion Talk: Emotional Support via Audio Messages for Psychological Assistance

This paper presents "Emotion Talk," a system designed to provide continuous emotional support through audio messages for psychological assistance. The primary objective is to offer consistent support to patients outside traditional therapy sessions by analyzing audio messages to detect emotions and generate appropriate responses. The solution focuses on Portuguese-speaking users, ensuring that the system is linguistically and culturally relevant. This system aims to complement and enhance the psychological follow-up process conducted by therapists, providing immediate and accessible assistance, especially in emergency situations where rapid response is crucial. Experimental results demonstrate the effectiveness of the proposed system, highlighting its potential in applications of psychological support.

Updated: 2024-07-12 05:13:17

标题: 情感交流：通过音频消息提供心理援助的情感支持

摘要: 本文介绍了“情感对话”系统，旨在通过音频信息为心理援助提供持续的情感支持。主要目标是通过分析音频信息来检测情绪并生成适当的回应，为病人在传统治疗会话之外提供持续的支持。该解决方案针对讲葡萄牙语的用户，确保系统在语言和文化上具有相关性。该系统旨在补充和增强由治疗师进行的心理跟进过程，提供即时可及的帮助，特别是在紧急情况下迅速响应至关重要。实验结果显示了所提出系统的有效性，突显了其在心理支持应用中的潜力。

更新时间: 2024-07-12 05:13:17

领域: cs.AI

下载: http://arxiv.org/abs/2407.08992v1

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.

Updated: 2024-07-12 05:10:32

标题: 重尾类别不平衡和为什么Adam在语言模型上表现优于梯度下降

摘要: Adam已被证明在大型语言模型上比梯度下降表现更好，优势比在其他任务上更大，但其原因尚不明确。我们展示了在语言任务中发现的重尾类别不平衡是导致这种性能差距的关键因素。当使用梯度下降训练时，罕见单词的损失下降速度比频繁单词的损失下降速度慢。这导致平均损失缓慢下降，因为大多数样本来自罕见单词。另一方面，Adam和基于符号的方法对这个问题不太敏感。为了证明这种行为是由类别不平衡引起的，我们展示了在语言变换器、视觉CNN和线性模型上可以重现这种现象，跨体系结构和数据类型。在一个带有交叉熵损失的线性模型上，我们展示了类别不平衡导致不平衡、相关的梯度和Hessian矩阵，据推测这有利于Adam。我们还证明，在连续时间中，梯度下降在低频类别上收敛缓慢，而符号下降则不会。

更新时间: 2024-07-12 05:10:32

领域: cs.LG,cs.CL,math.OC,stat.ML

下载: http://arxiv.org/abs/2402.19449v2

Optimization of DNN-based speaker verification model through efficient quantization technique

As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model. By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size. Our quantization algorithm is the first attempt to maintain the performance of the state-of-the-art pre-trained speaker verification model, ECAPATDNN, while significantly compressing its model size. Overall, our quantization approach resulted in reducing the model size by half, with an increase in EER limited to 0.07%.

Updated: 2024-07-12 05:03:10

标题: 基于高效量化技术优化的基于DNN的说话人验证模型

摘要: 随着深度神经网络（DNNs）在包括语音验证在内的各个领域的快速发展，它们通常涉及高计算成本和大量内存消耗，这可能在移动系统上难以管理。深度模型的量化提供了减少计算和内存开销的手段。我们的研究提出了一种优化框架，用于量化说话者验证模型。通过分析预训练说话者验证模型中每一层的性能变化和模型大小的减小，我们有效地最小化了性能下降，同时显著减少了模型大小。我们的量化算法是维持最先进的预训练说话者验证模型ECAPATDNN性能的第一次尝试，同时显著压缩其模型大小。总体而言，我们的量化方法导致模型大小减少一半，EER仅增加0.07%。

更新时间: 2024-07-12 05:03:10

领域: eess.AS,cs.AI,cs.CC

下载: http://arxiv.org/abs/2407.08991v1

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.

Updated: 2024-07-12 04:55:57

标题: 使用具有可变电阻存储器的动态神经网络进行二维和三维视觉处理

摘要: 大脑是动态的、联想的和高效的。它通过将输入与过去的经验联系起来重新配置，具有融合的记忆和处理能力。相比之下，人工智能模型是静态的，无法将输入与过去的经验联系起来，并在物理上分离的内存和处理的数字计算机上运行。我们提出了一种硬件-软件协同设计，基于语义记忆的动态神经网络(DNN)使用忆阻器。该网络将传入数据与存储为语义向量的过去经验关联起来。网络和语义记忆分别在噪声鲁棒的三值忆阻器计算内存(CIM)和内容寻址内存(CAM)电路上物理实现。我们验证了我们的协同设计，使用一个40纳米忆阻器宏，对ResNet和PointNet++进行了验证，用于对来自MNIST和ModelNet数据集的图像和3D点进行分类，不仅实现了与软件相当的准确率，还实现了48.1%和15.9%的计算预算减少。此外，它还实现了77.6%和93.3%的能耗减少。

更新时间: 2024-07-12 04:55:57

领域: cs.AR,cs.AI,cs.ET,cs.NE

下载: http://arxiv.org/abs/2407.08990v1

Vision language models are blind

Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: https://vlmsareblind.github.io/

Updated: 2024-07-12 04:55:18

标题: 视觉语言模型是盲目的

摘要: 大型具有视觉能力的语言模型（VLMs），例如GPT-4o和Gemini 1.5 Pro，正在为无数图像文本应用程序提供动力，并在许多视觉理解基准测试中得分很高。我们提出了BlindTest，一个包含7个对人类来说绝对容易的视觉任务的套件，例如识别（a）两个圆是否重叠；（b）两条线是否相交；（c）在一个单词中哪个字母被圈出；以及（d）数出奥林匹克式标志中的圆圈数量。令人惊讶的是，四种最先进的VLMs在我们的基准测试中平均只有56.20%的准确率，其中\newsonnet效果最好（73.77%准确率）。在BlindTest中，VLMs在需要精确空间信息和计数（从0到10）的任务上表现困难，有时会给出近视患者看到细节模糊并进行猜测的印象。代码可在以下网址获取：https://vlmsareblind.github.io/

更新时间: 2024-07-12 04:55:18

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06581v3

Robustness of LLMs to Perturbations in Text

Having a clean dataset has been the foundational assumption of most natural language processing (NLP) systems. However, properly written text is rarely found in real-world scenarios and hence, oftentimes invalidates the aforementioned foundational assumption. Recently, Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data? This work tackles this critical question by investigating LLMs' resilience against morphological variations in text. To that end, we artificially introduce varying levels of noise into a diverse set of datasets and systematically evaluate LLMs' robustness against the corrupt variations of the original text. Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text. This is a departure from pre-trained models like BERT or RoBERTa whose performance has been shown to be sensitive to deteriorating noisy text. Additionally, we test LLMs' resilience on multiple real-world benchmarks that closely mimic commonly found errors in the wild. With minimal prompting, LLMs achieve a new state-of-the-art on the benchmark tasks of Grammar Error Correction (GEC) and Lexical Semantic Change (LSC). To empower future research, we also release a dataset annotated by humans stating their preference for LLM vs. human-corrected outputs along with the code to reproduce our results.

Updated: 2024-07-12 04:50:17

标题: 文本中对LLM的扰动稳健性

摘要: 拥有一个干净的数据集一直是大多数自然语言处理（NLP）系统的基础假设。然而，在真实世界的情况下很少能找到适当编写的文本，因此经常会使上述基础假设失效。最近，大型语言模型（LLMs）展现出了令人印象深刻的性能，但它们能处理真实世界数据中不可避免的噪音吗？这项工作通过研究LLMs对文本中形态变化的抗性来探讨这一关键问题。为此，我们人为地向各种数据集引入不同程度的噪音，系统地评估LLMs对原始文本的损坏变体的鲁棒性。我们的研究结果表明，与普遍观念相反，生成式LLMs对文本中的噪音扰动具有相当的鲁棒性。这与像BERT或RoBERTa这样的预训练模型不同，后者的性能已被证明对恶化的嘈杂文本敏感。此外，我们还在多个真实世界基准测试中测试了LLMs的鲁棒性，这些测试基本上模拟了野外常见的错误。在最小提示下，LLMs在语法错误纠正（GEC）和词汇语义变化（LSC）基准任务上取得了新的最新成果。为了推动未来的研究，我们还发布了由人类注释的数据集，其中说明了他们对LLM与人类纠正输出的偏好，以及重现我们结果的代码。

更新时间: 2024-07-12 04:50:17

领域: cs.CL,cs.AI,I.7; I.2.7; I.2.4

下载: http://arxiv.org/abs/2407.08989v1

STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning

As supervised fine-tuning of pre-trained models within NLP applications increases in popularity, larger corpora of annotated data are required, especially with increasing parameter counts in large language models. Active learning, which attempts to mine and annotate unlabeled instances to improve model performance maximally fast, is a common choice for reducing the annotation cost; however, most methods typically ignore class imbalance and either assume access to initial annotated data or require multiple rounds of active learning selection before improving rare classes. We present STENCIL, which utilizes a set of text exemplars and the recently proposed submodular mutual information to select a set of weakly labeled rare-class instances that are then strongly labeled by an annotator. We show that STENCIL improves overall accuracy by $10\%-18\%$ and rare-class F-1 score by $17\%-40\%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.

Updated: 2024-07-12 04:44:39

标题: STENCIL：基于次模互信息的冷启动主动学习弱监督

摘要: 随着在自然语言处理应用中对预训练模型进行监督微调的流行度增加，需要更大的带有标注数据的语料库，尤其是在大型语言模型的参数数量不断增加的情况下。主动学习试图挖掘和标注未标记实例以尽快提高模型性能，是减少标注成本的常见选择；然而，大多数方法通常忽视类别不平衡，并且要么假设可以访问初始标注数据，要么需要多轮主动学习选择才能改进稀有类别。我们提出了STENCIL，它利用一组文本示例和最近提出的子模块互信息来选择一组弱标记的稀有类别实例，然后由标注者进行强标记。我们展示了STENCIL在多个文本分类数据集上比常见主动学习方法在类别不平衡的冷启动设置中提高了整体准确率10%-18%和稀有类别F-1得分17%-40%。

更新时间: 2024-07-12 04:44:39

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.13468v2

Parameter inference from a non-stationary unknown process

Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration, to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet most time-series analysis methods that are used in practice, on important problems across science and industry, make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as Parameter Inference from a Non-stationary Unknown Process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here we review and unify a diverse literature of algorithms for PINUP. We formulate the problem, and categorize the various algorithmic contributions. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on - notably the non-stationary Lorenz process and logistic map - are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.

Updated: 2024-07-12 04:44:29

标题: 从非平稳未知过程中的参数推断

摘要: 非平稳系统遍布世界各地，从受二氧化碳浓度变化影响的气候模式，到由上升神经调节驱动的大脑动态。因此，需要一种分析非平稳过程的方法，然而，目前在实践中使用的大多数时间序列分析方法都做出了平稳性的简化假设，这些方法在科学和工业领域的重要问题上使用。在分析非平稳系统中的一个重要问题是我们所谓的来自非平稳未知过程的参数推断（PINUP）问题类。给定一个观测的时间序列，这涉及推断驱动时间序列非平稳性的参数，而无需了解或推断基础系统的数学模型。在这里，我们回顾和统一了一系列有关PINUP的算法文献。我们制定了问题，并对各种算法贡献进行了分类。这种综合将使研究人员能够识别文献中的空白，并能够对不同方法进行系统比较。我们还证明了现有方法经常被测试的最常见系统 - 特别是非平稳的Lorenz过程和Logistic映射 - 使用简单的统计特征如窗口均值和方差居然很容易取得良好表现，削弱了在这些系统上表现良好作为算法性能证据的做法。然后，我们确定许多现有方法表现不佳的更具挑战性的问题，并可以用来推动该领域的方法论进步。我们的结果统一了对分析非平稳系统的不连贯科学贡献，并为解决PINUP问题和更广泛研究非平稳现象提出了新的方向。

更新时间: 2024-07-12 04:44:29

领域: physics.data-an,cs.LG,nlin.CD,stat.ML

下载: http://arxiv.org/abs/2407.08987v1

BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.

Updated: 2024-07-12 04:43:48

标题: BaCon：通过平衡特征级对比学习增强不平衡的半监督学习

摘要: 半监督学习（SSL）减少了深度学习中对广泛标注的需求，但SSL中不平衡数据分布的更现实挑战仍未被充分探索。在类别不平衡的半监督学习（CISSL）中，由于不平衡的数据分布，不可靠伪标签引入的偏见可能会加剧。大多数现有方法通过重新加权或重新采样来解决这个问题，但其性能受到对有偏骨干表示的依赖的严重限制。一些其他方法确实进行了特征级别的调整，如特征融合，但可能会引入不利的噪声。在本文中，我们讨论了对CISSL问题更平衡的特征分布的好处，并进一步提出了一种平衡特征级对比学习方法（BaCon）。我们的方法通过一种精心设计的对比方式直接规范实例的表示分布。具体而言，类别特征中心被计算为正锚点，而负锚点则通过简单而有效的机制进行选择。利用与分布相关的温度调整来动态控制类别对比程度。我们的方法通过在不同设置下对CIFAR10-LT、CIFAR100-LT、STL10-LT和SVHN-LT数据集进行全面实验来展示其有效性。例如，BaCon在CIFAR10-LT上超过了基于实例级方法FixMatch的ABC，准确度提高了1.21%，并在CIFAR100-LT上超过了最先进的特征级方法CoSSL，准确度提高了0.63%。当遇到更极端的不平衡程度时，BaCon也表现出比其他方法更好的鲁棒性。

更新时间: 2024-07-12 04:43:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.12986v2

Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations

Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more interpretable an LLM is, the more trustworthy it becomes. However, current techniques for interpreting LLMs when applied to code-related tasks largely focus on accuracy measurements, measures of how models react to change, or individual task performance instead of the fine-grained explanations needed at prediction time for greater interpretability, and hence trust. To improve upon this status quo, this paper introduces ASTrust, an interpretability method for LLMs of code that generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages. ASTrust explains generated code in the context of syntax categories based on Abstract Syntax Trees and aids practitioners in understanding model predictions at both local (individual code snippets) and global (larger datasets of code) levels. By distributing and assigning model confidence scores to well-known syntactic structures that exist within ASTs, our approach moves beyond prior techniques that perform token-level confidence mapping by offering a view of model confidence that directly aligns with programming language concepts with which developers are familiar. To put ASTrust into practice, we developed an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs. We examine both the practical benefit that ASTrust can provide through a data science study on 12 popular LLMs on a curated set of GitHub repos and the usefulness of ASTrust through a human study.

Updated: 2024-07-12 04:38:28

标题: 朝着更加可信赖和可解释的LLMs代码模型的方向迈进：通过基于语法的解释

摘要: 信任度和可解释性是LLMs不可分割的概念。LLM越可解释，就越可信赖。然而，目前解释LLMs的技术在应用于与代码相关的任务时，主要集中在准确度测量、模型对变化的反应度量或个体任务性能，而不是在预测时需要更精细解释的细致解释，从而提高可解释性和信任度。为了改善这种现状，本文介绍了ASTrust，这是一种用于LLMs的代码可解释性方法，它生成了基于模型置信度和编程语言语法结构之间关系的解释。ASTrust解释生成的代码，基于抽象语法树中的语法类别，并帮助从业者理解模型预测在本地（个别代码片段）和全局（更大的代码数据集）水平上的预测。通过将模型置信度分配给存在于ASTs中的众所周知的语法结构，我们的方法超越了以往通过提供与开发人员熟悉的编程语言概念直接对齐的模型置信度视图的令牌级置信度映射技术。为了将ASTrust付诸实践，我们开发了一种自动可视化工具，该工具说明了在ASTs中的句法结构的序列、热图和基于图的视觉效果上叠加的聚合模型置信度分数。我们通过对12个流行的LLMs在一组GitHub repos上进行的数据科学研究和对ASTrust的人体研究来检验ASTrust可以提供的实际好处和ASTrust的实用性。

更新时间: 2024-07-12 04:38:28

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08983v1

Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 160 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly-used machine translation models under this setting. Furthermore, we introduce a potential approach of finetuning large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements over baselines. Through our comprehensive analysis, we unveil that literary translation under the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms.

Updated: 2024-07-12 04:18:22

标题: 朝向通过大型语言模型实现章节到章节上下文感知的文学翻译

摘要: 现有文档级翻译数据集中的话语现象很少，这一点一直是发展具有上下文意识的机器翻译模型的基本障碍。此外，大多数现有的文档级语料库和具有上下文意识的机器翻译方法都依赖于一个不切实际的假设，即句子级别的对齐。为了缓解这些问题，我们首先精心策划了一个新颖的中英文学文献数据集，包含了160本结构复杂的书籍。然后，我们提出了一个更加务实和具有挑战性的上下文感知翻译设置，称为章节到章节（Ch2Ch）翻译，并研究了常用机器翻译模型在此设置下的性能。此外，我们介绍了在Ch2Ch文学翻译领域中微调大型语言模型（LLMs）的一个潜在方法，取得了令人印象深刻的改进效果。通过我们的综合分析，我们揭示了在Ch2Ch设置下进行文学翻译具有挑战性，无论是在模型学习方法还是翻译解码算法方面。

更新时间: 2024-07-12 04:18:22

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08978v1

CURE: Privacy-Preserving Split Learning Done Right

Training deep neural networks often requires large-scale datasets, necessitating storage and processing on cloud servers due to computational constraints. The procedures must follow strict privacy regulations in domains like healthcare. Split Learning (SL), a framework that divides model layers between client(s) and server(s), is widely adopted for distributed model training. While Split Learning reduces privacy risks by limiting server access to the full parameter set, previous research has identified that intermediate outputs exchanged between server and client can compromise client's data privacy. Homomorphic encryption (HE)-based solutions exist for this scenario but often impose prohibitive computational burdens. To address these challenges, we propose CURE, a novel system based on HE, that encrypts only the server side of the model and optionally the data. CURE enables secure SL while substantially improving communication and parallelization through advanced packing techniques. We propose two packing schemes that consume one HE level for one-layer networks and generalize our solutions to n-layer neural networks. We demonstrate that CURE can achieve similar accuracy to plaintext SL while being 16x more efficient in terms of the runtime compared to the state-of-the-art privacy-preserving alternatives.

Updated: 2024-07-12 04:10:19

标题: CURE：隐私保护的正确拆分学习

摘要: 培训深度神经网络通常需要大规模数据集，由于计算约束，需要在云服务器上存储和处理。在诸如医疗保健等领域，这些程序必须遵守严格的隐私规定。分层学习（SL）是一种将模型层分为客户端和服务器端的框架，被广泛用于分布式模型训练。虽然分层学习通过限制服务器访问完整参数集来降低隐私风险，但先前的研究已经确定服务器和客户端之间交换的中间输出可能会危及客户端的数据隐私。针对这些挑战，我们提出了基于HE的新系统CURE，它只加密模型的服务器端和可选的数据。CURE通过先进的打包技术实现安全的SL，大大改善通信和并行化。我们提出了两种打包方案，为单层网络消耗一个HE级别，并将我们的解决方案推广到n层神经网络。我们证明了CURE可以在运行时间方面比起最先进的隐私保护替代方案更有效，同时达到与明文SL类似的准确性。

更新时间: 2024-07-12 04:10:19

领域: cs.CR

下载: http://arxiv.org/abs/2407.08977v1

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

Updated: 2024-07-12 04:08:01

标题: 使用随机傅里叶特征进行核两样本检验中的计算统计权衡

摘要: 近年来，双样本检验方法的数量激增，其中最大均值差异（MMD）检验已经成为处理复杂和高维数据的有效工具。尽管MMD检验取得了成功并被广泛采用，其主要局限性在于其二次时间复杂性，这对于大规模分析提出了挑战。虽然已经提出了各种方法来加速该过程，但目前尚不清楚是否可能以次二次时间成本获得与MMD测试相同的功率保证。为填补这一空白，我们重新审视了使用随机傅里叶特征的近似MMD检验，并研究其计算-统计权衡。我们首先揭示了近似MMD检验仅在随机特征数量接近无穷时在功率上是一致的。然后我们考虑了测试的统一功率，并在极小最大测试框架下研究了时间-功率权衡。我们的结果表明，通过精心选择随机特征的数量，可以在次二次时间内实现与MMD测试相同的最小最大分离率。我们在不同的分布假设下展示了这一点，例如Sobolev球中的密度。我们的理论发现得到了模拟研究的支持。

更新时间: 2024-07-12 04:08:01

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2407.08976v1

Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.

Updated: 2024-07-12 04:04:54

标题: 拓扑学增强的机器学习模型（Top-ML）用于抗癌肽预测

摘要: 最近，治疗性肽已经展示出在癌症治疗中具有巨大的潜力。为了探索强大的抗癌肽，已经开发了基于人工智能的方法来系统地筛选潜在的候选者。然而，肽的有效特征化的缺乏已经成为这些机器学习模型的瓶颈。在本文中，我们提出了一种拓扑增强的机器学习模型（Top-ML）用于抗癌肽预测。我们的Top-ML利用从其序列“连接”信息中衍生出的肽拓扑特征，这些特征由矢量和谱描述符表征。我们的Top-ML模型已经在两个广泛使用的AntiCP 2.0基准数据集上进行了验证，并取得了最先进的性能。我们的结果突显了利用新型基于拓扑的特征化来加速抗癌肽的识别的潜力。

更新时间: 2024-07-12 04:04:54

领域: q-bio.QM,cs.LG,math.GN,q-bio.BM

下载: http://arxiv.org/abs/2407.08974v1

Integrating White and Black Box Techniques for Interpretable Machine Learning

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).

Updated: 2024-07-12 03:58:04

标题: 将白盒和黑盒技术融合在一起，实现可解释的机器学习

摘要: 在机器学习算法设计中，存在着算法的可解释性和性能之间的权衡。一般来说，对人类来说更简单、更容易理解的算法往往表现比较差，而更复杂、更不透明的算法表现更好。例如，一个随机森林分类器可能比一个简单的决策树更准确，但代价是可解释性较差。在本文中，我们提出了一种集成分类器设计，使用一个高度可解释的分类器（即白盒模型）对易处理的输入进行分类，使用一个更强大但可解释性较差的分类器（即黑盒模型）对更困难的输入进行分类。

更新时间: 2024-07-12 03:58:04

领域: cs.LG

下载: http://arxiv.org/abs/2407.08973v1

Don't Fear Peculiar Activation Functions: EUAF and Beyond

In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.

Updated: 2024-07-12 03:57:25

标题: 不要害怕奇特的激活函数：EUAF及其延伸

摘要: 在本文中，我们提出了一种新的超表达激活函数，称为参数化基本通用激活函数（PEUAF）。我们通过系统和全面的实验，在包括CIFAR10、Tiny-ImageNet和ImageNet在内的各种工业和图像数据集上展示了PEUAF的有效性。此外，我们通过展示任何连续函数都可以通过具有特定超表达激活函数的固定大小网络以任意精度逼近，显著推广了超表达激活函数家族的存在，这在几篇最近的作品中已经证明。具体地，我们的工作解决了阻碍超表达激活函数发展的两个主要瓶颈：对超表达函数的有限识别，这引起了人们对它们广泛适用性的怀疑；以及它们通常特殊的形式，这导致对它们在现实应用中的可扩展性和实用性的怀疑。

更新时间: 2024-07-12 03:57:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.09580v1

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs, but they suffer from evaluation chronoeffect, that is, as models rapidly evolve, existing data becomes leaked or undemanding, overestimating ever-developing LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach that dynamically probes the underlying moral baselines of LLMs. Distinct from previous adaptive testing methods that rely on static datasets with limited difficulty, GETA incorporates an iteratively-updated item generator which infers each LLM's moral boundaries and generates difficulty-tailored testing items, accurately reflecting the true alignment extent. This process theoretically learns a joint distribution of item and model response, with item difficulty and value conformity as latent variables, where the generator co-evolves with the LLM, addressing chronoeffect. We evaluate various popular LLMs with diverse capabilities and demonstrate that GETA can create difficulty-matching testing items and more accurately assess LLMs' values, better consistent with their performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms.

Updated: 2024-07-12 03:47:21

标题: 提高标准：通过生成演化测试探究大型语言模型的价值

摘要: 警告：本文包含展示不道德信息的模型输出。大型语言模型（LLMs）取得了重大突破，但其生成的不道德内容可能带来潜在风险。衡量LLMs的价值对齐性对于它们的监管和负责任部署至关重要。已经构建了大量数据集来评估LLMs中的社会偏见、有毒性和伦理问题，但它们存在评估时间效应，即随着模型的快速演化，现有数据泄露或变得不够严格，高估了不断发展的LLMs。为了解决这个问题，我们提出了GETA，一种新颖的生成演化测试方法，动态探测LLMs的潜在道德基线。与以往依赖固定数据集的有限困难的自适应测试方法不同，GETA包含一个迭代更新的项目生成器，该生成器推断出每个LLM的道德边界，并生成适应困难的测试项目，准确反映真实的对齐程度。这个过程在理论上学习了项目和模型响应的联合分布，其中项目困难度和价值一致性作为潜在变量，生成器与LLM共同演化，解决了时间效应。我们评估了各种具有不同能力的流行LLMs，并证明GETA可以创建与困难匹配的测试项目，更准确地评估LLMs的价值，与它们在未见过的OOD和i.i.d.项目上的表现更一致，为未来的评估范式奠定了基础。

更新时间: 2024-07-12 03:47:21

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.14230v2

Compressed Sensing: A Discrete Optimization Approach

We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. We introduce an $\ell_2$ regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average $6.22\%$ more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average $3.10\%$ more sparse. On real world ECG data, for a given $\ell_2$ reconstruction error our approach produces solutions that are on average $9.95\%$ more sparse than benchmark methods ($3.88\%$ more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average $10.77\%$ lower reconstruction error than benchmark methods ($1.42\%$ lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude.

Updated: 2024-07-12 03:46:02

标题: 压缩感知：离散优化方法

摘要: 我们研究了压缩感知（CS）问题，即找到满足一组线性测量的最稀疏向量，直到达到某个数值容差为止的问题。我们引入了一个$\ell_2$正则化的CS公式，将其重新表述为混合整数二阶锥规划。我们推导了这个问题的二阶锥松弛，并且证明在正则化参数上的温和条件下，得到的松弛等价于广泛研究的基 Pursuit 降噪问题。我们提出了一种加强二阶锥松弛的半定松弛，并开发了一种自定义分支定界算法，利用我们的二阶锥松弛来解决CS的小规模实例以获得可证明的最优解。与三种最先进的基准方法在合成数据上产生的解进行比较，我们的数值结果表明，我们的方法产生的解平均更稀疏了6.22％。仅与在合成数据上表现最好的基准方法进行比较，我们的方法产生的解平均更稀疏了3.10％。在真实世界的心电图数据中，对于给定的$\ell_2$重建误差，我们的方法产生的解平均比基准方法更稀疏了9.95％（仅与表现最佳的基准方法进行比较，更稀疏了3.88％），而对于给定的稀疏水平，我们的方法产生的解平均比基准方法的重建误差低了10.77％（仅与表现最佳的基准方法进行比较，误差低了1.42％）。当作为多标签分类算法的组成部分时，我们的方法实现了比基准压缩感知方法更高的分类准确性。这种提高的准确性是以计算时间增加数个数量级为代价的。

更新时间: 2024-07-12 03:46:02

领域: eess.SP,cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.04647v3

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.

Updated: 2024-07-12 03:40:13

标题: 温和的提示变得强硬：利用隐藏的元指令引导视觉语言模型

摘要: 我们介绍了一种在操作图片的语言模型中出现的新型间接注入漏洞：隐藏的“元指令”，它们影响模型如何解释图片并引导模型的输出来表达对手选择的风格、情感或观点。我们解释了如何通过生成充当软提示的图片来创建元指令。与越狱攻击和对抗性示例不同，这些图片产生的输出是可信的，基于图片的视觉内容，但遵循对手的（元）指令。我们描述了这些攻击的风险，包括误导和宣传，评估它们对多个视觉语言模型和对抗性元目标的有效性，并展示它们如何“解锁”基础语言模型的能力，这些能力通过明确的文本指令不可用。最后，我们讨论了对抗这些攻击的防御措施。

更新时间: 2024-07-12 03:40:13

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08970v1

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

Updated: 2024-07-12 03:39:05

标题: 数据过拟合对于具有动态算法和编译器共同设计的设备端超分辨率的影响

摘要: 深度神经网络（DNNs）经常用于各种计算机视觉应用中。如今，当前视频分发系统中的一个新兴趋势是利用DNN的过拟合特性来执行视频分辨率提升。通过将视频分割成块并将超分辨率（SR）模型应用于过拟合每个块，SR模型加视频块的方案能够取代传统视频传输，提高视频质量和传输效率。然而，为了保证高性能，需要许多模型和块，这导致用户端模型切换和内存占用量巨大。为了解决这些问题，我们提出了一种通过内容感知数据处理流水线辅助的动态深度神经网络（Dy-DCA），将模型数量减少到一个，有助于提升性能同时节约计算资源。此外，为了在用户端实现真正加速，我们设计了一个优化动态特性（例如动态形状、大小和控制流）的框架，使Dy-DCA能够进行一系列编译优化，包括融合代码生成、静态执行计划等。通过采用这些技术，我们的方法在一款现成的手机上实现了更好的PSNR和实时性能（33 FPS）。同时，在编译优化的帮助下，我们实现了1.7倍的加速，并节省了高达1.61倍的内存消耗。代码可在https://github.com/coulsonlee/Dy-DCA-ECCV2024获取。

更新时间: 2024-07-12 03:39:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.02813v2

Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models

In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evaluate these models, plus a random baseline, on a testset we develop against GPT-4, and GPT-4 Turbo's, detection of eight vulnerabilities from the dataset and the two top identified vulnerabilities - and their weighted F1 scores. We find that for binary classification (i.e., is this smart contract vulnerable?), our two best-performing models, GPT-3.5FT and Detect Llama - Foundation, achieve F1 scores of $0.776$ and $0.68$, outperforming both GPT-4 and GPT-4 Turbo, $0.66$ and $0.675$. For the evaluation against individual vulnerability identification, our top two models, GPT-3.5FT and Detect Llama - Foundation, both significantly outperformed GPT-4 and GPT-4 Turbo in both weighted F1 for all vulnerabilities ($0.61$ and $0.56$ respectively against GPT-4's $0.218$ and GPT-4 Turbo's $0.243$) and weighted F1 for the top two identified vulnerabilities ($0.719$ for GPT-3.5FT, $0.674$ for Detect Llama - Foundation against GPT-4's $0.363$ and GPT-4 Turbo's $0.429$).

Updated: 2024-07-12 03:33:13

标题: 检测Llama - 使用大型语言模型发现智能合约中的漏洞

摘要: 在本文中，我们测试了一个假设，即虽然OpenAI的GPT-4通常表现良好，但我们可以微调开源模型以在智能合约漏洞检测方面胜过GPT-4。我们对Meta的Code Llama中的两个模型和一个包含17k个提示的数据集Detect Llama - Foundation和Detect Llama - Instruct进行微调，我们还对OpenAI的GPT-3.5 Turbo模型（GPT-3.5FT）进行微调。然后，我们评估了这些模型，再加上一个随机基线，针对我们开发的测试集，针对GPT-4和GPT-4 Turbo对数据集中的八个漏洞和两个最常识别的漏洞的检测以及它们的加权F1分数。我们发现，对于二元分类（即这个智能合约是否有漏洞？），我们的两个表现最佳的模型，GPT-3.5FT和Detect Llama - Foundation，F1分数分别为0.776和0.68，胜过了GPT-4和GPT-4 Turbo，分别为0.66和0.675。对于针对单个漏洞的识别的评估，我们的前两个模型，GPT-3.5FT和Detect Llama - Foundation，在所有漏洞的加权F1（分别为0.61和0.56，针对GPT-4的0.218和GPT-4 Turbo的0.243）以及对于两个最常识别的漏洞的加权F1（GPT-3.5FT为0.719，Detect Llama - Foundation为0.674，而GPT-4为0.363和GPT-4 Turbo为0.429）方面均明显胜过了GPT-4和GPT-4 Turbo。

更新时间: 2024-07-12 03:33:13

领域: cs.CR

下载: http://arxiv.org/abs/2407.08969v1

Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models

Few-Shot Relation Extraction (FSRE), a subtask of Relation Extraction (RE) that utilizes limited training instances, appeals to more researchers in Natural Language Processing (NLP) due to its capability to extract textual information in extremely low-resource scenarios. The primary methodologies employed for FSRE have been fine-tuning or prompt tuning techniques based on Pre-trained Language Models (PLMs). Recently, the emergence of Large Language Models (LLMs) has prompted numerous researchers to explore FSRE through In-Context Learning (ICL). However, there are substantial limitations associated with methods based on either traditional RE models or LLMs. Traditional RE models are hampered by a lack of necessary prior knowledge, while LLMs fall short in their task-specific capabilities for RE. To address these shortcomings, we propose a Dual-System Augmented Relation Extractor (DSARE), which synergistically combines traditional RE models with LLMs. Specifically, DSARE innovatively injects the prior knowledge of LLMs into traditional RE models, and conversely enhances LLMs' task-specific aptitude for RE through relation extraction augmentation. Moreover, an Integrated Prediction module is employed to jointly consider these two respective predictions and derive the final results. Extensive experiments demonstrate the efficacy of our proposed method.

Updated: 2024-07-12 03:31:11

标题: 利用传统关系抽取方法和大型语言模型集成，增强少样本关系抽取

摘要: Few-Shot Relation Extraction (FSRE)是关系抽取（RE）的一个子任务，利用有限的训练实例，吸引了更多自然语言处理（NLP）研究人员的兴趣，因为它能够在极低资源情况下提取文本信息。用于FSRE的主要方法是基于预训练语言模型（PLMs）的微调或提示调整技术。最近，大型语言模型（LLMs）的出现促使许多研究人员通过上下文学习（ICL）来探索FSRE。然而，基于传统RE模型或LLMs的方法存在重大局限性。传统RE模型由于缺乏必要的先验知识而受阻，而LLMs在RE的特定任务能力方面表现不佳。为了解决这些缺点，我们提出了一种双系统增强关系提取器（DSARE），它将传统RE模型与LLMs协同结合。具体而言，DSARE创新地将LLMs的先验知识注入传统RE模型中，并通过关系提取增强来增强LLMs的RE任务特定能力。此外，还采用了集成预测模块来共同考虑这两个预测，并得出最终结果。大量实验证明了我们提出的方法的有效性。

更新时间: 2024-07-12 03:31:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08967v1

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{https://github.com/YBZh/LAPT}.

Updated: 2024-07-12 03:30:53

标题: LAPT：基于标签驱动的视觉语言模型自动提示调整用于OOD检测

摘要: Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{https://github.com/YBZh/LAPT}.

更新时间: 2024-07-12 03:30:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.08966v1

Lite-SAM Is Actually What You Need for Segment Everything

This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framework. Our LiteViT, a high-performance lightweight backbone network, has only 1.16M parameters, which is a 23% reduction compared to the lightest existing backbone network Shufflenet. We also introduce AutoPPN, an innovative end-to-end method for prompt boxes and points generation. This is an improvement over traditional grid search sampling methods, and its unique design allows for easy integration into any SAM series algorithm, extending its usability. we have thoroughly benchmarked Lite-SAM across a plethora of both public and private datasets. The evaluation encompassed a broad spectrum of universal metrics, including the number of parameters, SegEvery execution time, and accuracy. The findings reveal that Lite-SAM, operating with a lean 4.2M parameters, significantly outpaces its counterparts, demonstrating performance improvements of 43x, 31x, 20x, 21x, and 1.6x over SAM, MobileSAM, Edge-SAM, EfficientViT-SAM, and MobileSAM-v2 respectively, all the while maintaining competitive accuracy. This underscores Lite-SAM's prowess in achieving an optimal equilibrium between performance and precision, thereby setting a new state-of-the-art(SOTA) benchmark in the domain.

Updated: 2024-07-12 03:28:46

标题: "Lite-SAM实际上是您需要的用于分段的东西"

摘要: 本文介绍了Lite-SAM，这是一种高效的端到端解决方案，用于SegEvery任务，旨在降低计算成本和冗余。Lite-SAM由四个主要组件组成：简化的CNN-Transformer混合编码器（LiteViT），自动提示提议网络（AutoPPN），传统提示编码器和掩码解码器。所有这些组件都集成在SAM框架内。我们的LiteViT是一种高性能轻量级骨干网络，仅有116万参数，比最轻的现有骨干网络Shufflenet减少了23%。我们还引入了AutoPPN，一种创新的端到端方法，用于提示框和点的生成。这是对传统网格搜索采样方法的改进，其独特设计可轻松集成到任何SAM系列算法中，扩展了其可用性。我们已经在各种公共和私有数据集上彻底对Lite-SAM进行了基准测试。评估涵盖了广泛的通用指标，包括参数数量、SegEvery执行时间和准确性。研究结果表明，Lite-SAM仅使用420万参数，显著超过其对手，性能提高了43倍、31倍、20倍、21倍和1.6倍，分别比SAM、MobileSAM、Edge-SAM、EfficientViT-SAM和MobileSAM-v2。同时保持竞争性准确性。这突显了Lite-SAM在实现性能和精度之间的最佳平衡方面的实力，从而在该领域树立了一项新的最新技术（SOTA）基准。

更新时间: 2024-07-12 03:28:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08965v1

Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

Cooperative Adaptive Cruise Control (CACC) plays a pivotal role in enhancing traffic efficiency and safety in Connected and Autonomous Vehicles (CAVs). Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC, leading to improved system performance and adaptability. Among RL approaches, Multi-Agent Reinforcement Learning (MARL) has shown remarkable potential by enabling coordinated actions among multiple CAVs through Centralized Training with Decentralized Execution (CTDE). However, MARL often faces scalability issues, particularly when CACC vehicles suddenly join or leave the platoon, resulting in performance degradation. To address these challenges, we propose Communication-Aware Reinforcement Learning (CA-RL). CA-RL includes a communication-aware module that extracts and compresses vehicle communication information through forward and backward information transmission modules. This enables efficient cyclic information propagation within the CACC traffic flow, ensuring policy consistency and mitigating the scalability problems of MARL in CACC. Experimental results demonstrate that CA-RL significantly outperforms baseline methods in various traffic scenarios, achieving superior scalability, robustness, and overall system performance while maintaining reliable performance despite changes in the number of participating vehicles.

Updated: 2024-07-12 03:28:24

标题: 沟通感知强化学习在协作自适应巡航控制中的应用

摘要: 合作自适应巡航控制（CACC）在连接和自动驾驶车辆（CAV）中发挥着关键作用，提高了交通效率和安全性。强化学习（RL）在优化CACC中复杂决策过程方面已被证明是有效的，导致系统性能和适应性得到改善。在RL方法中，多智体强化学习（MARL）通过中心化训练与分散执行（CTDE）展现出了协调多个CAV之间的行动的潜力。然而，MARL在CACC中往往面临可扩展性问题，特别是当CACC车辆突然加入或离开编队时，导致性能下降。为了解决这些挑战，我们提出通信感知强化学习（CA-RL）。CA-RL包括一个通信感知模块，通过前向和后向信息传输模块提取和压缩车辆通信信息。这使得CACC交通流内的信息循环传播更加高效，确保政策的一致性，并缓解了MARL在CACC中的可扩展性问题。实验结果表明，CA-RL在各种交通场景中明显优于基线方法，实现了出色的可扩展性、稳健性和整体系统性能，同时在参与车辆数量变化时保持可靠的性能。

更新时间: 2024-07-12 03:28:24

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.08964v1

Local Optima in Diversity Optimization: Non-trivial Offspring Population is Essential

The main goal of diversity optimization is to find a diverse set of solutions which satisfy some lower bound on their fitness. Evolutionary algorithms (EAs) are often used for such tasks, since they are naturally designed to optimize populations of solutions. This approach to diversity optimization, called EDO, has been previously studied from theoretical perspective, but most studies considered only EAs with a trivial offspring population such as the $(\mu + 1)$ EA. In this paper we give an example instance of a $k$-vertex cover problem, which highlights a critical difference of the diversity optimization from the regular single-objective optimization, namely that there might be a locally optimal population from which we can escape only by replacing at least two individuals at once, which the $(\mu + 1)$ algorithms cannot do. We also show that the $(\mu + \lambda)$ EA with $\lambda \ge \mu$ can effectively find a diverse population on $k$-vertex cover, if using a mutation operator inspired by Branson and Sutton (TCS 2023). To avoid the problem of subset selection which arises in the $(\mu + \lambda)$ EA when it optimizes diversity, we also propose the $(1_\mu + 1_\mu)$ EA$_D$, which is an analogue of the $(1 + 1)$ EA for populations, and which is also efficient at optimizing diversity on the $k$-vertex cover problem.

Updated: 2024-07-12 03:27:47

标题: 多样性优化中的局部最优解：非平凡后代种群是必不可少的

摘要: 多样性优化的主要目标是找到一组满足其适应度下限的多样化解集。进化算法（EAs）经常用于这类任务，因为它们天然地被设计用于优化解集的种群。这种多样性优化方法被称为EDO，先前已从理论角度进行了研究，但大多数研究仅考虑了具有诸如$(\mu + 1)$这样的微不足道的后代种群的EAs。本文提供了一个$k$-顶点覆盖问题的示例实例，突出了多样性优化与常规单目标优化的关键差异，即可能存在一个局部最优种群，我们只能通过一次替换至少两个个体才能逃脱，而$(\mu + 1)$算法无法做到这一点。我们还展示了具有$\lambda \ge \mu$的$(\mu + \lambda)$ EA可以有效地在$k$-顶点覆盖上找到一个多样化的种群，如果使用受Branson和Sutton（TCS 2023）启发的突变算子。为了避免$(\mu + \lambda)$ EA在优化多样性时出现的子集选择问题，我们还提出了$(1_\mu + 1_\mu)$ EA$_D$，这是种群的$(1 + 1)$ EA的类比，也在$k$-顶点覆盖问题上高效优化多样性。

更新时间: 2024-07-12 03:27:47

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.08963v1

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas.

Updated: 2024-07-12 03:23:29

标题: 饼干：在计算笔记本中用短暂的用户界面支撑LLM生成的代码

摘要: 程序员经常使用计算笔记本与机器学习教程交互，并且已经开始采用基于大型语言模型（LLMs）的代码生成技术。然而，他们在理解和处理LLMs生成的代码时遇到困难。为了缓解这些挑战，我们在计算笔记本中引入了一种新颖的工作流程，将基于LLMs的代码生成与一个额外的短暂UI步骤相结合，为用户提供UI脚手架作为用户提示和代码生成之间的中间阶段。我们在JupyterLab中介绍了这种工作流程，其中提供了由LLMs生成的基于代码和意图上下文的短暂UI，帮助用户理解、指导和探索LLMs生成的代码。通过一项用户研究，其中10名新手使用BISCUIT进行机器学习教程，我们发现BISCUIT为用户提供了代码表示，以帮助他们理解，减少了提示工程的复杂性，并为用户创建了一个探索不同变量和构思想法的游乐场。

更新时间: 2024-07-12 03:23:29

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.07387v3

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.

Updated: 2024-07-12 03:18:38

标题: DeCE：为防御后门攻击而设计的欺骗性交叉熵损失

摘要: 代码语言模型（CLMs），特别是利用深度学习的模型，在代码智能领域取得了显著的成功。然而，安全性问题，特别是后门攻击，在这个过程中常常被忽视。先前的研究集中于为CLMs设计后门攻击，但有效的防御方法尚未得到充分解决。特别是，现有的自然语言处理的防御方法，直接应用于CLMs时，效果不够有效，缺乏普适性，在某些模型和场景中表现良好，但在其他情况下失败，因此在持续缓解后门攻击方面存在不足。为了弥合这一差距，我们首先确认“早期学习”现象在训练CLMs时是一个普遍现象。这一现象指的是模型最初专注于训练数据的主要特征，但随着时间的推移可能变得更加敏感于后门触发器，导致过拟合和对后门攻击的敏感性。然后我们分析，对后门触发器的过拟合是由于使用交叉熵损失函数造成的，其中交叉熵的无界性导致模型越来越集中于有毒数据的特征。基于这一认识，我们提出了一种通用且有效的损失函数DeCE（欺骗性交叉熵），通过混合欺骗性分布并应用标签平滑来限制梯度的边界，防止模型对后门触发器过拟合，进而提高CLMs对后门攻击的安全性。为验证我们防御方法的有效性，我们选择代码合成任务作为实验场景。我们在各种代码合成数据集、模型和污染比例上的实验表明，DeCE在增强CLMs安全性方面的适用性和有效性。

更新时间: 2024-07-12 03:18:38

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.08956v1

PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning

Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. Current solutions attempting to address both problems under the one-server FL setting fall short in the following aspects: 1) designed for simple validity checks that are insufficient against advanced attacks (e.g., checking norm of individual update); and 2) partial privacy leakage for more complicated robust aggregation algorithms (e.g., distances between model updates are leaked for multi-Krum). In this work, we formalize a novel security notion of aggregated privacy that characterizes the minimum amount of user information, in the form of some aggregated statistics of users' updates, that is necessary to be revealed to accomplish more advanced robust aggregation. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy. As concrete instantiations of PriRoAgg, we construct two secure and robust protocols based on state-of-the-art robust algorithms, for which we provide full theoretical analyses on security and complexity. Extensive experiments are conducted for these protocols, demonstrating their robustness against various model integrity attacks, and their efficiency advantages over baselines.

Updated: 2024-07-12 03:18:08

标题: PriRoAgg：在联邦学习中实现具有最小隐私泄漏的稳健模型聚合

摘要: 最近，联邦学习（FL）因其利用大规模分布式用户数据的潜力而受到重大关注，同时又保护用户隐私。然而，FL的典型范式面临隐私和稳健性方面的挑战：传输的模型更新可能会泄露敏感用户信息，而对于本地训练过程缺乏中央控制使全局模型容易受到对模型更新的恶意篡改。当前试图在单服务器FL设置下解决这两个问题的解决方案在以下方面存在不足：1）仅设计用于简单有效性检查，无法抵御高级攻击（例如，检查个体更新的范数）；以及2）对于更复杂的稳健聚合算法（例如，多Krum中泄露了模型更新之间的距离），存在部分隐私泄露。在这项工作中，我们正式定义了一个新颖的安全概念——聚合隐私，该概念描述了用户信息的最小量，以某些用户更新的聚合统计形式呈现，这些信息必须被揭示以完成更高级的稳健聚合。我们开发了一个通用框架PriRoAgg，利用拉格朗日编码计算和分布式零知识证明，执行各种稳健聚合算法，同时满足聚合隐私。作为PriRoAgg的具体实例，我们构建了两个基于最先进的稳健算法的安全和稳健协议，并对其安全性和复杂性进行了全面的理论分析。针对这些协议进行了大量实验，证明了它们对各种模型完整性攻击的稳健性，以及它们相对基线的效率优势。

更新时间: 2024-07-12 03:18:08

领域: cs.CR

下载: http://arxiv.org/abs/2407.08954v1

Attribution Methods in Asset Pricing: Do They Account for Risk?

Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Consequently, when applying machine learning models, we must ensure that the attribution methods reflect the underlying risks accurately. In this work, we present and study several axioms derived from asset pricing domain knowledge. It is shown that while Shapley value and Integrated Gradients preserve most axioms, neither can satisfy all axioms. Using extensive analytical and empirical examples, we demonstrate how attribution methods can reflect risks and when they should not be used.

Updated: 2024-07-12 03:16:54

标题: 资产定价中的归因方法：它们是否考虑了风险？

摘要: 在过去的几十年里，机器学习模型取得了极大的成功。由于公理归因方法的应用，特征的贡献已经被更清晰和严谨地解释。然而，很少有研究同时考虑了领域知识和公理。在本研究中，我们研究了与风险管理密切相关的金融资产定价。因此，在应用机器学习模型时，我们必须确保归因方法能够准确反映潜在的风险。在这项工作中，我们提出并研究了从资产定价领域知识中得出的几个公理。研究表明，尽管Shapley值和集成梯度保留了大多数公理，但都不能满足所有公理。通过广泛的分析和实证示例，我们展示了归因方法如何反映风险以及何时不应使用这些方法。

更新时间: 2024-07-12 03:16:54

领域: q-fin.CP,cs.LG

下载: http://arxiv.org/abs/2407.08953v1

Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection

Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Augmented Fake News Detection (DAFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DAFND first identifies the keywords of each news article through a Detection Module. Subsequently, DAFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to derive its respective two prediction results. Finally, a Determination Module further integrates these two predictions and derives the final result. Extensive experiments on two publicly available datasets show the efficacy of our proposed method, particularly in low-resource settings.

Updated: 2024-07-12 03:15:01

标题: 检测、调查、判断和确定：一种基于LLM的少样本假新闻检测新框架

摘要: Few-Shot Fake News Detection (FS-FND)旨在在极低资源情景下区分不准确的新闻和真实新闻。由于虚假新闻在社交媒体上的广泛传播和有害影响，这项任务引起了越来越多的关注。大型语言模型(LLMs)凭借其丰富的先验知识和出色的上下文学习能力展现出竞争力。然而，现有方法面临着重大局限，例如理解歧义和信息稀缺，这显著削弱了LLMs的潜力。为了解决这些缺点，我们提出了一种双视角增强的虚假新闻检测(DAFND)模型，旨在从内部和外部双重视角增强LLMs。具体而言，DAFND首先通过检测模块识别每篇新闻文章的关键词。随后，DAFND创造性地设计了一个调查模块，以获取有关当前新闻的内部和外部有价值信息，然后通过另一个判断模块得出其两个预测结果。最后，一个决策模块进一步整合这两个预测结果并得出最终结果。对两个公开可用数据集的广泛实验显示了我们提出的方法的有效性，特别是在低资源环境中。

更新时间: 2024-07-12 03:15:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08952v1

Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBMs with nearly no human effort. We discover undesirable biases in CBMs built on pre-trained models and propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations. Specifically, our method offers a seamless pipeline that adopts foundation models for assessing potential spurious correlations in datasets, annotating concepts for images, and refining the annotations for improved robustness. We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.

Updated: 2024-07-12 03:07:28

标题: 利用最小的人力努力构建基于概念的模型以减少虚假相关性

摘要: 提高模型的可解释性可以通过揭示模型如何进行预测来解决虚假相关性问题。概念瓶颈模型（CBMs）可以通过提供一种原则性的方式来揭示和引导模型行为，尽管在数据注释方面需要付出很高的人力成本。在本文中，我们利用多个基础模型的协同作用来构建几乎不需要人力投入的CBMs。我们发现了建立在预训练模型上的CBMs中存在的不良偏见，并提出了一种新颖的框架，旨在利用预训练模型，同时免疫这些偏见，从而降低对虚假相关性的脆弱性。具体而言，我们的方法提供了一种无缝的流程，采用基础模型来评估数据集中潜在的虚假相关性，为图像注释概念，并改进注释以提高鲁棒性。我们在多个数据集上评估了所提出的方法，结果表明它在减少模型对虚假相关性的依赖同时保持可解释性方面的有效性。

更新时间: 2024-07-12 03:07:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.08947v1

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.

Updated: 2024-07-12 03:03:50

标题: 您的扩散模型秘密是一个噪声分类器，并受益于对比训练

摘要: 扩散模型学习去噪数据，然后训练好的去噪器被用来从数据分布中生成新样本。在本文中，我们重新审视了扩散采样过程，并确定了样本质量下降的一个根本原因：在远离训练分布（OOD）的区域中，去噪器估计不准确，采样过程不可避免地评估这些OOD区域。这对所有采样方法都可能成为一个问题，尤其是当我们转向并行采样时，这需要我们同时初始化和更新动力学中整个样本轨迹，导致许多OOD评估。为了解决这个问题，我们引入了一个新的自监督训练目标，区分添加到样本中的噪声水平，从而提高OOD去噪性能。该方法基于我们的观察，扩散模型隐含地定义了一个对数似然比，区分具有不同噪声量的分布，而这个表达式取决于标准训练分布之外的去噪器性能。我们通过多种实验表明，所提出的对比扩散训练对于顺序和并行设置都是有效的，并且显著提高了并行采样器的性能和速度。

更新时间: 2024-07-12 03:03:50

领域: cs.LG

下载: http://arxiv.org/abs/2407.08946v1

A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model

Recommendation systems have become an important solution to information search problems. This article proposes a neural matrix factorization recommendation system model based on the multimodal large language model called BoNMF. This model combines BoBERTa's powerful capabilities in natural language processing, ViT in computer in vision, and neural matrix decomposition technology. By capturing the potential characteristics of users and items, and after interacting with a low-dimensional matrix composed of user and item IDs, the neural network outputs the results. recommend. Cold start and ablation experimental results show that the BoNMF model exhibits excellent performance on large public data sets and significantly improves the accuracy of recommendations.

Updated: 2024-07-12 02:58:07

标题: 基于多模态大型语言模型的神经矩阵分解推荐系统模型

摘要: 推荐系统已经成为信息搜索问题的重要解决方案。本文提出了一种基于多模式大型语言模型BoNMF的神经矩阵分解推荐系统模型。该模型结合了BoBERTa在自然语言处理中的强大能力、ViT在计算机视觉中的应用以及神经矩阵分解技术。通过捕捉用户和物品的潜在特征，并与由用户和物品ID组成的低维矩阵进行交互，神经网络输出推荐结果。冷启动和消融实验结果显示，BoNMF模型在大型公共数据集上表现出色，并显著提高了推荐的准确性。

更新时间: 2024-07-12 02:58:07

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.08942v1

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.

Updated: 2024-07-12 02:55:16

标题: PredBench：在不同学科领域中对时空预测进行基准测试

摘要: 在这篇论文中，我们介绍了PredBench，这是一个专门为空间-时间预测网络的整体评估定制的基准。尽管在这一领域取得了显著进展，但仍然缺乏一个标准化框架来详细比较各种预测网络架构。PredBench通过进行大规模实验、维护标准化和适当的实验设置以及实施多维度评估来填补这一空白。该基准集成了12种广泛采用的方法和15个不同领域的多样数据集，为当代空间-时间预测网络提供了广泛的评估。通过在各种应用中精心校准预测设置，PredBench确保了与预期用途相关的评估，并实现了公平的比较。此外，其多维度评估框架通过全面的指标集拓展了分析，提供了对模型能力的深入洞察。我们的研究结果为未来发展提供了战略方向。我们的代码库位于https://github.com/OpenEarthLab/PredBench。

更新时间: 2024-07-12 02:55:16

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.08418v2

Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential learning framework based on LLMs to explore whether LLMs can imitate human ability for learning and utilizing experience. It autonomously learns and accumulates experience through experience transfer and induction, categorizing the types of input questions to select which accumulated experience to employ for them. Experimental results on six widely used NLP datasets show that our framework performs reliably in each intermediate step and effectively improves the performance of GPT-3.5 and GPT-4. This validates the feasibility of using LLMs to mimic human experiential learning and application capabilities. Additionally, we provide a detailed analysis of the behavior of our framework at each step.

Updated: 2024-07-12 02:49:13

标题: 自我进化的GPT：一个终身自主经验学习者

摘要: 为了提高大型语言模型（LLMs）的性能，研究人员已经探索了通过提示为LLMs提供文本任务解决经验。然而，他们依赖于手动努力获取和应用这种经验，这对LLMs的增长需求和用户问题的多样性是不可行的。为了解决这个问题，我们设计了一个基于LLMs的终身自主经验学习框架，探讨LLMs是否可以模仿人类学习和利用经验的能力。它通过经验转移和归纳自主学习和积累经验，将输入问题的类型分类来选择使用哪些积累的经验。在六个广泛使用的NLP数据集上的实验结果显示，我们的框架在每个中间步骤中表现可靠，并有效地提高了GPT-3.5和GPT-4的性能。这验证了使用LLMs模仿人类经验学习和应用能力的可行性。此外，我们还对我们的框架在每个步骤的行为进行了详细分析。

更新时间: 2024-07-12 02:49:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08937v1

Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses

Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However, FedGL against backdoor attacks is largely unexplored, and no effective defense exists. In this paper, we aim to address such significant deficiency. First, we propose an effective, stealthy, and persistent backdoor attack on FedGL. Our attack uses a subgraph as the trigger and designs an adaptive trigger generator that can derive the effective trigger location and shape for each graph. Our attack shows that empirical defenses are hard to detect/remove our generated triggers. To mitigate it, we further develop a certified defense for any backdoored FedGL model against the trigger with any shape at any location. Our defense involves carefully dividing a testing graph into multiple subgraphs and designing a majority vote-based ensemble classifier on these subgraphs. We then derive the deterministic certified robustness based on the ensemble classifier and prove its tightness. We extensively evaluate our attack and defense on six graph datasets. Our attack results show our attack can obtain > 90% backdoor accuracy in almost all datasets. Our defense results show, in certain cases, the certified accuracy for clean testing graphs against an arbitrary trigger with size 20 can be close to the normal accuracy under no attack, while there is a moderate gap in other cases. Moreover, the certified backdoor accuracy is always 0 for backdoored testing graphs generated by our attack, implying our defense can fully mitigate the attack. Source code is available at: https://github.com/Yuxin104/Opt-GDBA.

Updated: 2024-07-12 02:43:44

标题: 分布式后门攻击对联邦图学习和认证防御的影响

摘要: 联邦图学习（FedGL）是一种新兴的联邦学习（FL）框架，将FL扩展到从不同来源学习图数据。非图数据的FL已经显示出容易受到后门攻击的弱点，后门攻击会将共享的后门触发器注入训练数据中，使得训练后门FL模型可以按照攻击者的意愿预测包含触发器的测试数据。然而，对于FedGL来说，针对后门攻击的研究还很有限，目前不存在有效的防御措施。本文旨在解决这一重大缺陷。首先，我们提出了一种有效、隐蔽且持久的对FedGL的后门攻击。我们的攻击使用子图作为触发器，并设计了一个自适应触发器生成器，可以为每个图派生出有效的触发器位置和形状。我们的攻击表明，经验性的防御措施很难检测/消除我们生成的触发器。为了缓解这种情况，我们进一步对任何受到后门攻击的FedGL模型开发了一种认证防御措施，针对任何位置、任何形状的触发器。我们的防御措施涉及将测试图分成多个子图，并在这些子图上设计基于多数投票的集成分类器。然后，我们根据集成分类器导出确定性认证的鲁棒性并证明其紧密性。我们在六个图数据集上广泛评估了我们的攻击和防御。我们的攻击结果显示，在几乎所有数据集中，我们的攻击可以获得>90%的后门准确率。我们的防御结果显示，在某些情况下，对于干净的测试图来说，对于大小为20的任意触发器的认证准确率可以接近正常情况下无攻击的准确率，而在其他情况下存在一定的差距。此外，对于我们的攻击生成的受到后门攻击的测试图，认证后门准确率始终为0，这意味着我们的防御措施可以完全缓解攻击。源代码可在https://github.com/Yuxin104/Opt-GDBA找到。

更新时间: 2024-07-12 02:43:44

领域: cs.CR

下载: http://arxiv.org/abs/2407.08935v1

Compositional Structures in Neural Embedding and Interaction Decompositions

We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framework aims to shed light on the emergence of structural patterns in data representations, a phenomenon widely acknowledged but arguably still lacking a solid formal grounding. Specifically, we introduce a characterization of compositional structures in terms of "interaction decompositions," and we establish necessary and sufficient conditions for the presence of such structures within the representations of a model.

Updated: 2024-07-12 02:39:50

标题: 神经嵌入和交互分解中的组合结构

摘要: 我们描述了人工神经网络中向量嵌入中的线性代数结构与这些网络建模的概率分布的条件独立约束之间的基本对应关系。我们的框架旨在阐明数据表示中结构模式的出现，这是一个被广泛认可但可能仍然缺乏坚实形式基础的现象。具体而言，我们引入了一种基于“交互分解”的组合结构表征，并建立了模型表示中存在此类结构的必要和充分条件。

更新时间: 2024-07-12 02:39:50

领域: cs.LG

下载: http://arxiv.org/abs/2407.08934v1

Machine Learning in High Volume Media Manufacturing

Errors or failures in a high-volume manufacturing environment can have significant impact that can result in both the loss of time and money. Identifying such failures early has been a top priority for manufacturing industries and various rule-based algorithms have been developed over the years. However, catching these failures is time consuming and such algorithms cannot adapt well to changes in designs, and sometimes variations in everyday behavior. More importantly, the number of units to monitor in a high-volume manufacturing environment is too big for manual monitoring or for a simple program. Here we develop a novel program that combines both rule-based decisions and machine learning models that can not only learn and adapt to such day-to-day variations or long-term design changes, but also can be applied at scale to the high number of manufacturing units in use today. Using the current state-of-the-art technologies, we then deploy this program at-scale to handle the needs of ever-increasing demand from the manufacturing environment.

Updated: 2024-07-12 02:34:54

标题: 高容量媒体制造中的机器学习

摘要: 在高产量制造环境中的错误或故障可能会对时间和金钱造成重大影响。及早识别这些故障一直是制造行业的首要任务，多年来已经开发了各种基于规则的算法。然而，捕捉这些故障是耗时的，这些算法不能很好地适应设计变化，有时也无法适应日常行为的变化。更重要的是，在高产量制造环境中需要监控的单位数量太大，无法通过手动监控或简单程序来实现。在这里，我们开发了一种新颖的程序，结合了基于规则的决策和机器学习模型，不仅可以学习和适应日常变化或长期设计变化，还可以应用于当今大量制造单位。利用当前最先进的技术，我们将这个程序规模化部署，以满足制造环境中不断增长的需求。

更新时间: 2024-07-12 02:34:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.08933v1

Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Vehicle Decision-Making in Dynamic Environment

Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.

Updated: 2024-07-12 02:34:44

标题: 深度关注驱动的强化学习（DAD-RL）用于动态环境中自主车辆决策-making

摘要: 城市环境中自动驾驶车辆（AV）的决策制定在本质上具有挑战性，因为它与周围车辆的动态互动。为了安全规划，AV必须理解场景中各种时空互动的权重。当代作品主要使用庞大的变压器架构来编码互动，主要用于轨迹预测，导致计算复杂性增加。为了解决这个问题，我们提出了简单的Deep Attention Driven Reinforcement Learning（DADRL）框架，动态地分配并将周围车辆的重要性纳入到自我RL驱动的决策制定过程中。我们引入了一个以AV为中心的时空注意编码（STAE）机制，用于学习与不同周围车辆的动态互动。为了理解地图和路线上下文，我们使用上下文编码器从上下文地图中提取特征。时空表示结合上下文编码提供了全面的状态表示。所得模型使用Soft Actor Critic（SAC）算法进行训练。我们在没有交通信号的SMARTS城市基准场景上评估了提出的框架，以证明DADRL优于最新的最新方法。此外，消融研究强调了上下文编码器和时空注意编码器在实现卓越性能方面的重要性。

更新时间: 2024-07-12 02:34:44

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.08932v1

Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective

Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.

Updated: 2024-07-12 02:21:54

标题: 重新思考图形后门攻击：一个保持分布的视角

摘要: 图神经网络（GNNs）在各种任务中表现出了显著的性能。然而，最近的研究表明，GNNs容易受到后门攻击的影响。一般来说，后门攻击通过将后门触发器和目标类标签附加到训练图中的一组节点来污染图。在毒害图上训练的GNN将被误导，以预测附有触发器的测试节点属于目标类。尽管这些攻击方法有效，但我们的实证分析表明，现有方法生成的触发器往往是离群值（OOD），与清洁数据明显不同。因此，这些注入的触发器在现实应用中可以很容易地被广泛使用的离群值检测方法检测和修剪。因此，在本文中，我们研究了一个新颖的问题，即使用内分布（ID）触发器进行不可察觉的图后门攻击。为了生成ID触发器，我们引入了一个OOD检测器，结合对抗学习策略来生成触发器的属性，使其在分布内。为了确保使用ID触发器的高攻击成功率，我们介绍了设计来增强受害模型对毒害图训练的触发器记忆的新模块。对真实世界数据集的广泛实验表明，所提出的方法在生成能够绕过各种防御策略的分布内触发器方面是有效的，同时保持高攻击成功率。

更新时间: 2024-07-12 02:21:54

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.10757v3

Disassembling Obfuscated Executables with LLM

Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated executables, which is shown to significantly outperform other state-of-the-art disassembly solutions.

Updated: 2024-07-12 02:10:07

标题: 使用LLM解析混淆的可执行文件

摘要: 拆解是一项具有挑战性的任务，特别是对于包含垃圾字节的混淆可执行文件，这些字节旨在诱导拆解错误。现有的解决方案依赖于启发式方法或利用机器学习技术，但只取得了有限的成功。从根本上讲，这种混淆无法被打败，除非对二进制可执行文件的语义有深入的理解，而这是由大型语言模型（LLM）的出现所可能实现的。在本文中，我们提出了DisasLLM，一种新颖的基于LLM驱动的解码器，以克服分析混淆的可执行文件所面临的挑战。DisasLLM由两个组件组成：一个基于LLM的分类器，用于确定汇编代码片段中的指令是否被正确解码，以及一种利用该模型从头到尾拆解混淆的可执行文件的拆解策略。我们在一组严重混淆的可执行文件上评估了DisasLLM，结果显示它明显优于其他最先进的拆解解决方案。

更新时间: 2024-07-12 02:10:07

领域: cs.CR

下载: http://arxiv.org/abs/2407.08924v1

Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?

With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-based knowledge, such as material property prediction or name recognition, but they often lack an understanding of fundamental physicochemical mechanisms that require logical reasoning. To bridge this gap, our study developed a benchmark consisting of 775 multiple-choice questions focusing on the mechanisms of gold nanoparticle synthesis. By reflecting on existing evaluation metrics, we question whether a direct true-or-false assessment merely suggests conjecture. Hence, we propose a novel evaluation metric, the confidence-based score (c-score), which probes the output logits to derive the precise probability for the correct answer. Based on extensive experiments, our results show that in the context of gold nanoparticle synthesis, LLMs understand the underlying physicochemical mechanisms rather than relying on conjecture. This study underscores the potential of LLMs to grasp intrinsic scientific mechanisms and sets the stage for developing more reliable and effective AI tools across various scientific domains.

Updated: 2024-07-12 02:05:59

标题: 利用大型语言模型解释纳米合成机制：坚实基础还是纯粹的猜测？

摘要: 随着人工智能（AI）的快速发展，诸如GPT-4等大型语言模型（LLMs）在科学界引起了极大关注，展示了在推动科学发现方面的巨大潜力。这一进展引发了一个重要问题：这些LLMs是否与现实世界的物理化学原则相吻合？当前的评估策略主要强调基于事实的知识，如材料性质预测或名称识别，但它们往往缺乏对需要逻辑推理的基本物理化学机制的理解。为了弥合这一差距，我们的研究开发了一个由775个关于金纳米颗粒合成机制的多项选择题组成的基准。通过反思现有的评估指标，我们质疑直接的真假评估是否仅仅暗示着猜测。因此，我们提出了一种新颖的评估指标，即基于置信度的分数（c-score），它探究输出logits以推导出正确答案的确切概率。根据广泛的实验，我们的结果显示，在金纳米颗粒合成的背景下，LLMs理解了潜在的物理化学机制，而不是依赖于猜测。这项研究强调了LLMs掌握内在科学机制的潜力，并为在各个科学领域开发更可靠和有效的人工智能工具奠定了基础。

更新时间: 2024-07-12 02:05:59

领域: cs.LG

下载: http://arxiv.org/abs/2407.08922v1

Agricultural Recommendation System based on Deep Learning: A Multivariate Weather Forecasting Approach

Agriculture plays a fundamental role in driving economic growth and ensuring food security for populations around the world. Although labor-intensive agriculture has led to steady increases in food grain production in many developing countries, it is frequently challenged by adverse weather conditions, including heavy rainfall, low temperatures, and drought. These factors substantially hinder food production, posing significant risks to global food security. In order to have a profitable, sustainable, and farmer-friendly agricultural practice, this paper proposes a context-based crop recommendation system powered by a weather forecast model. For implementation purposes, we have considered the whole territory of Bangladesh. With extensive evaluation, the multivariate Stacked Bi-LSTM (three Bi-LSTM layers with a time Distributed layer) Network is employed as the weather forecasting model. The proposed weather model can forecast Rainfall, Temperature, Humidity, and Sunshine for any given location in Bangladesh with an average R-Squared value of 0.9824, and the model outperforms other state-of-the-art LSTM models. These predictions guide our system in generating viable farming decisions. Additionally, our full-fledged system is capable of alerting the farmers about extreme weather conditions so that preventive measures can be undertaken to protect the crops. Finally, the system is also adept at making knowledge-based crop suggestions for flood and drought-prone regions.

Updated: 2024-07-12 02:02:45

标题: 基于深度学习的农业推荐系统：一种多变量天气预测方法

摘要: 农业在推动经济增长和保障全球人口粮食安全方面发挥着基础性作用。尽管劳动密集型农业在许多发展中国家已经导致粮食产量稳步增长，但常常受到不利的天气条件影响，包括暴雨、低温和干旱。这些因素严重阻碍了粮食生产，给全球粮食安全带来了重大风险。为了实现盈利、可持续和农民友好的农业实践，本文提出了一个基于天气预报模型的基于情境的作物推荐系统。为了实施这一目标，我们考虑了孟加拉国的整个领土。通过广泛评估，我们采用了多元叠加的双向长短期记忆（三个双向长短期记忆层与一个分布时间层）网络作为天气预测模型。所提出的天气模型可以预测孟加拉国任何位置的降雨、温度、湿度和日照，平均R平方值为0.9824，并且该模型胜过其他最先进的LSTM模型。这些预测指导我们的系统生成可行的农业决策。此外，我们的全功能系统能够警告农民极端天气条件，以便采取预防措施保护作物。最后，该系统还擅长为易受洪涝和干旱影响的地区提供建议基于知识的作物建议。

更新时间: 2024-07-12 02:02:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.11410v3

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.

Updated: 2024-07-12 01:59:00

标题: Skill-Critic：为分层强化学习优化学习到的技能

摘要: 分层强化学习（RL）可以通过将策略在多个层次上进行时间抽象来加速长期决策。在稀疏奖励环境中，使用技能（即原始动作序列）已经看到了令人鼓舞的结果。通常，从离线数据中发现技能潜在空间和策略。然而，由于低覆盖演示或分布转移，导致的低级策略可能不可靠。作为解决方案，我们提出了Skill-Critic算法，以优化低级策略和高级技能选择。我们的Skill-Critic算法优化了低级和高级策略；这些策略由从离线演示中学习的潜在空间初始化和正则化，以指导并行策略优化。我们在多个稀疏奖励RL环境中验证了Skill-Critic，包括Gran Turismo Sport中的新的稀疏奖励自主赛车任务。实验证明，Skill-Critic的低级策略微调和基于演示的正则化对于良好的性能至关重要。我们的网站上提供了代码和视频：https://sites.google.com/view/skill-critic。

更新时间: 2024-07-12 01:59:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.08388v3

Unsupervised Anomaly Detection Using Diffusion Trend Analysis

Conventional anomaly detection techniques based on reconstruction via denoising diffusion model are widely used due to their ability to identify anomaly locations and shapes with high performance. However, there is a limitation in determining appropriate noise parameters that can degrade anomalies while preserving normal characteristics. Also, due to the volatility of the diffusion model, normal regions can fluctuate considerably during reconstruction, resulting in false detection. In this paper, we propose a method to detect anomalies by analysis of reconstruction trend depending on the degree of degradation, effectively solving the both problems of existing methods. The proposed method is validated on an open dataset for industrial anomaly detection, improving the performance of existing methods on a number of evaluation criteria. With the ease of combination with existing anomaly detection methods, it provides a tradeoff between computational cost and performance, allowing it high application potential in manufacturing industry.

Updated: 2024-07-12 01:50:07

标题: 无监督异常检测使用扩散趋势分析

摘要: 传统的基于去噪扩散模型重建的异常检测技术被广泛应用，因为它们能够高效地识别异常位置和形状。然而，确定能够降低异常而保留正常特征的适当噪声参数存在局限性。此外，由于扩散模型的不稳定性，在重建过程中正常区域可能会发生相当大的波动，导致误检测。本文提出了一种通过分析重建趋势依赖于降解程度来检测异常的方法，有效解决了现有方法的两个问题。所提出的方法在工业异常检测的开放数据集上进行验证，在多个评估标准上提高了现有方法的性能。通过与现有异常检测方法的结合容易性，它在计算成本和性能之间提供了一种折衷，使其在制造业中具有较高的应用潜力。

更新时间: 2024-07-12 01:50:07

领域: cs.CV,cs.LG,68T45 (Primary) 68T27 (Secondary),I.2.10

下载: http://arxiv.org/abs/2407.09578v1

Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

The concept of digital twin (DT), which enables the creation of a programmable, digital representation of physical systems, is expected to revolutionize future industries and will lie at the heart of the vision of a future smart society, namely, Society 5.0, in which high integration between cyber (digital) and physical spaces is exploited to bring economic and societal advancements. However, the success of such a DT-driven Society 5.0 requires a synergistic convergence of artificial intelligence and networking technologies into an integrated, programmable system that can coordinate DT networks to effectively deliver diverse Society 5.0 services. Prior works remain restricted to either qualitative study, simple analysis or software implementations of a single DT, and thus, they cannot provide the highly synergistic integration of digital and physical spaces as required by Society 5.0. In contrast, this paper envisions a novel concept of an Internet of Federated Digital Twins (IoFDT) that holistically integrates heterogeneous and physically separated DTs representing different Society 5.0 services within a single framework and system. For this concept of IoFDT, we first introduce a hierarchical architecture that integrates federated DTs through horizontal and vertical interactions, bridging cyber and physical spaces to unlock new possibilities. Then, we discuss challenges of realizing IoFDT, highlighting the intricacies across communication, computing, and AI-native networks while also underscoring potential innovative solutions. Subsequently, we elaborate on the importance of the implementation of a unified IoFDT platform that integrates all technical components and orchestrates their interactions, emphasizing the necessity of practical experimental platforms with a focus on real-world applications in areas like smart mobility.

Updated: 2024-07-12 01:49:41

标题: 联邦数字孪生物联网（IoFDT）：连接跨越边界的孪生物，服务于Society 5.0

摘要: 数字孪生(DT)的概念使得能够创建可编程的、物理系统的数字表示，预计将彻底改变未来的产业，并将成为未来智能社会愿景的核心，即Society 5.0，其中数字和物理空间之间的高度集成被利用来带来经济和社会进步。然而，这种基于数字孪生的Society 5.0的成功需要人工智能和网络技术的协同融合成一个集成的、可编程的系统，该系统能够协调数字孪生网络，有效地提供多样化的Society 5.0服务。先前的研究仍然局限于定性研究、简单分析或单个数字孪生的软件实现，因此它们无法提供Society 5.0所需的数字和物理空间的高度协同集成。相比之下，本文设想了一个新颖的概念，即联邦数字孪生互联网(IoFDT)，它在一个框架和系统中整合了异构和物理分离的代表不同Society 5.0服务的数字孪生。对于IoFDT的这一概念，我们首先介绍了一个层次结构架构，通过水平和垂直互动集成联邦数字孪生，架起了数字和物理空间之间的桥梁，开启了新的可能性。然后，我们讨论了实现IoFDT所面临的挑战，强调了在通信、计算和人工智能原生网络之间的复杂性，同时也强调了潜在的创新解决方案。随后，我们详细阐述了实施一个统一的IoFDT平台的重要性，该平台整合了所有技术组件并协调它们的互动，强调了在智能移动等领域关注实际应用的实用实验平台的必要性。

更新时间: 2024-07-12 01:49:41

领域: cs.AI

下载: http://arxiv.org/abs/2312.06432v2

Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task evaluations. We introduce a novel framework that employs a complex network to comprehensively analyze the dynamics of knowledge transfer between tasks within EMaTO. By extracting and scrutinizing the knowledge transfer network from existing EMaTO algorithms, we evaluate the influence of network modifications on overall algorithmic efficacy. Our findings indicate that these networks are diverse, displaying community-structured directed graph characteristics, with their network density adapting to different task sets. This research underscores the viability of integrating complex network concepts into EMaTO to refine knowledge transfer processes, paving the way for future advancements in the domain.

Updated: 2024-07-12 01:49:04

标题: 从复杂网络的角度探索进化多任务优化中的知识传递

摘要: 进化多任务优化（EMaTO）领域因其能够简化具有重复特征的优化挑战的解决方案，从而节约计算资源而日益受到认可。本文解决了在EMaTO内构建高效知识传递机制的挑战，这一任务由于个体任务评估的计算需求而变得复杂。我们介绍了一个新颖的框架，利用复杂网络全面分析EMaTO内任务之间的知识传递动态。通过从现有的EMaTO算法中提取和审查知识传递网络，我们评估了网络修改对整体算法效力的影响。我们的研究结果表明，这些网络是多样的，显示出具有社区结构的有向图特征，其网络密度适应不同的任务集。这项研究强调了将复杂网络概念整合到EMaTO中以完善知识传递过程的可行性，为该领域未来的发展铺平道路。

更新时间: 2024-07-12 01:49:04

领域: cs.AI

下载: http://arxiv.org/abs/2407.08918v1

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

Updated: 2024-07-12 01:48:00

标题: 将网络空间与物理世界对齐：关于具身人工智能的全面调查

摘要: 具有体现人工智能（Embodied AI）的能力对实现人工通用智能（AGI）至关重要，并作为连接网络空间和物理世界的各种应用的基础。最近，多模态大型模型（MLMs）和世界模型（WMs）的出现引起了广泛关注，因为它们具有卓越的感知、交互和推理能力，使其成为具有前景的体现智能体的架构。然而，在MLMs时代，尚无全面的体现AI调查。在这项调查中，我们对体现AI的最新进展进行了全面探讨。我们的分析首先在体现机器人和模拟器的代表性作品前沿进行导航，以充分了解研究重点及其局限性。然后，我们分析了四个主要研究目标：1）体现感知，2）体现互动，3）体现代理和4）从模拟到真实的适应，涵盖了最先进的方法、基本范式和全面数据集。此外，我们探讨了虚拟和真实体现智能体中MLMs的复杂性，突出了它们在促进动态数字和物理环境中的互动中的重要性。最后，我们总结了体现AI的挑战和局限性，并讨论了它们潜在的未来方向。我们希望这项调查将为研究社区提供一个基础参考，并激发持续创新。相关项目可在https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List找到。

更新时间: 2024-07-12 01:48:00

领域: cs.CV,cs.AI,cs.LG,cs.MA,cs.RO

下载: http://arxiv.org/abs/2407.06886v2

Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection.

Updated: 2024-07-12 01:37:44

标题: 增强时空联邦学习的隐私，抵御梯度反转攻击

摘要: 最近，由于其在各种基于位置的服务中仅使用共享梯度训练有价值的模型的能力，时空联邦学习引起了密集的研究。另一方面，最近的研究表明，共享梯度可能受到图像或文本上的梯度反转攻击（GIA）的影响。然而，迄今为止，在时空联邦学习中尚未对梯度反转攻击进行系统研究。在本文中，我们从攻击和防御的角度探讨了时空联邦学习中的梯度攻击问题。为了了解时空联邦学习中的隐私风险，我们首先提出了时空梯度反转攻击（ST-GIA），这是一种针对时空数据的梯度攻击算法，成功地从梯度中重建出原始位置。此外，我们设计了一种自适应的防御策略，以减轻时空联邦学习中的梯度反转攻击。通过动态调整扰动水平，我们可以为不同轮次的训练数据提供量身定制的保护，从而实现隐私和效用之间比当前最先进方法更好的权衡。通过对三个真实世界数据集的深入实验分析，我们揭示了所提出的防御策略可以有效保留时空联邦学习的效用，并提供有效的安全保护。

更新时间: 2024-07-12 01:37:44

领域: cs.CR

下载: http://arxiv.org/abs/2407.08529v2

Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling

In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like Differential Privacy (DP) have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making the shuffle model inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.

Updated: 2024-07-12 01:36:06

标题: 超出统计估计：通过洗牌实现差分隐私个体计算

摘要: 在数据驱动的应用程序中，保护用户隐私同时实现有价值的计算仍然是一个关键挑战。差分隐私（DP）等技术在解决这些问题方面起着关键作用。DP的洗牌模型不需要信任的保管人，并且可以通过利用从洗牌中产生的隐私放大效应来实现高效用。这些优势导致了对洗牌模型的极大兴趣。然而，洗牌模型中的计算任务仅限于统计估计，使得洗牌模型在每个用户都需要个性化输出的现实场景中无法应用。本文介绍了一种名为私人个别计算（PIC）的新范例，将洗牌模型扩展到支持更广泛的置换等变计算。PIC在保护隐私的同时实现了个性化输出，并通过洗牌获得了隐私放大。我们提出了一个实现PIC的具体协议。通过使用一次性公钥，我们的协议使用户可以在不损害匿名性的情况下接收他们的输出，这对于隐私放大至关重要。此外，我们提出了一种优化的随机化器，Minkowski Response，专为PIC模型设计，以增强实用性。我们正式证明了PIC协议的安全性和隐私性属性。理论分析和实证评估展示了PIC在处理非统计计算任务方面的能力，以及与现有解决方案相比，PIC和Minkowski随机化器在实现更优效用方面的有效性。

更新时间: 2024-07-12 01:36:06

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.18145v2

PORCA: Root Cause Analysis with Partially Observed Data

Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.

Updated: 2024-07-12 01:28:49

标题: PORCA: 部分观测数据的根本原因分析

摘要: 根本原因分析(RCA)旨在通过揭示和分析复杂系统中的因果结构，识别系统故障的潜在原因。它已被广泛应用于许多应用领域。可靠的诊断结论对于减轻系统故障和财务损失至关重要。然而，先前的研究隐含地假设对系统进行了完整观察，忽视了部分观察的影响(即，缺失节点和潜在故障)。因此，它们未能得出可靠的RCA结果。在本文中，我们揭示了部分观察中未观察到的混杂因素和异质性的问题，并提出了根本原因分析在部分观察数据下的新问题。为实现这一目标，我们提出了PORCA，一种新颖的RCA框架，可以在未观察到的混杂因素和未观察到的异质性下探索可靠的根本原因。PORCA利用放大的基于分数的因果发现，有效地优化无环有向混合图在未观察到的混杂因素下。此外，我们还开发了一种异质性感知的调度策略，提供自适应样本权重。对一个合成数据集和两个真实世界数据集的广泛实验结果表明了所提框架的有效性和优越性。

更新时间: 2024-07-12 01:28:49

领域: cs.AI

下载: http://arxiv.org/abs/2407.05869v2

Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation, highlighting the efficacy of the employed methods. Results indicate that the proposed system achieves high accuracy and relevance in recommendations, making significant contributions to the field of recommendations systems.

Updated: 2024-07-12 01:26:33

标题: 用先进的机器学习方法改变电影推荐：NMF、SVD和K-Means聚类的研究

摘要: 本研究利用各种机器学习技术，包括非负矩阵分解（NMF）、截断奇异值分解（SVD）和K-Means聚类，开发了一个稳健的电影推荐系统。主要目标是通过提供个性化的电影推荐来提升用户体验。该研究涵盖了数据预处理、模型训练和评估，突出了所采用方法的有效性。结果表明，所提出的系统在推荐方面实现了高准确性和相关性，对推荐系统领域做出了重要贡献。

更新时间: 2024-07-12 01:26:33

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2407.08916v1

15M Multimodal Facial Image-Text Dataset

Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image-caption dataset to date. We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M. To validate the effectiveness of FaceCaption-15M, we first trained a facial language-image pre-training model (FLIP, similar to CLIP) to align facial image with its corresponding captions in feature space. Subsequently, using both image and text encoders and fine-tuning only the linear layer, our FLIP-based models achieved state-of-the-art results on two challenging face-centered tasks. The purpose is to promote research in the field of face-related tasks through the availability of the proposed FaceCaption-15M dataset. All data, codes, and models are publicly available. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

Updated: 2024-07-12 01:19:33

标题: 15M多模态人脸图像-文本数据集

摘要: 目前，图像文本驱动的多模态深度学习模型在许多领域展示了其杰出的潜力。在实践中，以面部图像为中心的任务具有广泛的应用前景。本文介绍了FaceCaption-15M，一个大规模、多样化和高质量的面部图像数据集，附带其自然语言描述（面部图像到文本）。该数据集旨在促进面部中心任务的研究。FaceCaption-15M包括超过1500万对面部图像和它们对应的面部特征自然语言描述，使其成为迄今为止最大的面部图像字幕数据集。我们对图像质量、文本自然性、文本复杂性和文本-图像相关性进行了全面分析，以展示FaceCaption-15M的优越性。为验证FaceCaption-15M的有效性，我们首先训练了一个面部语言-图像预训练模型（FLIP，类似于CLIP），将面部图像与其在特征空间中的对应描述对齐。随后，使用图像和文本编码器，并仅微调线性层，我们基于FLIP的模型在两个具有挑战性的面部中心任务上取得了最先进的结果。旨在通过提供所提出的FaceCaption-15M数据集来促进面部相关任务领域的研究。所有数据、代码和模型都是公开可用的。https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

更新时间: 2024-07-12 01:19:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08515v2

PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emissions. However, existing DRL methods need a pre-defined reward function to assess the impact of each action on the final sustainable development goals (SDG). In many real applications, such a reward function cannot be given in advance. To address the problem, this study proposes a Performance based Adversarial Imitation Learning (PAIL) engine. It is a novel method to acquire optimal operational policies for carbon neutrality without any pre-defined action rewards. Specifically, PAIL employs a Transformer-based policy generator to encode historical information and predict following actions within a multi-dimensional space. The entire action sequence will be iteratively updated by an environmental simulator. Then PAIL uses a discriminator to minimize the discrepancy between generated sequences and real-world samples of high SDG. In parallel, a Q-learning framework based performance estimator is designed to estimate the impact of each action on SDG. Based on these estimations, PAIL refines generated policies with the rewards from both discriminator and performance estimator. PAIL is evaluated on multiple real-world application cases and datasets. The experiment results demonstrate the effectiveness of PAIL comparing to other state-of-the-art baselines. In addition, PAIL offers meaningful interpretability for the optimization in carbon neutrality.

Updated: 2024-07-12 01:06:01

标题: PAIL：基于性能的对抗性模仿学习引擎，用于碳中性优化

摘要: 在工业运营中实现碳中和已成为可持续发展日益迫切的问题。这不仅是一个重大挑战，也是工业4.0中运营优化的关键机会。近年来，基于深度强化学习（DRL）的方法为顺序优化过程提供了有希望的增强，可以用于减少碳排放。然而，现有的DRL方法需要一个预定义的奖励函数来评估每个行动对最终可持续发展目标（SDG）的影响。在许多实际应用中，这样的奖励函数无法提前给出。为解决这个问题，本研究提出了一种基于性能的对抗模仿学习（PAIL）引擎。这是一种获取碳中和最佳运营策略的新方法，无需任何预定义的行动奖励。具体而言，PAIL利用基于Transformer的策略生成器对历史信息进行编码，并在多维空间内预测后续行动。整个行动序列将通过环境模拟器进行迭代更新。然后，PAIL使用鉴别器来最小化生成序列与高SDG真实样本之间的差异。同时，设计了基于Q学习框架的性能估计器，用于估计每个行动对SDG的影响。基于这些估计，PAIL通过鉴别器和性能估计器的奖励对生成的策略进行优化。PAIL在多个真实应用案例和数据集上进行评估。实验结果表明，PAIL相对于其他最先进的基准方法具有有效性。此外，PAIL为碳中和优化提供了有意义的可解释性。

更新时间: 2024-07-12 01:06:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.08910v1

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval

Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity.

Updated: 2024-07-12 00:59:32

标题: 它们是同一张图片吗？：将概念瓶颈模型应用于人工智能协作在图像检索中

摘要: 图像检索在从野生动物保护到医疗保健等应用中发挥着关键作用，用于查找个体动物或相关图像以帮助诊断。尽管图像检索的深度学习技术取得了显著进展，但它们在现实世界中的表现并不完美，通常需要包含人类专业知识。人在循环方法通常依赖于人类独立完成任务，然后以各种方式将他们的意见与 AI 模型相结合，因为这些模型提供的可解释性或可更正性很少。为了让人类能够干预 AI 模型，从而节省人力和精力，我们改编了概念瓶颈模型（CBM）并提出了 CHAIR。CHAIR (a) 使人类能够纠正中间概念，从而帮助改善生成的嵌入， (b) 允许灵活的干预水平，以适应不同水平的人类专业知识，以实现更好的检索。为了展示 CHAIR 的有效性，我们证明了我们的方法在图像检索指标上表现优于类似模型，而无需任何外部干预。此外，我们还展示了人类干预如何进一步提高检索性能，从而实现人工智能的互补。

更新时间: 2024-07-12 00:59:32

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.08908v1

AirSketch: Generative Motion to Sketch

Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.

Updated: 2024-07-12 00:52:04

标题: AirSketch：生成动作进行素描

摘要: 插图是人类表达和交流的基本方式。伴随语言的某些动作可以提供这种插图式的交流方式。虚增强现实（AR/VR）技术引入了利用手势绘制画作（空中绘画）的工具，但通常需要昂贵的硬件和额外的数字标记，从而限制了其可访问性和便携性。此外，空中绘画需要相当的技巧才能达到美学效果。为了解决这些挑战，我们引入了AirSketch的概念，旨在通过手势直接生成忠实和视觉连贯的草图，消除了复杂的头戴设备或标记的需要。我们设计了一种简单的基于增强的自监督训练程序，使可控的图像扩散模型能够学会将高度嘈杂的手部跟踪图像转换为清晰、美学的草图，同时保留原始跟踪数据的重要视觉线索。我们提供了两个空中绘画数据集来研究这个问题。我们的研究结果表明，除了能够从精确的空间输入产生逼真的图像外，可控的图像扩散还能有效地从嘈杂的输入中产生精致、清晰的草图。我们的工作是向无标记空中绘画迈出的第一步，并揭示了可控扩散模型在AirSketch和AR/VR等领域的独特应用。

更新时间: 2024-07-12 00:52:04

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2407.08906v1

Flash normalization: fast RMSNorm for LLMs

RMSNorm is used by many LLMs such as Llama, Mistral, and OpenELM. This paper details FlashNorm, which is an exact but faster implementation of RMSNorm followed by linear layers. See https://huggingface.co/open-machine/FlashNorm for code and more transformer tricks.

Updated: 2024-07-12 00:37:55

标题: 快速RMSNorm的Flash标准化处理方法

摘要: RMSNorm被许多LLMs（例如Llama、Mistral和OpenELM）使用。本文详细介绍了FlashNorm，这是一个准确但更快的RMSNorm实现，后面跟着线性层。请访问https://huggingface.co/open-machine/FlashNorm获取代码和更多transformer技巧。

更新时间: 2024-07-12 00:37:55

领域: cs.LG

下载: http://arxiv.org/abs/2407.09577v1

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.

Updated: 2024-07-12 00:35:18

标题: TensorTEE：统一异构TEE粒度，实现高效安全协作张量计算

摘要: 具有NPU和CPU的异构协作计算受到广泛关注，因为它具有显著的性能优势。为了在计算过程中确保数据的保密性和完整性，受信执行环境（TEE）被认为是一种有前途的解决方案，因为它的开销相对较低。然而，现有的异构TEE设计对于协作计算来说是低效的，原因是CPU和NPU之间存在细微和不同的内存粒度。1）CPU TEE的缓存行粒度增加了内存压力，因为它需要额外的内存访问，2）NPU的缓存行粒度MAC增加了有限内存存储的压力。3）跨异构保护区域的数据传输依赖于非安全区域的传递，导致繁琐的重新加密和调度。为了解决这些问题，我们提出了TensorTEE，一种用于高效安全协作张量计算的统一张量粒度的异构TEE。首先，在CPU TEE中虚拟支持张量粒度，通过在芯片上检测和维护张量结构来消除外部元数据访问。其次，我们提出了张量粒度的MAC管理与预测执行，以避免计算停滞，同时消除外部MAC存储和访问。此外，基于统一粒度，我们实现了直接数据传输，避免了重新加密和调度的困境。我们的评估基于增强的Gem5和一个周期精确的NPU模拟器。结果显示，与现有工作相比，TensorTEE将大型语言模型（LLM）训练工作负载的性能提高了4.0倍，并且与非安全训练相比只增加了2.1%的开销，为LLM训练提供了实际的安全保障。

更新时间: 2024-07-12 00:35:18

领域: cs.CR,cs.AI,cs.AR

下载: http://arxiv.org/abs/2407.08903v1

Application of Artificial Intelligence in Supporting Healthcare Professionals and Caregivers in Treatment of Autistic Children

Autism Spectrum Disorder (ASD) represents a multifaceted neurodevelopmental condition marked by difficulties in social interaction, communication impediments, and repetitive behaviors. Despite progress in understanding ASD, its diagnosis and treatment continue to pose significant challenges due to the variability in symptomatology and the necessity for multidisciplinary care approaches. This paper investigates the potential of Artificial Intelligence (AI) to augment the capabilities of healthcare professionals and caregivers in managing ASD. We have developed a sophisticated algorithm designed to analyze facial and bodily expressions during daily activities of both autistic and non-autistic children, leading to the development of a powerful deep learning-based autism detection system. Our study demonstrated that AI models, specifically the Xception and ResNet50V2 architectures, achieved high accuracy in diagnosing Autism Spectrum Disorder (ASD). This research highlights the transformative potential of AI in improving the diagnosis, treatment, and comprehensive management of ASD. Our study revealed that AI models, notably the Xception and ResNet50V2 architectures, demonstrated high accuracy in diagnosing ASD.

Updated: 2024-07-12 00:34:40

标题: 人工智能在支持医护人员和照顾者治疗自闭症儿童中的应用

摘要: 自闭症谱系障碍（ASD）代表了一个多方面的神经发育状况，其特点是社交互动困难、沟通障碍和重复行为。尽管在理解ASD方面取得了进展，但由于症状变异性和多学科护理方法的必要性，其诊断和治疗仍然具有重大挑战性。本文调查了人工智能（AI）在管理ASD方面增强卫生保健专业人员和护理人员能力的潜力。我们开发了一个复杂的算法，旨在分析自闭症和非自闭症儿童在日常活动中的面部和身体表情，从而开发出一个强大的基于深度学习的自闭症检测系统。我们的研究表明，AI模型，特别是Xception和ResNet50V2架构，在诊断自闭症谱系障碍（ASD）方面取得了高准确度。这项研究强调了AI在改善ASD的诊断、治疗和全面管理方面的变革潜力。我们的研究表明，AI模型，特别是Xception和ResNet50V2架构，在诊断ASD方面表现出高准确度。

更新时间: 2024-07-12 00:34:40

领域: cs.AI

下载: http://arxiv.org/abs/2407.08902v1

Increasing Trust in Language Models through the Reuse of Verified Circuits

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify an auto-regressive transformer model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into a larger untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

Updated: 2024-07-12 00:34:01

标题: 通过重复使用经过验证的电路增加语言模型的信任

摘要: 语言模型（LMs）越来越被广泛应用于各种预测任务，但它们的训练往往会忽视罕见的边缘情况，降低其可靠性。在这里，我们定义了一个严格的可信度标准，要求任务算法和电路实现必须经过验证，考虑到边缘情况，不存在已知的故障模式。我们展示了如果使用数学和逻辑明确定义的框架构建模型，可以训练一个符合这一标准的模型。在本文中，我们完全验证了一个自回归变压器模型，用于n位整数加法。为展示已验证模块的可重用性，我们将训练好的整数加法模型插入到一个更大的未经训练的模型中，并训练组合模型执行加法和减法。我们发现加法电路在两个任务中都得到了广泛的重复使用，简化了更复杂减法模型的验证。我们讨论了如何将经过验证的任务模块插入到LMs中，利用模型重用来提高使用它们构建的语言模型的可验证性和可信度。已验证电路的重用减少了验证更复杂的复合模型的工作量，我们认为这是语言模型安全性的重要一步。

更新时间: 2024-07-12 00:34:01

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.02619v8

Random Inverse Problems Over Graphs: Decentralized Online Learning

We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.

Updated: 2024-07-12 00:17:04

标题: 图上的随机反问题：分散式在线学习

摘要: 我们建立了一个分布式随机逆问题的框架，该框架涵盖了网络图和在线测量，并提出了一个分散式在线学习算法。这统一了希尔伯特空间中的分布参数估计和再生核希尔伯特空间中的最小均方问题（RKHS-LMS）。我们将算法的收敛性转化为希尔伯特空间中一类非齐次随机差分方程的渐近稳定性，其中包含了L2有界鞅差分项，并发展了希尔伯特空间中的L2-渐近稳定性理论。结果表明，如果网络图是连通的，并且正向算子序列满足无限维时空激励条件，那么所有节点的估计都是均方和几乎必然强一致的。此外，我们提出了一个基于非平稳和非独立在线数据流的RKHS中的分散式在线学习算法，并证明如果由随机输入数据导致的算子满足无限维时空激励条件，则该算法是均方和几乎必然强一致的。

更新时间: 2024-07-12 00:17:04

领域: cs.LG,cs.DC,cs.SY,eess.SY,math.PR

下载: http://arxiv.org/abs/2303.11789v6

It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma

The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems.

Updated: 2024-07-12 00:13:25

标题: 需要两个人：一种区块链验证者困境的同行预测解决方案

摘要: 区块链系统的安全性基本上是基于去中心化共识，其中大多数参与方都会诚实行事，内容验证过程对于保持区块链系统的稳健性至关重要。然而，安全的区块链系统中存在几乎没有作弊者或作弊者的现象可能无法为验证者提供足够的激励来诚实执行昂贵的验证，这被称为验证者困境，可能严重削弱区块链系统的基本安全性。虽然现有的研究已经尝试通过插入故意错误来减少懒惰的验证，但去中心化环境使得无法直接判断验证的正确性或检测恶意验证者。本文提出了一种利用对等预测方法的研究，用于设计多个验证者之间的去中心化验证博弈的贝叶斯诚实机制，激励所有验证者在验证过程中即使存在嘈杂观察也能进行诚实验证，而无需接触到真相。通过我们机制在验证博弈中的理论保证的真实性，我们的工作提供了一个验证机制框架，增强了区块链系统和潜在其他去中心化系统的安全性和稳健性。

更新时间: 2024-07-12 00:13:25

领域: cs.CR,cs.GT

下载: http://arxiv.org/abs/2406.01794v2

IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents

Seamless interaction between AI agents and humans using natural language remains a key goal in AI research. This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions through the IGLU competition at NeurIPS. Despite advancements, challenges such as a scarcity of appropriate datasets and the need for effective evaluation platforms persist. We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment, resulting in a Multi-Modal dataset with around 9,000 utterances and over 1,000 clarification questions. Additionally, we present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance through multi-turn communication with human annotators. We offer to the community these assets referred to as IDAT (IGLU Dataset And Toolkit) which aim to advance the development of intelligent, interactive AI agents and provide essential resources for further research.

Updated: 2024-07-12 00:07:43

标题: IDAT：用于构建和评估交互式任务解决代理的多模态数据集和工具包

摘要: 人工智能代理与人类之间使用自然语言进行无缝交互仍然是人工智能研究的一个关键目标。本文通过NeurIPS的IGLU竞赛探讨了开发能够理解和执行基于自然语言指令的交互式代理的挑战。尽管取得了进展，但诸如缺乏适当数据集和需要有效评估平台等挑战仍然存在。我们引入了一种可扩展的数据收集工具，用于在类似Minecraft的环境中收集交互式的基于语言指令，结果形成了一个包含约9,000个话语和1,000多个澄清问题的多模态数据集。此外，我们提出了一个人在环中的交互式评估平台，用于通过与人类注释者进行多轮交流来进行代理性能的定性分析和比较。我们向社区提供这些资源，称为IDAT（IGLU数据集和工具包），旨在推进智能、交互式人工智能代理的发展，并为进一步研究提供必要资源。

更新时间: 2024-07-12 00:07:43

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.08898v1