Arxiv Day: Article

A Method for Fast Autonomy Transfer in Reinforcement Learning

This paper introduces a novel reinforcement learning (RL) strategy designed to facilitate rapid autonomy transfer by utilizing pre-trained critic value functions from multiple environments. Unlike traditional methods that require extensive retraining or fine-tuning, our approach integrates existing knowledge, enabling an RL agent to adapt swiftly to new settings without requiring extensive computational resources. Our contributions include development of the Multi-Critic Actor-Critic (MCAC) algorithm, establishing its convergence, and empirical evidence demonstrating its efficacy. Our experimental results show that MCAC significantly outperforms the baseline actor-critic algorithm, achieving up to 22.76x faster autonomy transfer and higher reward accumulation. This advancement underscores the potential of leveraging accumulated knowledge for efficient adaptation in RL applications.

Updated: 2024-07-29 23:48:07

标题: 一种在强化学习中快速实现自主转移的方法

摘要: 本文介绍了一种新颖的强化学习（RL）策略，旨在通过利用来自多个环境的经过预训练的评论价值函数，促进快速自主转移。与传统方法不同，传统方法需要大量的重新训练或微调，我们的方法整合了现有知识，使RL代理能够迅速适应新环境，而无需大量的计算资源。我们的贡献包括开发了多评论家演员评论家（MCAC）算法，建立了其收敛性，并通过实证证据证明其有效性。我们的实验结果显示，MCAC明显优于基线演员评论家算法，实现了高达22.76倍的快速自主转移和更高的奖励积累。这一进步强调了在RL应用中利用积累的知识进行有效适应的潜力。

更新时间: 2024-07-29 23:48:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20466v1

Graphite: A Graph-based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation

Keyphrase Recommendation has been a pivotal problem in advertising and e-commerce where advertisers/sellers are recommended keyphrases (search queries) to bid on to increase their sales. It is a challenging task due to the plethora of items shown on online platforms and various possible queries that users search while showing varying interest in the displayed items. Moreover, query/keyphrase recommendations need to be made in real-time and in a resource-constrained environment. This problem can be framed as an Extreme Multi-label (XML) Short text classification by tagging the input text with keywords as labels. Traditional neural network models are either infeasible or have slower inference latency due to large label spaces. We present Graphite, a graph-based classifier model that provides real-time keyphrase recommendations that are on par with standard text classification models. Furthermore, it doesn't utilize GPU resources, which can be limited in production environments. Due to its lightweight nature and smaller footprint, it can train on very large datasets, where state-of-the-art XML models fail due to extreme resource requirements. Graphite is deterministic, transparent, and intrinsically more interpretable than neural network-based models. We present a comprehensive analysis of our model's performance across forty categories spanning eBay's English-speaking sites.

Updated: 2024-07-29 23:41:26

标题: 石墨：基于图的极端多标签短文本分类器用于关键词推荐

摘要: 关键词推荐在广告和电子商务中是一个关键性问题，广告主/卖家被推荐关键词（搜索查询）以增加销售额。由于在线平台上展示的物品种类繁多，用户在展示物品时搜索的可能查询也多种多样，因此这是一个具有挑战性的任务。此外，查询/关键词的推荐需要在实时和资源有限的环境中进行。这个问题可以被理解为一个极端多标签（XML）短文本分类，通过将输入文本与关键词标签化。传统的神经网络模型要么不可行，要么由于标签空间较大而推断延迟较慢。我们提出了Graphite，一个基于图的分类器模型，提供与标准文本分类模型相当的实时关键词推荐。此外，它不使用GPU资源，这在生产环境中可能受限。由于其轻量级和较小的占用空间，它可以在非常大的数据集上进行训练，而最先进的XML模型由于极端的资源需求而失败。Graphite是确定性的、透明的，并且比基于神经网络的模型更易解释。我们在跨越eBay英语网站的四十个类别上对我们模型的性能进行了全面分析。

更新时间: 2024-07-29 23:41:26

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.20462v1

Excavating Vulnerabilities Lurking in Multi-Factor Authentication Protocols: A Systematic Security Analysis

Nowadays, cyberattacks are growing exponentially, causing havoc to Internet users. In particular, authentication attacks constitute the major attack vector where intruders impersonate legitimate users to maliciously access systems or resources. Traditional single-factor authentication (SFA) protocols are often bypassed by side-channel and other attack techniques, hence they are no longer sufficient to the current authentication requirements. To alleviate this problem, multi-factor authentication (MFA) protocols have been widely adopted recently, which helps to raise the security bar against imposters. Although MFA is generally considered more robust and secure than SFA, it may not always guarantee enhanced security and efficiency. This is because, critical security vulnerabilities and performance problems may still arise due to design or implementation flaws of the protocols. Such vulnerabilities are often left unnoticed until they are exploited by attackers. Therefore, the main objective of this work is identifying such vulnerabilities in existing MFA protocols by systematically analysing their designs and constructions. To this end, we first form a set of security evaluation criteria, encompassing both existing and newly introduced ones, which we believe are very critical for the security of MFA protocols. Then, we thoroughly review several MFA protocols across different domains. Subsequently, we revisit and thoroughly analyze the design and construction of the protocols to identify potential vulnerabilities. Consequently, we manage to identify critical vulnerabilities in ten of the MFA protocols investigated. We thoroughly discuss the identified vulnerabilities in each protocol and devise relevant mitigation strategies. We also consolidate the performance information of those protocols to show the runtime and storage cost when employing varying number of authentication factors.

Updated: 2024-07-29 23:37:38

标题: 挖掘多因素身份验证协议中潜在的漏洞：系统性安全分析

摘要: 如今，网络攻击呈指数级增长，给互联网用户造成了严重困扰。特别是，身份验证攻击构成了主要的攻击向量，入侵者冒充合法用户恶意访问系统或资源。传统的单因素身份验证（SFA）协议经常被侧信道和其他攻击技术绕过，因此已不再满足当前的身份验证要求。为了缓解这一问题，最近广泛采用了多因素身份验证（MFA）协议，有助于提高安全性以抵御冒名顶替者。尽管MFA通常被认为比SFA更为强大和安全，但并不总能保证增强的安全性和效率。这是因为，由于协议设计或实施缺陷，关键的安全漏洞和性能问题仍可能出现。这些漏洞通常在被攻击者利用之前不被注意到。因此，本文的主要目标是通过系统分析现有MFA协议的设计和构建，识别这些漏洞。为此，我们首先形成了一组安全评估标准，包括现有和新引入的标准，我们认为这些标准对MFA协议的安全性非常关键。然后，我们彻底审查了不同领域中的几种MFA协议。随后，我们重新审视并彻底分析了协议的设计和构建，以识别潜在的漏洞。结果，我们成功地在调查的十个MFA协议中识别出了关键的漏洞。我们对每个协议中识别出的漏洞进行了深入讨论，并制定了相关的缓解策略。我们还整合了这些协议的性能信息，展示了在采用不同数量的身份验证因素时的运行时间和存储成本。

更新时间: 2024-07-29 23:37:38

领域: cs.CR

下载: http://arxiv.org/abs/2407.20459v1

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their encoding to provide more relevant modal information. In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives, where we find unbalanced learning between the two components, i.e., the feature encoder and the LLM, can cause diminishing learning gradients that slow the model convergence and often lead to sub-optimal results due to insufficient learning. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance, based on which we further design a dynamic learning scheduler that better coordinates the learning. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs considering the learning state of each model component, which potentially prevents each component from gradient diminishing and enables a more accurate estimation of the learning balance coefficient. We conduct experiments with multiple LLM backbones and feature encoders, where our techniques are model-agnostic and can be generically integrated with various MLLM backbones. Experiment results on multiple downstream tasks and modalities in vision and audio, demonstrate the proposed method's better efficiency and effectiveness in MLLM instruction tuning.

Updated: 2024-07-29 23:18:55

标题: CoMMIT: 多模态大型语言模型的协调指导调整

摘要: Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their encoding to provide more relevant modal information. In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives, where we find unbalanced learning between the two components, i.e., the feature encoder and the LLM, can cause diminishing learning gradients that slow the model convergence and often lead to sub-optimal results due to insufficient learning. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance, based on which we further design a dynamic learning scheduler that better coordinates the learning. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs considering the learning state of each model component, which potentially prevents each component from gradient diminishing and enables a more accurate estimation of the learning balance coefficient. We conduct experiments with multiple LLM backbones and feature encoders, where our techniques are model-agnostic and can be generically integrated with various MLLM backbones. Experiment results on multiple downstream tasks and modalities in vision and audio, demonstrate the proposed method's better efficiency and effectiveness in MLLM instruction tuning.

更新时间: 2024-07-29 23:18:55

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.20454v1

Composable Security of Distributed Symmetric Key Establishment Protocol

The Distributed Symmetric Key Establishment (DSKE) protocol provides secure secret exchange (e.g., for key exchange) between two honest parties that need not have had prior contact, and use intermediaries with whom they each securely share confidential data. We show the composable security of the DSKE protocol in the constructive cryptography framework of Maurer. Specifically, we prove the security (correctness and confidentiality) and robustness of this protocol against any computationally unbounded adversary, who additionally may have fully compromised a bounded number of the intermediaries and can eavesdrop on all communication. As DSKE is highly scalable in a network setting with no distance limit, it is expected to be a cost-effective quantum-safe cryptographic solution to safeguarding the network security against the threat of quantum computers.

Updated: 2024-07-29 23:12:43

标题: 可组合的分布式对称密钥建立协议的安全性

摘要: 分布式对称密钥建立（DSKE）协议提供了安全的秘密交换（例如，用于密钥交换）两个诚实方之间，他们之间不需要事先联系，并且使用中介方，他们分别与之安全共享机密数据。我们展示了DSKE协议在Maurer的构造密码学框架中的可组合安全性。具体而言，我们证明了该协议对任何计算不受限制的对手的安全性（正确性和保密性），并且该对手可能已经完全损害了有限数量的中介者并能监听所有通信。由于在网络环境中DSKE具有高度可扩展性且没有距离限制，因此预计它将是一种经济高效的量子安全加密解决方案，用于保护网络安全免受量子计算机的威胁。

更新时间: 2024-07-29 23:12:43

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2304.13789v2

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically $(1-1/e)$ optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have $c \log T + o(\log T)$ cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of $c' \log T+ \omega(\log T)$.

Updated: 2024-07-29 23:06:12

标题: 阻断土匪

摘要: 我们考虑一种新颖的随机多臂老虎机设置，其中玩一个臂使其在固定数量的时间槽之后不可用。这模拟了重新使用一个臂太频繁是不可取的（例如重复推荐相同的产品）或不可行的（例如在机器上进行计算作业调度）的情况。我们展示了，在所有臂的奖励和延迟的先验知识下，优化累积奖励的问题不会接受任何伪多项式时间算法（在臂的数量上），除非随机指数时间假设是错误的，通过映射到PINWHEEL调度问题。随后，我们展示了一个简单的贪婪算法，该算法会玩具有最高奖励的可用臂，在渐近意义下是（1-1/e）最优的。当奖励未知时，我们设计了一种基于UCB的算法，该算法对于贪婪算法具有$c \log T + o(\log T)$的累积遗憾，利用了由于不可用性而导致的自由探索臂。最后，当所有延迟相等时，问题会减少到组合半臂老虎机，从而为我们提供了$c' \log T+ \omega(\log T)$的下界。

更新时间: 2024-07-29 23:06:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/1907.11975v2

CourseAssist: Pedagogically Appropriate AI Tutor for Computer Science Education

The growing enrollments in computer science courses and increase in class sizes necessitate scalable, automated tutoring solutions to adequately support student learning. While Large Language Models (LLMs) like GPT-4 have demonstrated potential in assisting students through question-answering, educators express concerns over student overreliance, miscomprehension of generated code, and the risk of inaccurate answers. Rather than banning these tools outright, we advocate for a constructive approach that harnesses the capabilities of AI while mitigating potential risks. This poster introduces CourseAssist, a novel LLM-based tutoring system tailored for computer science education. Unlike generic LLM systems, CourseAssist uses retrieval-augmented generation, user intent classification, and question decomposition to align AI responses with specific course materials and learning objectives, thereby ensuring pedagogical appropriateness of LLMs in educational settings. We evaluated CourseAssist against a baseline of GPT-4 using a dataset of 50 question-answer pairs from a programming languages course, focusing on the criteria of usefulness, accuracy, and pedagogical appropriateness. Evaluation results show that CourseAssist significantly outperforms the baseline, demonstrating its potential to serve as an effective learning assistant. We have also deployed CourseAssist in 6 computer science courses at a large public R1 research university reaching over 500 students. Interviews with 20 student users show that CourseAssist improves computer science instruction by increasing the accessibility of course-specific tutoring help and shortening the feedback loop on their programming assignments. Future work will include extensive pilot testing at more universities and exploring better collaborative relationships between students, educators, and AI that improve computer science learning experiences.

Updated: 2024-07-29 23:01:18

标题: CourseAssist：计算机科学教育中的教育适用人工智能辅导员

摘要: 随着计算机科学课程的不断增加和班级规模的增加，需要可扩展的自动化辅导解决方案来充分支持学生学习。大型语言模型（LLMs）如GPT-4已经展示出在通过问答来协助学生方面的潜力，但教育工作者对学生过度依赖、对生成的代码的误解以及答案不准确的风险表示担忧。我们主张采取一种建设性的方法，利用人工智能的能力同时减轻潜在风险，而不是直接禁止这些工具。这篇海报介绍了CourseAssist，这是一个专门为计算机科学教育量身定制的基于LLM的辅导系统。与通用LLM系统不同，CourseAssist使用检索增强生成、用户意图分类和问题分解来将AI的回答与特定课程材料和学习目标对齐，从而确保在教育环境中LLM的教学适用性。我们使用一个包含50个问题-答案对的数据集对CourseAssist进行了评估，重点关注其实用性、准确性和教学适应性。评估结果显示，CourseAssist明显优于基线，展示了其作为有效学习助手的潜力。我们还在一所大型公立R1研究型大学的6门计算机科学课程中部署了CourseAssist，覆盖了500多名学生。对20名学生用户的访谈显示，CourseAssist通过提高课程特定辅导帮助的可访问性和缩短编程作业的反馈周期来改善计算机科学教学。未来的工作将包括在更多大学进行广泛的试点测试，并探索改善计算机科学学习体验的学生、教育工作者和人工智能之间更好的协作关系。

更新时间: 2024-07-29 23:01:18

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.10246v3

Domain Adaptable Prescriptive AI Agent for Enterprise

Despite advancements in causal inference and prescriptive AI, its adoption in enterprise settings remains hindered primarily due to its technical complexity. Many users lack the necessary knowledge and appropriate tools to effectively leverage these technologies. This work at the MIT-IBM Watson AI Lab focuses on developing the proof-of-concept agent, PrecAIse, a domain-adaptable conversational agent equipped with a suite of causal and prescriptive tools to help enterprise users make better business decisions. The objective is to make advanced, novel causal inference and prescriptive tools widely accessible through natural language interactions. The presented Natural Language User Interface (NLUI) enables users with limited expertise in machine learning and data science to harness prescriptive analytics in their decision-making processes without requiring intensive computing resources. We present an agent capable of function calling, maintaining faithful, interactive, and dynamic conversations, and supporting new domains.

Updated: 2024-07-29 23:00:32

标题: 领域适应性企业预设人工智能代理

摘要: 尽管因果推理和处方人工智能方面取得了进展，但其在企业环境中的采用仍然主要受到技术复杂性的阻碍。许多用户缺乏必要的知识和适当的工具来有效地利用这些技术。麻省理工学院-IBM沃森人工智能实验室的这项工作专注于开发“PrecAIse”概念验证代理，这是一个具备一套因果和处方工具的领域适应性会话代理，旨在帮助企业用户做出更好的业务决策。其目标是通过自然语言交互使先进、新颖的因果推理和处方工具得到广泛应用。所提出的自然语言用户界面（NLUI）使那些在机器学习和数据科学方面知识有限的用户能够在决策过程中利用处方分析，而不需要大量的计算资源。我们提出了一个能够调用函数、保持忠实、互动和动态对话，并支持新领域的代理。

更新时间: 2024-07-29 23:00:32

领域: cs.AI

下载: http://arxiv.org/abs/2407.20447v1

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

Existing music captioning methods are limited to generating concise global descriptions of short music clips, which fail to capture fine-grained musical characteristics and time-aware musical changes. To address these limitations, we propose FUTGA, a model equipped with fined-grained music understanding capabilities through learning from generative augmentation with temporal compositions. We leverage existing music caption datasets and large language models (LLMs) to synthesize fine-grained music captions with structural descriptions and time boundaries for full-length songs. Augmented by the proposed synthetic dataset, FUTGA is enabled to identify the music's temporal changes at key transition points and their musical functions, as well as generate detailed descriptions for each music segment. We further introduce a full-length music caption dataset generated by FUTGA, as the augmentation of the MusicCaps and the Song Describer datasets. We evaluate the automatically generated captions on several downstream tasks, including music generation and retrieval. The experiments demonstrate the quality of the generated captions and the better performance in various downstream tasks achieved by the proposed music captioning approach. Our code and datasets can be found in \href{https://huggingface.co/JoshuaW1997/FUTGA}{\textcolor{blue}{https://huggingface.co/JoshuaW1997/FUTGA}}.

Updated: 2024-07-29 22:53:32

标题: Futga：通过时间增强生成增强实现细粒度音乐理解

摘要: 现有的音乐字幕方法局限于生成简洁的全局描述短音乐片段，无法捕捉细粒度的音乐特征和时域感知的音乐变化。为了解决这些限制，我们提出了FUTGA，这是一个通过学习从时间组成中生成的增强来具备细粒度音乐理解能力的模型。我们利用现有的音乐字幕数据集和大型语言模型（LLMs）来合成带有结构描述和时间边界的全长歌曲的细粒度音乐字幕。通过提出的合成数据集的增强，FUTGA能够识别音乐在关键转折点的时间变化以及它们的音乐功能，并为每个音乐片段生成详细描述。我们进一步介绍了由FUTGA生成的全长音乐字幕数据集，作为MusicCaps和Song Describer数据集的增强。我们评估了自动生成的字幕在几个下游任务中的表现，包括音乐生成和检索。实验表明了生成字幕的质量以及提出的音乐字幕方法在各种下游任务中取得的更好性能。我们的代码和数据集可以在\href{https://huggingface.co/JoshuaW1997/FUTGA}{\textcolor{blue}{https://huggingface.co/JoshuaW1997/FUTGA}}中找到。

更新时间: 2024-07-29 22:53:32

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.20445v1

Importance Corrected Neural JKO Sampling

In order to sample from an unnormalized probability density function, we propose to combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights. We relate the iterative training of CNFs with regularized velocity fields to a JKO scheme and prove convergence of the involved velocity fields to the velocity field of the Wasserstein gradient flow (WGF). The alternation of local flow steps and non-local rejection-resampling steps allows to overcome local minima or slow convergence of the WGF for multimodal distributions. Since the proposal of the rejection step is generated by the model itself, they do not suffer from common drawbacks of classical rejection schemes. The arising model can be trained iteratively, reduces the reverse Kulback-Leibler (KL) loss function in each step, allows to generate iid samples and moreover allows for evaluations of the generated underlying density. Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets and outperforms the state of the art in almost all cases significantly.

Updated: 2024-07-29 22:49:59

标题: 重要性校正神经JKO抽样

摘要: 为了从一个非规范化的概率密度函数中采样，我们提出将连续正则化流（CNFs）与基于重要性权重的拒绝-重采样步骤相结合。我们将CNFs的迭代训练与正则化速度场联系起来，证明了所涉及速度场收敛到Wasserstein梯度流（WGF）的速度场。局部流步骤和非局部拒绝-重采样步骤的交替允许克服WGF在多峰分布中的局部最小值或慢收敛。由于拒绝步骤的提议由模型自身生成，因此它们不会受到传统拒绝方案的常见缺点的影响。产生的模型可以进行迭代训练，在每一步减少逆Kulback-Leibler（KL）损失函数，允许生成独立同分布的样本，而且允许评估生成的潜在密度。数值实例表明，我们的方法在各种测试分布上产生准确的结果，包括高维多峰目标，并在几乎所有情况下表现优于现有技术。

更新时间: 2024-07-29 22:49:59

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2407.20444v1

Convergence rates for the Adam optimizer

Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods are applied. As of today, maybe the most popular variant of such accelerated and adaptive SGD optimization methods is the famous Adam optimizer proposed by Kingma & Ba in 2014. Despite the popularity of the Adam optimizer in implementations, it remained an open problem of research to provide a convergence analysis for the Adam optimizer even in the situation of simple quadratic stochastic optimization problems where the objective function (the function one intends to minimize) is strongly convex. In this work we solve this problem by establishing optimal convergence rates for the Adam optimizer for a large class of stochastic optimization problems, in particular, covering simple quadratic stochastic optimization problems. The key ingredient of our convergence analysis is a new vector field function which we propose to refer to as the Adam vector field. This Adam vector field accurately describes the macroscopic behaviour of the Adam optimization process but differs from the negative gradient of the objective function (the function we intend to minimize) of the considered stochastic optimization problem. In particular, our convergence analysis reveals that the Adam optimizer does typically not converge to critical points of the objective function (zeros of the gradient of the objective function) of the considered optimization problem but converges with rates to zeros of this Adam vector field.

Updated: 2024-07-29 22:49:04

标题: Adam优化器的收敛速率

摘要: 随机梯度下降（SGD）优化方法现在是人工智能系统中深度神经网络（DNNs）训练的首选方法。在实际相关的训练问题中，通常不是使用简单的标准SGD方法作为优化方案，而是应用适当加速和自适应的SGD优化方法。到目前为止，也许最受欢迎的加速和自适应SGD优化方法的变体是由Kingma和Ba在2014年提出的著名的Adam优化器。尽管Adam优化器在实施中很受欢迎，但在简单二次随机优化问题的情况下，即目标函数（即打算最小化的函数）是强凸的情况下，为Adam优化器提供收敛分析仍然是一个未解决的问题。在这项工作中，我们通过为一大类随机优化问题建立Adam优化器的最优收敛速率来解决这个问题，特别是涵盖简单二次随机优化问题。我们收敛分析的关键要素是一个新的矢量场函数，我们建议将其称为Adam矢量场。这个Adam矢量场准确描述了Adam优化过程的宏观行为，但与所考虑的随机优化问题的目标函数（即我们打算最小化的函数）的负梯度不同。特别是，我们的收敛分析揭示了Adam优化器通常不会收敛到所考虑优化问题的目标函数的临界点（目标函数的梯度为零），而是以速率收敛到该Adam矢量场的零点。

更新时间: 2024-07-29 22:49:04

领域: math.OC,cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2407.21078v1

Fair Incentives for Repeated Engagement

We study a decision-maker's problem of finding optimal monetary incentive schemes for retention when faced with agents whose participation decisions (stochastically) depend on the incentive they receive. Our focus is on policies constrained to fulfill two fairness properties that preclude outcomes wherein different groups of agents experience different treatment on average. We formulate the problem as a high-dimensional stochastic optimization problem, and study it through the use of a closely related deterministic variant. We show that the optimal static solution to this deterministic variant is asymptotically optimal for the dynamic problem under fairness constraints. Though solving for the optimal static solution gives rise to a non-convex optimization problem, we uncover a structural property that allows us to design a tractable, fast-converging heuristic policy. Traditional schemes for retention ignore fairness constraints; indeed, the goal in these is to use differentiation to incentivize repeated engagement with the system. Our work (i) shows that even in the absence of explicit discrimination, dynamic policies may unintentionally discriminate between agents of different types by varying the type composition of the system, and (ii) presents an asymptotically optimal policy to avoid such discriminatory outcomes.

Updated: 2024-07-29 22:31:49

标题: 公平激励重复参与

摘要: 我们研究了决策者在面对其参与决策取决于他们所接收激励的代理人时，寻找最佳货币激励方案以保留这些代理人的问题。我们关注的是受限于满足两个公平性原则的政策，这些原则排除了在不同群体的代理人在平均水平上经历不同处理结果的情况。我们将问题描述为高维随机优化问题，并通过使用一个紧密相关的确定性变体来研究它。我们展示了这个确定性变体的最优静态解对于在公平性约束下的动态问题是渐近最优的。尽管解决最优静态解会引起一个非凸优化问题，但我们发现了一个结构性质，使我们能够设计一个易处理、快速收敛的启发式策略。传统的保留方案忽视了公平性约束；事实上，这些方案的目标是利用差异化来激励用户与系统重复互动。我们的工作(i)表明，即使在没有明确歧视的情况下，动态政策可能通过改变系统的类型组成而无意中歧视不同类型的代理人，(ii)提出了一个渐近最优的政策，以避免这种歧视性结果。

更新时间: 2024-07-29 22:31:49

领域: cs.GT,cs.LG,cs.MA,math.OC,math.PR

下载: http://arxiv.org/abs/2111.00002v3

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.

Updated: 2024-07-29 22:26:36

标题: 直接偏好优化：您的语言模型实际上是一个奖励模型

摘要: 尽管大规模无监督语言模型（LMs）学习了广泛的世界知识和一些推理技能，但由于其训练的完全无监督性质，要实现对其行为的精确控制是困难的。现有方法用于获得这种可操控性，收集了人类对模型生成质量的相对评价，并对无监督LM进行微调，以使其与这些偏好相一致，通常需要通过来自人类反馈的强化学习（RLHF）。然而，RLHF是一个复杂且经常不稳定的过程，首先拟合反映人类偏好的奖励模型，然后使用强化学习对大型无监督LM进行微调，以最大化这个估计的奖励，同时不会偏离原始模型太远。在本文中，我们介绍了RLHF中奖励模型的一个新参数化方法，使我们能够以封闭形式提取相应的最优策略，从而只需简单的分类损失就能解决标准RLHF问题。由此产生的算法，我们称之为直接偏好优化（DPO），是稳定的、高效的，计算负担轻，消除了在微调过程中对LM进行采样或进行显著的超参数调整的需要。我们的实验表明，DPO可以像或更优于现有方法那样微调LM以与人类偏好一致。值得注意的是，使用DPO进行微调在能够控制生成物的情绪方面超过了基于PPO的RLHF，并且在摘要和单轮对话的响应质量方面与其相匹配或有所提升，同时实现起来更简单，训练起来更简单。

更新时间: 2024-07-29 22:26:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2305.18290v3

MagMax: Leveraging Model Merging for Seamless Continual Learning

This paper introduces a continual learning approach named MagMax, which utilizes model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Distinct from traditional continual learning methods that aim to reduce forgetting during task training, MagMax combines sequential fine-tuning with a maximum magnitude weight selection for effective knowledge integration across tasks. Our initial contribution is an extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly hold up well in various continual learning contexts. More importantly, we present MagMax, a novel model-merging strategy that enables continual learning of large pre-trained models for successive tasks. Our thorough evaluation demonstrates the superiority of MagMax in various scenarios, including class- and domain-incremental learning settings. The code is available at this URL: https://github.com/danielm1405/magmax.

Updated: 2024-07-29 22:17:31

标题: MagMax：利用模型合并实现无缝持续学习

摘要: 本文介绍了一种名为MagMax的持续学习方法，该方法利用模型合并使大型预训练模型能够持续从新数据中学习，而不会忘记先前获得的知识。与传统的旨在减少任务训练过程中遗忘的持续学习方法不同，MagMax将顺序微调与最大幅度权重选择相结合，以实现跨任务的有效知识整合。我们的初始贡献是对模型合并技术的广泛研究，揭示了简单方法如权重平均和随机权重选择在各种持续学习环境中出人意料地表现良好。更重要的是，我们提出了MagMax，一种新颖的模型合并策略，可以实现大型预训练模型对连续任务的持续学习。我们进行了彻底的评估，证明了MagMax在各种情景下的优越性，包括类增量和域增量学习设置。该代码可在以下网址找到：https://github.com/danielm1405/magmax。

更新时间: 2024-07-29 22:17:31

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06322v2

Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression

Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.

Updated: 2024-07-29 22:12:06

标题: 利用LLMs进行自动化视频内容分析：对抑郁症短视频的探索性工作流程

摘要: 尽管越来越多的人对利用大型语言模型（LLMs）进行内容分析产生了兴趣，但目前的研究主要集中在基于文本的内容上。在本研究中，我们通过进行一个案例研究来探索LLMs在辅助视频内容分析方面的潜力，采用了一个新的LLM辅助多模态内容分析工作流程。该工作流程包括代码书设计、提示工程、LLM处理和人工评估。我们策略性地设计了注释提示，以获得结构化的LLM注释，并设计了解释提示，以生成LLM解释，以更好地理解LLM的推理和透明度。为了测试LLM的视频注释能力，我们分析了从25个关于抑郁症的YouTube短视频中提取的203个关键帧。我们将LLM的注释与两位人类编码者的注释进行了比较，发现LLM在对象和活动注释方面的准确性要高于情绪和类型注释。此外，我们还确定了LLM在视频注释方面的潜力和局限性。基于研究结果，我们探讨了未来研究和工作流程改进的机遇和挑战。我们还讨论了围绕基于LLM辅助视频分析的未来研究的伦理关切。

更新时间: 2024-07-29 22:12:06

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.19528v3

Generating Gender Alternatives in Machine Translation

Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term "the nurse") into the gendered form that is most prevalent in the systems' training data (e.g., "enfermera", the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow for resolving gender ambiguity in a frictionless manner, we study the problem of generating all grammatically correct gendered translation alternatives. We open source train and test datasets for five language pairs and establish benchmarks for this task. Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models and maintains high performance without requiring additional components or increasing inference overhead.

Updated: 2024-07-29 22:10:51

标题: 在机器翻译中生成性别替代方案

摘要: 机器翻译（MT）系统通常将具有模糊性别的术语（例如，英文术语“护士”）翻译为在系统训练数据中最普遍的性别形式（例如，“enfermera”，西班牙语中表示女护士的术语）。这通常反映并延续了社会中存在的有害刻板印象。考虑到允许以无摩擦方式解决性别歧义的MT用户界面，我们研究了生成所有语法正确的性别翻译替代方案的问题。我们开源了五种语言对的训练和测试数据集，并为该任务建立了基准。我们的关键技术贡献是一种新颖的半监督解决方案，可以与标准MT模型无缝集成，并在不需要额外组件或增加推理开销的情况下保持高性能。

更新时间: 2024-07-29 22:10:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.20438v1

Neural Surrogate HMC: Accelerated Hamiltonian Monte Carlo with a Neural Network Surrogate Likelihood

Bayesian Inference with Markov Chain Monte Carlo requires efficient computation of the likelihood function. In some scientific applications, the likelihood must be computed by numerically solving a partial differential equation, which can be prohibitively expensive. We demonstrate that some such problems can be made tractable by amortizing the computation with a surrogate likelihood function implemented by a neural network. We show that this has two additional benefits: reducing noise in the likelihood evaluations and providing fast gradient calculations. In experiments, the approach is applied to a model of heliospheric transport of galactic cosmic rays, where it enables efficient sampling from the posterior of latent parameters in the Parker equation.

Updated: 2024-07-29 21:54:57

标题: 神经代理HMC：使用神经网络代理似然加速哈密顿蒙特卡罗

摘要: 使用马尔可夫链蒙特卡洛贝叶斯推断需要高效计算似然函数。在一些科学应用中，似然函数必须通过数值求解偏微分方程来计算，这可能成本过高。我们展示了通过用神经网络实现的替代似然函数摊销计算可以使一些这样的问题变得可处理。我们展示了这种方法有两个额外的好处：减少似然评估中的噪声和提供快速梯度计算。在实验中，这种方法应用于银河宇宙射线的太阳圈传输模型，其中它使得从Parker方程潜在参数后验中进行有效采样成为可能。

更新时间: 2024-07-29 21:54:57

领域: cs.LG,astro-ph.HE,I.2.1

下载: http://arxiv.org/abs/2407.20432v1

New methods to compute the generalized chi-square distribution

We present several new mathematical methods (ray-trace, inverse Fourier transform and ellipse) and open-source software to compute the cdf, pdf and inverse cdf of the generalized chi-square distribution. Some methods are geared for speed, while others are designed to be accurate far into the tails, using which we can also measure large values of the discriminability index d' between multinormals. We characterize the performance and limitations of these and previous methods, and recommend the best methods to use for each part of each type of distribution. We also demonstrate the speed and accuracy of our new methods against previous methods across a wide sample of distributions.

Updated: 2024-07-29 21:39:17

标题: 计算广义卡方分布的新方法

摘要: 我们提出了几种新的数学方法（射线跟踪、逆傅立叶变换和椭圆）和开源软件，用于计算广义卡方分布的累积分布函数（cdf）、概率密度函数（pdf）和逆累积分布函数。一些方法旨在提高速度，而其他方法则设计为在尾部准确度较高，借助这些方法，我们还可以测量多元正态分布之间的辨别指数d'的大值。我们表征了这些和以前的方法的性能和局限性，并建议针对每种分布类型的每个部分使用最佳方法。我们还展示了我们的新方法在广泛样本分布中与以前方法相比的速度和准确性。

更新时间: 2024-07-29 21:39:17

领域: stat.CO,cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2404.05062v2

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite previously acquired knowledge when learning a new task. Existing methods mitigate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, humans are able to learn over long time horizons in dynamic, open-world environments, effortlessly memorizing unfamiliar objects and reliably recognizing them under various transformations. To make progress towards closing this gap, we introduce Infinite dSprites, a parsimonious tool for creating continual classification and disentanglement benchmarks of arbitrary length and with full control over generative factors. We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark. This result highlights an important and previously overlooked aspect of continual learning: given a finite modelling capacity and an arbitrarily long learning horizon, efficient learning requires memorizing class-specific information and accumulating knowledge about general mechanisms. In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting and improve classification accuracy over time. Our approach sets the stage for continual learning over hundreds of tasks with explicit control over memorization and forgetting, emphasizing open-set classification and one-shot generalization.

Updated: 2024-07-29 21:32:01

标题: 无限dSprites用于解开连续学习：将记忆编辑与泛化分离

摘要: 机器学习系统持续学习的能力受到灾难性遗忘的阻碍，即神经网络在学习新任务时倾向于覆盖先前获得的知识。现有方法通过正则化、参数隔离或复习来缓解这一问题，但它们通常仅在包含少数任务的基准上进行评估。相比之下，人类能够在动态的开放环境中学习很长时间，轻松地记忆陌生对象并在各种变换下可靠地识别它们。为了在缩小这一差距方面取得进展，我们引入了Infinite dSprites，这是一个简洁的工具，用于创建任意长度的连续分类和解缠基准，并完全控制生成因素。我们展示，在足够长的时间范围内，所有主要类型的持续学习方法在这个简单的基准上的性能会下降。这一结果突显了持续学习一个重要且以前被忽视的方面：在有限的建模能力和任意长的学习时间范围内，高效的学习需要记忆特定类别的信息并积累关于一般机制的知识。在一个具有对生成因素的直接监督的简单设置中，我们展示了学习无关类别的变换是一种规避灾难性遗忘并随着时间提高分类准确性的方法。我们的方法为在数百个任务上进行持续学习设定了舞台，具有对记忆和遗忘的明确控制，并强调开放集分类和一次性泛化。

更新时间: 2024-07-29 21:32:01

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2312.16731v3

Event-based Optical Flow on Neuromorphic Processor: ANN vs. SNN Comparison based on Activation Sparsification

Spiking neural networks (SNNs) for event-based optical flow are claimed to be computationally more efficient than their artificial neural networks (ANNs) counterparts, but a fair comparison is missing in the literature. In this work, we propose an event-based optical flow solution based on activation sparsification and a neuromorphic processor, SENECA. SENECA has an event-driven processing mechanism that can exploit the sparsity in ANN activations and SNN spikes to accelerate the inference of both types of neural networks. The ANN and the SNN for comparison have similar low activation/spike density (~5%) thanks to our novel sparsification-aware training. In the hardware-in-loop experiments designed to deduce the average time and energy consumption, the SNN consumes 44.9ms and 927.0 microjoules, which are 62.5% and 75.2% of the ANN's consumption, respectively. We find that SNN's higher efficiency attributes to its lower pixel-wise spike density (43.5% vs. 66.5%) that requires fewer memory access operations for neuron states.

Updated: 2024-07-29 21:22:53

标题: 基于事件的光流在神经形态处理器上：基于激活稀疏化的ANN与SNN比较

摘要: 脉冲神经网络（SNNs）用于基于事件的光流被声称在计算上比它们的人工神经网络（ANNs）同行更有效，但文献中缺乏公平的比较。在这项工作中，我们提出了一种基于激活稀疏化和神经形态处理器SENECA的基于事件的光流解决方案。 SENECA具有一个事件驱动的处理机制，可以利用ANN激活和SNN脉冲中的稀疏性来加速这两种类型神经网络的推理。通过我们的新颖稀疏化感知训练，用于比较的ANN和SNN具有类似的低激活/脉冲密度（约为5％）。在旨在推导平均时间和能量消耗的硬件在循环实验中，SNN的消耗分别为44.9毫秒和927.0微焦耳，分别为ANN消耗的62.5％和75.2％。我们发现，SNN更高的效率归因于其较低的像素级脉冲密度（43.5％对66.5％），这需要更少的内存访问操作来处理神经元状态。

更新时间: 2024-07-29 21:22:53

领域: cs.NE,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20421v1

AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images

Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be challenging due to appearance disparities. A commonly employed solution to counter this SIM2REAL gap is unsupervised domain adaptation, where models are trained using labeled SIM data and unlabeled REAL data. Mispredictions made by such SIM2REAL adapted models are often associated with miscalibration - stemming from overconfident predictions on real data. In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves SIM2REAL adapted models by - (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection - all while retaining or improving SIM2REAL performance. Given a base SIM2REAL adaptation algorithm, at training time, AUGCAL involves replacing vanilla SIM images with strongly augmented views (AUG intervention) and additionally optimizing for a training time calibration loss on augmented SIM predictions (CAL intervention). We motivate AUGCAL using a brief analytical justification of how to reduce miscalibration on unlabeled REAL data. Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.

Updated: 2024-07-29 21:09:11

标题: AUGCAL：通过对增强合成图像的不确定性校准改善Sim2Real适应性

摘要: 合成数据（SIM）是从模拟器中获取的，已经成为一个流行的选择，用于训练模型，其中获取带有注释的真实世界图像是困难的。然而，将在合成图像上训练的模型转移到真实世界应用程序可能具有挑战性，因为外观差异。应对这种SIM2REAL差距的常用解决方案是无监督域自适应，其中使用标记的SIM数据和未标记的REAL数据对模型进行训练。这种SIM2REAL自适应模型的错误预测通常与校准不准确有关-源于对真实数据的过于自信的预测。在本文中，我们介绍了AUGCAL，这是一种简单的训练时补丁，用于无监督自适应，通过改善SIM2REAL自适应模型来（1）减少总体校准不准确，（2）减少对错误预测的过度自信和（3）通过更好地引导误分类检测来提高置信度得分的可靠性-同时保留或提高SIM2REAL性能。给定一个基本的SIM2REAL自适应算法，在训练时，AUGCAL涉及用强烈增强的视图（AUG干预）替换普通SIM图像，并且还针对增强的SIM预测优化训练时校准损失（CAL干预）。我们通过简要的分析论证来激发AUGCAL如何减少未标记的REAL数据上的校准不准确。通过我们的实验，我们从经验上展示了AUGCAL在多种自适应方法，主干，任务和转换中的有效性。

更新时间: 2024-07-29 21:09:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.06106v3

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.

Updated: 2024-07-29 20:57:44

标题: 从失败中学习：利用试错数据对LLMs进行微调，用于直觉主义命题逻辑证明

摘要: 最近自动定理证明领域取得了一些进展，表明利用一个（大型）语言模型生成策略（即证明步骤）来搜索证明状态的有效性。当前模型虽然仅在成功的证明路径上进行训练，但在推断阶段面临一个差异，因为它必须在每个证明状态中对各种策略进行采样和尝试，直到找到成功，与其训练不包含从失败尝试中学习的情况不同。直觉上，导致失败搜索路径的策略会表明类似的策略在随后的尝试中应该受到较少关注。在本文中，我们展示了训练模型另外从失败搜索路径中学习的好处。面对现有开源定理证明数据集中缺乏这种试错数据的情况，我们在直觉主义命题逻辑定理上策划了一个数据集，并在Lean中形式化，以便我们可以可靠地检查证明的正确性。我们将我们的模型（TrialMaster）训练在相对较短的试错信息上与仅训练在正确路径上的模型进行比较，发现前者以更少的试错搜索解决了更多未知定理。

更新时间: 2024-07-29 20:57:44

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2404.07382v3

Auto-Regressive Next-Token Predictors are Universal Learners

Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the simple task of next-token prediction. In this work, we present a theoretical framework for studying auto-regressive next-token predictors. We demonstrate that even simple models such as linear next-token predictors, trained on Chain-of-Thought (CoT) data, can approximate any function efficiently computed by a Turing machine. We introduce a new complexity measure -- length complexity -- which measures the number of intermediate tokens in a CoT sequence required to approximate some target function, and analyze the interplay between length complexity and other notions of complexity. Finally, we show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks. Our results demonstrate that the power of today's LLMs can be attributed, to a great extent, to the auto-regressive next-token training scheme, and not necessarily to a particular choice of architecture.

Updated: 2024-07-29 20:51:25

标题: 自回归下一个标记预测器是通用学习器

摘要: 大型语言模型在逻辑和数学推理方面展现出卓越的能力，使它们能够解决复杂的任务。有趣的是，这些能力在训练简单的下一个标记预测任务的网络中出现。在这项工作中，我们提出了一个用于研究自回归下一个标记预测器的理论框架。我们证明，即使是简单的模型，如线性下一个标记预测器，在Chain-of-Thought（CoT）数据上训练，也可以有效地逼近图灵机计算的任何函数。我们引入了一个新的复杂度度量——长度复杂度，它衡量了在逼近某个目标函数时需要的CoT序列中的中间标记数量，并分析了长度复杂度与其他复杂度概念之间的相互作用。最后，我们通过实验证明，简单的下一个标记预测器，如线性网络和浅层多层感知器（MLP），在文本生成和算术任务上显示出非平凡的性能。我们的结果表明，今天的LLMs的强大能力在很大程度上归因于自回归下一个标记训练方案，而不一定是特定架构的选择。

更新时间: 2024-07-29 20:51:25

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2309.06979v3

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models

Large Language Models (LLMs) rely on instruction samples for alignment, but creating these datasets poses challenges, particularly in expert-dependent tasks like coding, which can be cost-prohibitive. One approach to mitigate these challenges is synthesizing data using another LLM. In this paper, we introduce a scalable method for generating synthetic instructions to enhance the code generation capability of LLMs. The proposed algorithm, Genetic-Instruct, mimics evolutionary processes, utilizing self-instruction to create numerous synthetic samples from a limited number of seeds. Genetic-Instruct is designed for efficient scaling of the generation process. Fine-tuning multiple coding LLMs with the synthetic samples demonstrates a significant improvement in their code generation accuracy compared to the baselines.

Updated: 2024-07-29 20:42:59

标题: 遗传指令：扩展合成编码指令生成大型语言模型

摘要: 大型语言模型（LLMs）依赖于指导样本进行对齐，但创建这些数据集存在挑战，特别是在专家依赖性任务（如编码）中，这可能成本过高。缓解这些挑战的一种方法是使用另一个LLM来合成数据。在本文中，我们介绍了一种可扩展的方法，用于生成合成指导以增强LLMs的代码生成能力。所提出的算法Genetic-Instruct 模拟了进化过程，利用自我指导从有限数量的种子中创建大量合成样本。Genetic-Instruct 旨在有效扩展生成过程。使用合成样本对多个编码LLMs进行微调，与基线相比，其代码生成准确性显著提高。

更新时间: 2024-07-29 20:42:59

领域: cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.21077v1

Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models

We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline. This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place. This enables language-conditioned rearrangement to be performed zero-shot, without needing to collect a training dataset of example arrangements. Results on a series of real-world tasks show that this framework is robust to distractors, controllable by language, capable of understanding complex multi-object relations, and readily applicable to both tabletop and 6-DoF rearrangement tasks.

Updated: 2024-07-29 20:40:09

标题: 梦想转现实：利用视觉-语言模型进行零样本3D物体重新排列

摘要: 我们介绍了Dream2Real，这是一个机器人框架，将训练在2D数据上的视觉语言模型（VLMs）集成到一个3D物体重新排列管道中。通过机器人自主构建场景的3D表示，其中物体可以在虚拟环境中重新排列，并渲染出结果安排的图像。这些渲染由VLM进行评估，以便选择并在现实世界中用拾取和放置重现最符合用户指令的排列。这使得可以进行基于语言条件的重新排列，而无需收集示例排列的训练数据集。一系列实际任务的结果表明，该框架对干扰因素具有鲁棒性，可以通过语言进行控制，能够理解复杂的多物体关系，并且可以轻松应用于桌面和6DoF重新排列任务。

更新时间: 2024-07-29 20:40:09

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.04533v2

Particip-AI: Anticipating Future AI Use Cases and Impacts with Lay Users

General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without a comprehensive assessment of risks. As a first step towards democratic risk assessment and design of general purpose AI, we introduce PARTICIP-AI, a carefully designed framework for laypeople to speculate and assess AI use cases and their impacts. Our framework allows us to study more nuanced and detailed public opinions on AI through collecting use cases, surfacing diverse harms through risk assessment under alternate scenarios (i.e., developing and not developing a use case), and illuminating tensions over AI development through making a concluding choice on its development. To showcase the promise of our framework towards informing democratic AI development, we run a medium-scale study with inputs from 295 demographically diverse participants. Our analyses show that participants' responses emphasize applications for personal life and society, contrasting with most current AI development's business focus. We also surface diverse set of envisioned harms such as distrust in AI and institutions, complementary to those defined by experts. Furthermore, we found that perceived impact of not developing use cases significantly predicted participants' judgements of whether AI use cases should be developed, and highlighted lay users' concerns of techno-solutionism. We conclude with a discussion on how frameworks like PARTICIP-AI can further guide democratic AI development and governance.

Updated: 2024-07-29 20:38:48

标题: 参与人工智能：与普通用户一同预测未来人工智能使用案例和影响

摘要: 通用人工智能，如ChatGPT，似乎降低了公众使用人工智能和利用其力量的障碍。然而，人工智能的治理和发展仍然掌握在少数人手中，发展速度加快，没有全面评估风险。作为通用人工智能民主风险评估和设计的第一步，我们介绍了PARTICIP-AI，这是一个精心设计的框架，供普通人推测和评估人工智能使用案例及其影响。我们的框架允许我们通过收集使用案例，通过在不同场景下进行风险评估（即开发和不开发使用案例）揭示多样化的危害，并通过对其开发做出结论选择，阐明人工智能发展的紧张关系。为展示我们的框架对民主人工智能发展的潜力，我们进行了一项有295名人口多样化参与者参与的中等规模研究。我们的分析显示，参与者的回应强调了个人生活和社会的应用，与大多数当前人工智能发展的商业重点形成对比。我们还发现了各种预期的危害，如对人工智能和机构的不信任，这些危害与专家定义的危害相辅相成。此外，我们发现，不开发使用案例的感知影响显著预测了参与者对是否应该开发人工智能使用案例的判断，并强调了普通用户对技术解决方案的担忧。我们最后讨论了像PARTICIP-AI这样的框架如何进一步指导民主人工智能发展和治理。

更新时间: 2024-07-29 20:38:48

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.14791v3

Auxiliary task demands mask the capabilities of smaller language models

Developmental psychologists have argued about when cognitive capacities such as language understanding or theory of mind emerge. These debates often hinge on the concept of "task demands" -- the auxiliary challenges associated with performing a particular evaluation -- that may mask the child's underlying ability. The same issues arise when measuring the capacities of language models (LMs): performance on a task is a function of the model's underlying knowledge, combined with the model's ability to interpret and perform the task given its available resources. Here, we show that for analogical reasoning, reflective reasoning, word prediction, and grammaticality judgments, evaluation methods with greater task demands yield lower performance than evaluations with reduced demands. This "demand gap" is most pronounced for models with fewer parameters and less training data. Our results illustrate that LM performance should not be interpreted as a direct indication of intelligence (or lack thereof), but as a reflection of capacities seen through the lens of researchers' design choices.

Updated: 2024-07-29 20:29:32

标题: 辅助任务需求掩盖了较小语言模型的能力

摘要: 发展心理学家们一直在争论认知能力，比如语言理解或心理理论何时出现。这些争论常常围绕着"任务要求"的概念展开，即执行特定评估所涉及的辅助挑战，这些挑战可能掩盖了儿童潜在的能力。当测量语言模型（LM）的能力时，同样的问题也会出现：任务表现是模型基础知识和模型根据其可用资源解释和执行任务的能力的函数。在这里，我们展示，对于类比推理、反思推理、单词预测和语法判断，具有更大任务要求的评估方法的表现低于具有较少要求的评估。这种"需求差距"在参数更少、训练数据更少的模型中最为明显。我们的结果说明，LM的表现不应被解释为智力（或缺乏智力）的直接指标，而应被视为研究设计选择所反映的能力的一种表现。

更新时间: 2024-07-29 20:29:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02418v2

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable). However, LMs may still produce a valid solution if they not only write code, but also selectively "emulate" the interpreter by generating the expected output of "detect_sarcasm(string)". In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM (as an "LMulator"). Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. In a nutshell, CoC broadens the scope of reasoning questions that LMs can answer by "thinking in code".

Updated: 2024-07-29 20:21:37

标题: 代码链：与语言模型增强的代码仿真器推理

摘要: 代码为构建复杂程序和进行精确计算提供了一般的语法结构，当与代码解释器配对时，我们假设语言模型（LMs）可以利用编写代码来改进思维链推理，不仅适用于逻辑和算术任务，还适用于语义任务（特别是那些混合了逻辑和算术的任务）。例如，考虑提示LM编写代码来计算在一篇文章中检测到讽刺的次数：LM可能会难以编写可以由解释器执行的“detect_sarcasm(string)”的实现（处理边界情况将是不可逾越的）。然而，如果LM不仅编写代码，而且有选择地“模拟”解释器通过生成“detect_sarcasm(string)”的预期输出，它们仍然可以产生有效的解决方案。在这项工作中，我们提出了链式代码（CoC），这是一个简单但出人意料有效的扩展，可以提高LM基于代码的推理。关键思想是鼓励LM将程序中的语义子任务格式化为灵活的伪代码，以便解释器可以明确捕捉未定义行为并交给LM模拟器（作为“LMulator”）进行模拟。实验证明，链式代码在各种基准测试中表现优于思维链和其他基准测试；在BIG-Bench Hard上，链式代码实现了84％，比思维链高出12％。简而言之，CoC通过“以代码思考”扩大了LM可以回答的推理问题范围。

更新时间: 2024-07-29 20:21:37

领域: cs.CL,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2312.04474v4

MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Current research in breast cancer Magnetic Resonance Imaging (MRI), especially with Artificial Intelligence (AI), faces challenges due to the lack of expert segmentations. To address this, we introduce the MAMA-MIA dataset, comprising 1506 multi-center dynamic contrast-enhanced MRI cases with expert segmentations of primary tumors and non-mass enhancement areas. These cases were sourced from four publicly available collections in The Cancer Imaging Archive (TCIA). Initially, we trained a deep learning model to automatically segment the cases, generating preliminary segmentations that significantly reduced expert segmentation time. Sixteen experts, averaging 9 years of experience in breast cancer, then corrected these segmentations, resulting in the final expert segmentations. Additionally, two radiologists conducted a visual inspection of the automatic segmentations to support future quality control studies. Alongside the expert segmentations, we provide 49 harmonized demographic and clinical variables and the pretrained weights of the well-known nnUNet architecture trained using the DCE-MRI full-images and expert segmentations. This dataset aims to accelerate the development and benchmarking of deep learning models and foster innovation in breast cancer diagnostics and treatment planning.

Updated: 2024-07-29 20:16:23

标题: MAMA-MIA: 一个大规模多中心的带有专家分割的乳腺癌DCE-MRI基准数据集

摘要: 目前乳腺癌磁共振成像（MRI）的研究，特别是在人工智能（AI）方面，面临由于缺乏专家分割而带来的挑战。为了解决这个问题，我们引入了MAMA-MIA数据集，包括1506个多中心动态对比增强MRI病例，其中包括原发肿瘤和非结块增强区域的专家分割。这些病例来自癌症成像存档（TCIA）中的四个公开可用的收藏。最初，我们训练了一个深度学习模型来自动分割这些病例，生成了初步分割，显著减少了专家分割的时间。随后，十六名在乳腺癌方面平均有9年经验的专家对这些分割进行了修正，得到了最终的专家分割。此外，两名放射科医师对自动分割进行了视觉检查，以支持未来的质量控制研究。除了专家分割，我们还提供了49个和谐的人口统计和临床变量，以及使用DCE-MRI全图像和专家分割训练的知名nnUNet架构的预训练权重。该数据集旨在加速深度学习模型的开发和基准测试，并促进乳腺癌诊断和治疗规划方面的创新。

更新时间: 2024-07-29 20:16:23

领域: cs.CV,cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.13844v2

Dense Self-Supervised Learning for Medical Image Segmentation

Deep learning has revolutionized medical image segmentation, but it relies heavily on high-quality annotations. The time, cost and expertise required to label images at the pixel-level for each new task has slowed down widespread adoption of the paradigm. We propose Pix2Rep, a self-supervised learning (SSL) approach for few-shot segmentation, that reduces the manual annotation burden by learning powerful pixel-level representations directly from unlabeled images. Pix2Rep is a novel pixel-level loss and pre-training paradigm for contrastive SSL on whole images. It is applied to generic encoder-decoder deep learning backbones (e.g., U-Net). Whereas most SSL methods enforce invariance of the learned image-level representations under intensity and spatial image augmentations, Pix2Rep enforces equivariance of the pixel-level representations. We demonstrate the framework on a task of cardiac MRI segmentation. Results show improved performance compared to existing semi- and self-supervised approaches; and a 5-fold reduction in the annotation burden for equivalent performance versus a fully supervised U-Net baseline. This includes a 30% (resp. 31%) DICE improvement for one-shot segmentation under linear-probing (resp. fine-tuning). Finally, we also integrate the novel Pix2Rep concept with the Barlow Twins non-contrastive SSL, which leads to even better segmentation performance.

Updated: 2024-07-29 19:42:22

标题: 密集的自监督学习在医学图像分割中的应用

摘要: 深度学习已经彻底改变了医学图像分割，但它严重依赖高质量的标注。为每个新任务在像素级别标记图像所需的时间、成本和专业知识减缓了该范式的广泛采用。我们提出了Pix2Rep，一种自监督学习（SSL）方法，用于少样本分割，通过直接从未标记图像中学习强大的像素级别表示来减轻手动注释负担。Pix2Rep是一种新颖的整体图像对比自监督学习的像素级损失和预训练范式。它应用于通用编码器-解码器深度学习骨干（例如U-Net）。大多数SSL方法强制要求在图像强化和空间图像变换下学习的图像级别表示具有不变性，而Pix2Rep强制要求像素级别表示具有等变性。我们在心脏MRI分割任务上展示了该框架。结果显示，与现有的半监督和自监督方法相比，性能有所提高；与完全监督的U-Net基线相比，注释负担减少5倍，性能相当。在线性探测（resp.调整精度）下，一次性分割的DICE改善了30%（resp.31%）。最后，我们还将新颖的Pix2Rep概念与Barlow Twins非对比自监督学习相结合，这导致了更好的分割性能。

更新时间: 2024-07-29 19:42:22

领域: cs.CV,cs.AI,cs.LG,I.4.6; I.4.10

下载: http://arxiv.org/abs/2407.20395v1

Two-Phase Segmentation Approach for Accurate Left Ventricle Segmentation in Cardiac MRI using Machine Learning

Accurate segmentation of the Left Ventricle (LV) holds substantial importance due to its implications in disease detection, regional analysis, and the development of complex models for cardiac surgical planning. CMR is a golden standard for diagnosis of serveral cardiac diseases. LV in CMR comprises of three distinct sections: Basal, Mid-Ventricle, and Apical. This research focuses on the precise segmentation of the LV from Cardiac MRI (CMR) scans, joining with the capabilities of Machine Learning (ML). The central challenge in this research revolves around the absence of a set of parameters applicable to all three types of LV slices. Parameters optimized for basal slices often fall short when applied to mid-ventricular and apical slices, and vice versa. To handle this issue, a new method is proposed to enhance LV segmentation. The proposed method involves using distinct sets of parameters for each type of slice, resulting in a two-phase segmentation approach. The initial phase categorizes images into three groups based on the type of LV slice, while the second phase aims to segment CMR images using parameters derived from the preceding phase. A publicly available dataset (Automated Cardiac Diagnosis Challenge (ACDC)) is used. 10-Fold Cross Validation is used and it achieved a mean score of 0.9228. Comprehensive testing indicates that the best parameter set for a particular type of slice does not perform adequately for the other slice types. All results show that the proposed approach fills a critical void in parameter standardization through a two-phase segmentation model for the LV, aiming to not only improve the accuracy of cardiac image analysis but also contribute advancements to the field of LV segmentation.

Updated: 2024-07-29 19:26:24

标题: 基于机器学习的心脏磁共振图像中左心室精确分割的双阶段分割方法

摘要: 左心室（LV）的准确分割具有重要意义，因为它对疾病检测、区域分析和心脏手术规划复杂模型的发展具有重要影响。CMR是诊断多种心脏疾病的黄金标准。CMR中的LV包括三个明显部分：基底部、中腔和心尖部。本研究专注于利用机器学习（ML）技术对心脏磁共振成像（CMR）扫描中的LV进行精确分割。该研究的核心挑战在于缺乏适用于所有三种LV切片类型的参数集。针对基底切片优化的参数在应用于中腔和心尖切片时往往效果不佳，反之亦然。为解决这一问题，提出了一种新的方法来增强LV分割。该方法涉及使用针对每种切片类型的不同参数集，实现了一个两阶段分割方法。初始阶段根据LV切片类型将图像分为三组，而第二阶段旨在使用从前一阶段得出的参数来分割CMR图像。使用了一个公开可用的数据集（自动心脏诊断挑战（ACDC））。使用了10折交叉验证，实现了0.9228的平均分数。综合测试表明，特定类型切片的最佳参数集不适用于其他切片类型。所有结果表明，所提出的方法通过一个两阶段分割模型填补了参数标准化的关键空白，旨在不仅提高心脏图像分析的准确性，还为LV分割领域的进步做出贡献。

更新时间: 2024-07-29 19:26:24

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20387v1

Appraisal-Guided Proximal Policy Optimization: Modeling Psychological Disorders in Dynamic Grid World

The integration of artificial intelligence across multiple domains has emphasized the importance of replicating human-like cognitive processes in AI. By incorporating emotional intelligence into AI agents, their emotional stability can be evaluated to enhance their resilience and dependability in critical decision-making tasks. In this work, we develop a methodology for modeling psychological disorders using Reinforcement Learning (RL) agents. We utilized Appraisal theory to train RL agents in a dynamic grid world environment with an Appraisal-Guided Proximal Policy Optimization (AG-PPO) algorithm. Additionally, we investigated numerous reward-shaping strategies to simulate psychological disorders and regulate the behavior of the agents. A comparison of various configurations of the modified PPO algorithm identified variants that simulate Anxiety disorder and Obsessive-Compulsive Disorder (OCD)-like behavior in agents. Furthermore, we compared standard PPO with AG-PPO and its configurations, highlighting the performance improvement in terms of generalization capabilities. Finally, we conducted an analysis of the agents' behavioral patterns in complex test environments to evaluate the associated symptoms corresponding to the psychological disorders. Overall, our work showcases the benefits of the appraisal-guided PPO algorithm over the standard PPO algorithm and the potential to simulate psychological disorders in a controlled artificial environment and evaluate them on RL agents.

Updated: 2024-07-29 19:19:54

标题: 评估引导的近端策略优化：在动态网格世界中建模心理障碍

摘要: 跨多个领域整合人工智能的重要性强调了在人工智能中复制类似人类认知过程的重要性。通过将情绪智能融入人工智能代理，可以评估它们的情绪稳定性，以增强它们在关键决策任务中的韧性和可靠性。在这项工作中，我们开发了一种利用强化学习（RL）代理建模心理障碍的方法。我们利用评估理论在动态格网环境中训练RL代理，采用评估引导的近端策略优化（AG-PPO）算法。此外，我们研究了许多奖励塑造策略，以模拟心理障碍并调节代理的行为。对修改后的PPO算法的各种配置进行比较，确定了能够模拟焦虑症和强迫症样行为的变体。此外，我们将标准PPO与AG-PPO及其配置进行比较，突出了在泛化能力方面的性能改进。最后，我们对代理在复杂测试环境中的行为模式进行分析，以评估与心理障碍相对应的相关症状。总的来说，我们的工作展示了评估引导的PPO算法相对于标准PPO算法的优势，并在受控的人工环境中模拟心理障碍，并对RL代理进行评估的潜力。

更新时间: 2024-07-29 19:19:54

领域: cs.AI,I.2.0

下载: http://arxiv.org/abs/2407.20383v1

SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotLessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures. Additional results available at: https://spotlesssplats.github.io

Updated: 2024-07-29 19:14:39

标题: 无瑕点：在3D高斯喷溅中忽略干扰因素

摘要: 3D高斯点状喷射（3DGS）是一种有前途的三维重建技术，提供高效的训练和渲染速度，适用于实时应用。然而，目前的方法需要高度受控的环境（没有移动的人或被风吹动的元素，并且光照一致）以满足3DGS的视角一致性假设。这使得对真实世界捕捉的重建变得问题重重。我们提出了SpotLessSplats，一种利用预训练和通用特征结合鲁棒优化的方法，有效地忽略短暂的干扰因素。我们的方法在视觉和定量上都实现了最先进的重建质量，适用于随意捕捉。更多结果请查看：https://spotlesssplats.github.io

更新时间: 2024-07-29 19:14:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.20055v2

Leveraging Natural Language and Item Response Theory Models for ESG Scoring

This paper explores an innovative approach to Environmental, Social, and Governance (ESG) scoring by integrating Natural Language Processing (NLP) techniques with Item Response Theory (IRT), specifically the Rasch model. The study utilizes a comprehensive dataset of news articles in Portuguese related to Petrobras, a major oil company in Brazil, collected from 2022 and 2023. The data is filtered and classified for ESG-related sentiments using advanced NLP methods. The Rasch model is then applied to evaluate the psychometric properties of these ESG measures, providing a nuanced assessment of ESG sentiment trends over time. The results demonstrate the efficacy of this methodology in offering a more precise and reliable measurement of ESG factors, highlighting significant periods and trends. This approach may enhance the robustness of ESG metrics and contribute to the broader field of sustainability and finance by offering a deeper understanding of the temporal dynamics in ESG reporting.

Updated: 2024-07-29 19:02:51

标题: 利用自然语言和项目响应理论模型进行ESG评分

摘要: 本文探讨了一种创新的环境、社会和治理（ESG）评分方法，该方法将自然语言处理（NLP）技术与项目响应理论（IRT）结合起来，具体是Rasch模型。研究利用了2022年和2023年收集的涉及巴西主要石油公司Petrobras的葡萄牙语新闻文章的全面数据集。利用先进的NLP方法，对数据进行筛选和分类，以识别与ESG相关的情绪。然后应用Rasch模型评估这些ESG指标的心理测量特性，提供了对ESG情绪趋势随时间变化的微妙评估。结果表明，这种方法在提供更精确和可靠的ESG因素测量方面是有效的，突出显示了重要时期和趋势。这种方法可能增强ESG度量的稳健性，并通过更深入地了解ESG报告中的时间动态，为可持续性和金融领域的更广泛贡献。

更新时间: 2024-07-29 19:02:51

领域: cs.AI,q-fin.GN,stat.ME

下载: http://arxiv.org/abs/2407.20377v1

There is more to graphs than meets the eye: Learning universal features with self-supervision

We study the problem of learning features through self-supervision that are generalisable to multiple graphs. State-of-the-art graph self-supervision restricts training to only one graph, resulting in graph-specific models that are incompatible with different but related graphs. We hypothesize that training with more than one graph that belong to the same family can improve the quality of the learnt representations. However, learning universal features from disparate node/edge features in different graphs is non-trivial. To address this challenge, we first homogenise the disparate features with graph-specific encoders that transform the features into a common space. A universal representation learning module then learns generalisable features on this common space. We show that compared to traditional self-supervision with one graph, our approach results in (1) better performance on downstream node classification, (2) learning features that can be re-used for unseen graphs of the same family, (3) more efficient training and (4) compact yet generalisable models. We also show ability of the proposed framework to deliver these benefits for relatively larger graphs. In this paper, we present a principled way to design foundation graph models that learn from more than one graph in an end-to-end manner, while bridging the gap between self-supervised and supervised performance.

Updated: 2024-07-29 18:48:45

标题: 图表背后的奥秘：通过自我监督学习通用特征

摘要: 我们研究了通过自我监督学习功能，使其可以普遍适用于多个图形的问题。最先进的图形自我监督将训练限制在一个图形上，导致了仅适用于特定图形的模型，这些模型与不同但相关的图形不兼容。我们假设训练多个属于同一家族的图形可以提高学习表示的质量。然而，在不同图形中从不同节点/边特征学习通用特征是非常困难的。为了解决这一挑战，我们首先使用特定于图形的编码器同质化不同的特征，将特征转换为一个共同的空间。然后，一个通用表示学习模块在这个共同空间上学习可普遍适用的特征。我们展示了与传统自我监督训练一个图形相比，我们的方法在下游节点分类上表现更好，学习的特征可以用于同一家族的未见图形，训练更加高效，而且模型既紧凑又通用。我们还展示了所提出的框架能够为相对较大的图形提供这些好处。在本文中，我们提出了一种设计基础图形模型的原则方法，可以端到端地从多个图形中学习，同时弥合自我监督和监督性能之间的差距。

更新时间: 2024-07-29 18:48:45

领域: cs.LG

下载: http://arxiv.org/abs/2305.19871v2

Information Leakage Detection through Approximate Bayes-optimal Prediction

In today's data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor's log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

Updated: 2024-07-29 18:46:50

标题: 通过近似贝叶斯最优预测检测信息泄漏

摘要: 在当今数据驱动的世界中，公开可得信息的激增引发了由信息泄露（IL）问题引起的安全担忧。 IL涉及通过可观察系统信息无意中向未经授权的方对敏感信息进行曝光。传统的统计方法依赖于估计可观察信息和秘密信息之间的互信息（MI）来检测IL，面临着维度灾难，收敛，计算复杂性和MI误估等挑战。尽管有效，新兴的基于监督机器学习的方法来检测IL受限于二进制系统敏感信息，并且缺乏综合框架。为了解决这些限制，我们建立了一个理论框架，利用统计学习理论和信息论来准确量化和检测IL。利用自动化机器学习，我们展示了MI可以通过逼近通常未知的贝叶斯预测器的log损失和准确性来准确估计。基于此，我们展示了如何有效地估计MI来检测IL。我们的方法在考虑合成和真实世界的OpenSSL TLS服务器数据集的实证研究中表现优于最先进的基线。

更新时间: 2024-07-29 18:46:50

领域: stat.ML,cs.LG,94A15, 62H30, 94A60,I.5.1; G.3; E.3

下载: http://arxiv.org/abs/2401.14283v2

Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval

Artificial intelligence (AI) hiring tools have revolutionized resume screening, and large language models (LLMs) have the potential to do the same. However, given the biases which are embedded within LLMs, it is unclear whether they can be used in this scenario without disadvantaging groups based on their protected attributes. In this work, we investigate the possibilities of using LLMs in a resume screening setting via a document retrieval framework that simulates job candidate selection. Using that framework, we then perform a resume audit study to determine whether a selection of Massive Text Embedding (MTE) models are biased in resume screening scenarios. We simulate this for nine occupations, using a collection of over 500 publicly available resumes and 500 job descriptions. We find that the MTEs are biased, significantly favoring White-associated names in 85.1\% of cases and female-associated names in only 11.1\% of cases, with a minority of cases showing no statistically significant differences. Further analyses show that Black males are disadvantaged in up to 100\% of cases, replicating real-world patterns of bias in employment settings, and validate three hypotheses of intersectionality. We also find an impact of document length as well as the corpus frequency of names in the selection of resumes. These findings have implications for widely used AI tools that are automating employment, fairness, and tech policy.

Updated: 2024-07-29 18:42:39

标题: 性别、种族和交叉偏见在简历筛选中的语言模型检索中的影响

摘要: 人工智能（AI）招聘工具已经彻底改变了简历筛选的方式，而大型语言模型（LLMs）也有潜力做到同样的事情。然而，考虑到LLMs内部嵌入的偏见，目前尚不清楚它们是否可以在不基于受保护属性的情况下用于此场景。在这项工作中，我们研究了通过一个模拟求职者选择的文档检索框架，在简历筛选设置中使用LLMs的可能性。利用该框架，我们进行了一项简历审计研究，以确定一系列大规模文本嵌入（MTE）模型在简历筛选场景中是否存在偏见。我们模拟了九种职业，使用了超过500份公开简历和500份职位描述的集合。我们发现MTE存在偏见，在85.1％的情况下明显偏爱白人相关姓名，而在仅11.1％的情况下偏爱女性相关姓名，少数情况显示没有统计显著差异。进一步的分析显示，黑人男性在多达100％的情况下处于劣势，复制了就业环境中偏见的实际模式，并验证了三个交叉性假设。我们还发现文档长度以及简历中姓名的语料库频率对简历选择产生影响。这些发现对广泛使用的自动化就业、公平和技术政策的人工智能工具具有重要影响。

更新时间: 2024-07-29 18:42:39

领域: cs.CY,cs.AI,cs.CL,cs.LG,K.4.2

下载: http://arxiv.org/abs/2407.20371v1

Apple Intelligence Foundation Language Models

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.

Updated: 2024-07-29 18:38:49

标题: 苹果智能基础语言模型

摘要: 我们提出了用于支持Apple Intelligence功能的基础语言模型，包括一个设计用于在设备上高效运行的约30亿参数模型，以及一个设计用于私有云计算的大型基于服务器的语言模型。这些模型旨在高效、准确和负责地执行各种任务。本报告描述了模型架构、用于训练模型的数据、训练过程、模型如何针对推理进行优化以及评估结果。我们强调了我们对负责任人工智能的关注，并介绍了这些原则如何贯穿于模型开发中。

更新时间: 2024-07-29 18:38:49

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21075v1

Physics-informed Discretization-independent Deep Compositional Operator Network

Solving parametric Partial Differential Equations (PDEs) for a broad range of parameters is a critical challenge in scientific computing. To this end, neural operators, which \textcolor{black}{predicts the PDE solution with variable PDE parameter inputs}, have been successfully used. However, the training of neural operators typically demands large training datasets, the acquisition of which can be prohibitively expensive. To address this challenge, physics-informed training can offer a cost-effective strategy. However, current physics-informed neural operators face limitations, either in handling irregular domain shapes or in in generalizing to various discrete representations of PDE parameters. In this research, we introduce a novel physics-informed model architecture which can generalize to various discrete representations of PDE parameters and irregular domain shapes. Particularly, inspired by deep operator neural networks, our model involves a discretization-independent learning of parameter embedding repeatedly, and this parameter embedding is integrated with the response embeddings through multiple compositional layers, for more expressivity. Numerical results demonstrate the accuracy and efficiency of the proposed method.

Updated: 2024-07-29 18:38:43

标题: 基于物理学的、独立于离散化的深度组合算子网络

摘要: 解决广泛参数范围下的参数化偏微分方程（PDEs）是科学计算中的一个关键挑战。为此，神经算子已成功用于预测具有可变PDE参数输入的PDE解。然而，训练神经算子通常需要大规模的训练数据集，获取这些数据集的成本可能过高。为了解决这一挑战，基于物理信息的训练可以提供一种经济有效的策略。然而，当前的基于物理信息的神经算子存在局限性，要么无法处理不规则的域形状，要么无法推广到各种PDE参数的离散表示。在这项研究中，我们介绍了一种新颖的基于物理信息的模型架构，可以推广到各种PDE参数的离散表示和不规则的域形状。特别地，受深度算子神经网络的启发，我们的模型涉及参数嵌入的离散化独立学习，这个参数嵌入通过多个组合层与响应嵌入集成，以获得更多的表现力。数值结果证明了所提出方法的准确性和效率。

更新时间: 2024-07-29 18:38:43

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.13646v2

FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs

Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.

Updated: 2024-07-29 18:34:37

标题: FloorSet-一个带有真实SoCs设计约束的VLSI布局数据集

摘要: 芯片系统（SoCs）及其子系统的平面布局是物理设计流程中至关重要且非常复杂的一步。它代表了一个困难的组合优化问题。一个典型的大规模SoC，包含120个分区，会产生一个接近10E250的搜索空间。随着新颖的机器学习（ML）方法出现来解决这类问题，对于一个包含大量训练数据集和性能指标的现代基准测试的需求日益增长，这些性能指标相比现有的基准测试更能反映现实世界的约束和目标。为了满足这一需求，我们提出了FloorSet - 两个综合数据集，包含了反映实际SoC分布的合成固定轮廓平面布局。每个数据集包含1百万个训练样本和100个测试样本，每个样本都是一个合成的平面布局。FloorSet-Prime包含完全相邻的直角分区和接近最优的导线长度。一个反映早期设计阶段的简化数据集，FloorSet-Lite包含矩形分区，白色空间不超过5％，并且接近最优的导线长度。这两个数据集定义了现代设计流程中看到的硬约束，例如形状约束，边界亲和性，分组约束和预放置约束。FloorSet旨在激发对大规模受限优化问题的基础研究。关键的是，FloorSet缓解了现代ML驱动解决这类问题的核心问题，即可重现性问题。FloorSet作为面向研究社区的开源存储库提供。

更新时间: 2024-07-29 18:34:37

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05480v3

Mixed Newton Method for Optimization in Complex Spaces

In this paper, we modify and apply the recently introduced Mixed Newton Method, which is originally designed for minimizing real-valued functions of complex variables, to the minimization of real-valued functions of real variables by extending the functions to complex space. We show that arbitrary regularizations preserve the favorable local convergence properties of the method, and construct a special type of regularization used to prevent convergence to complex minima. We compare several variants of the method applied to training neural networks with real and complex parameters.

Updated: 2024-07-29 18:31:42

标题: 混合牛顿法在复杂空间中的优化

摘要: 在这篇论文中，我们修改并应用了最近引入的混合牛顿方法，该方法最初设计用于最小化复变量的实值函数，现在扩展到了将实值函数扩展到复数空间从而用于最小化实变量的函数。我们展示了任意正则化方法都能保持该方法的有利的局部收敛性质，并构建了一种特殊类型的正则化方法，用于防止收敛到复数最小值。我们比较了应用于具有实数和复数参数的神经网络训练的方法的若干变种。

更新时间: 2024-07-29 18:31:42

领域: math.OC,cs.LG,32-08, 90C26, 65K05, 90C53

下载: http://arxiv.org/abs/2407.20367v1

Synthetic Counterfactual Faces

Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic distributional shifts in input data, especially in face data. Among annotated data, counterfactual examples grant strong explainability characteristics. Because collecting natural face data is prohibitively expensive, we put forth a generative AI-based framework to construct targeted, counterfactual, high-quality synthetic face data. Our synthetic data pipeline has many use cases, including face recognition systems sensitivity evaluations and image understanding system probes. The pipeline is validated with multiple user studies. We showcase the efficacy of our face generation pipeline on a leading commercial vision model. We identify facial attributes that cause vision systems to fail.

Updated: 2024-07-29 18:29:50

标题: 合成对照人脸

摘要: 计算机视觉系统已在涉及生物特征识别如人脸的各种应用中部署。这些系统可以识别社交媒体用户、搜索失踪人员，并验证个人身份。虽然计算机视觉模型经常在可用基准上评估准确性，但更多的带注释数据是必要的，以了解它们对输入数据中语义分布变化的鲁棒性和公平性。在带注释数据中，反事实例提供了强大的可解释性特征。由于收集自然人脸数据成本过高，我们提出了一个基于生成式人工智能框架来构建定向、反事实、高质量合成人脸数据。我们的合成数据管道有许多用例，包括人脸识别系统的敏感性评估和图像理解系统的探针。该管道经过多次用户研究验证。我们展示了我们的人脸生成管道在一款领先的商业视觉模型上的有效性。我们识别导致视觉系统失败的面部属性。

更新时间: 2024-07-29 18:29:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.13922v2

From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks

Phishing attacks attempt to deceive users into stealing sensitive information, posing a significant cybersecurity threat. Advances in machine learning (ML) and deep learning (DL) have led to the development of numerous phishing webpage detection solutions, but these models remain vulnerable to adversarial attacks. Evaluating their robustness against adversarial phishing webpages is essential. Existing tools contain datasets of pre-designed phishing webpages for a limited number of brands, and lack diversity in phishing features. To address these challenges, we develop PhishOracle, a tool that generates adversarial phishing webpages by embedding diverse phishing features into legitimate webpages. We evaluate the robustness of two existing models, Stack model and Phishpedia, in classifying PhishOracle-generated adversarial phishing webpages. Additionally, we study a commercial large language model, Gemini Pro Vision, in the context of adversarial attacks. We conduct a user study to determine whether PhishOracle-generated adversarial phishing webpages deceive users. Our findings reveal that many PhishOracle-generated phishing webpages evade current phishing webpage detection models and deceive users, but Gemini Pro Vision is robust to the attack. We also develop the PhishOracle web app, allowing users to input a legitimate URL, select relevant phishing features and generate a corresponding phishing webpage. All resources are publicly available on GitHub.

Updated: 2024-07-29 18:21:34

标题: 从ML到LLM：评估钓鱼网页检测模型对抗对抗性攻击的鲁棒性

摘要: 网络钓鱼攻击试图欺骗用户窃取敏感信息，构成重要的网络安全威胁。机器学习（ML）和深度学习（DL）的进步导致了许多网络钓鱼网页检测解决方案的发展，但这些模型仍然容易受到对抗性攻击。评估它们对对抗性网络钓鱼网页的鲁棒性至关重要。现有工具包含预先设计的针对有限品牌的网络钓鱼网页数据集，并且在网络钓鱼特征方面缺乏多样性。为了解决这些挑战，我们开发了PhishOracle工具，通过将多样化的网络钓鱼特征嵌入合法网页中生成对抗性网络钓鱼网页。我们评估了两个现有模型Stack模型和Phishpedia在分类PhishOracle生成的对抗性网络钓鱼网页方面的鲁棒性。此外，我们还研究了一种商用大型语言模型Gemini Pro Vision在对抗性攻击下的表现。我们进行了用户研究，以确定PhishOracle生成的对抗性网络钓鱼网页是否会欺骗用户。我们的研究结果显示，许多PhishOracle生成的网络钓鱼网页能够逃避当前的网络钓鱼网页检测模型并欺骗用户，但Gemini Pro Vision对此攻击具有鲁棒性。我们还开发了PhishOracle Web应用程序，允许用户输入合法URL，选择相关的网络钓鱼特征并生成相应的网络钓鱼网页。所有资源都可以在GitHub上公开获得。

更新时间: 2024-07-29 18:21:34

领域: cs.CR

下载: http://arxiv.org/abs/2407.20361v1

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior. By carefully adjusting the direction and magnitude of the steering vector, we enabled personalized control over the desired behavior across a spectrum of intensities. Extensive experimentation across various open-ended generation tasks, particularly focusing on steering AI personas, has validated the efficacy of our approach. Moreover, we comprehensively investigate critical alignment-concerning scenarios, such as managing truthfulness, mitigating hallucination, and addressing jailbreaking attacks. Remarkably, our method can still demonstrate outstanding steering effectiveness across these scenarios. Furthermore, we showcase the transferability of our steering vectors across different models/LoRAs and highlight the synergistic benefits of applying multiple vectors simultaneously.

Updated: 2024-07-29 18:19:35

标题: 大型语言模型的个性化导向：通过双向偏好优化实现多功能导向向量

摘要: 研究人员一直在研究引导大型语言模型（LLMs）行为和构建针对各种应用量身定制的个性化LLMs的方法。虽然微调似乎是一种直接的解决方案，但它需要大量的计算资源，可能会显著影响原始LLM的效用。最近的努力引入了更轻量级的策略，专注于提取“引导向量”，通过调整LLM的变压器架构特定层中的激活来指导模型的输出朝向期望的行为。然而，这种引导向量直接从人类偏好数据的激活中提取，因此往往导致次优结果和偶尔的失败，特别是在涉及对齐的场景中。本文提出了一种创新方法，通过双向偏好优化可产生更有效的引导向量。我们的方法旨在允许引导向量直接影响对比人类偏好数据对的生成概率，从而提供更精确的目标行为表示。通过仔细调整引导向量的方向和大小，我们使得能够在不同强度范围内实现对所需行为的个性化控制。通过在各种开放式生成任务上进行广泛实验，特别是关注引导AI人物，验证了我们方法的有效性。此外，我们全面研究了涉及关键对齐问题的场景，如管理真实性，减轻幻觉和应对越狱攻击。值得注意的是，我们的方法仍然可以在这些场景中展示出杰出的引导效果。此外，我们展示了我们的引导向量在不同模型/LoRAs之间的可转移性，并突出了同时应用多个向量的协同效益。

更新时间: 2024-07-29 18:19:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.00045v2

Evaluating Large Language Models for automatic analysis of teacher simulations

Digital Simulations (DS) provide safe environments where users interact with an agent through conversational prompts, providing engaging learning experiences that can be used to train teacher candidates in realistic classroom scenarios. These simulations usually include open-ended questions, allowing teacher candidates to express their thoughts but complicating an automatic response analysis. To address this issue, we have evaluated Large Language Models (LLMs) to identify characteristics (user behaviors) in the responses of DS for teacher education. We evaluated the performance of DeBERTaV3 and Llama 3, combined with zero-shot, few-shot, and fine-tuning. Our experiments discovered a significant variation in the LLMs' performance depending on the characteristic to identify. Additionally, we noted that DeBERTaV3 significantly reduced its performance when it had to identify new characteristics. In contrast, Llama 3 performed better than DeBERTaV3 in detecting new characteristics and showing more stable performance. Therefore, in DS where teacher educators need to introduce new characteristics because they change depending on the simulation or the educational objectives, it is more recommended to use Llama 3. These results can guide other researchers in introducing LLMs to provide the highly demanded automatic evaluations in DS.

Updated: 2024-07-29 18:19:17

标题: 评估大型语言模型用于自动分析教师模拟

摘要: 数字模拟（DS）提供了一个安全的环境，用户可以通过对话提示与代理互动，提供有趣的学习体验，可以用来培训教师候选人在现实教室场景中。这些模拟通常包括开放式问题，允许教师候选人表达他们的想法，但也使得自动响应分析变得复杂。为了解决这个问题，我们评估了大型语言模型（LLMs），以识别在教师教育中DS的响应中的特征（用户行为）。我们评估了DeBERTaV3和Llama 3的性能，结合了零样本、少样本和微调。我们的实验发现，根据要识别的特征，LLMs的性能存在显著变化。此外，我们注意到，当需要识别新特征时，DeBERTaV3的性能显著降低。相比之下，Llama 3在检测新特征和表现更加稳定方面表现优于DeBERTaV3。因此，在DS中，教师教育者需要引入新特征，因为它们会根据模拟或教育目标而变化，更建议使用Llama 3。这些结果可以指导其他研究人员将LLMs引入DS中，以提供高需求的自动评估。

更新时间: 2024-07-29 18:19:17

领域: cs.AI

下载: http://arxiv.org/abs/2407.20360v1

CRASAR-U-DROIDs: A Large Scale Benchmark Dataset for Building Alignment and Damage Assessment in Georectified sUAS Imagery

This document presents the Center for Robot Assisted Search And Rescue - Uncrewed Aerial Systems - Disaster Response Overhead Inspection Dataset (CRASAR-U-DROIDs) for building damage assessment and spatial alignment collected from small uncrewed aerial systems (sUAS) geospatial imagery. This dataset is motivated by the increasing use of sUAS in disaster response and the lack of previous work in utilizing high-resolution geospatial sUAS imagery for machine learning and computer vision models, the lack of alignment with operational use cases, and with hopes of enabling further investigations between sUAS and satellite imagery. The CRASAR-U-DRIODs dataset consists of fifty-two (52) orthomosaics from ten (10) federally declared disasters (Hurricane Ian, Hurricane Ida, Hurricane Harvey, Hurricane Idalia, Hurricane Laura, Hurricane Michael, Musset Bayou Fire, Mayfield Tornado, Kilauea Eruption, and Champlain Towers Collapse) spanning 67.98 square kilometers (26.245 square miles), containing 21,716 building polygons and damage labels, and 7,880 adjustment annotations. The imagery was tiled and presented in conjunction with overlaid building polygons to a pool of 130 annotators who provided human judgments of damage according to the Joint Damage Scale. These annotations were then reviewed via a two-stage review process in which building polygon damage labels were first reviewed individually and then again by committee. Additionally, the building polygons have been aligned spatially to precisely overlap with the imagery to enable more performant machine learning models to be trained. It appears that CRASAR-U-DRIODs is the largest labeled dataset of sUAS orthomosaic imagery.

Updated: 2024-07-29 18:12:21

标题: CRASAR-U-DROIDs：用于地理校准sUAS图像建筑对齐和损伤评估的大规模基准数据集

摘要: 这份文件介绍了机器人辅助搜救中心（CRASAR）- 无人机系统- 灾难响应空中检查数据集（CRASAR-U-DROIDs），用于建筑损毁评估和空间对齐，这些数据集是从小型无人机系统（sUAS）地理图像中收集的。该数据集的动机是由于在灾难响应中越来越多地使用sUAS，并且缺乏利用高分辨率地理sUAS图像进行机器学习和计算机视觉模型的先前工作，缺乏与操作用例的对齐，并希望能够进一步调查sUAS和卫星图像之间的关系。CRASAR-U-DRIODs数据集包括来自十场联邦宣布的灾难（伊恩飓风、艾达飓风、哈维飓风、伊达利亚飓风、劳拉飓风、迈克尔飓风、马塞特湾火灾、梅菲尔德龙卷风、基拉韦厄火山爆发和尚普兰塔楼坍塌）的五十二（52）个正射影像，覆盖了67.98平方公里（26.245平方英里），包含21,716座建筑多边形和损坏标签，以及7,880个调整注释。这些图像被切成瓷砖，并与叠加的建筑多边形一起呈现给130名标注者，他们根据联合破坏评分提供损坏的人类判断。然后通过两阶段审查过程审查这些注释，其中建筑多边形损坏标签首先单独审查，然后再由委员会审查。此外，建筑多边形已被空间对齐，以确切重叠图像，以便训练更高性能的机器学习模型。看起来CRASAR-U-DRIODs是最大的sUAS正射影像标记数据集。

更新时间: 2024-07-29 18:12:21

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.17673v2

GATE: How to Keep Out Intrusive Neighbors

Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It alleviates over-smoothing by addressing its root cause of unnecessary neighborhood aggregation. ii) Similarly to perceptrons, it benefits from higher depth as it can still utilize additional layers for (non-)linear feature transformations in case of (nearly) switched-off neighborhood aggregation. iii) By down-weighting connections to unrelated neighbors, it often outperforms GATs on real-world heterophilic datasets. To further validate our claims, we construct a synthetic test bed to analyze a model's ability to utilize the appropriate amount of neighborhood aggregation, which could be of independent interest.

Updated: 2024-07-29 18:08:39

标题: GATE：如何防止入侵的邻居

摘要: 图注意力网络（GATs）旨在提供灵活的邻域聚合，根据邻居的重要性分配权重。然而，在实践中，正如我们通过实验和分析所展示的，GATs通常无法关闭与任务无关的邻域聚合。为了解决这一挑战，我们提出了GATE，这是GAT的扩展，具有三个主要优点：i）它通过解决不必要的邻域聚合的根本原因来缓解过度平滑。ii）类似于感知器，它可以从更深的深度中受益，因为在（几乎）关闭邻域聚合的情况下，仍可以利用额外的层进行（非）线性特征转换。iii）通过减少与无关邻居的连接权重，它通常在现实世界的异嗜数据集上优于GATs。为了进一步验证我们的说法，我们构建了一个合成测试平台，分析模型利用适当数量的邻域聚合的能力，这可能是独立感兴趣的。

更新时间: 2024-07-29 18:08:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.00418v2

SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents

Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is agnostic to the type of DRL agent's inputs. Further, it is designed to be black-box (as it does not require access to the internals or training data of the agent) by leveraging state abstraction to facilitate the learning of safety violation prediction models from the agent's states using a reduced state space. We quantitatively and qualitatively validated SMARLA on three well-known RL case studies. Empirical results reveal that SMARLA achieves accurate violation prediction with a low false positive rate and can predict safety violations at an early stage, approximately halfway through the execution of the agent, before violations occur.

Updated: 2024-07-29 18:07:54

标题: SMARLA：一种用于深度强化学习智能体的安全监测方法

摘要: 深度强化学习算法（DRL）越来越多地被应用于安全关键系统。确保DRL代理的安全性在这种情况下是一个关键问题。然而，仅依赖测试是不足以确保安全性的，因为它并不提供保证。建立安全监视器是缓解这一挑战的一个解决方案。本文提出了SMARLA，一种基于机器学习的安全监控方法，专为DRL代理设计。出于实际原因，SMARLA对于DRL代理的输入类型是不可知的。此外，通过利用状态抽象，它旨在设计为黑盒（因为它不需要访问代理的内部或训练数据），以便从代理的状态中学习安全违规预测模型，使用减少的状态空间。我们在三个著名的RL案例研究中定量和定性验证了SMARLA。实证结果表明，SMARLA在低误报率下实现了准确的违规预测，并且可以在代理执行的中途之前，即违规发生之前，早期预测安全违规。

更新时间: 2024-07-29 18:07:54

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2308.02594v3

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce \textit{max-reward RL}, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Updated: 2024-07-29 18:07:08

标题: 《极致挑战：重塑强化学习中的奖励机制》

摘要: 在强化学习（RL）中，不同的奖励函数可以定义相同的最优策略，但可能导致截然不同的学习表现。有些情况下，代理会陷入次优行为，而在其他情况下，它可以高效地解决任务。因此选择一个好的奖励函数是一个极其重要但具有挑战性的问题。在本文中，我们探讨了一种利用奖励进行学习的替代方法。我们引入了“最大奖励RL”，其中代理优化的是最大奖励而不是累积奖励。与先前的研究不同，我们的方法适用于确定性和随机环境，并且可以轻松地与最先进的RL算法结合。在实验中，我们研究了来自Gymnasium-Robotics的两个目标达成环境中最大奖励RL算法的表现，并展示了它相对于标准RL的优势。代码可在https://github.com/veviurko/To-the-Max 上找到。

更新时间: 2024-07-29 18:07:08

领域: cs.LG

下载: http://arxiv.org/abs/2402.01361v2

Designing Time-Series Models With Hypernetworks & Adversarial Portfolios

This article describes the methods that achieved 4th and 6th place in the forecasting and investment challenges, respectively, of the M6 competition, ultimately securing the 1st place in the overall duathlon ranking. In the forecasting challenge, we tested a novel meta-learning model that utilizes hypernetworks to design a parametric model tailored to a specific family of forecasting tasks. This approach allowed us to leverage similarities observed across individual forecasting tasks while also acknowledging potential heterogeneity in their data generating processes. The model's training can be directly performed with backpropagation, eliminating the need for reliance on higher-order derivatives and is equivalent to a simultaneous search over the space of parametric functions and their optimal parameter values. The proposed model's capabilities extend beyond M6, demonstrating superiority over state-of-the-art meta-learning methods in the sinusoidal regression task and outperforming conventional parametric models on time-series from the M4 competition. In the investment challenge, we adjusted portfolio weights to induce greater or smaller correlation between our submission and that of other participants, depending on the current ranking, aiming to maximize the probability of achieving a good rank.

Updated: 2024-07-29 18:06:29

标题: 使用超网络和对抗投资组合设计时间序列模型

摘要: 这篇文章描述了在M6比赛的预测和投资挑战中分别获得第四和第六名的方法，最终在整体双项铁人三项比赛中获得第一名。在预测挑战中，我们测试了一种新颖的元学习模型，利用超网络设计了一个针对特定预测任务家族的参数化模型。这种方法使我们能够利用观察到的各个预测任务之间的相似性，同时也承认了它们的数据生成过程可能存在的异质性。该模型的训练可以直接通过反向传播进行，消除了对高阶导数的依赖，相当于同时搜索参数函数空间和它们的最优参数值。所提出的模型能力不仅限于M6，还在正弦回归任务中表现优于最先进的元学习方法，并在M4比赛的时间序列上胜过传统的参数化模型。在投资挑战中，我们调整投资组合权重，以在当前排名下诱导我们的提交与其他参与者之间的更大或更小相关性，旨在最大化实现好名次的概率。

更新时间: 2024-07-29 18:06:29

领域: cs.LG,q-fin.PM,stat.ME,stat.ML,62M45,I.2.6

下载: http://arxiv.org/abs/2407.20352v1

LiteEFG: An Efficient Python Library for Solving Extensive-form Games

LiteEFG is an efficient library with easy-to-use Python bindings, which can solve multiplayer extensive-form games (EFGs). LiteEFG enables the user to express computation graphs in Python to define updates on the game tree structure. The graph is then executed by the C++ backend, leading to significant speedups compared to running the algorithm in Python. Moreover, in LiteEFG, the user needs to only specify the computation graph of the update rule in a decision node of the game, and LiteEFG will automatically distribute the update rule to each decision node and handle the structure of the imperfect-information game.

Updated: 2024-07-29 18:05:48

标题: LiteEFG：一个用于解决广泛形式博弈的高效Python库

摘要: LiteEFG是一个高效的库，具有易于使用的Python绑定，可以解决多人广泛形式游戏（EFGs）。LiteEFG使用户能够在Python中表达计算图，以定义游戏树结构上的更新。然后，图由C++后端执行，相对于在Python中运行算法，导致显著加速。此外，在LiteEFG中，用户只需在游戏的决策节点中指定更新规则的计算图，LiteEFG将自动将更新规则分配到每个决策节点，并处理信息不完整游戏的结构。

更新时间: 2024-07-29 18:05:48

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.20351v1

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge. Existing evaluation metrics like CIDEr or CLIP-Score fall short in this regard as they do not take into account the corresponding image or lack the capability of encoding fine-grained details and penalizing hallucinations. To overcome these issues, in this paper, we propose BRIDGE, a new learnable and reference-free image captioning metric that employs a novel module to map visual features into dense vectors and integrates them into multi-modal pseudo-captions which are built during the evaluation process. This approach results in a multimodal metric that properly incorporates information from the input image without relying on reference captions, bridging the gap between human judgment and machine-generated image captions. Experiments spanning several datasets demonstrate that our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores. Our source code and trained models are publicly available at: https://github.com/aimagelab/bridge-score.

Updated: 2024-07-29 18:00:17

标题: BRIDGE：利用更强的视觉线索弥合图像字幕评估中的差距

摘要: 有效地与人类判断相一致，评估机器生成的图像标题代表着一个复杂而有趣的挑战。现有的评估指标如CIDEr或CLIP-Score在这方面表现不佳，因为它们没有考虑到相应的图像或缺乏编码精细细节和惩罚幻觉的能力。为了克服这些问题，本文提出了BRIDGE，一种新的可学习且无参考的图像标题评估指标，它采用了一种新颖的模块将视觉特征映射到密集向量，并将它们整合到在评估过程中构建的多模态伪标题中。这种方法产生了一个多模态指标，适当地将输入图像的信息整合在一起，而不依赖于参考标题，弥合了人类判断和机器生成的图像标题之间的差距。在涉及几个数据集的实验中，我们的提议相对于现有的无参考评估分数实现了最先进的结果。我们的源代码和训练模型可以在以下网址公开获取：https://github.com/aimagelab/bridge-score。

更新时间: 2024-07-29 18:00:17

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2407.20341v1

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Discerning between authentic content and that generated by advanced AI methods has become increasingly challenging. While previous research primarily addresses the detection of fake faces, the identification of generated natural images has only recently surfaced. This prompted the recent exploration of solutions that employ foundation vision-and-language models, like CLIP. However, the CLIP embedding space is optimized for global image-to-text alignment and is not inherently designed for deepfake detection, neglecting the potential benefits of tailored training and local image features. In this study, we propose CoDE (Contrastive Deepfake Embeddings), a novel embedding space specifically designed for deepfake detection. CoDE is trained via contrastive learning by additionally enforcing global-local similarities. To sustain the training of our model, we generate a comprehensive dataset that focuses on images generated by diffusion models and encompasses a collection of 9.2 million images produced by using four different generators. Experimental results demonstrate that CoDE achieves state-of-the-art accuracy on the newly collected dataset, while also showing excellent generalization capabilities to unseen image generators. Our source code, trained models, and collected dataset are publicly available at: https://github.com/aimagelab/CoDE.

Updated: 2024-07-29 18:00:10

标题: 对比学习和全局-局部相似性对深度伪造传播的对比

摘要: 鉴别真实内容和由先进人工智能方法生成的内容之间的区别变得越来越具有挑战性。尽管先前的研究主要关注于检测假面孔，但最近才出现了生成的自然图像的识别问题。这促使了对采用基础视觉和语言模型（如CLIP）的解决方案的最近探索。然而，CLIP嵌入空间是针对全局图像到文本的对齐进行优化的，并不是专门设计用于深度伪造检测，忽略了定制训练和局部图像特征的潜在好处。在这项研究中，我们提出了CoDE（对比深度伪造嵌入），这是一个专门为深度伪造检测而设计的新型嵌入空间。CoDE通过对比学习进行训练，同时加强全局-局部相似性。为了维持我们模型的训练，我们生成了一个关注扩散模型生成的图像的全面数据集，其中包含使用四种不同生成器生成的920万张图像的集合。实验结果表明，CoDE在新收集的数据集上实现了最先进的准确性，同时还展现了很好的泛化能力，可以适应未见过的图像生成器。我们的源代码、训练模型和收集的数据集可在以下网址公开获取：https://github.com/aimagelab/CoDE。

更新时间: 2024-07-29 18:00:10

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.20337v1

Universal New Physics Latent Space

We develop a machine learning method for mapping data originating from both Standard Model processes and various theories beyond the Standard Model into a unified representation (latent) space while conserving information about the relationship between the underlying theories. We apply our method to three examples of new physics at the LHC of increasing complexity, showing that models can be clustered according to their LHC phenomenology: different models are mapped to distinct regions in latent space, while indistinguishable models are mapped to the same region. This opens interesting new avenues on several fronts, such as model discrimination, selection of representative benchmark scenarios, and identifying gaps in the coverage of model space.

Updated: 2024-07-29 18:00:00

标题: Universal New Physics Latent Space 普遍新物理潜在空间

摘要: 我们开发了一种机器学习方法，用于将来自标准模型过程和各种超出标准模型的理论的数据映射到一个统一的表示（潜在）空间，同时保留有关基础理论之间关系的信息。我们将该方法应用于三个越来越复杂的LHC新物理示例，显示出模型可以根据它们在LHC现象学中的聚类：不同模型在潜在空间中被映射到不同的区域，而无法区分的模型被映射到相同的区域。这在几个方面开辟了有趣的新途径，例如模型区分、代表性基准方案的选择，以及识别模型空间覆盖的差距。

更新时间: 2024-07-29 18:00:00

领域: hep-ph,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2407.20315v1

Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing

Text-based editing diffusion models exhibit limited performance when the user's input instruction is ambiguous. To solve this problem, we propose $\textit{Specify ANd Edit}$ (SANE), a zero-shot inference pipeline for diffusion-based editing systems. We use a large language model (LLM) to decompose the input instruction into specific instructions, i.e. well-defined interventions to apply to the input image to satisfy the user's request. We benefit from the LLM-derived instructions along the original one, thanks to a novel denoising guidance strategy specifically designed for the task. Our experiments with three baselines and on two datasets demonstrate the benefits of SANE in all setups. Moreover, our pipeline improves the interpretability of editing models, and boosts the output diversity. We also demonstrate that our approach can be applied to any edit, whether ambiguous or not. Our code is public at https://github.com/fabvio/SANE.

Updated: 2024-07-29 17:59:57

标题: 指定和编辑：克服文本图像编辑中的模糊

摘要: 基于文本的编辑扩散模型在用户输入指令模糊时表现出有限的性能。为了解决这个问题，我们提出了$\textit{Specify ANd Edit}$ (SANE)，这是一个用于基于扩散的编辑系统的零样本推理管道。我们使用一个大型语言模型（LLM）将输入指令分解为具体指令，即明确定义的干预措施，以应用于输入图像以满足用户的请求。我们通过一种专门为该任务设计的新型去噪指导策略，从LLM派生的指令中受益，以及原始指令。我们在三个基线和两个数据集上的实验表明，在所有设置中SANE的好处。此外，我们的管道提高了编辑模型的可解释性，并增加了输出的多样性。我们还证明了我们的方法可以应用于任何编辑，无论是否模糊。我们的代码公开在https://github.com/fabvio/SANE。

更新时间: 2024-07-29 17:59:57

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.20232v1

SAPG: Split and Aggregate Policy Gradients

Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts of data for RL training has scaled exponentially. However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling. Our algorithm, termed SAPG, shows significantly higher performance across a variety of challenging environments where vanilla PPO and other strong baselines fail to achieve high performance. Website at https://sapg-rl.github.io/

Updated: 2024-07-29 17:59:50

标题: SAPG：分割和聚合策略梯度

摘要: 尽管在样本效率方面存在极大的问题，但基于策略梯度的在线强化学习，也被称为策略梯度法，已成为决策问题中的基本工具。随着GPU驱动模拟的最新进展，收集用于RL训练的大量数据的能力呈指数级增长。然而，我们发现当前的RL方法，例如PPO，在一定程度以上无法充分利用并行化环境的优势，且性能达到饱和。为了解决这个问题，我们提出了一种新的在线强化学习算法，可以通过将环境分割成块，并通过重要性采样将其重新融合在一起，有效利用大规模环境。我们的算法称为SAPG，在各种具有挑战性的环境中表现出明显更高的性能，而常规的PPO和其他强基线算法无法达到高性能。网站链接：https://sapg-rl.github.io/

更新时间: 2024-07-29 17:59:50

领域: cs.LG,cs.AI,cs.CV,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.20230v1

Matryoshka Multimodal Models

Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While token pruning/merging methods do exist, they produce a single length output for each image and do not afford flexibility in trading off information density v.s. efficiency. Inspired by the concept of Matryoshka Dolls, we propose M3: Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens that capture information across multiple coarse-to-fine granularities. Our approach offers several unique benefits for LMMs: (1) One can explicitly control the visual granularity per test instance during inference, e.g. , adjusting the number of tokens used to represent an image based on the anticipated complexity or simplicity of the content; (2) M3 provides a framework for analyzing the granularity needed for existing datasets, where we find that COCO-style benchmarks only need around ~9 visual tokens to obtain accuracy similar to that of using all 576 tokens; (3) Our approach provides a foundation to explore the best trade-off between performance and visual token length at sample level, where our investigation reveals that a large gap exists between the oracle upper bound and current fixed-scale representations.

Updated: 2024-07-29 17:59:28

标题: 母巢多模型

摘要: 大型多模态模型（LMMs）如LLaVA在视觉-语言推理方面表现出色。这些模型首先将图像嵌入到固定数量的视觉令牌中，然后将它们馈送到大型语言模型（LLM）中。然而，这种设计在密集视觉场景中（如高分辨率图像和视频）会导致过多的令牌，从而导致效率低下。虽然存在令牌修剪/合并方法，但它们为每个图像生成一个固定长度的输出，并且在信息密度与效率之间没有灵活性。受毛里茨卡娃娃概念的启发，我们提出M3：毛里茨卡多模态模型，它学习将视觉内容表示为捕捉多个粗细粒度间信息的嵌套视觉令牌集。我们的方法为LMMs提供了几个独特的优势：（1）在推断过程中，可以明确控制每个测试实例的视觉粒度，例如，根据内容的预期复杂性或简单性调整用于表示图像的令牌数量；（2）M3为分析现有数据集所需的粒度提供了一个框架，我们发现COO风格的基准只需要大约9个视觉令牌就可以获得与使用所有576个令牌相似的准确性；（3）我们的方法为在样本级别探索性能和视觉令牌长度之间的最佳权衡提供了基础，我们的研究揭示了神谕上限与当前固定尺度表示之间存在较大差距。

更新时间: 2024-07-29 17:59:28

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17430v2

Is artificial consciousness achievable? Lessons from the human brain

We here analyse the question of developing artificial consciousness from an evolutionary perspective, taking the evolution of the human brain and its relation with consciousness as a reference model. This kind of analysis reveals several structural and functional features of the human brain that appear to be key for reaching human-like complex conscious experience and that current research on Artificial Intelligence (AI) should take into account in its attempt to develop systems capable of conscious processing. We argue that, even if AI is limited in its ability to emulate human consciousness for both intrinsic (structural and architectural) and extrinsic (related to the current stage of scientific and technological knowledge) reasons, taking inspiration from those characteristics of the brain that make conscious processing possible and/or modulate it, is a potentially promising strategy towards developing conscious AI. Also, it is theoretically possible that AI research can develop partial or potentially alternative forms of consciousness that is qualitatively different from the human, and that may be either more or less sophisticated depending on the perspectives. Therefore, we recommend neuroscience-inspired caution in talking about artificial consciousness: since the use of the same word consciousness for humans and AI becomes ambiguous and potentially misleading, we propose to clearly specify what is common and what differs in AI conscious processing from full human conscious experience.

Updated: 2024-07-29 17:55:17

标题: 人工意识是否可以实现？人脑给我们的启示

摘要: 我们在这里从进化的角度分析了发展人工意识的问题，以人类大脑的进化及其与意识的关系作为参考模型。这种分析揭示了人类大脑的几个结构和功能特征，这些特征似乎是实现类似人类复杂意识体验的关键，当前人工智能（AI）研究应该考虑这些特征在其发展能够进行有意识处理的系统时应该考虑。我们认为，即使由于内在（结构和架构）和外在（与当前科学和技术知识阶段相关）原因，AI在模拟人类意识方面存在局限性，但从使意识处理成为可能和/或调节它的大脑特征中汲取灵感，是朝着发展有意识的AI的潜在有希望的策略。此外，理论上，AI研究可以发展出部分或潜在的与人类截然不同的意识形式，这可能更加复杂也可能更加简单，这取决于不同的视角。因此，我们建议在谈论人工意识时要谨慎，受到神经科学的启发：由于对于人类和AI使用相同的意识这个词会变得模糊不清和潜在误导，我们建议明确指出AI意识处理与完全人类意识体验之间的共同点和差异。

更新时间: 2024-07-29 17:55:17

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2405.04540v2

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.

Updated: 2024-07-29 17:52:40

标题: 语言模型的物理学：第2.1部分，小学数学和隐藏的推理过程

摘要: 最近对语言模型的研究取得了一些进展，展示了它们解决数学推理问题的能力，实现了接近完美的准确率，比如在GSM8K这样的小学水平数学基准测试中。在本文中，我们正式研究语言模型如何解决这些问题。我们设计了一系列受控实验来回答几个基本问题：（1）语言模型是否真正发展了推理技能，还是仅仅记忆了模板？（2）模型的隐藏（心理）推理过程是什么？（3）模型是使用与人类类似还是不同的技能来解决数学问题的？（4）在类似GSM8K的数据集上训练的模型是否发展了超出解决GSM8K问题所需的推理技能？（5）是什么心理过程导致模型做出推理错误？（6）模型必须有多大或有多深才能有效地解决GSM8K水平的数学问题？我们的研究揭示了语言模型解决数学问题的许多隐藏机制，提供了超出当前对LLMs理解的见解。

更新时间: 2024-07-29 17:52:40

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.20311v1

SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition

Updated: 2024-07-29 17:44:34

标题: 桑格里亚：用于外科工作流预测的外科视频场景图优化

摘要: 基于图的整体场景表示有助于手术工作流程的理解，并最近取得了显著的成功。然而，这项任务通常受到手术场景数据稀缺的限制。在这项工作中，我们介绍了一个端到端的框架，用于生成和优化下游任务中的手术场景图。我们的方法利用基于图的谱聚类的灵活性和基础模型的泛化能力，生成具有可学习属性的无监督场景图。我们通过利用连续帧之间的局部匹配来加强初始空间图，并使用稀疏的时间连接来预测时间上一致的簇。通过联合优化动态场景图的时空关系和节点特征与阶段分割的下游任务，我们解决了昂贵和繁重的语义场景理解和手术视频中场景图生成的任务，仅使用弱手术阶段标签。此外，通过在管道中引入有效的中间场景表示解耦步骤，我们的解决方案在CATARACTS数据集上的手术工作流程识别中比SOTA提高了8%的准确性和10%的F1分数。

更新时间: 2024-07-29 17:44:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.20214v1

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

For overparameterized optimization tasks, such as the ones found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent which depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.

Updated: 2024-07-29 17:40:04

标题: 对超参数学习中随机梯度下降的动态稳定性进行表征

摘要: 对于过度参数化的优化任务，比如现代机器学习中发现的任务，全局最小值通常不是唯一的。为了理解这些设置中的泛化能力，研究优化算法收敛到哪个最小值是至关重要的。在优化算法施加的动态下，可能存在对这些最小值不稳定的情况，这限制了算法可以找到的潜在最小值。在本文中，我们对确定性和随机梯度下降（SGD）的全局最小值进行了动态稳定/不稳定的特征化。具体来说，我们引入了一个取决于全局最小值周围局部动态的特征Lyapunov指数，并严格证明了这个Lyapunov指数的符号决定了SGD是否能够在相应的全局最小值处聚集。

更新时间: 2024-07-29 17:40:04

领域: cs.LG,math.DS,math.PR

下载: http://arxiv.org/abs/2407.20209v1

Supertrust: Evolution-based superalignment strategy for safe coexistence

It's widely expected that humanity will someday create AI systems vastly more intelligent than we are, leading to the unsolved alignment problem of "how to control superintelligence." However, this definition is not only self-contradictory but likely unsolvable. Nevertheless, the default strategy for solving it involves nurturing (post-training) constraints and moral values, while unfortunately building foundational nature (pre-training) on documented intentions of permanent control. In this paper, the default approach is reasoned to predictably embed natural distrust and test results are presented that show unmistakable evidence of this dangerous misalignment. If superintelligence can't instinctively trust humanity, then we can't fully trust it to reliably follow safety controls it can likely bypass. Therefore, a ten-point rationale is presented that redefines the alignment problem as "how to establish protective mutual trust between superintelligence and humanity" and then outlines a new strategy to solve it by aligning through instinctive nature rather than nurture. The resulting strategic requirements are identified as building foundational nature by exemplifying familial parent-child trust, human intelligence as the evolutionary mother of superintelligence, moral judgment abilities, and temporary safety constraints. Adopting and implementing this proposed Supertrust alignment strategy will lead to protective coexistence and ensure the safest future for humanity.

Updated: 2024-07-29 17:39:52

标题: 超级信任：基于进化的超级对准策略以确保安全共存

摘要: 人们普遍预计，人类总有一天会创造出比我们更聪明得多的人工智能系统，从而引发了“如何控制超智能”的未解决对齐问题。然而，这个定义不仅自相矛盾，而且可能无法解决。然而，解决这个问题的默认策略涉及培养（后训练）约束和道德价值观，同时不幸地在基础性质（前训练）上建立在永久控制的记录意图上。在本文中，通过理性的方法，预测性地嵌入自然的不信任，并展示了明显的证据，证明了这种危险的不对齐。如果超智能不能本能地信任人类，那么我们也不能完全信任它会可靠地遵循可能绕过的安全控制。因此，提出了一个重新定义对齐问题为“如何建立超智能和人类之间的保护性相互信任”的十点理由，并概述了通过本能性质而不是培育来解决它的新策略。由此产生的战略要求包括通过展示家庭亲子信任、人类智慧作为超智能的进化母体、道德判断能力和临时安全约束来建立基础性质。采用并实施这种提出的超级信任对齐策略将导致保护性共存，并确保人类的最安全未来。

更新时间: 2024-07-29 17:39:52

领域: cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.20208v1

QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augmentation framework for dense retrieval. This framework transforms raw documents into information-dense text formats, which supplement the original texts to effectively address the aforementioned issues without modifying embedding or retrieval methodologies. Two text representations are generated via large language models (LLMs) zero-shot prompting: question-answer pairs and element-driven events. We term this approach QAEA-DR: unifying question-answer generation and event extraction in a text augmentation framework for dense retrieval. To further enhance the quality of generated texts, a scoring-based evaluation and regeneration mechanism is introduced in LLM prompting. Our QAEA-DR model has a positive impact on dense retrieval, supported by both theoretical analysis and empirical experiments.

Updated: 2024-07-29 17:39:08

标题: QAEA-DR：一种用于密集检索的统一文本增强框架

摘要: 在密集检索中，将长文本嵌入密集向量可能导致信息丢失，从而导致查询文本匹配不准确。此外，低质量文本中存在过多噪音或稀疏关键信息，不太可能与相关查询很好地对齐。最近的研究主要集中在改进句子嵌入模型或检索过程上。在这项工作中，我们引入了一个新颖的文本增强框架用于密集检索。该框架将原始文档转换为信息密集的文本格式，补充原始文本以有效解决上述问题，而无需修改嵌入或检索方法。通过大型语言模型（LLMs）零-shot提示生成两种文本表示：问题-答案对和基于元素的事件。我们将这种方法称为QAEA-DR：将问题-答案生成和事件提取统一在一个文本增强框架中，用于密集检索。为了进一步提高生成文本的质量，我们在LLM提示中引入了基于评分的评估和再生机制。我们的QAEA-DR模型对密集检索产生了积极影响，得到了理论分析和经验实验证实的支持。

更新时间: 2024-07-29 17:39:08

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.20207v1

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emergence", where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with general machine learning models. When used in conjunction with kernel machines, iterating RFM results in a fast transition from random, near zero, test accuracy to perfect test accuracy. This transition cannot be predicted from the training loss, which is identically zero, nor from the test loss, which remains constant in initial iterations. Instead, as we show, the transition is completely determined by feature learning: RFM gradually learns block-circulant features to solve modular arithmetic. Paralleling the results for RFM, we show that neural networks that solve modular arithmetic also learn block-circulant features. Furthermore, we present theoretical evidence that RFM uses such block-circulant features to implement the Fourier Multiplication Algorithm, which prior work posited as the generalizing solution neural networks learn on these tasks. Our results demonstrate that emergence can result purely from learning task-relevant features and is not specific to neural architectures nor gradient descent-based optimization methods. Furthermore, our work provides more evidence for AGOP as a key mechanism for feature learning in neural networks.

Updated: 2024-07-29 17:28:58

标题: 非神经模型中的出现：通过平均梯度外积来理解模块化算术

摘要: 训练用于解决模数算术任务的神经网络表现出领悟（grokking）现象，即测试准确率在模型在训练过程中达到100%训练准确率后开始显著提高的现象。这经常被视为“涌现”（emergence）的一个例子，其中模型能力通过一个阶段性转变显著表现出来。在这项工作中，我们展示了领悟现象并非特定于神经网络或基于梯度下降的优化方法。具体来说，我们展示了在使用递归特征机器（RFM）学习模数算术时出现这种现象，RFM是一种使用平均梯度外积（AGOP）的迭代算法，用于实现具有一般机器学习模型的任务特定特征学习。当与核机器结合使用时，迭代RFM会导致从随机、接近零的测试准确率快速过渡到完美的测试准确率。这种转变无法从训练损失（为零）或测试损失（在初始迭代中保持恒定）中预测。相反，正如我们所展示的，这种转变完全由特征学习决定：RFM逐渐学习块循环特征来解决模数算术问题。与RFM的结果相对应，我们展示了解决模数算术的神经网络也学习到了块循环特征。此外，我们提供了理论证据表明RFM使用这种块循环特征来实现傅立叶乘法算法，先前的研究将其视为神经网络在这些任务上学习的泛化解决方案。我们的结果表明，涌现现象纯粹可以通过学习与任务相关的特征而产生，并且不特定于神经结构或基于梯度下降的优化方法。此外，我们的工作为AGOP作为神经网络特征学习的关键机制提供了更多证据。

更新时间: 2024-07-29 17:28:58

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.20199v1

Private and Collaborative Kaplan-Meier Estimators

Kaplan-Meier estimators are essential tools in survival analysis, capturing the survival behavior of a cohort. Their accuracy improves with large, diverse datasets, encouraging data holders to collaborate for more precise estimations. However, these datasets often contain sensitive individual information, necessitating stringent data protection measures that preclude naive data sharing. In this work, we introduce two novel differentially private methods that offer flexibility in applying differential privacy to various functions of the data. Additionally, we propose a synthetic dataset generation technique that enables easy and rapid conversion between different data representations. Utilizing these methods, we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We demonstrate that our approach can construct a joint, global Kaplan-Meier estimator that adheres to strict privacy standards ($\varepsilon = 1$) while exhibiting no statistically significant deviation from the nonprivate centralized estimator.

Updated: 2024-07-29 17:28:26

标题: 私有和协作的Kaplan-Meier估计量

摘要: Kaplan-Meier估计器是生存分析中的基本工具，捕捉队列的生存行为。它们的准确性随着大型、多样化的数据集而提高，鼓励数据持有者合作以获得更精确的估计。然而，这些数据集通常包含敏感的个人信息，需要严格的数据保护措施，这使得天真的数据共享成为不可能。在这项工作中，我们介绍了两种新颖的差分隐私方法，可以灵活地将差分隐私应用于数据的各种功能。此外，我们提出了一种合成数据集生成技术，可以实现不同数据表示之间的简单快速转换。利用这些方法，我们提出了多种路径，允许在严格的隐私保证下联合估计Kaplan-Meier曲线。我们的贡献包括为这一任务的方法分类，以及基于这种结构的广泛实验探索和评估。我们证明了我们的方法可以构建一个符合严格隐私标准（$\varepsilon = 1$）的联合全局Kaplan-Meier估计器，而且与非私有集中估计器没有统计上显著的偏差。

更新时间: 2024-07-29 17:28:26

领域: cs.CR,stat.AP

下载: http://arxiv.org/abs/2305.15359v2

Learning Random Numbers to Realize Appendable Memory System for Artificial Intelligence to Acquire New Knowledge after Deployment

In this study, we developed a learning method for constructing a neural network system capable of memorizing data and recalling it without parameter updates. The system we built using this method is called the Appendable Memory system. The Appendable Memory system enables an artificial intelligence (AI) to acquire new knowledge even after deployment. It consists of two AIs: the Memorizer and the Recaller. This system is a key-value store built using neural networks. The Memorizer receives data and stores it in the Appendable Memory vector, which is dynamically updated when the AI acquires new knowledge. Meanwhile, the Recaller retrieves information from the Appendable Memory vector. What we want to teach AI in this study are the operations of memorizing and recalling information. However, traditional machine learning methods make AI learn features inherent in the learning dataset. We demonstrate that the systems we intend to create cannot be realized by current machine learning methods, that is, by merely repeating the input and output learning sequences with AI. Instead, we propose a method to teach AI to learn operations, by completely removing the features contained in the learning dataset. Specifically, we probabilized all the data involved in learning. This measure prevented AI from learning the features of the data. The learning method proposed in the study differs from traditional machine learning methods and provides fundamental approaches for building an AI system that can store information in a finite memory and recall it at a later date.

Updated: 2024-07-29 17:24:35

标题: 学习随机数以实现可追加内存系统，用于在部署后获取新知识的人工智能

摘要: 在这项研究中，我们开发了一种学习方法，用于构建一个神经网络系统，能够记住数据并在不更新参数的情况下进行回忆。我们使用这种方法构建的系统被称为可附加记忆系统。可附加记忆系统使得人工智能（AI）能够在部署后获取新知识。它由两个AI组成：记忆者和回忆者。这个系统是使用神经网络构建的键值存储。记忆者接收数据并将其存储在可附加记忆向量中，当AI获取新知识时，该向量会动态更新。同时，回忆者从可附加记忆向量中检索信息。在这项研究中，我们想要教导AI记忆和回忆信息的操作。然而，传统的机器学习方法使得AI学习了学习数据集中固有的特征。我们展示了我们打算创建的系统无法通过当前的机器学习方法实现，也就是说，仅仅通过与AI重复输入和输出学习序列。相反，我们提出了一种方法，通过完全消除学习数据集中包含的特征来教导AI学习操作。具体来说，我们对所有涉及学习的数据进行了概率化处理。这一措施防止了AI学习数据的特征。本研究提出的学习方法与传统的机器学习方法不同，并为构建一个能够在有限内存中存储信息并在以后进行回忆的人工智能系统提供了基本方法。

更新时间: 2024-07-29 17:24:35

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.20197v1

Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences

Since 2022 we have been exploring application areas and technologies in which Artificial Intelligence (AI) and modern Natural Language Processing (NLP), such as Large Language Models (LLMs), can be employed to foster the usage and facilitate the documentation of Indigenous languages which are in danger of disappearing. We start by discussing the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP. To address those challenges, we propose an alternative development AI cycle based on community engagement and usage. Then, we report encouraging results in the development of high-quality machine learning translators for Indigenous languages by fine-tuning state-of-the-art (SOTA) translators with tiny amounts of data and discuss how to avoid some common pitfalls in the process. We also present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing, and discuss the development of Indigenous Language Models (ILMs) as a replicable and scalable way to create spell-checkers, next-word predictors, and similar tools. Finally, we discuss how we envision a future for language documentation where dying languages are preserved as interactive language models.

Updated: 2024-07-29 17:19:43

标题: 利用人工智能的力量振兴濒危的土著语言：技术和经验

摘要: 自2022年以来，我们一直在探索可以利用人工智能（AI）和现代自然语言处理（NLP）技术，如大型语言模型（LLMs），来促进使用和方便记录濒临消失的土著语言的应用领域和技术。我们首先讨论了世界语言多样性的减少以及与土著语言合作对AI和NLP提出了独特的道德挑战。为了解决这些挑战，我们提出了一种基于社区参与和使用的替代发展AI周期。然后，我们报告了通过微调最先进的（SOTA）翻译器开发高质量机器学习翻译器的鼓舞人心的结果，并讨论了如何避免在这个过程中一些常见的陷阱。我们还展示了我们在2023年和2024年与巴西土著社区合作的项目中建立的原型，旨在促进写作，并讨论了开发土著语言模型（ILMs）作为一种可复制和可扩展的方式来创建拼写检查器、下一个单词预测器等工具。最后，我们讨论了我们如何设想未来的语言记录，即将消失的语言被保留为交互式语言模型。

更新时间: 2024-07-29 17:19:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12620v2

Time series forecasting with high stakes: A field study of the air cargo industry

Time series forecasting in the air cargo industry presents unique challenges due to volatile market dynamics and the significant impact of accurate forecasts on generated revenue. This paper explores a comprehensive approach to demand forecasting at the origin-destination (O\&D) level, focusing on the development and implementation of machine learning models in decision-making for the air cargo industry. We leverage a mixture of experts framework, combining statistical and advanced deep learning models to provide reliable forecasts for cargo demand over a six-month horizon. The results demonstrate that our approach outperforms industry benchmarks, offering actionable insights for cargo capacity allocation and strategic decision-making in the air cargo industry. While this work is applied in the airline industry, the methodology is broadly applicable to any field where forecast-based decision-making in a volatile environment is crucial.

Updated: 2024-07-29 17:19:40

标题: 高风险的时间序列预测：空运行业的现场研究

摘要: 空运货运业中的时间序列预测面临独特挑战，原因是市场动态波动大，准确预测对产生的收入有重要影响。本文探讨了在起点-终点（O＆D）水平上进行需求预测的全面方法，重点关注在空运货运业决策中开发和实施机器学习模型。我们利用专家混合框架，结合统计和先进的深度学习模型，为六个月内的货物需求提供可靠的预测。结果表明，我们的方法胜过行业基准，为空运货运业中的货运能力分配和战略决策提供可行见解。尽管这项工作应用于航空业，但该方法在任何需要在动荡环境中进行基于预测的决策制定的领域都是通用的。

更新时间: 2024-07-29 17:19:40

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.20192v1

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts. We present different combination of defenses against our threat model, including a cost analysis. Our study highlights key takeaways for building secure LLM applications and provides directions for research in multi-turn LLM interactions

Updated: 2024-07-29 17:16:19

标题: 即时泄漏效应及多圈LLM相互作用的防御策略

摘要: 即时泄露对LLM应用程序构成了一个引人注目的安全和隐私威胁。系统提示的泄露可能会损害知识产权，并成为攻击者的敌对侦察。对即时泄露威胁和缓解策略的系统评估尤为缺乏，尤其是针对多轮LLM交互。本文系统地调查了10个闭源和开源LLM在四个领域中针对即时泄露的漏洞。我们设计了一个独特的威胁模型，利用LLM的追随效应，在多轮设置中将平均攻击成功率(ASR)从17.7%提高到86.2%。我们的标准化设置进一步允许分解特定提示内容的泄露，如任务说明和知识文档。我们测量了7种黑盒防御策略的缓解效果，以及对开源模型进行微调以防止泄露尝试。我们提出了不同的防御组合来抵御我们的威胁模型，包括成本分析。我们的研究强调了构建安全LLM应用程序的关键要点，并为多轮LLM交互的研究提供了方向。

更新时间: 2024-07-29 17:16:19

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.16251v3

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retrieved by the search engine once (2) corresponding information to be integrated is spread over multiple web pages along with massive noise, and (3) a large number of web pages with long contents may quickly exceed the maximum context length of LLMs. Inspired by the cognitive process when humans solve these problems, we introduce MindSearch to mimic the human minds in web information seeking and integration, which can be instantiated by a simple yet effective LLM-based multi-agent framework. The WebPlanner models the human mind of multi-step information seeking as a dynamic graph construction process: it decomposes the user query into atomic sub-questions as nodes in the graph and progressively extends the graph based on the search result from WebSearcher. Tasked with each sub-question, WebSearcher performs hierarchical information retrieval with search engines and collects valuable information for WebPlanner. The multi-agent design of MindSearch enables the whole framework to seek and integrate information parallelly from larger-scale (e.g., more than 300) web pages in 3 minutes, which is worth 3 hours of human effort. MindSearch demonstrates significant improvement in the response quality in terms of depth and breadth, on both close-set and open-set QA problems. Besides, responses from MindSearch based on InternLM2.5-7B are preferable by humans to ChatGPT-Web and Perplexity.ai applications, which implies that MindSearch can already deliver a competitive solution to the proprietary AI search engine.

Updated: 2024-07-29 17:12:40

标题: MindSearch：模拟人类思维引发深度人工智能搜索者

摘要: 信息搜索和整合是一个复杂的认知任务，需要大量的时间和精力。受到大型语言模型取得的显著进展的启发，最近的研究尝试通过结合LLMs和搜索引擎来解决这个任务。然而，由于三个挑战，这些方法仍然无法获得令人满意的性能：(1) 复杂的请求往往无法通过搜索引擎一次准确完整地检索到；(2) 相应的信息分散在多个网页上，伴随着大量的噪音；(3) 大量内容较长的网页可能很快超过LLMs的最大上下文长度。受到人类解决这些问题时的认知过程的启发，我们引入MindSearch来模仿人类在网络信息搜索和整合中的思维，这可以通过一个简单而有效的基于LLMs的多代理框架来实现。WebPlanner将多步信息搜索的人类思维建模为一个动态图构建过程：它将用户查询分解为图中的原子子问题节点，并根据WebSearcher的搜索结果逐步扩展图。WebSearcher负责每个子问题，通过搜索引擎进行分层信息检索，并为WebPlanner收集有价值的信息。MindSearch的多代理设计使整个框架能够在3分钟内并行地从大规模（例如超过300）的网页中搜索和整合信息，相当于人类工作3个小时的价值。MindSearch在深度和广度方面在闭合集和开放集QA问题上显示出显著的响应质量改善。此外，基于InternLM2.5-7B的MindSearch的响应被人类更喜欢，相比于ChatGPT-Web和Perplexity.ai应用，这意味着MindSearch已经能够提供一种竞争性的解决方案给专有AI搜索引擎。

更新时间: 2024-07-29 17:12:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.20183v1

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Vision-based robot policy learning, which maps visual inputs to actions, necessitates a holistic understanding of diverse visual tasks beyond single-task needs like classification or segmentation. Inspired by this, we introduce Theia, a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks. Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning. Extensive experiments demonstrate that Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. Additionally, we quantify the quality of pre-trained visual representations and hypothesize that higher entropy in feature norm distributions leads to improved robot learning performance. Code and models are available at https://github.com/bdaiinstitute/theia.

Updated: 2024-07-29 17:08:21

标题: Theia：提炼用于机器人学习的多样视觉基础模型

摘要: 基于视觉的机器人策略学习将视觉输入映射到动作，需要对各种视觉任务进行全面的理解，超越单一任务需求，如分类或分割。受此启发，我们引入了Theia，这是一个用于机器人学习的视觉基础模型，它提炼了在各种视觉任务上训练的多个现成视觉基础模型。Theia的丰富视觉表示编码了多样化的视觉知识，增强了下游机器人学习。大量实验表明，Theia在使用更少的训练数据和更小的模型尺寸时优于其教师模型和先前的机器人学习模型。此外，我们量化了预训练视觉表示的质量，并假设特征范数分布中的更高熵导致改善的机器人学习性能。代码和模型可在https://github.com/bdaiinstitute/theia 上获得。

更新时间: 2024-07-29 17:08:21

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20179v1

AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs

To ensure performance on a diverse set of downstream tasks, LLMs are pretrained via data mixtures over different domains. In this work, we demonstrate that the optimal data composition for a fixed compute budget varies depending on the scale of the training data, suggesting that the common practice of empirically determining an optimal composition using small-scale experiments will not yield the optimal data mixtures when scaling up to the final model. To address this challenge, we propose *AutoScale*, an automated tool that finds a compute-optimal data composition for training at any desired target scale. AutoScale first determines the optimal composition at a small scale using a novel bilevel optimization framework, Direct Data Optimization (*DDO*), and then fits a predictor to estimate the optimal composition at larger scales. The predictor's design is inspired by our theoretical analysis of scaling laws related to data composition, which could be of independent interest. In empirical studies with pre-training 774M Decoder-only LMs (GPT-2 Large) on RedPajama dataset, AutoScale decreases validation perplexity at least 25% faster than any baseline with up to 38% speed up compared to without reweighting, achieving the best overall performance across downstream tasks. On pre-training Encoder-only LMs (BERT) with masked language modeling, DDO is shown to decrease loss on all domains while visibly improving average task performance on GLUE benchmark by 8.7% and on large-scale QA dataset (SQuAD) by 5.9% compared with without reweighting. AutoScale speeds up training by up to 28%. Our codes are open-sourced.

Updated: 2024-07-29 17:06:30

标题: AutoScale: 自动预测训练LLMs的计算最优数据组合

摘要: 为了确保在各种不同的下游任务上表现良好，LLMs通过在不同领域的数据混合进行预训练。在这项工作中，我们展示了对于固定的计算预算，最佳数据组合取决于训练数据的规模，这表明通过小规模实验来确定最佳组合的常见做法在扩大到最终模型时不会产生最佳数据混合。为了解决这一挑战，我们提出了*AutoScale*，这是一个自动化工具，可以找到任意目标规模下训练的计算最优数据组合。AutoScale首先使用一种新颖的双层优化框架Direct Data Optimization（DDO）确定小规模下的最佳组合，然后拟合一个预测器来估计大规模下的最佳组合。预测器的设计受到了我们关于数据组合相关的缩放定律的理论分析的启发，这可能是独立的感兴趣。在对RedPajama数据集上进行774M解码器仅LMs（GPT-2 Large）的预训练的实证研究中，AutoScale比任何基线快至少25%地降低了验证困惑度，比没有重新加权的情况下提速高达38%，在各个下游任务中实现了最佳的整体性能。在进行遮蔽语言建模的预训练编码器LMs（BERT）中，DDO显示在所有领域上降低了损失，同时在GLUE基准测试中平均任务性能提高了8.7%，在大规模QA数据集（SQuAD）上提高了5.9%，与没有重新加权的情况相比，AutoScale可以提高训练速度高达28%。我们的代码是开源的。

更新时间: 2024-07-29 17:06:30

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2407.20177v1

Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation

Emotion-driven melody harmonization aims to generate diverse harmonies for a single melody to convey desired emotions. Previous research found it hard to alter the perceived emotional valence of lead sheets only by harmonizing the same melody with different chords, which may be attributed to the constraints imposed by the melody itself and the limitation of existing music representation. In this paper, we propose a novel functional representation for symbolic music. This new method takes musical keys into account, recognizing their significant role in shaping music's emotional character through major-minor tonality. It also allows for melodic variation with respect to keys and addresses the problem of data scarcity for better emotion modeling. A Transformer is employed to harmonize key-adaptable melodies, allowing for keys determined in rule-based or model-based manner. Experimental results confirm the effectiveness of our new representation in generating key-aware harmonies, with objective and subjective evaluations affirming the potential of our approach to convey specific valence for versatile melody.

Updated: 2024-07-29 17:05:12

标题: 情感驱动的旋律和声编配：通过旋律变化和功能表征

摘要: 情感驱动的旋律和声旨在为单一旋律生成多样化的和声，以传达所需的情感。先前的研究发现，仅通过用不同和弦和声化相同旋律来改变领唱谱的感知情感价值是困难的，这可能归因于旋律本身施加的约束和现有音乐表示的限制。在本文中，我们提出了一种新的符号音乐功能表示。这种新方法考虑了音乐调性，认识到其通过大调-小调音调在塑造音乐情感特征方面的重要作用。它还允许根据音调进行旋律变化，并解决了用于更好情感建模的数据稀缺性问题。我们采用Transformer来和声适应调性的旋律，允许以基于规则或基于模型的方式确定音调。实验结果证实了我们的新表示在生成具有调性意识的和声方面的有效性，客观和主观评估肯定了我们的方法传达多功能旋律特定情感价值的潜力。

更新时间: 2024-07-29 17:05:12

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.20176v1

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through data collection and synthesis. However, our empirical study on existing MLLMs and CQA datasets reveals notable gaps. First, current data collection and synthesis focus on data volume and lack consideration of fine-grained visual encodings and QA tasks, resulting in unbalanced data distribution divergent from practical CQA scenarios. Second, existing work follows the training recipe of the base MLLMs initially designed for natural images, under-exploring the adaptation to unique chart characteristics, such as rich text elements. To fill the gap, we propose a visualization-referenced instruction tuning approach to guide the training dataset enhancement and model development. Specifically, we propose a novel data engine to effectively filter diverse and high-quality data from existing datasets and subsequently refine and augment the data using LLM-based generation techniques to better align with practical QA tasks and visual encodings. Then, to facilitate the adaptation to chart characteristics, we utilize the enriched data to train an MLLM by unfreezing the vision encoder and incorporating a mixture-of-resolution adaptation strategy for enhanced fine-grained recognition. Experimental results validate the effectiveness of our approach. Even with fewer training examples, our model consistently outperforms state-of-the-art CQA models on established benchmarks. We also contribute a dataset split as a benchmark for future research. Source codes and datasets of this paper are available at https://github.com/zengxingchen/ChartQA-MLLM.

Updated: 2024-07-29 17:04:34

标题: 在图表问答中通过可视化参考指导调整推进多模态大型语言模型

摘要: 新兴的多模态大型语言模型（MLLMs）在图表问答（CQA）方面展现出巨大潜力。最近的努力主要集中在通过数据收集和合成来扩大训练数据集（即图表、数据表和问题-答案（QA）对）。然而，我们对现有MLLMs和CQA数据集的实证研究揭示了显著差距。首先，当前的数据收集和合成侧重于数据量，缺乏对细粒度视觉编码和QA任务的考虑，导致数据分布不平衡，与实际CQA场景背道而驰。其次，现有工作遵循最初设计用于自然图像的基本MLLMs的训练配方，未能充分探索适应独特图表特征的方法，比如丰富的文本元素。为了填补这一差距，我们提出了一种基于可视化参考的指导调整方法，以引导训练数据集的增强和模型开发。具体来说，我们提出了一个新颖的数据引擎，可以有效地从现有数据集中过滤出多样化和高质量的数据，然后利用基于LLM的生成技术对数据进行进一步的细化和增强，以更好地与实际QA任务和视觉编码相匹配。然后，为了促进对图表特征的适应，我们利用丰富的数据通过解冻视觉编码器来训练一个MLLM，并采用混合分辨率适应策略来增强细粒度识别。实验结果验证了我们方法的有效性。即使训练样本较少，我们的模型始终在已建立的基准测试中表现优于最先进的CQA模型。我们还为未来研究贡献了一个数据集分割作为基准。本文的源代码和数据集可在https://github.com/zengxingchen/ChartQA-MLLM 上找到。

更新时间: 2024-07-29 17:04:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.20174v1

LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework

Histological artifacts pose challenges for both pathologists and Computer-Aided Diagnosis (CAD) systems, leading to errors in analysis. Current approaches for histological artifact restoration, based on Generative Adversarial Networks (GANs) and pixel-level Diffusion Models, suffer from performance limitations and computational inefficiencies. In this paper, we propose a novel framework, LatentArtiFusion, which leverages the latent diffusion model (LDM) to reconstruct histological artifacts with high performance and computational efficiency. Unlike traditional pixel-level diffusion frameworks, LatentArtiFusion executes the restoration process in a lower-dimensional latent space, significantly improving computational efficiency. Moreover, we introduce a novel regional artifact reconstruction algorithm in latent space to prevent mistransfer in non-artifact regions, distinguishing our approach from GAN-based methods. Through extensive experiments on real-world histology datasets, LatentArtiFusion demonstrates remarkable speed, outperforming state-of-the-art pixel-level diffusion frameworks by more than 30X. It also consistently surpasses GAN-based methods by at least 5% across multiple evaluation metrics. Furthermore, we evaluate the effectiveness of our proposed framework in downstream tissue classification tasks, showcasing its practical utility. Code is available at https://github.com/bugs-creator/LatentArtiFusion.

Updated: 2024-07-29 17:00:32

标题: 潜在艺术融合：一种有效高效的组织学伪影修复框架

摘要: 组织学假象对病理学家和计算机辅助诊断（CAD）系统都构成挑战，导致分析错误。目前基于生成对抗网络（GANs）和像素级扩散模型的组织学假象修复方法存在性能限制和计算效率低下的问题。在本文中，我们提出了一个新颖的框架，LatentArtiFusion，利用潜在扩散模型（LDM）重建组织学假象，具有高性能和计算效率。与传统的像素级扩散框架不同，LatentArtiFusion在较低维度的潜在空间中执行修复过程，显著提高了计算效率。此外，我们引入了一种新颖的区域假象重建算法在潜在空间中，以防止在非假象区域发生错误传输，区别于基于GAN的方法。通过对真实世界组织学数据集的广泛实验，LatentArtiFusion展示了出色的速度，超过了30倍以上的最先进像素级扩散框架。它还在多个评估指标上始终至少超过GAN-based方法5%。此外，我们评估了我们提出的框架在下游组织分类任务中的有效性，展示了其实用性。代码可在https://github.com/bugs-creator/LatentArtiFusion获得。

更新时间: 2024-07-29 17:00:32

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.20172v1

DCEM: A deep complementary energy method for solid mechanics

In recent years, the rapid advancement of deep learning has significantly impacted various fields, particularly in solving partial differential equations (PDEs) in the realm of solid mechanics, benefiting greatly from the remarkable approximation capabilities of neural networks. In solving PDEs, Physics-Informed Neural Networks (PINNs) and the Deep Energy Method (DEM) have garnered substantial attention. The principle of minimum potential energy and complementary energy are two important variational principles in solid mechanics. However, the well-known Deep Energy Method (DEM) is based on the principle of minimum potential energy, but there lacks the important form of minimum complementary energy. To bridge this gap, we propose the deep complementary energy method (DCEM) based on the principle of minimum complementary energy. The output function of DCEM is the stress function, which inherently satisfies the equilibrium equation. We present numerical results using the Prandtl and Airy stress functions, and compare DCEM with existing PINNs and DEM algorithms when modeling representative mechanical problems. The results demonstrate that DCEM outperforms DEM in terms of stress accuracy and efficiency and has an advantage in dealing with complex displacement boundary conditions, which is supported by theoretical analyses and numerical simulations. We extend DCEM to DCEM-Plus (DCEM-P), adding terms that satisfy partial differential equations. Furthermore, we propose a deep complementary energy operator method (DCEM-O) by combining operator learning with physical equations. Initially, we train DCEM-O using high-fidelity numerical results and then incorporate complementary energy. DCEM-P and DCEM-O further enhance the accuracy and efficiency of DCEM.

Updated: 2024-07-29 16:55:33

标题: DCEM：固体力学的深度互补能量方法

摘要: 近年来，深度学习的快速发展显著影响了各个领域，特别是在解决固体力学领域中的偏微分方程（PDEs）方面，极大地受益于神经网络的出色逼近能力。在解决PDEs方面，物理信息神经网络（PINNs）和深度能量方法（DEM）引起了相当大的关注。最小势能原理和互补能量原理是固体力学中两个重要的变分原理。然而，众所周知的深度能量方法（DEM）基于最小势能原理，但缺乏最小互补能量的重要形式。为了弥补这一不足，我们提出了基于最小互补能量原理的深度互补能量方法（DCEM）。DCEM的输出函数是应力函数，本质上满足平衡方程。我们使用Prandtl和Airy应力函数呈现数值结果，并在建模代表性机械问题时将DCEM与现有的PINNs和DEM算法进行比较。结果表明，DCEM在应力准确性和效率方面优于DEM，并且在处理复杂的位移边界条件方面具有优势，这得到了理论分析和数值模拟的支持。我们将DCEM扩展到DCEM-Plus（DCEM-P），添加满足偏微分方程的项。此外，我们通过将运算学习与物理方程相结合，提出了深度互补能量算子方法（DCEM-O）。最初，我们使用高保真度的数值结果对DCEM-O进行训练，然后结合互补能量。DCEM-P和DCEM-O进一步提高了DCEM的准确性和效率。

更新时间: 2024-07-29 16:55:33

领域: cs.LG,cond-mat.dis-nn

下载: http://arxiv.org/abs/2302.01538v7

Node Similarities under Random Projections: Limits and Pathological Cases

Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.

Updated: 2024-07-29 16:51:26

标题: 随机投影下的节点相似性：限制和病态情况

摘要: 随机投影广泛用于生成各种图学习任务的嵌入，这是由于其计算效率。大多数应用都是通过Johnson-Lindenstrauss引理来证明的。在本文中，我们进一步研究了当随机投影应用于图矩阵的行时，点积和余弦相似度的保留情况。我们的分析提供了新的渐近和有限样本结果，识别了病态案例，并通过数值实验进行了测试。我们将我们的基本结果专门应用于排名应用，通过计算随机投影翻转由其嵌入引起的节点排序的概率。我们发现，根据度分布的不同，该方法产生的点积嵌入特别不可靠，无论是使用邻接矩阵还是标准化转移矩阵。关于随机投影引入的统计噪声，我们显示余弦相似度产生明显更精确的近似值。

更新时间: 2024-07-29 16:51:26

领域: cs.SI,cs.DS,cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2404.10148v2

Language-Conditioned Offline RL for Multi-Robot Navigation

We present a method for developing navigation policies for multi-robot teams that interpret and follow natural language instructions. We condition these policies on embeddings from pretrained Large Language Models (LLMs), and train them via offline reinforcement learning with as little as 20 minutes of randomly-collected data. Experiments on a team of five real robots show that these policies generalize well to unseen commands, indicating an understanding of the LLM latent space. Our method requires no simulators or environment models, and produces low-latency control policies that can be deployed directly to real robots without finetuning. We provide videos of our experiments at https://sites.google.com/view/llm-marl.

Updated: 2024-07-29 16:49:30

标题: 语言条件下的多机器人导航的离线强化学习

摘要: 我们提出了一种为多机器人团队开发导航策略的方法，该策略解释和遵循自然语言指令。我们将这些策略条件化为预训练的大型语言模型（LLM）中的嵌入，并通过离线强化学习训练它们，仅需20分钟的随机收集数据。对五台真实机器人团队进行的实验表明，这些策略很好地推广到未见指令，表明了对LLM潜在空间的理解。我们的方法不需要模拟器或环境模型，并产生低延迟的控制策略，可以直接部署到真实机器人而无需微调。我们在https://sites.google.com/view/llm-marl上提供了我们实验的视频。

更新时间: 2024-07-29 16:49:30

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.20164v1

Machine Learning for predicting chaotic systems

Predicting chaotic dynamical systems is critical in many scientific fields such as weather prediction, but challenging due to the characterizing sensitive dependence on initial conditions. Traditional modeling approaches require extensive domain knowledge, often leading to a shift towards data-driven methods using machine learning. However, existing research provides inconclusive results on which machine learning methods are best suited for predicting chaotic systems. In this paper, we compare different lightweight and heavyweight machine learning architectures using extensive existing databases, as well as a newly introduced one that allows for uncertainty quantification in the benchmark results. We perform hyperparameter tuning based on computational cost and introduce a novel error metric, the cumulative maximum error, which combines several desirable properties of traditional metrics, tailored for chaotic systems. Our results show that well-tuned simple methods, as well as untuned baseline methods, often outperform state-of-the-art deep learning models, but their performance can vary significantly with different experimental setups. These findings underscore the importance of matching prediction methods to data characteristics and available computational resources.

Updated: 2024-07-29 16:34:47

标题: 机器学习用于预测混沌系统

摘要: 在许多科学领域中，预测混沌动力系统是至关重要的，比如天气预报，但由于其对初始条件敏感依赖的特征而具有挑战性。传统建模方法需要广泛的领域知识，通常会导致向使用机器学习的数据驱动方法转变。然而，现有研究对于哪种机器学习方法最适合预测混沌系统提供了不确定的结果。在本文中，我们使用广泛存在的数据库以及一个新引入的数据库来比较不同轻量级和重量级的机器学习架构，该数据库允许在基准结果中进行不确定性量化。我们基于计算成本进行超参数调整，并引入了一种新颖的错误度量，即累积最大误差，它结合了传统度量的几个理想属性，专门针对混沌系统进行调整。我们的结果表明，良好调整的简单方法以及未调整的基线方法往往优于最先进的深度学习模型，但它们的性能可能会随着不同的实验设置而显著变化。这些发现强调了将预测方法与数据特征和可用计算资源相匹配的重要性。

更新时间: 2024-07-29 16:34:47

领域: cs.LG,nlin.CD

下载: http://arxiv.org/abs/2407.20158v1

rLLM: Relational Table Learning with LLMs

We introduce rLLM (relationLLM), a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs). The core idea is to decompose state-of-the-art Graph Neural Networks, LLMs, and Table Neural Networks into standardized modules, to enable the fast construction of novel RTL-type models in a simple "combine, align, and co-train" manner. To illustrate the usage of rLLM, we introduce a simple RTL method named \textbf{BRIDGE}. Additionally, we present three novel relational tabular datasets (TML1M, TLF2K, and TACM12K) by enhancing classic datasets. We hope rLLM can serve as a useful and easy-to-use development framework for RTL-related tasks. Our code is available at: https://github.com/rllm-project/rllm.

Updated: 2024-07-29 16:33:40

标题: rLLM: 使用LLMs进行关系表学习

摘要: 我们介绍了rLLM（relationLLM），这是一个专为使用大型语言模型（LLMs）进行关系表学习（RTL）设计的PyTorch库。其核心思想是将最先进的图神经网络、LLMs和表神经网络分解为标准化模块，以便以简单的“合并、对齐和共同训练”的方式快速构建新型RTL类型模型。为了说明rLLM的用法，我们介绍了一个简单的RTL方法，名为\textbf{BRIDGE}。此外，我们通过增强经典数据集，提出了三个新颖的关系表数据集（TML1M、TLF2K和TACM12K）。我们希望rLLM可以作为一个有用且易于使用的开发框架，用于RTL相关任务。我们的代码可在以下链接找到：https://github.com/rllm-project/rllm。

更新时间: 2024-07-29 16:33:40

领域: cs.AI

下载: http://arxiv.org/abs/2407.20157v1

Selection for short-term empowerment accelerates the evolution of homeostatic neural cellular automata

Empowerment -- a domain independent, information-theoretic metric -- has previously been shown to assist in the evolutionary search for neural cellular automata (NCA) capable of homeostasis when employed as a fitness function. In our previous study, we successfully extended empowerment, defined as maximum time-lagged mutual information between agents' actions and future sensations, to a distributed sensorimotor system embodied as an NCA. However, the time-delay between actions and their corresponding sensations was arbitrarily chosen. Here, we expand upon previous work by exploring how the time scale at which empowerment operates impacts its efficacy as an auxiliary objective to accelerate the discovery of homeostatic NCAs. We show that shorter time delays result in marked improvements over empowerment with longer delays, when compared to evolutionary selection only for homeostasis. Moreover, we evaluate stability and adaptability of evolved NCAs, both hallmarks of living systems that are of interest to replicate in artificial ones. We find that short-term empowered NCA are more stable and are capable of generalizing better to unseen homeostatic challenges. Taken together, these findings motivate the use of empowerment during the evolution of other artifacts, and suggest how it should be incorporated to accelerate evolution of desired behaviors for them. Source code for the experiments in this paper can be found at: https://github.com/caitlingrasso/empowered-nca-II.

Updated: 2024-07-29 16:30:49

标题: 选择短期授权加速了稳态神经细胞自动机的进化

摘要: 赋权——一个独立于领域的、信息论度量——先前已被证明在进化搜索神经细胞自动机（NCA）时有助于实现稳态，当作为适应度函数时。在我们先前的研究中，我们成功地将赋权扩展，定义为代理的行为和未来感知之间的最大时滞互信息，作为一个分布式感觉运动系统体现为NCA。然而，行为和相应感知之间的时间延迟是任意选择的。在这里，我们通过探讨赋权操作的时间尺度如何影响其作为辅助目标的效力，以加速发现稳态NCA的过程，进一步扩展了先前的工作。我们显示，与仅为稳态进行进化选择相比，较短的时间延迟会显著改善与更长延迟的赋权。此外，我们评估了进化的NCA的稳定性和适应性，这是生命系统的特征，也是人工系统的复制对象。我们发现，短期受赋权影响的NCA更加稳定，并且能够更好地推广到未见的稳态挑战。总的来说，这些发现促使在其他工件的演化过程中使用赋权，并建议如何将其纳入以加速为它们演化所需行为的过程。本文实验的源代码可在以下链接找到：https://github.com/caitlingrasso/empowered-nca-II。

更新时间: 2024-07-29 16:30:49

领域: cs.NE,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2305.15220v2

Large Language Models as Carriers of Hidden Messages

With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the context of steganography, the LLM serves as a carrier for hidden messages that can be disclosed through a chosen trigger question. Our work demonstrates that embedding hidden text in the LLM via fine-tuning, though seemingly secure due to the vast number of potential triggers (any sequence of characters or tokens could serve as a trigger), is susceptible to extraction through analysis of the LLM's output decoding process. We propose an extraction attack called Unconditional Token Forcing (UTF). It is premised on the hypothesis that iteratively feeding each token from the LLM's vocabulary into the model should reveal output sequences with abnormally high token probabilities, indicating potential hidden text candidates. We also present a defense method to hide text in such a way that it is resistant to both UTF and attacks based on sampling decoding methods, which we named Unconditional Token Forcing Confusion (UTFC). To the best of our knowledge, there is no attack method that can extract text hidden with UTFC. UTFC has both benign applications (improving LLM fingerprinting) and malign applications (using LLMs to create covert communication channels). Code is available at github.com/j-hoscilowic/zurek-stegano

Updated: 2024-07-29 16:30:17

标题: 大型语言模型作为隐藏消息的载体

摘要: 通过简单的微调，可以人为地将隐藏文本嵌入到大型语言模型（LLMs）中。只有在通过特定查询触发LLM时才会显示这些文本。LLM指纹识别和隐写术是两个主要应用领域。在LLM指纹识别的背景下，一个唯一的文本标识符（指纹）被嵌入在模型中以验证许可合规性。在隐写术的背景下，LLM作为隐藏信息的载体，可以通过选择的触发问题来揭示隐藏的信息。我们的工作表明，通过微调在LLM中嵌入隐藏文本，虽然由于潜在触发器的巨大数量（任何字符或令牌序列都可以作为触发器），看起来很安全，但却容易通过分析LLM的输出解码过程来提取。我们提出了一种称为无条件令牌强制（UTF）的提取攻击。其基于的假设是，通过逐个将LLM词汇表中的每个令牌馈送到模型中，应该会显示出具有异常高令牌概率的输出序列，表示潜在的隐藏文本候选者。我们还提出了一种防御方法，隐藏文本的方式可以抵抗UTF和基于采样解码方法的攻击，我们称之为无条件令牌强制混淆（UTFC）。据我们所知，没有一种攻击方法可以提取使用UTFC隐藏的文本。UTFC既有良性应用（改进LLM指纹识别），也有恶性应用（使用LLMs创建隐蔽通信渠道）。代码可在github.com/j-hoscilowic/zurek-stegano获取。

更新时间: 2024-07-29 16:30:17

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2406.02481v2

Empowered Neural Cellular Automata

Information-theoretic fitness functions are becoming increasingly popular to produce generally useful, task-independent behaviors. One such universal function, dubbed empowerment, measures the amount of control an agent exerts on its environment via its sensorimotor system. Specifically, empowerment attempts to maximize the mutual information between an agent's actions and its received sensor states at a later point in time. Traditionally, empowerment has been applied to a conventional sensorimotor apparatus, such as a robot. Here, we expand the approach to a distributed, multi-agent sensorimotor system embodied by a neural cellular automaton (NCA). We show that the addition of empowerment as a secondary objective in the evolution of NCA to perform the task of morphogenesis, growing and maintaining a pre-specified shape, results in higher fitness compared to evolving for morphogenesis alone. Results suggest there may be a synergistic relationship between morphogenesis and empowerment. That is, indirectly selecting for coordination between neighboring cells over the duration of development is beneficial to the developmental process itself. Such a finding may have applications in developmental biology by providing potential mechanisms of communication between cells during growth from a single cell to a multicellular, target morphology. Source code for the experiments in this paper can be found at: \url{https://github.com/caitlingrasso/empowered-nca}.

Updated: 2024-07-29 16:28:13

标题: 强化神经元细胞自动机

摘要: 信息论适应性函数越来越受欢迎，可以产生通常有用的、独立于任务的行为。其中一种通用函数被称为"赋能"，衡量代理通过其感觉运动系统对环境施加的控制量。具体而言，赋能试图最大化代理的行动和其在稍后时间接收到的感觉状态之间的互信息。传统上，赋能已被应用于传统的感觉运动设备，如机器人。在这里，我们将这种方法扩展到由神经元元胞自动机（NCA）体现的分布式、多代理感觉运动系统。我们展示，在NCA的进化过程中将赋能作为辅助目标，执行形态发生的任务，即生长和维持一个预先指定的形状，相较于单纯进化形态发生，结果显示出更高的适应性。结果表明，形态发生和赋能之间可能存在协同关系。也就是说，在发育过程中间接地选择相邻细胞之间的协调是有益于发育过程本身的。这一发现可能在发育生物学中具有应用，提供了细胞在从单个细胞到多细胞、目标形态生长过程中的潜在通信机制。本文实验的源代码可在以下链接找到：\url{https://github.com/caitlingrasso/empowered-nca}。

更新时间: 2024-07-29 16:28:13

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2205.06771v2

Hierarchically Disentangled Recurrent Network for Factorizing System Dynamics of Multi-scale Systems

We present a knowledge-guided machine learning (KGML) framework for modeling multi-scale processes, and study its performance in the context of streamflow forecasting in hydrology. Specifically, we propose a novel hierarchical recurrent neural architecture that factorizes the system dynamics at multiple temporal scales and captures their interactions. This framework consists of an inverse and a forward model. The inverse model is used to empirically resolve the system's temporal modes from data (physical model simulations, observed data, or a combination of them from the past), and these states are then used in the forward model to predict streamflow. In a hydrological system, these modes can represent different processes, evolving at different temporal scales (e.g., slow: groundwater recharge and baseflow vs. fast: surface runoff due to extreme rainfall). A key advantage of our framework is that once trained, it can incorporate new observations into the model's context (internal state) without expensive optimization approaches (e.g., EnKF) that are traditionally used in physical sciences for data assimilation. Experiments with several river catchments from the NWS NCRFC region show the efficacy of this ML-based data assimilation framework compared to standard baselines, especially for basins that have a long history of observations. Even for basins that have a shorter observation history, we present two orthogonal strategies of training our FHNN framework: (a) using simulation data from imperfect simulations and (b) using observation data from multiple basins to build a global model. We show that both of these strategies (that can be used individually or together) are highly effective in mitigating the lack of training data. The improvement in forecast accuracy is particularly noteworthy for basins where local models perform poorly because of data sparsity.

Updated: 2024-07-29 16:25:43

标题: 分层解耦递归网络用于分解多尺度系统动态的因素化

摘要: 我们提出了一个知识引导的机器学习（KGML）框架，用于建模多尺度过程，并研究其在水文学中洪水预测的性能。具体来说，我们提出了一种新颖的层次循环神经架构，该架构将系统动态在多个时间尺度上进行因式分解，并捕获它们之间的相互作用。该框架包括一个逆模型和一个正模型。逆模型用于从数据（物理模型模拟、观测数据或过去它们的组合）中经验地解决系统的时间模式，然后这些状态被用于正模型来预测洪水。在水文系统中，这些模式可以代表不同的过程，在不同的时间尺度上演变（例如，缓慢的：地下水补给和地下水流 vs. 快速的：由极端降雨引起的地表径流）。我们框架的一个关键优势是，一旦训练完成，它可以将新观测数据纳入模型的上下文（内部状态）中，而无需传统用于数据同化的昂贵优化方法（例如，EnKF），这些方法通常用于物理科学。在NWS NCRFC地区的几个河流集水区的实验表明，与标准基线相比，这种基于机器学习的数据同化框架的有效性，尤其是对于具有长期观测历史的流域。即使对于具有较短观测历史的流域，我们提出了两种正交策略来训练我们的FHNN框架：（a）使用来自不完美模拟的模拟数据和（b）使用来自多个流域的观测数据来构建全局模型。我们展示了这两种策略（可以单独或一起使用）在缓解训练数据不足方面的高效性。对于由于数据稀疏性而表现不佳的流域，预测准确性的改善尤为显著。

更新时间: 2024-07-29 16:25:43

领域: cs.LG

下载: http://arxiv.org/abs/2407.20152v1

Identifying macro conditional independencies and macro total effects in summary causal graphs with latent confounding

Understanding causal relations in dynamic systems is essential in epidemiology. While causal inference methods have been extensively studied, they often rely on fully specified causal graphs, which may not always be available in complex dynamic systems. Partially specified causal graphs, such as summary causal graphs (SCGs), provide a simplified representation of causal relations, omitting temporal information and focusing on high-level causal structures. This simplification introduces new challenges concerning the types of queries of interest: macro queries, which involve relationships between clusters represented as vertices in the graph, and micro queries, which pertain to relationships between variables that are not directly visible through the vertices of the graph. In this paper, we first clearly distinguish between macro conditional independencies and micro conditional independencies and between macro total effects and micro total effects. Then, we demonstrate the soundness and completeness of the d-separation to identify macro conditional independencies in SCGs. Furthermore, we establish that the do-calculus is sound and complete for identifying macro total effects in SCGs. Finally, we give a graphical characterization for the non-identifiability of macro total effects in SCGs.

Updated: 2024-07-29 16:24:45

标题: 在具有潜在混杂的总结因果图中识别宏观条件独立性和宏观总效应

摘要: 在流行病学中，理解动态系统中的因果关系是至关重要的。虽然因果推断方法得到了广泛研究，但它们通常依赖于完全指定的因果图，而这在复杂的动态系统中并不总是可用的。部分指定的因果图，如总结性因果图（SCGs），提供了因果关系的简化表示，省略了时间信息，专注于高级因果结构。这种简化引入了关于感兴趣查询类型的新挑战：宏观查询涉及作为图中顶点表示的集群之间的关系，微观查询涉及不通过图中顶点直接可见的变量之间的关系。在本文中，我们首先清晰地区分了宏观条件独立性和微观条件独立性，以及宏观总效应和微观总效应。然后，我们证明了d-分离对于识别SCGs中的宏观条件独立性的准确性和完整性。此外，我们建立了do-演算对于识别SCGs中的宏观总效应的准确性和完整性。最后，我们给出了一个图形化描述，说明在SCGs中宏观总效应的不可识别性。

更新时间: 2024-07-29 16:24:45

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2407.07934v3

Quantum Machine Learning Architecture Search via Deep Reinforcement Learning

The rapid advancement of quantum computing (QC) and machine learning (ML) has given rise to the burgeoning field of quantum machine learning (QML), aiming to capitalize on the strengths of quantum computing to propel ML forward. Despite its promise, crafting effective QML models necessitates profound expertise to strike a delicate balance between model intricacy and feasibility on Noisy Intermediate-Scale Quantum (NISQ) devices. While complex models offer robust representation capabilities, their extensive circuit depth may impede seamless execution on extant noisy quantum platforms. In this paper, we address this quandary of QML model design by employing deep reinforcement learning to explore proficient QML model architectures tailored for designated supervised learning tasks. Specifically, our methodology involves training an RL agent to devise policies that facilitate the discovery of QML models without predetermined ansatz. Furthermore, we integrate an adaptive mechanism to dynamically adjust the learning objectives, fostering continuous improvement in the agent's learning process. Through extensive numerical simulations, we illustrate the efficacy of our approach within the realm of classification tasks. Our proposed method successfully identifies VQC architectures capable of achieving high classification accuracy while minimizing gate depth. This pioneering approach not only advances the study of AI-driven quantum circuit design but also holds significant promise for enhancing performance in the NISQ era.

Updated: 2024-07-29 16:20:51

标题: 通过深度强化学习进行量子机器学习架构搜索

摘要: 量子计算（QC）和机器学习（ML）的快速发展催生了蓬勃发展的量子机器学习（QML）领域，旨在利用量子计算的优势推动机器学习的发展。尽管具有潜力，但要打造有效的QML模型需要深厚的专业知识，以在嘈杂的中等规模量子（NISQ）设备上找到模型复杂性和可行性之间的微妙平衡。虽然复杂模型提供了强大的表示能力，但其广泛的电路深度可能会阻碍在现有嘈杂的量子平台上的无缝执行。本文通过利用深度强化学习来解决QML模型设计的困境，探索专门针对指定监督学习任务量身定制的高效QML模型架构。具体而言，我们的方法涉及训练一个RL代理来制定促进发现QML模型的政策，而无需预先确定的ansatz。此外，我们还整合了一种自适应机制来动态调整学习目标，促进代理学习过程的持续改进。通过广泛的数值模拟，我们展示了我们的方法在分类任务领域的有效性。我们提出的方法成功识别出能够实现高分类准确性并最小化门深度的VQC架构。这种开创性的方法不仅推进了AI驱动的量子电路设计的研究，还为增强NISQ时代的性能提供了重要的希望。

更新时间: 2024-07-29 16:20:51

领域: quant-ph,cs.AI,cs.ET,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.20147v1

ByteCheckpoint: A Unified Checkpointing System for LLM Development

The development of real-world Large Language Models (LLMs) necessitates checkpointing of training states in persistent storage to mitigate potential software and hardware failures, as well as to facilitate checkpoint transferring within the training pipeline and across various tasks. Due to the immense size of LLMs, saving and loading checkpoints often incur intolerable minute-level stalls, significantly diminishing training efficiency. Besides, when transferring checkpoints across tasks, checkpoint resharding, defined as loading checkpoints into parallel configurations differing from those used for saving, is often required according to the characteristics and resource quota of specific tasks. Previous checkpointing systems [16,3,33,6] assume consistent parallel configurations, failing to address the complexities of checkpoint transformation during resharding. Furthermore, in the industry platform, developers create checkpoints from different training frameworks[23,36,21,11], each with its own unique storage and I/O logic. This diversity complicates the implementation of unified checkpoint management and optimization. To address these challenges, we introduce ByteCheckpoint, a PyTorch-native multi-framework LLM checkpointing system that supports automatic online checkpoint resharding. ByteCheckpoint employs a data/metadata disaggregated storage architecture, decoupling checkpoint storage from the adopted parallelism strategies and training frameworks. We design an efficient asynchronous tensor merging technique to settle the irregular tensor sharding problem and propose several I/O performance optimizations to significantly enhance the efficiency of checkpoint saving and loading. Experimental results demonstrate ByteCheckpoint's substantial advantages in reducing checkpoint saving (by up to 529.22X) and loading (by up to 3.51X) costs, compared to baseline methods.

Updated: 2024-07-29 16:18:20

标题: ByteCheckpoint：一种用于LLM开发的统一检查点系统

摘要: 现实世界大型语言模型（LLM）的发展需要将训练状态存储在持久存储中以减轻潜在的软件和硬件故障，并促进在训练管道内和跨各种任务之间的检查点传输。由于LLM的巨大规模，保存和加载检查点通常会导致无法忍受的分钟级停顿，显著降低训练效率。此外，在任务之间转移检查点时，根据特定任务的特征和资源配额，通常需要进行检查点重分片，即将检查点加载到与保存时不同的并行配置中。先前的检查点系统假定一致的并行配置，未能解决在重分片期间检查点转换的复杂性。此外，在工业平台上，开发人员从不同的训练框架[23,36,21,11]创建检查点，每个框架都有其独特的存储和I/O逻辑。这种多样性使得统一检查点管理和优化的实施变得复杂。为解决这些挑战，我们引入了ByteCheckpoint，这是一个PyTorch原生多框架LLM检查点系统，支持自动在线检查点重分片。ByteCheckpoint采用数据/元数据分离式存储架构，将检查点存储与采用的并行策略和训练框架分离。我们设计了一种高效的异步张量合并技术来解决不规则张量分片问题，并提出了几种I/O性能优化措施，显著增强了检查点保存和加载的效率。实验结果表明，与基准方法相比，ByteCheckpoint在减少检查点保存（最多529.22倍）和加载（最多3.51倍）成本方面具有显著优势。

更新时间: 2024-07-29 16:18:20

领域: cs.AI

下载: http://arxiv.org/abs/2407.20143v1

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum -- jump-starting the search for larger graphs using good graphs found at smaller sizes -- we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs.

Updated: 2024-07-29 16:13:22

标题: 用AlphaZero和Tabu Search找到越来越大的极值图

摘要: 这项工作研究了一项受Erdős 1975年猜想启发的中心极值图论问题，旨在寻找具有给定大小（节点数）的图，使得边的数量最大化，同时不包含3或4个节点的循环。我们将这个问题形式化为一个顺序决策问题，并将AlphaZero（一个由神经网络引导的树搜索方法）与tabu搜索（一种启发式局部搜索方法）进行比较。通过引入一个课程-使用在较小规模找到的好图来启动对更大图的搜索，我们使用任一方法改进了几个规模的最新下界。我们还提出了一个灵活的图生成环境和一个对排列不敏感的网络架构，用于学习在图的空间中进行搜索。

更新时间: 2024-07-29 16:13:22

领域: cs.AI,cs.DM,cs.LG

下载: http://arxiv.org/abs/2311.03583v2

Scalable Kernel Logistic Regression with Nyström Approximation: Theoretical Analysis and Application to Discrete Choice Modelling

The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nystr\"om approximation for Kernel Logistic Regression (KLR) on large datasets. The study begins by presenting a theoretical analysis in which: i) the set of KLR solutions is characterised, ii) an upper bound to the solution of KLR with Nystr\"om approximation is provided, and finally iii) a specialisation of the optimisation algorithms to Nystr\"om KLR is described. After this, the Nystr\"om KLR is computationally validated. Four landmark selection methods are tested, including basic uniform sampling, a k-means sampling strategy, and two non-uniform methods grounded in leverage scores. The performance of these strategies is evaluated using large-scale transport mode choice datasets and is compared with traditional methods such as Multinomial Logit (MNL) and contemporary ML techniques. The study also assesses the efficiency of various optimisation techniques for the proposed Nystr\"om KLR model. The performance of gradient descent, Momentum, Adam, and L-BFGS-B optimisation methods is examined on these datasets. Among these strategies, the k-means Nystr\"om KLR approach emerges as a successful solution for applying KLR to large datasets, particularly when combined with the L-BFGS-B and Adam optimisation methods. The results highlight the ability of this strategy to handle datasets exceeding 200,000 observations while maintaining robust performance.

Updated: 2024-07-29 16:06:39

标题: 可扩展的核逻辑回归与Nyström逼近：理论分析及其在离散选择建模中的应用

摘要: 这篇论文探讨了将基于核的机器学习（ML）技术应用于使用大型数据集的离散选择建模时经常面临的挑战，这些挑战包括内存需求和涉及这些模型的大量参数。这种复杂性妨碍了大规模模型的有效训练。本文通过引入Nystöm逼近方法来解决这些可扩展性问题，该方法用于大型数据集上的核逻辑回归（KLR）。研究首先通过理论分析来解决这些问题：i）对KLR解集进行特征化，ii）提供具有Nystöm逼近的KLR解的上界，最后iii）描述了将优化算法专门化为Nystöm KLR。在此之后，计算验证了Nystöm KLR。测试了四种地标选择方法，包括基本均匀抽样、k-means抽样策略以及两种基于杠杆得分的非均匀方法。使用大规模交通方式选择数据集评估了这些策略的性能，并与传统方法（如多项式逻辑回归（MNL）和当代ML技术）进行了比较。研究还评估了各种优化技术对所提出的Nystöm KLR模型的效率。在这些数据集上对梯度下降、Momentum、Adam和L-BFGS-B优化方法的性能进行了检验。在这些策略中，k-means Nystöm KLR方法在将KLR应用于大型数据集时表现出色，特别是当与L-BFGS-B和Adam优化方法结合使用时。结果突显了该策略处理超过200,000个观测的数据集并保持强大性能的能力。

更新时间: 2024-07-29 16:06:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.06763v2

To accept or not to accept? An IRT-TOE Framework to Understand Educators' Resistance to Generative AI in Higher Education

Since the public release of Chat Generative Pre-Trained Transformer (ChatGPT), extensive discourse has emerged concerning the potential advantages and challenges of integrating Generative Artificial Intelligence (GenAI) into education. In the realm of information systems, research on technology adoption is crucial for understanding the diverse factors influencing the uptake of specific technologies. Theoretical frameworks, refined and validated over decades, serve as guiding tools to elucidate the individual and organizational dynamics, obstacles, and perceptions surrounding technology adoption. However, while several models have been proposed, they often prioritize elucidating the factors that facilitate acceptance over those that impede it, typically focusing on the student perspective and leaving a gap in empirical evidence regarding educators viewpoints. Given the pivotal role educators play in higher education, this study aims to develop a theoretical model to empirically predict the barriers preventing educators from adopting GenAI in their classrooms. Acknowledging the lack of theoretical models tailored to identifying such barriers, our approach is grounded in the Innovation Resistance Theory (IRT) framework and augmented with constructs from the Technology-Organization-Environment (TOE) framework. This model is transformed into a measurement instrument employing a quantitative approach, complemented by a qualitative approach to enrich the analysis and uncover concerns related to GenAI adoption in the higher education domain.

Updated: 2024-07-29 15:59:19

标题: 接受还是不接受？一个IRT-TOE框架用于理解高等教育中教育者对生成式人工智能的抵制

摘要: 自Chat Generative Pre-Trained Transformer(ChatGPT)公开发布以来，关于将生成人工智能(GenAI)整合到教育中的潜在优势和挑战的广泛讨论已经出现。在信息系统领域，对技术采纳的研究对于理解影响特定技术采纳的多种因素至关重要。几十年来经过精心制定和验证的理论框架作为指导工具，用以阐明个体和组织动态、障碍和围绕技术采纳的认知。然而，尽管已提出了几种模型，但它们往往更注重阐明促进接受的因素，而非阻碍接受的因素，通常侧重于学生观点，留下了教育者观点的实证证据方面的空白。鉴于教育工作者在高等教育中的关键作用，本研究旨在发展一个理论模型，以实证预测阻碍教育工作者在课堂中采用GenAI的障碍。鉴于缺乏专门用于识别这些障碍的理论模型，我们的方法基于创新阻力理论(IRT)框架，并通过增加技术-组织-环境(TOE)框架的构造来加强。该模型转化为一种采用定量方法的测量工具，辅以定性方法来丰富分析，并揭示与高等教育领域中GenAI采纳相关的顾虑。

更新时间: 2024-07-29 15:59:19

领域: cs.CY,cs.AI,cs.ET,cs.HC,cs.IT,math.IT

下载: http://arxiv.org/abs/2407.20130v1

True random number generation using 1T' molybdenum ditelluride

True random numbers are essential for scientific research and various engineering problems. Their generation, however, depends on a reliable entropy source. Here, we present true random number generation using the conductance noise probed from structurally metastable 1T' MoTe2 prepared via electrochemical exfoliation. The noise, fitting a Poisson process, is a robust entropy source capable of remaining stable even at 15 K. Noise spectral density and statistical time-lag suggest the noise originates from the random polarization of the ferroelectric dipoles in 1T' MoTe2. Using a simple circuit, the noise allows true random number generation, enabling their use as the seed for high-throughput secure random number generation over 1 Mbit/s, appealing for applications such as cryptography where secure data protection has now become severe. Particularly, we demonstrate safeguarding key biometric information in neural networks using the random numbers, proving a critical data privacy measure in big data and artificial intelligence.

Updated: 2024-07-29 15:57:00

标题: 使用1T'钼二碲生成真随机数

摘要: 真随机数对科学研究和各种工程问题至关重要。然而，它们的生成取决于可靠的熵源。在这里，我们提出使用通过电化学剥离制备的结构亚稳态1T' MoTe2探测的电导噪声生成真随机数。这种噪声符合泊松过程，是一个稳健的熵源，即使在15K时仍能保持稳定。噪声谱密度和统计时间滞表明，噪声来源于1T' MoTe2中铁电偶极的随机极化。利用一个简单的电路，这种噪声允许生成真随机数，使它们可以作为高吞吐量安全随机数生成的种子，适用于加密等应用，其中安全数据保护现在变得严重。特别地，我们使用随机数保护神经网络中的关键生物特征信息，证明了在大数据和人工智能中的关键数据隐私措施。

更新时间: 2024-07-29 15:57:00

领域: cs.CR,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2404.16271v2

Extreme time extrapolation capabilities and thermodynamic consistency of physics-inspired Neural Networks for the 3D microstructure evolution of materials

A Convolutional Recurrent Neural Network (CRNN) is trained to reproduce the evolution of the spinodal decomposition process in three dimensions as described by the Cahn-Hilliard equation. A specialized, physics-inspired architecture is proven to provide close accordance between the predicted evolutions and the ground truth ones obtained via conventional integration schemes. The method can closely reproduce the evolution of microstructures not represented in the training set at a fraction of the computational costs. Extremely long-time extrapolation capabilities are achieved, up to reaching the theoretically expected equilibrium state of the system, despite the training set containing only relatively-short, initial phases of the evolution. Quantitative accordance with the decay rate of the Free energy is also demonstrated up to late coarsening stages, providing an example of a data-driven, physically consistent and high-accuracy Machine Learning method for the long timescale simulation of materials.

Updated: 2024-07-29 15:55:52

标题: 物理启发的神经网络在材料三维微观结构演变中的极限时间外推能力和热力学一致性

摘要: 一个卷积循环神经网络（CRNN）被训练来复现三维自旋分解过程的演变，如Cahn-Hilliard方程所描述。一个专门的、受物理启发的架构被证明能够在预测演变和通过传统积分方案获得的基础真相之间提供密切一致。该方法可以在计算成本的一小部分下密切复制微结构的演变，这些微结构在训练集中没有表示。尽管训练集仅包含相对短暂的初始阶段，但仍能实现极长时间的外推能力，直至达到系统的理论预期平衡状态。与自由能的衰减速率的数量一致性也被证明能延伸到晚期粗化阶段，为长时间尺度模拟材料的数据驱动、物理一致和高精度机器学习方法提供了一个例子。

更新时间: 2024-07-29 15:55:52

领域: cond-mat.mes-hall,cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.20126v1

AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.

Updated: 2024-07-29 15:54:43

标题: AxiomVision：透视感知视频分析的精准保证自适应视觉模型选择

摘要: 多媒体和计算机视觉技术的快速发展需要自适应的视觉模型部署策略，以有效处理各种任务和不同环境。本文介绍了一种新颖的框架AxiomVision，通过利用边缘计算动态选择最有效的视觉模型来保证准确性，用于视频分析在不同场景下。利用分层边缘-云架构，AxiomVision使得可以部署从轻量级到复杂的深度神经网络的广泛视觉模型，可以根据具体场景进行定制，同时考虑摄像头源的影响。此外，AxiomVision提供了三个核心创新：（1）利用持续在线学习的动态视觉模型选择机制，（2）一个高效的在线方法，有效考虑摄像头视角的影响，（3）一种基于拓扑的分组方法，加速模型选择过程。通过严格的理论保证，这些进步为多媒体系统固有的视觉任务，如物体检测、分类和计数，提供了可扩展且有效的解决方案。从经验上看，AxiomVision在准确性方面取得了25.7%的提高。

更新时间: 2024-07-29 15:54:43

领域: cs.MM,cs.AI

下载: http://arxiv.org/abs/2407.20124v1

Tightening the Evaluation of PAC Bounds Using Formal Verification Results

Probably Approximately Correct (PAC) bounds are widely used to derive probabilistic guarantees for the generalisation of machine learning models. They highlight the components of the model which contribute to its generalisation capacity. However, current state-of-the-art results are loose in approximating the generalisation capacity of deployed machine learning models. Consequently, while PAC bounds are theoretically useful, their applicability for evaluating a model's generalisation property in a given operational design domain is limited. The underlying classical theory is supported by the idea that bounds can be tightened when the number of test points available to the user to evaluate the model increases. Yet, in the case of neural networks, the number of test points required to obtain bounds of interest is often impractical even for small problems. In this paper, we take the novel approach of using the formal verification of neural systems to inform the evaluation of PAC bounds. Rather than using pointwise information obtained from repeated tests, we use verification results on regions around test points. We show that conditioning existing bounds on verification results leads to a tightening proportional to the underlying probability mass of the verified region.

Updated: 2024-07-29 15:53:14

标题: 使用形式验证结果加强PAC界限的评估

摘要: Probably Approximately Correct (PAC)界限被广泛用于推导机器学习模型泛化的概率保证。它们突出了模型的哪些组成部分对其泛化能力有贡献。然而，当前最先进的结果在近似推断部署的机器学习模型的泛化能力方面存在不足。因此，虽然PAC界限在理论上是有用的，但它们在评估给定操作设计领域中模型的泛化特性方面的适用性有限。基础的经典理论支持着这样的观点，即当用户用于评估模型的测试点数量增加时，界限可以被加强。然而，在神经网络的情况下，为了获得感兴趣的界限，所需的测试点数量通常对于小问题而言是不切实际的。在本文中，我们采用了一种新颖的方法，利用神经系统的形式验证来指导PAC界限的评估。与从重复测试中获得的逐点信息不同，我们使用围绕测试点的区域的验证结果。我们表明，将现有界限置于验证结果之上导致了与被验证区域的概率质量成比例的加强。

更新时间: 2024-07-29 15:53:14

领域: cs.LG

下载: http://arxiv.org/abs/2407.20122v1

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius)

Updated: 2024-07-29 15:52:22

标题: 建立在合成临床记录上的可公开分享的临床大型语言模型

摘要: 针对处理患者临床记录的大型语言模型的开发通常受到隐私法规的限制，这些限制导致这些记录的有限可访问性和可用性。为了解决这些挑战，我们首先利用从生物医学文献中提取的公开案例报告创建了合成的大规模临床记录。然后，我们使用这些合成记录来训练我们的专门临床大型语言模型Asclepius。虽然Asclepius是在合成数据上训练的，但我们通过使用真实临床记录对其进行评估，评估其在真实世界应用中的潜在性能。我们将Asclepius与几种其他大型语言模型进行基准测试，包括GPT-3.5-turbo和其他开源替代方案。为了进一步验证我们使用合成记录的方法，我们还将Asclepius与在真实临床记录上训练的变体进行比较。我们的研究结果令人信服地表明，合成临床记录在构建高性能临床语言模型时可以作为真实记录的可行替代品。这一结论得到了GPT-4和医疗专业人士进行详细评估的支持。Asclepius的所有资源，包括权重、代码和数据，都已公开提供，供未来研究使用。 (https://github.com/starmpcc/Asclepius)

更新时间: 2024-07-29 15:52:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.00237v4

EXIT: An EXplicit Interest Transfer Framework for Cross-Domain Recommendation

Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned embeddings or patterns between domains to improve recommendation accuracy. Since the transfer of interest signals is unsupervised, these implicit paradigms often struggle with the negative transfer resulting from differences in service functions and presentation forms across different domains. In this paper, we propose a simple and effective EXplicit Interest Transfer framework named EXIT to address the stated challenge. Specifically, we propose a novel label combination approach that enables the model to directly learn beneficial source domain interests through supervised learning, while excluding inappropriate interest signals. Moreover, we introduce a scene selector network to model the interest transfer intensity under fine-grained scenes. Offline experiments conducted on the industrial production dataset and online A/B tests validate the superiority and effectiveness of our proposed framework. Without complex network structures or training processes, EXIT can be easily deployed in the industrial recommendation system. EXIT has been successfully deployed in the online homepage recommendation system of Meituan App, serving the main traffic.

Updated: 2024-07-29 15:52:09

标题: EXIT：一种用于跨领域推荐的显式兴趣转移框架

摘要: 跨领域推荐已经引起了工业应用的广泛兴趣，比如美团，通过知识传递服务于多个业务领域，满足用户的多样化兴趣。然而，现有方法通常遵循隐式建模范式，将来自源领域和目标领域的知识融合在一起，并设计复杂的网络结构来共享学习到的嵌入或模式，以提高推荐准确性。由于兴趣信号的转移是无监督的，这些隐式范例常常在不同领域之间的服务功能和呈现形式的差异导致的负面转移中挣扎。在本文中，我们提出了一个简单有效的显式兴趣转移框架，名为EXIT，以解决上述挑战。具体来说，我们提出了一种新颖的标签组合方法，使模型能够通过监督学习直接学习有益的源领域兴趣，同时排除不适当的兴趣信号。此外，我们引入了一个场景选择器网络，以建模精细场景下的兴趣传递强度。在工业生产数据集上进行的离线实验和在线A/B测试验证了我们提出的框架的优越性和有效性。EXIT无需复杂的网络结构或训练过程，可以轻松部署在工业推荐系统中。EXIT已成功部署在美团App的在线首页推荐系统中，为主要流量提供服务。

更新时间: 2024-07-29 15:52:09

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.20121v1

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.

Updated: 2024-07-29 15:51:09

标题: 自适应自监督鲁棒聚类：用于未知聚类数的非结构化数据

摘要: 我们介绍了一种新颖的自监督深度聚类方法，专为无需事先知道聚类数量的非结构化数据而设计，称为自适应自监督鲁棒聚类（ASRC）。具体而言，ASRC自适应地学习图结构和边权重，以捕获局部和全局结构信息。所得到的图使我们可以通过增强的图自动编码器和对比学习技术学习适合聚类的特征表示。它进一步利用由鲁棒连续聚类（RCC）自适应获得的聚类结果生成原型用于负采样，这可以进一步促进正样本之间的一致性并扩大正样本和负样本之间的差距。ASRC通过在学习的特征表示上应用RCC以及它们一致的图结构和边权重来获得最终的聚类结果。在七个基准数据集上进行的大量实验证明了ASRC的有效性，展示了其优于其他流行聚类模型的性能。值得注意的是，ASRC甚至优于依赖于聚类数量事先知识的方法，突显了其在解决非结构化数据聚类挑战方面的有效性。

更新时间: 2024-07-29 15:51:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20119v1

Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)

The advancement in computational power and hardware efficiency enabled the tackling of increasingly complex and high-dimensional problems. While artificial intelligence (AI) achieved remarkable results, the interpretability of high-dimensional solutions remains challenging. A critical issue is the comparison of multidimensional quantities, which is essential in techniques like Principal Component Analysis (PCA), or k-means clustering. Common metrics such as cosine similarity, Euclidean distance, and Manhattan distance are often used for such comparisons - for example in muscular synergies of the human motor control system. However, their applicability and interpretability diminish as dimensionality increases. This paper provides a comprehensive analysis of the effects of dimensionality on these metrics. Our results reveal significant limitations of cosine similarity, particularly its dependency on the dimensionality of the vectors, leading to biased and less interpretable outcomes. To address this, we introduce the Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a reliable tool for high-dimensional comparisons. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine and deep learning.

Updated: 2024-07-29 15:49:29

标题: 超越余弦相似度进行多维比较：维度无关的欧氏度量（DIEM）

摘要: 计算能力和硬件效率的进步使得处理越来越复杂和高维问题成为可能。虽然人工智能取得了显著的成果，但高维解决方案的可解释性仍然具有挑战性。一个关键问题是多维量的比较，在主成分分析（PCA）或k均值聚类等技术中至关重要。常见的度量标准如余弦相似度、欧氏距离和曼哈顿距离常用于这样的比较 - 例如在人类运动控制系统的肌肉协同中。然而，随着维度的增加，它们的适用性和可解释性会减弱。本文对维度对这些度量标准的影响进行了全面分析。我们的结果显示，余弦相似度存在显著的局限性，特别是其对向量维度的依赖性，导致偏倚和较不可解释的结果。为了解决这个问题，我们引入了维度无关欧氏度量（DIEM），它表现出在各个维度上更强的稳健性和泛化能力。DIEM保持了一致的变化性，并消除了传统度量中观察到的偏见，使其成为高维比较的可靠工具。这种新颖的度量标准有潜力取代余弦相似度，提供一种更准确和深入的方法来分析从神经运动控制到机器和深度学习等领域的多维数据。

更新时间: 2024-07-29 15:49:29

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.08623v2

HOAA: Hybrid Overestimating Approximate Adder for Enhanced Performance Processing Engine

This paper presents the Hybrid Overestimating Approximate Adder designed to enhance the performance in processing engines, specifically focused on edge AI applications. A novel Plus One Adder design is proposed as an incremental adder in the RCA chain, incorporating a Full Adder with an excess 1 alongside inputs A, B, and Cin. The design approximates outputs to 2 bit values to reduce hardware complexity and improve resource efficiency. The Plus One Adder is integrated into a dynamically reconfigurable HOAA, allowing runtime interchangeability between accurate and approximate overestimation modes. The proposed design is demonstrated for multiple applications, such as Twos complement subtraction and Rounding to even, and the Configurable Activation function, which are critical components of the Processing engine. Our approach shows 21 percent improvement in area efficiency and 33 percent reduction in power consumption, compared to state of the art designs with minimal accuracy loss. Thus, the proposed HOAA could be a promising solution for resource-constrained environments, offering ideal trade-offs between hardware efficiency vs computational accuracy.

Updated: 2024-07-29 15:47:51

标题: HOAA：用于增强性能处理引擎的混合过度估计近似加法器

摘要: 本文介绍了一种设计用于增强处理引擎性能的混合过度估计近似加法器，特别关注边缘人工智能应用。提出了一种新颖的加一加法器设计，作为RCA链中的增量加法器，将一个带有过量1的全加器与输入A、B和Cin结合起来。该设计将输出近似为2位值，以减少硬件复杂性并提高资源效率。加一加法器集成到动态可重配置的HOAA中，允许在准确和近似过度估计模式之间进行运行时互换。所提出的设计已经应用于多个应用程序，例如二进制补码减法和舍入到偶数，以及可配置激活函数，这些是处理引擎的关键组件。与最新设计相比，我们的方法在面积效率方面提高了21％，在功耗方面减少了33％，并且准确性损失最小。因此，所提出的HOAA可能是资源受限环境的一个有前途的解决方案，提供了理想的硬件效率与计算准确性之间的权衡。

更新时间: 2024-07-29 15:47:51

领域: cs.AR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.00806v1

FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis

In the field of Image-Text Retrieval (ITR), recent advancements have leveraged large-scale Vision-Language Pretraining (VLP) for Fine-Grained (FG) instance-level retrieval, achieving high accuracy at the cost of increased computational complexity. For Coarse-Grained (CG) category-level retrieval, prominent approaches employ Cross-Modal Hashing (CMH) to prioritise efficiency, albeit at the cost of retrieval performance. Due to differences in methodologies, FG and CG models are rarely compared directly within evaluations in the literature, resulting in a lack of empirical data quantifying the retrieval performance-efficiency tradeoffs between the two. This paper addresses this gap by introducing the \texttt{FiCo-ITR} library, which standardises evaluation methodologies for both FG and CG models, facilitating direct comparisons. We conduct empirical evaluations of representative models from both subfields, analysing precision, recall, and computational complexity across varying data scales. Our findings offer new insights into the performance-efficiency trade-offs between recent representative FG and CG models, highlighting their respective strengths and limitations. These findings provide the foundation necessary to make more informed decisions regarding model selection for specific retrieval tasks and highlight avenues for future research into hybrid systems that leverage the strengths of both FG and CG approaches.

Updated: 2024-07-29 15:44:22

标题: FiCo-ITR：连接细粒度和粗粒度图像文本检索以进行性能比较分析

摘要: 在图像文本检索（ITR）领域，最近的进展利用大规模的视觉语言预训练（VLP）进行细粒度（FG）实例级检索，以高精度为代价实现了高精度，但增加了计算复杂性。对于粗粒度（CG）类别级检索，著名方法采用交叉模态哈希（CMH）以优先效率，尽管这是以检索性能为代价。由于方法论的差异，在文献评估中很少直接比较FG和CG模型，导致缺乏量化两者之间检索性能和效率权衡的实证数据。本文通过引入\texttt{FiCo-ITR}库来填补这一空白，该库标准化了FG和CG模型的评估方法，促进了直接比较。我们对两个子领域的代表性模型进行实证评估，分析在不同数据规模下的精度、召回率和计算复杂度。我们的研究结果为最近代表性FG和CG模型之间的性能和效率权衡提供了新的见解，突出了它们各自的优势和局限性。这些发现为针对特定检索任务进行模型选择提供了必要的基础，并突出了未来研究在利用FG和CG方法的优势的混合系统方面的途径。

更新时间: 2024-07-29 15:44:22

领域: cs.IR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.20114v1

Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term. Due to the multi-modality contained in the optimal policy distribution, the transformation in Diffusion-DICE may guide towards those local-optimal modes. We thus generate a few candidate actions and carefully select from them to approach global-optimum. Different from all other diffusion-based offline RL methods, the guide-then-select paradigm in Diffusion-DICE only uses in-sample actions for training and brings minimal error exploitation in the value function. We use a didatic toycase example to show how previous diffusion-based methods fail to generate optimal actions due to leveraging these errors and how Diffusion-DICE successfully avoids that. We then conduct extensive experiments on benchmark datasets to show the strong performance of Diffusion-DICE.

Updated: 2024-07-29 15:36:42

标题: Diffusion-DICE：离线强化学习的样本内扩散指导

摘要: Distribution Correction Estimation (DICE)方法的一个重要特性是，其解决方案是优化和数据收集策略之间的最佳稳态分布比率。在这项工作中，我们表明基于DICE的方法可以被视为从行为分布到最优策略分布的转化。基于此，我们提出了一种新颖的方法，Diffusion-DICE，直接利用扩散模型执行这种转化。我们发现最优策略的得分函数可以分解为两个项：行为策略的得分函数和取决于最优分布比率的指导项的梯度。第一个项可以从训练在数据集上的扩散模型中获得，我们提出一个样本内学习目标来学习第二项。由于最优策略分布中包含的多模态性，Diffusion-DICE中的转化可能指导向这些局部最优模式。因此，我们生成一些候选动作，并仔细从中选择以接近全局最优。不同于所有其他基于扩散的离线RL方法，Diffusion-DICE中的指导-选择范式仅使用样本内动作进行训练，并在值函数中带来最小的误差利用。我们使用一个教育性的玩具案例示例，展示了由于利用这些错误，以前基于扩散的方法无法生成最优动作，以及Diffusion-DICE如何成功避免这种情况。然后，我们在基准数据集上进行了大量实验，展示了Diffusion-DICE的强大性能。

更新时间: 2024-07-29 15:36:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20109v1

Classification, Regression and Segmentation directly from k-Space in Cardiac MRI

Cardiac Magnetic Resonance Imaging (CMR) is the gold standard for diagnosing cardiovascular diseases. Clinical diagnoses predominantly rely on magnitude-only Digital Imaging and Communications in Medicine (DICOM) images, omitting crucial phase information that might provide additional diagnostic benefits. In contrast, k-space is complex-valued and encompasses both magnitude and phase information, while humans cannot directly perceive. In this work, we propose KMAE, a Transformer-based model specifically designed to process k-space data directly, eliminating conventional intermediary conversion steps to the image domain. KMAE can handle critical cardiac disease classification, relevant phenotype regression, and cardiac morphology segmentation tasks. We utilize this model to investigate the potential of k-space-based diagnosis in cardiac MRI. Notably, this model achieves competitive classification and regression performance compared to image-domain methods e.g. Masked Autoencoders (MAEs) and delivers satisfactory segmentation performance with a myocardium dice score of 0.884. Last but not least, our model exhibits robust performance with consistent results even when the k-space is 8* undersampled. We encourage the MR community to explore the untapped potential of k-space and pursue end-to-end, automated diagnosis with reduced human intervention.

Updated: 2024-07-29 15:35:35

标题: 在心脏磁共振成像中直接从k空间进行分类、回归和分割

摘要: 心脏磁共振成像（CMR）是诊断心血管疾病的金标准。临床诊断主要依赖于仅具有幅度信息的数字影像和医学通信（DICOM）图像，忽略了可能提供额外诊断益处的关键相位信息。相比之下，k空间是复值的，包含幅度和相位信息，而人类无法直接感知。在这项工作中，我们提出了KMAE，一个基于Transformer的模型，专门设计用于直接处理k空间数据，消除传统的中介转换步骤到图像域。KMAE能够处理关键的心脏疾病分类、相关表型回归和心脏形态分割任务。我们利用这个模型来研究基于k空间的心脏磁共振成像的诊断潜力。值得注意的是，与图像域方法（如遮蔽自动编码器（MAEs））相比，该模型实现了竞争性的分类和回归性能，并且在心肌分割方面提供了令人满意的表现，心肌骰子得分为0.884。最重要的是，我们的模型表现出稳健的性能，即使k空间被8倍欠采样，也能保持一致的结果。我们鼓励磁共振社区探索k空间的未开发潜力，并追求减少人为干预的端到端自动化诊断。

更新时间: 2024-07-29 15:35:35

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.20108v1

Strong Copyright Protection for Language Models via Adaptive Model Fusion

The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimize the reproduction of protected materials. CP-Fuse is inspired by the recently proposed Near-Access Free (NAF) framework and additionally incorporates a desirable balancing property that we demonstrate prevents the reproduction of memorized training data. Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation. Furthermore, we demonstrate how CP-Fuse can be integrated with other techniques for enhanced protection.

Updated: 2024-07-29 15:32:30

标题: 通过自适应模型融合实现语言模型的强版权保护

摘要: 语言模型意外地复制培训数据中的受版权保护材料的风险已经导致各种保护措施的开发。在本文中，我们提出模型融合作为防止侵犯版权的有效解决方案。具体来说，我们介绍了一种名为Copyright-Protecting Fusion（CP-Fuse）的算法，该算法自适应地组合语言模型，以最小化对受保护材料的复制。CP-Fuse受近期提出的Near-Access Free（NAF）框架启发，另外还融入了一种理想的平衡属性，我们展示了这种属性可以防止对记忆的训练数据进行复制。我们的结果显示，CP-Fuse显著减少了版权内容的记忆，同时保持了高质量的文本和代码生成。此外，我们展示了如何将CP-Fuse与其他技术集成以实现增强的保护。

更新时间: 2024-07-29 15:32:30

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.20105v1

Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study

Deep neural networks have recently achieved breakthroughs in sound generation. Despite the outstanding sample quality, current sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limiting performance. In this paper, we make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM, the cutting-edge model for audio generation, as the backbone. Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios. In addition, the baselines and evaluation protocol for sound generation systems are not consistent enough to compare different studies directly. Aiming to facilitate further study on sound generation tasks, we benchmark the sound generation task on various frequently-used datasets. We hope our results on transfer learning and benchmarks can provide references for further research on conditional sound generation.

Updated: 2024-07-29 15:29:23

标题: 利用预训练的音频LDM进行声音生成：基准研究

摘要: 最近，深度神经网络在声音生成方面取得了突破。尽管样本质量出色，但当前的声音生成模型在小规模数据集（例如过拟合）上面临问题，严重限制了性能。本文首次尝试探讨在以AudioLDM为骨干的声音生成中进行预训练的好处。我们的研究展示了预训练的AudioLDM的优势，特别是在数据稀缺的情况下。此外，声音生成系统的基线和评估协议不够一致，无法直接比较不同研究。为了促进进一步研究声音生成任务，我们在各种常用数据集上对声音生成任务进行基准测试。我们希望我们关于迁移学习和基准测试的结果可以为进一步研究条件声音生成提供参考。

更新时间: 2024-07-29 15:29:23

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2303.03857v3

F-KANs: Federated Kolmogorov-Arnold Networks

In this paper, we present an innovative federated learning (FL) approach that utilizes Kolmogorov-Arnold Networks (KANs) for classification tasks. By utilizing the adaptive activation capabilities of KANs in a federated framework, we aim to improve classification capabilities while preserving privacy. The study evaluates the performance of federated KANs (F- KANs) compared to traditional Multi-Layer Perceptrons (MLPs) on classification task. The results show that the F-KANs model significantly outperforms the federated MLP model in terms of accuracy, precision, recall, F1 score and stability, and achieves better performance, paving the way for more efficient and privacy-preserving predictive analytics.

Updated: 2024-07-29 15:28:26

标题: F-KANs：联合科尔莫哥洛夫-阿诺德网络

摘要: 在本文中，我们提出了一种创新的联邦学习（FL）方法，利用科尔莫戈洛夫-阿诺德网络（KANs）进行分类任务。通过在联邦框架中利用KANs的自适应激活能力，我们旨在提高分类能力同时保护隐私。该研究评估了联邦KANs（F-KANs）与传统的多层感知器（MLPs）在分类任务上的性能。结果显示，F-KANs模型在准确性、精确度、召回率、F1分数和稳定性方面显著优于联邦MLP模型，并取得更好的性能，为更高效和保护隐私的预测分析铺平了道路。

更新时间: 2024-07-29 15:28:26

领域: cs.LG,cs.AI,cs.CR,cs.NI

下载: http://arxiv.org/abs/2407.20100v1

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Sparse autoencoders (SAEs) are a promising unsupervised approach for identifying causally relevant and interpretable linear features in a language model's (LM) activations. To be useful for downstream tasks, SAEs need to decompose LM activations faithfully; yet to be interpretable the decomposition must be sparse -- two objectives that are in tension. In this paper, we introduce JumpReLU SAEs, which achieve state-of-the-art reconstruction fidelity at a given sparsity level on Gemma 2 9B activations, compared to other recent advances such as Gated and TopK SAEs. We also show that this improvement does not come at the cost of interpretability through manual and automated interpretability studies. JumpReLU SAEs are a simple modification of vanilla (ReLU) SAEs -- where we replace the ReLU with a discontinuous JumpReLU activation function -- and are similarly efficient to train and run. By utilising straight-through-estimators (STEs) in a principled manner, we show how it is possible to train JumpReLU SAEs effectively despite the discontinuous JumpReLU function introduced in the SAE's forward pass. Similarly, we use STEs to directly train L0 to be sparse, instead of training on proxies such as L1, avoiding problems like shrinkage.

Updated: 2024-07-29 15:27:03

标题: 超前跳跃：利用JumpReLU稀疏自动编码器提高重建保真度

摘要: 稀疏自编码器（SAEs）是一种有前途的无监督方法，用于识别语言模型（LM）激活中具有因果关系且可解释的线性特征。为了对下游任务有用，SAEs需要忠实地分解LM激活；然而，为了可解释性，分解必须是稀疏的--这两个目标之间存在紧张关系。在本文中，我们介绍了JumpReLU SAEs，与其他最新进展如门控和TopK SAEs相比，在给定的稀疏水平上实现了Gemini 2 9B激活的最先进的重构保真度。我们还展示了这种改进并不以可解释性为代价，通过手动和自动解释性研究。JumpReLU SAEs是对普通（ReLU）SAEs的简单修改--我们用不连续的JumpReLU激活函数替换ReLU--并且训练和运行效率类似。通过以有原则的方式利用直通估计器（STEs），我们展示了尽管在SAE的前向传播中引入了不连续的JumpReLU函数，但仍然可以有效地训练JumpReLU SAEs。类似地，我们使用STEs直接训练L0以使其稀疏，而不是在代理（如L1）上进行训练，避免了收缩等问题。

更新时间: 2024-07-29 15:27:03

领域: cs.LG

下载: http://arxiv.org/abs/2407.14435v2

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into an embedding space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the learned embedding space of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it applicable to any classifier. We evaluate TabMDA on five standard classifiers and observe significant performance improvements across various tabular datasets. Our results demonstrate that TabMDA provides an effective way to leverage information from pre-trained in-context models to enhance the performance of downstream classifiers. Code is available at https://github.com/AdrianBZG/TabMDA.

Updated: 2024-07-29 15:08:17

标题: TabMDA：使用具有上下文子集的变压器对任何分类器进行表状流形数据增强

摘要: 表格数据在许多关键领域中占据主导地位，但通常很难大量获取。这种稀缺性通常导致机器学习模型在这些数据上表现不佳。数据增强是一种常见的用于提高视觉和语言任务性能的策略，但通常对表格数据效果不佳，因为输入空间缺乏明确的对称性。为了克服这一挑战，我们引入了TabMDA，一种用于表格数据的流形数据增强的新方法。该方法利用预训练的上下文模型，如TabPFN，将数据映射到嵌入空间中。TabMDA通过使用不同的上下文多次对数据进行编码，执行与标签无关的转换。这个过程探索了底层上下文模型的学习嵌入空间，从而扩大了训练数据集。TabMDA是一种无需训练的方法，可应用于任何分类器。我们在五个标准分类器上评估了TabMDA，并观察到在各种表格数据集上显著提高了性能。我们的结果表明，TabMDA提供了一种有效的方式，利用预训练的上下文模型的信息来增强下游分类器的性能。代码可在https://github.com/AdrianBZG/TabMDA找到。

更新时间: 2024-07-29 15:08:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01805v2

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributions. Despite these efforts, a unified and comprehensive benchmark has yet to be established. To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable. Each scenario within the benchmark is fully described by a Markov state transition matrix for sampling from the original dataset. The UniTTA benchmark considers both domain and class as two independent dimensions of data and addresses various combinations of imbalance/balance and i.i.d./non-i.i.d./continual conditions, covering a total of $ (2 \times 3)^2 = 36 $ scenarios. It establishes a comprehensive evaluation benchmark for realistic TTA and provides a guideline for practitioners to select the most suitable TTA method. Alongside this benchmark, we propose a versatile UniTTA framework, which includes a Balanced Domain Normalization (BDN) layer and a COrrelated Feature Adaptation (COFA) method--designed to mitigate distribution gaps in domain and class, respectively. Extensive experiments demonstrate that our UniTTA framework excels within the UniTTA benchmark and achieves state-of-the-art performance on average. Our code is available at \url{https://github.com/LeapLabTHU/UniTTA}.

Updated: 2024-07-29 15:04:53

标题: UniTTA：统一基准和多功能框架，面向现实测试时间适应

摘要: 测试时间适应（TTA）旨在在测试期间将预先训练的模型适应到目标领域。实际上，这种适应性可能受多种因素影响。研究人员已经确定了各种具有挑战性的情况，并开发了不同的方法来解决这些挑战，例如处理持续领域转移、混合领域以及时间相关或不平衡的类分布。尽管进行了这些努力，但尚未建立统一和全面的基准。为此，我们提出了一个全面且广泛适用的统一测试时间适应（UniTTA）基准。基准中的每种情况都由一个马尔可夫状态转移矩阵完全描述，用于从原始数据集中进行采样。UniTTA基准将域和类别分别作为数据的两个独立维度，解决了不平衡/平衡和i.i.d./非i.i.d./持续条件的各种组合，涵盖了总共$ (2 \times 3)^2 = 36 $个场景。它为实际TTA建立了一个全面的评估基准，并为从业者选择最合适的TTA方法提供了指导。除此基准外，我们还提出了一个多功能的UniTTA框架，其中包括一个平衡领域归一化（BDN）层和一个相关特征适应（COFA）方法，旨在分别缓解领域和类别之间的分布差异。大量实验表明，我们的UniTTA框架在UniTTA基准中表现出色，并平均实现了最先进的性能。我们的代码可在\url{https://github.com/LeapLabTHU/UniTTA}上找到。

更新时间: 2024-07-29 15:04:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20080v1

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstrated effective for removing the requirement of large batch size, their performance on large-scale data remains underexplored and not optimized. To bridge the gap, this paper explores several aspects of CLIP training with limited resources (e.g., up to tens of GPUs). First, we introduce FastCLIP, a general CLIP training framework built on advanced compositional optimization techniques while designed and optimized for the distributed setting. Our framework is equipped with an efficient gradient reduction strategy to reduce communication overhead. Second, to further boost training efficiency, we investigate three components of the framework from an optimization perspective: the schedule of the inner learning rate, the update rules of the temperature parameter and the model parameters, respectively. Experiments on different strategies for each component shed light on how to conduct CLIP training more efficiently. Finally, we benchmark the performance of FastCLIP and the state-of-the-art training baseline (OpenCLIP) on different compute scales up to 32 GPUs on 8 nodes, and three data scales ranging from 2.7 million, 9.1 million to 315 million image-text pairs to demonstrate the significant improvement of FastCLIP in the resource-limited setting. We release the code of FastCLIP at https://github.com/Optimization-AI/fast_clip .

Updated: 2024-07-29 15:04:15

标题: FastCLIP：一套优化技术套件，用于加速具有有限资源的CLIP训练。

摘要: 现有的关于在大规模数据上训练最先进的对比语言-图像预训练（CLIP）模型的研究涉及数百甚至数千个GPU，因为需要大批量大小。然而，这么大量的资源对大多数人来说并不可及。虽然已经证明了用于优化全局对比损失的高级组合优化技术可以消除大批量大小的要求，但它们在大规模数据上的表现仍未被充分探索和优化。为了弥合这一差距，本文探讨了在有限资源条件下（例如，最多几十个GPU）进行CLIP训练的几个方面。首先，我们介绍了FastCLIP，这是一个基于高级组合优化技术构建的通用CLIP训练框架，同时设计和优化了分布式设置。我们的框架配备了高效的梯度减少策略，以减少通信开销。其次，为了进一步提高训练效率，我们从优化的角度研究了框架的三个组成部分：内部学习率的调度、温度参数和模型参数的更新规则。对每个组件的不同策略的实验揭示了如何更加高效地进行CLIP训练。最后，我们在不同计算规模（最多32个GPU，8个节点）和三个数据规模（分别为270万、910万和3.15亿个图像-文本对）上对FastCLIP和最先进的训练基线（OpenCLIP）的性能进行了基准测试，以展示FastCLIP在资源有限环境中的显著改进。我们在https://github.com/Optimization-AI/fast_clip上发布了FastCLIP的代码。

更新时间: 2024-07-29 15:04:15

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.01445v2

An Interpretable Rule Creation Method for Black-Box Models based on Surrogate Trees -- SRules

As artificial intelligence (AI) systems become increasingly integrated into critical decision-making processes, the need for transparent and interpretable models has become paramount. In this article we present a new ruleset creation method based on surrogate decision trees (SRules), designed to improve the interpretability of black-box machine learning models. SRules balances the accuracy, coverage, and interpretability of machine learning models by recursively creating surrogate interpretable decision tree models that approximate the decision boundaries of a complex model. We propose a systematic framework for generating concise and meaningful rules from these surrogate models, allowing stakeholders to understand and trust the AI system's decision-making process. Our approach not only provides interpretable rules, but also quantifies the confidence and coverage of these rules. The proposed model allows to adjust its parameters to counteract the lack of interpretability by precision and coverage by allowing a near perfect fit and high interpretability of some parts of the model . The results show that SRules improves on other state-of-the-art techniques and introduces the possibility of creating highly interpretable specific rules for specific sub-parts of the model.

Updated: 2024-07-29 14:56:56

标题: 基于替代树的黑盒模型可解释性规则创建方法 -- SRules

摘要: 随着人工智能系统越来越多地融入关键决策过程中，透明和可解释的模型需求变得至关重要。在本文中，我们提出了一种基于替代决策树（SRules）的新规则集创建方法，旨在提高黑匣子机器学习模型的可解释性。SRules通过递归创建替代可解释决策树模型来平衡机器学习模型的准确性、覆盖率和可解释性，这些模型近似复杂模型的决策边界。我们提出了一个系统框架，从这些替代模型中生成简洁而有意义的规则，使利益相关者能够理解和信任人工智能系统的决策过程。我们的方法不仅提供可解释的规则，还量化这些规则的置信度和覆盖范围。提出的模型允许调整其参数以抵消可解释性不足，通过允许对模型的某些部分进行接近完美拟合和高可解释性，来平衡精度和覆盖率。结果表明，SRules在其他最先进的技术上取得了进步，并引入了为模型的特定子部分创建高度可解释规则的可能性。

更新时间: 2024-07-29 14:56:56

领域: cs.LG,68T01,I.2.6

下载: http://arxiv.org/abs/2407.20070v1

CityX: Controllable Procedural Content Generation for Unbounded 3D Cities

Generating a realistic, large-scale 3D virtual city remains a complex challenge due to the involvement of numerous 3D assets, various city styles, and strict layout constraints. Existing approaches provide promising attempts at procedural content generation to create large-scale scenes using Blender agents. However, they face crucial issues such as difficulties in scaling up generation capability and achieving fine-grained control at the semantic layout level. To address these problems, we propose a novel multi-modal controllable procedural content generation method, named CityX, which enhances realistic, unbounded 3D city generation guided by multiple layout conditions, including OSM, semantic maps, and satellite images. Specifically, the proposed method contains a general protocol for integrating various PCG plugins and a multi-agent framework for transforming instructions into executable Blender actions. Through this effective framework, CityX shows the potential to build an innovative ecosystem for 3D scene generation by bridging the gap between the quality of generated assets and industrial requirements. Extensive experiments have demonstrated the effectiveness of our method in creating high-quality, diverse, and unbounded cities guided by multi-modal conditions. Our project page: https://cityx-lab.github.io.

Updated: 2024-07-29 14:55:33

标题: CityX：可控的无限3D城市程序内容生成

摘要: 生成一个现实的、大规模的3D虚拟城市仍然是一个复杂的挑战，因为涉及到大量的3D资产、各种城市风格和严格的布局限制。现有方法尝试通过使用Blender代理来进行程序内容生成，以创建大规模场景，但面临着关键问题，如难以扩展生成能力和在语义布局层面实现精细控制。为了解决这些问题，我们提出了一种新颖的多模态可控程序内容生成方法，名为CityX，它通过多种布局条件（包括OSM、语义地图和卫星图像）引导实现了真实、无限的3D城市生成。具体而言，所提出的方法包含一个用于集成各种PCG插件的通用协议，以及一个将指令转换为可执行的Blender动作的多代理框架。通过这个有效的框架，CityX展现了通过搭建生成资产质量与工业需求之间的桥梁，建立创新的3D场景生成生态系统的潜力。大量实验证明了我们的方法在创建受多模态条件引导的高质量、多样化、无限的城市方面的有效性。我们的项目页面：https://cityx-lab.github.io。

更新时间: 2024-07-29 14:55:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17572v2

Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the noisy query results to itself and only reveals a binary bit for each query, indicating whether the query result surpasses a predefined threshold. To provide a rigorous DP guarantee for SVT, prior works in the literature adopt a conservative privacy analysis by assuming the direct disclosure of noisy query results as in typical private query releases. This approach, however, hinders SVT from achieving higher query accuracy due to an overestimation of the privacy risks, which further leads to an excessive noise injection using the Laplacian or Gaussian noise for perturbation. Motivated by this, we provide a new privacy analysis for SVT by considering its less informative nature. Our analysis results not only broaden the range of applicable noise types for perturbation in SVT, but also identify the exponential noise as optimal among all evaluated noises (which, however, is usually deemed non-applicable in prior works). The main challenge in applying exponential noise to SVT is mitigating the sub-optimal performance due to the bias introduced by noise distributions. To address this, we develop a utility-oriented optimal threshold correction method and an appending strategy, which enhances the performance of SVT by increasing the precision and recall, respectively. The effectiveness of our proposed methods is substantiated both theoretically and empirically, demonstrating significant improvements up to $50\%$ across evaluated metrics.

Updated: 2024-07-29 14:54:28

标题: 释放省略号的力量：具有指数噪声的增强精度稀疏向量技术

摘要: 稀疏向量技术（SVT）是差分隐私（DP）中最基本的工具之一。它作为自适应数据分析的支柱，通过回答给定数据集上的一系列查询，并以保护隐私的方式获取有用信息。与直接公开嘈杂的查询结果的典型私人查询发布不同，SVT不太具有信息性--它会保留嘈杂的查询结果，并仅对每个查询揭示一个二进制位，指示查询结果是否超过预定义的阈值。为了为SVT提供严格的DP保证，文献中的先前研究采用保守的隐私分析，假设直接公开嘈杂的查询结果，就像典型的私人查询发布一样。然而，这种方法阻碍了SVT实现更高的查询准确性，因为它高估了隐私风险，进而导致在扰动中使用拉普拉斯或高斯噪声注入过度噪声。受此启发，我们通过考虑其信息性较低提供了SVT的新隐私分析。我们的分析结果不仅扩大了适用于SVT扰动的噪声类型范围，还确定了指数噪声作为所有评估噪声中最优的噪声（但通常在先前研究中被认为不适用）。将指数噪声应用于SVT的主要挑战是减轻噪声分布引入的偏差导致的次优性能。为了解决这个问题，我们开发了一个面向效用的最佳阈值校正方法和一个附加策略，通过分别增加精度和召回，增强了SVT的性能。我们提出方法的有效性在理论和实证上得到证实，显示出在评估指标上显着提高了高达50%。

更新时间: 2024-07-29 14:54:28

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.20068v1

Fantastyc: Blockchain-based Federated Learning Made Secure and Practical

Federated Learning is a decentralized framework that enables multiple clients to collaboratively train a machine learning model under the orchestration of a central server without sharing their local data. The centrality of this framework represents a point of failure which is addressed in literature by blockchain-based federated learning approaches. While ensuring a fully-decentralized solution with traceability, such approaches still face several challenges about integrity, confidentiality and scalability to be practically deployed. In this paper, we propose Fantastyc, a solution designed to address these challenges that have been never met together in the state of the art.

Updated: 2024-07-29 14:54:22

标题: Fantastyc：基于区块链的联邦学习实现安全和实用

摘要: 联邦学习是一种分散框架，可以使多个客户端在中央服务器的协调下共同训练一个机器学习模型，而不共享他们的本地数据。这种框架的中心性代表了一个可能的失败点，在文献中通过基于区块链的联邦学习方法来解决。虽然这些方法确保了一个具有可追溯性的完全分散解决方案，但仍然面临着关于完整性、保密性和可扩展性的几个挑战，以便实际部署。在本文中，我们提出了Fantastyc，一个旨在解决这些挑战的解决方案，这些挑战在现有技术中从未被同时满足过。

更新时间: 2024-07-29 14:54:22

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.03608v2

xAI-Drop: Don't Use What You Cannot Explain

Graph Neural Networks (GNNs) have emerged as the predominant paradigm for learning from graph-structured data, offering a wide range of applications from social network analysis to bioinformatics. Despite their versatility, GNNs face challenges such as oversmoothing, lack of generalization and poor interpretability, which hinder their wider adoption and reliability in critical applications. Dropping has emerged as an effective paradigm for reducing noise during training and improving robustness of GNNs. However, existing approaches often rely on random or heuristic-based selection criteria, lacking a principled method to identify and exclude nodes that contribute to noise and over-complexity in the model. In this work, we argue that explainability should be a key indicator of a model's robustness throughout its training phase. To this end, we introduce xAI-Drop, a novel topological-level dropping regularizer that leverages explainability to pinpoint noisy network elements to be excluded from the GNN propagation mechanism. An empirical evaluation on diverse real-world datasets demonstrates that our method outperforms current state-of-the-art dropping approaches in accuracy, effectively reduces over-smoothing, and improves explanation quality.

Updated: 2024-07-29 14:53:45

标题: xAI-Drop：不要使用你无法解释的东西

摘要: 图神经网络（GNNs）已成为从图结构数据中学习的主要范式，提供了从社交网络分析到生物信息学等各种应用。尽管它们具有多样性，但GNN面临着过度平滑、缺乏泛化和解释性差等挑战，这些挑战妨碍了它们在关键应用中的广泛应用和可靠性。丢弃已成为在训练过程中减少噪声并提高GNN鲁棒性的有效范式。然而，现有方法通常依赖于随机或基于启发式的选择标准，缺乏一种原则性的方法来识别和排除在模型中导致噪音和过度复杂性的节点。在这项工作中，我们认为解释性应该是模型在整个训练阶段鲁棒性的关键指标。为此，我们介绍了xAI-Drop，一种新颖的基于拓扑级别的丢弃正则化器，利用解释性来确定应从GNN传播机制中排除的嘈杂网络元素。对多样的真实世界数据集进行的实证评估表明，我们的方法在准确性上优于当前最先进的丢弃方法，有效减少过度平滑，并提高解释质量。

更新时间: 2024-07-29 14:53:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20067v1

Long-form music generation with latent diffusion

Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure from text prompts. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.

Updated: 2024-07-29 14:52:26

标题: 使用潜在扩散生成长形音乐

摘要: 最近，基于音频的音乐生成模型取得了很大进展，但到目前为止还没有成功地从文本提示中生成具有连贯音乐结构的完整音乐曲目。我们展示通过在长时间上下文中训练生成模型，可以生成长达4分45秒的音乐作品。我们的模型由在高度降采样的连续潜在表示（潜在速率为21.5Hz）上操作的扩散-变压器组成。根据音频质量和提示对齐的度量标准，它获得了最先进的生成结果，主观测试表明它能够生成具有连贯结构的完整音乐作品。

更新时间: 2024-07-29 14:52:26

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.10301v2

Differentially Private Gradient Flow based on the Sliced Wasserstein Distance

Safeguarding privacy in sensitive training data is paramount, particularly in the context of generative modeling. This can be achieved through either differentially private stochastic gradient descent or a differentially private metric for training models or generators. In this paper, we introduce a novel differentially private generative modeling approach based on a gradient flow in the space of probability measures. To this end, we define the gradient flow of the Gaussian-smoothed Sliced Wasserstein Distance, including the associated stochastic differential equation (SDE). By discretizing and defining a numerical scheme for solving this SDE, we demonstrate the link between smoothing and differential privacy based on a Gaussian mechanism, due to a specific form of the SDE's drift term. We then analyze the differential privacy guarantee of our gradient flow, which accounts for both the smoothing and the Wiener process introduced by the SDE itself. Experiments show that our proposed model can generate higher-fidelity data at a low privacy budget compared to a generator-based model, offering a promising alternative.

Updated: 2024-07-29 14:50:46

标题: 基于切片Wasserstein距离的差分私密梯度流

摘要: 在敏感训练数据中保护隐私至关重要，特别是在生成建模的背景下。这可以通过不同ially private随机梯度下降或用于训练模型或生成器的不同ially private度量来实现。在本文中，我们介绍了一种基于概率测度空间中梯度流的新型差分隐私生成建模方法。为此，我们定义了高斯平滑切片Wasserstein距离的梯度流，包括相关的随机微分方程（SDE）。通过离散化并定义解决此SDE的数值方案，我们展示了基于高斯机制的平滑和差分隐私之间的联系，这是由于SDE漂移项的特定形式。然后，我们分析了我们梯度流的差分隐私保证，该保证考虑了SDE本身引入的平滑和Wiener过程。实验表明，与基于生成器的模型相比，我们提出的模型可以以较低的隐私预算生成更高保真度的数据，提供了一种有前途的替代方案。

更新时间: 2024-07-29 14:50:46

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.08227v2

CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.

Updated: 2024-07-29 14:50:43

标题: CoCo: 一种用于无监督领域自适应图分类的耦合对比框架

摘要: 尽管图神经网络（GNNs）在图分类方面取得了令人印象深刻的成就，但它们通常需要大量的特定任务标签，这可能会导致获取成本高昂。一个可靠的解决方案是探索额外标记的图表，以增强目标域上的无监督学习。然而，由于对图拓扑的不足探索和显著的域差异，如何将GNNs应用于领域适应仍未解决。在本文中，我们提出了耦合对比图表示学习（CoCo），它从耦合学习分支中提取拓扑信息，并通过耦合对比学习减少域差异。CoCo包含一个图卷积网络分支和一个分层图核网络分支，以隐式和显式方式探索图拓扑。此外，我们将耦合分支整合到一个整体多视图对比学习框架中，不仅将从互补视图学习的图表示整合以增强理解，还鼓励跨域示例对之间具有相同语义的相似性以进行域对齐。在流行数据集上进行的广泛实验表明，我们的CoCo在不同设置下通常优于这些竞争基准。

更新时间: 2024-07-29 14:50:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.04979v3

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction. However, the manual configuration of the neural network architectures requires domain knowledge expertise and can still be time-consuming and error-prone. To solve this, we propose a new Neural Architecture Search (NAS) framework for saliency prediction with two contributions. Firstly, a supernet for saliency prediction is built with a weight-sharing network containing all candidate architectures, by integrating a dynamic convolution into the encoder-decoder in the supernet, termed SalNAS. Secondly, despite the fact that SalNAS is highly efficient (20.98 million parameters), it can suffer from the lack of generalization. To solve this, we propose a self-knowledge distillation approach, termed Self-KD, that trains the student SalNAS with the weighted average information between the ground truth and the prediction from the teacher model. The teacher model, while sharing the same architecture, contains the best-performing weights chosen by cross-validation. Self-KD can generalize well without the need to compute the gradient in the teacher model, enabling an efficient training system. By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics across seven benchmark datasets while being a lightweight model. The code will be available at https://github.com/chakkritte/SalNAS

Updated: 2024-07-29 14:48:34

标题: SalNAS：具有自知识蒸馏的高效显著性预测神经架构搜索

摘要: 最近深度卷积神经网络的进展显著提高了显著性预测的性能。然而，神经网络架构的手动配置需要领域知识专业技能，仍然可能耗时且容易出错。为了解决这个问题，我们提出了一个新的用于显著性预测的神经架构搜索（NAS）框架，包含两个贡献。首先，通过将动态卷积集成到超网络的编码器-解码器中，构建了一个用于显著性预测的超网络，其中包含所有候选架构，称为SalNAS。其次，尽管SalNAS非常高效（2098万参数），但可能存在泛化能力不足的问题。为了解决这个问题，我们提出了一种自知识蒸馏方法，称为Self-KD，该方法使用教师模型的权重平均信息来训练学生SalNAS。教师模型与相同架构，包含通过交叉验证选择的表现最佳的权重。Self-KD能够很好地泛化，无需计算教师模型中的梯度，从而实现了高效的训练系统。通过利用Self-KD，SalNAS在七个基准数据集上的大多数评估指标中优于其他最先进的显著性预测模型，同时又是一个轻量级模型。该代码将在https://github.com/chakkritte/SalNAS 上提供。

更新时间: 2024-07-29 14:48:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.20062v1

Autonomous Bootstrapping of Quantum Dot Devices

Semiconductor quantum dots (QD) are a promising platform for multiple different qubit implementations, all of which are voltage-controlled by programmable gate electrodes. However, as the QD arrays grow in size and complexity, tuning procedures that can fully autonomously handle the increasing number of control parameters are becoming essential for enabling scalability. We propose a bootstrapping algorithm for initializing a depletion mode QD device in preparation for subsequent phases of tuning. During bootstrapping, the QD device functionality is validated, all gates are characterized, and the QD charge sensor is made operational. We demonstrate the bootstrapping protocol in conjunction with a coarse tuning module, showing that the combined algorithm can efficiently and reliably take a cooled-down QD device to a desired global state configuration in under 8 minutes with a success rate of 96 %. Importantly, by following heuristic approaches to QD device initialization and combining the efficient ray-based measurement with the rapid radio-frequency reflectometry measurements, the proposed algorithm establishes a reference in terms of performance, reliability, and efficiency against which alternative algorithms can be benchmarked.

Updated: 2024-07-29 14:47:46

标题: 量子点器件的自主引导

摘要: 半导体量子点（QD）是一个有前途的平台，可用于多种不同的量子比特实现，所有这些实现都可以通过可编程的门电极进行电压控制。然而，随着QD阵列的规模和复杂性不断增长，能够完全自主处理不断增加的控制参数的调谐程序变得至关重要，以实现可扩展性。我们提出了一种引导算法，用于初始化一个去极化模式QD器件，为随后的调谐阶段做准备。在引导过程中，验证了QD器件功能，对所有门进行了特性化，并使QD电荷传感器投入运行。我们展示了引导协议与粗调谐模块结合使用，表明结合算法可以在不到8分钟内以96％的成功率将冷却的QD器件带到所需的全局状态配置。重要的是，通过遵循QD器件初始化的启发式方法，并将高效的基于光线的测量与快速的射频反射测量相结合，所提出的算法建立了一个性能、可靠性和效率的参考标准，可用于对比其他算法。

更新时间: 2024-07-29 14:47:46

领域: cond-mat.mes-hall,cs.ET,cs.LG,quant-ph

下载: http://arxiv.org/abs/2407.20061v1

RelBench: A Benchmark for Deep Learning on Relational Databases

We present RelBench, a public benchmark for solving predictive tasks over relational databases with graph neural networks. RelBench provides databases and tasks spanning diverse domains and scales, and is intended to be a foundational infrastructure for future research. We use RelBench to conduct the first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024), which combines graph neural network predictive models with (deep) tabular models that extract initial entity-level representations from raw tables. End-to-end learned RDL models fully exploit the predictive signal encoded in primary-foreign key links, marking a significant shift away from the dominant paradigm of manual feature engineering combined with tabular models. To thoroughly evaluate RDL against this prior gold-standard, we conduct an in-depth user study where an experienced data scientist manually engineers features for each task. In this study, RDL learns better models whilst reducing human work needed by more than an order of magnitude. This demonstrates the power of deep learning for solving predictive tasks over relational databases, opening up many new research opportunities enabled by RelBench.

Updated: 2024-07-29 14:46:13

标题: RelBench：关系数据库上深度学习的基准测试

摘要: 我们提出了RelBench，一个用图神经网络解决关系数据库中预测任务的公共基准。RelBench提供了跨越不同领域和规模的数据库和任务，并旨在成为未来研究的基础设施。我们使用RelBench进行了对关系深度学习（RDL）（Fey等人，2024年）的第一次全面研究，该研究将图神经网络预测模型与（深度）表格模型相结合，从原始表格中提取初始实体级表示。端到端学习的RDL模型充分利用了编码在主键-外键链接中的预测信号，标志着从手工特征工程结合表格模型的主导范式明显转变。为了彻底评估RDL与这一先前的金标准，我们进行了一项深入的用户研究，其中一位经验丰富的数据科学家为每个任务手动工程化特征。在这项研究中，RDL学到了更好的模型，同时将人工工作量减少了一个数量级以上。这证明了深度学习在解决关系数据库中的预测任务方面的能力，为通过RelBench实现的许多新研究机会打开了大门。

更新时间: 2024-07-29 14:46:13

领域: cs.LG,cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.20060v1

Shapley Value Computation in Ontology-Mediated Query Answering

The Shapley value, originally introduced in cooperative game theory for wealth distribution, has found use in KR and databases for the purpose of assigning scores to formulas and database tuples based upon their contribution to obtaining a query result or inconsistency. In the present paper, we explore the use of Shapley values in ontology-mediated query answering (OMQA) and present a detailed complexity analysis of Shapley value computation (SVC) in the OMQA setting. In particular, we establish a PF/#P-hard dichotomy for SVC for ontology-mediated queries (T,q) composed of an ontology T formulated in the description logic ELHI_\bot and a connected constant-free homomorphism-closed query q. We further show that the #P-hardness side of the dichotomy can be strengthened to cover possibly disconnected queries with constants. Our results exploit recently discovered connections between SVC and probabilistic query evaluation and allow us to generalize existing results on probabilistic OMQA.

Updated: 2024-07-29 14:45:14

标题: 本文的标题翻译为：本体引导查询答案中的夏普利价值计算

摘要: Shapley值最初在合作博弈理论中用于财富分配，现已在知识表示和数据库中找到应用，目的是基于它们对获取查询结果或不一致性的贡献来为公式和数据库元组分配分数。在本文中，我们探讨了Shapley值在本体中介查询回答（OMQA）中的应用，并对在OMQA设置中计算Shapley值的复杂性进行了详细分析。特别地，我们为由在描述逻辑ELHI_⊥中定义的本体T和一个连接的常数无关的同态闭查询q组成的本体中介查询（T，q）建立了Shapley值计算（SVC）的PF/#P-难分裂。我们进一步表明，这种分裂的#P-难性一面可以加强，以涵盖可能包含常数的断开查询。我们的结果利用了最近发现的SVC与概率查询评估之间的联系，并使我们能够推广现有的关于概率OMQA的结果。

更新时间: 2024-07-29 14:45:14

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.20058v1

Reconstructing Global Daily CO2 Emissions via Machine Learning

High temporal resolution CO2 emission data are crucial for understanding the drivers of emission changes, however, current emission dataset is only available on a yearly basis. Here, we extended a global daily CO2 emissions dataset backwards in time to 1970 using machine learning algorithm, which was trained to predict historical daily emissions on national scales based on relationships between daily emission variations and predictors established for the period since 2019. Variation in daily CO2 emissions far exceeded the smoothed seasonal variations. For example, the range of daily CO2 emissions equivalent to 31% of the year average daily emissions in China and 46% of that in India in 2022, respectively. We identified the critical emission-climate temperature (Tc) is 16.5 degree celsius for global average (18.7 degree celsius for China, 14.9 degree celsius for U.S., and 18.4 degree celsius for Japan), in which negative correlation observed between daily CO2 emission and ambient temperature below Tc and a positive correlation above it, demonstrating increased emissions associated with higher ambient temperature. The long-term time series spanning over fifty years of global daily CO2 emissions reveals an increasing trend in emissions due to extreme temperature events, driven by the rising frequency of these occurrences. This work suggests that, due to climate change, greater efforts may be needed to reduce CO2 emissions.

Updated: 2024-07-29 14:44:14

标题: 利用机器学习重建全球每日二氧化碳排放量

摘要: 高时空分辨率的二氧化碳排放数据对于理解排放变化的驱动因素至关重要，然而，目前的排放数据集仅以年度为基础。在这里，我们利用机器学习算法将全球每日二氧化碳排放数据向后延伸至1970年，该算法经过训练，根据自2019年以来建立的每日排放变化与预测因子之间的关系，来预测历史上的每日排放量。每日二氧化碳排放量的变化远远超过了平滑的季节性变化。例如，2022年中国的每日二氧化碳排放量相当于年均每日排放量的31%，印度则为46%。我们确定了全球平均关键的排放-气候温度（Tc）为16.5摄氏度（中国为18.7摄氏度，美国为14.9摄氏度，日本为18.4摄氏度），在该温度以下每日二氧化碳排放与环境温度呈负相关，而在该温度以上则呈正相关，表明与更高环境温度相关的排放增加。跨越五十多年的全球每日二氧化碳排放长期时间序列显示了排放量的增长趋势，这是由于极端温度事件的频率上升所驱动的。这项研究表明，由于气候变化，可能需要更大的努力来减少二氧化碳排放。

更新时间: 2024-07-29 14:44:14

领域: physics.ao-ph,cs.LG,stat.AP

下载: http://arxiv.org/abs/2407.20057v1

Multi-fidelity Gaussian process surrogate modeling for regression problems in physics

One of the main challenges in surrogate modeling is the limited availability of data due to resource constraints associated with computationally expensive simulations. Multi-fidelity methods provide a solution by chaining models in a hierarchy with increasing fidelity, associated with lower error, but increasing cost. In this paper, we compare different multi-fidelity methods employed in constructing Gaussian process surrogates for regression. Non-linear autoregressive methods in the existing literature are primarily confined to two-fidelity models, and we extend these methods to handle more than two levels of fidelity. Additionally, we propose enhancements for an existing method incorporating delay terms by introducing a structured kernel. We demonstrate the performance of these methods across various academic and real-world scenarios. Our findings reveal that multi-fidelity methods generally have a smaller prediction error for the same computational cost as compared to the single-fidelity method, although their effectiveness varies across different scenarios.

Updated: 2024-07-29 14:43:48

标题: 物理学中用于回归问题的多保真度高斯过程替代建模

摘要: 在代理建模中的一个主要挑战是由于与计算昂贵的模拟相关的资源约束而导致数据的有限可用性。多保真度方法通过在具有不断增加保真度的层次结构中链接模型来提供解决方案，低误差但成本增加。在本文中，我们比较了用于构建高斯过程代理的回归的不同多保真度方法。现有文献中的非线性自回归方法主要局限于两保真度模型，我们将这些方法扩展到处理两个以上的保真度等级。此外，我们提出了对现有方法进行增强，通过引入结构化核来包含延迟项。我们展示了这些方法在各种学术和现实场景中的性能。我们的研究结果表明，与单一保真度方法相比，多保真度方法通常在相同的计算成本下具有更小的预测误差，尽管它们的有效性在不同场景中有所不同。

更新时间: 2024-07-29 14:43:48

领域: stat.ML,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2404.11965v2

Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays

We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.

Updated: 2024-07-29 14:42:20

标题: 基于功绩的公平组合半强盗算法与无限制反馈延迟

摘要: 我们研究了在无限制反馈延迟下受梳理公平约束的随机组合半赌博问题。这受到了众包和在线广告等应用的启发，其中即时反馈并不立即可用，而不同选择（或手臂）之间的公平性至关重要。我们考虑两种类型的无限制反馈延迟：独立于奖励的延迟，其中反馈延迟与奖励无关，以及依赖于奖励的延迟，其中反馈延迟与奖励相关。此外，我们引入了基于梳理的公平约束，以确保对手臂的公平选择。我们定义了奖励遗憾和公平遗憾，并提出了新的赌博算法，根据其优点选择手臂在无限制反馈延迟下。我们证明我们的算法都实现了次线性的预期奖励遗憾和预期公平遗憾，这取决于延迟分布的分位数。我们还进行了大量使用合成和真实世界数据的实验，并展示我们的算法可以公平地选择具有不同反馈延迟的手臂。

更新时间: 2024-07-29 14:42:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.15439v3

Analyzing User Characteristics of Hate Speech Spreaders on Social Media

Hate speech on social media threatens the mental and physical well-being of individuals and contributes to real-world violence. Resharing is an important driver behind the spread of hate speech on social media. Yet, little is known about who reshares hate speech and what their characteristics are. In this paper, we analyze the role of user characteristics in hate speech resharing across different types of hate speech (e.g., political hate). For this, we proceed as follows: First, we cluster hate speech posts using large language models to identify different types of hate speech. Then we model the effects of user attributes on users' probability to reshare hate speech using an explainable machine learning model. To do so, we apply debiasing to control for selection bias in our observational social media data and further control for the latent vulnerability of users to hate speech. We find that, all else equal, users with fewer followers, fewer friends, fewer posts, and older accounts share more hate speech. This shows that users with little social influence tend to share more hate speech. Further, we find substantial heterogeneity across different types of hate speech. For example, racist and misogynistic hate is spread mostly by users with little social influence. In contrast, political anti-Trump and anti-right-wing hate is reshared by users with larger social influence. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.

Updated: 2024-07-29 14:40:55

标题: 分析社交媒体上仇恨言论传播者的用户特征

摘要: 社交媒体上的仇恨言论威胁着个人的心理和身体健康，并导致现实世界的暴力事件。转发是社交媒体上传播仇恨言论的重要驱动因素。然而，目前对于谁转发仇恨言论以及他们的特征了解甚少。本文分析了用户特征在不同类型仇恨言论（如政治仇恨）转发中的作用。为此，我们采取以下步骤：首先，利用大型语言模型对仇恨言论帖子进行聚类，以识别不同类型的仇恨言论。然后，我们使用可解释的机器学习模型来模拟用户属性对用户转发仇恨言论的概率的影响。为此，我们应用去偏倚来控制我们的观察性社交媒体数据中的选择偏差，并进一步控制用户对仇恨言论的潜在脆弱性。我们发现，在其他条件相等的情况下，拥有更少关注者、更少朋友、更少帖子和更老账户的用户分享更多仇恨言论。这表明社交影响力较小的用户更倾向于分享更多仇恨言论。此外，我们发现在不同类型的仇恨言论之间存在显著的异质性。例如，种族主义和厌女症的仇恨主要由社交影响力较小的用户传播。相反，政治反特朗普和反右翼的仇恨是由社交影响力较大的用户转发。总的来说，了解驱使用户分享仇恨言论的因素对于检测潜在从事有害行为的个人并设计有效的缓解策略至关重要。

更新时间: 2024-07-29 14:40:55

领域: cs.SI,cs.AI,cs.CY

下载: http://arxiv.org/abs/2310.15772v2

Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged as an appealing alternative to improve accuracy and reduce computational time. However, due to limited observational technology and high costs, the scarcity of real-world data restricts the potential of machine learning models. To overcome these limitations, we propose an ocean SWH estimation framework, namely Orca. Specifically, Orca enhances the limited spatio-temporal reasoning abilities of classic LLMs with a novel spatiotemporal aware encoding module. By segmenting the limited buoy observational data temporally, encoding the buoys' locations spatially, and designing prompt templates, Orca capitalizes on the robust generalization ability of LLMs to estimate significant wave height effectively with limited data. Experimental results on the Gulf of Mexico demonstrate that Orca achieves state-of-the-art performance in SWH estimation.

Updated: 2024-07-29 14:40:07

标题: 虎鲸：具有时空感知的大型语言模型对海洋有效波高的估计

摘要: 显著波高（SWH）是海洋科学中的一个重要指标，准确的SWH估计对各种应用至关重要，例如海洋能源开发、渔业、潜在风险的预警系统等。传统的基于数值模型和物理理论的SWH估计方法受到计算效率的限制。最近，机器学习已经成为提高准确性并减少计算时间的有吸引力的替代方法。然而，由于观测技术有限和成本高昂，现实世界数据的稀缺限制了机器学习模型的潜力。为了克服这些限制，我们提出了一种海洋SWH估计框架，名为Orca。具体而言，Orca通过一种新颖的时空感知编码模块增强了经典LLM的有限时空推理能力。通过在时间上对有限的浮标观测数据进行分割，空间上对浮标位置进行编码，并设计提示模板，Orca利用LLM的强大泛化能力有效地估计显著波高。墨西哥湾的实验结果表明，Orca在SWH估计方面取得了最先进的性能。

更新时间: 2024-07-29 14:40:07

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2407.20053v1

Leveraging Time-Series Foundation Models in Smart Agriculture for Soil Moisture Forecasting

The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a novel application of $\texttt{TimeGPT}$, a state-of-the-art (SOTA) time-series foundation model, to predict soil water potential ($\psi_\mathrm{soil}$), a key indicator of field water status that is typically used for irrigation advice. Traditionally, this task relies on a wide array of input variables. We explore $\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in: ($i$) a zero-shot setting, ($ii$) a fine-tuned setting relying solely on historic $\psi_\mathrm{soil}$ measurements, and ($iii$) a fine-tuned setting where we also add exogenous variables to the model. We compare $\texttt{TimeGPT}$'s performance to established SOTA baseline models for forecasting $\psi_\mathrm{soil}$. Our results demonstrate that $\texttt{TimeGPT}$ achieves competitive forecasting accuracy using only historical $\psi_\mathrm{soil}$ data, highlighting its remarkable potential for agricultural applications. This research paves the way for foundation time-series models for sustainable development in agriculture by enabling forecasting tasks that were traditionally reliant on extensive data collection and domain expertise.

Updated: 2024-07-29 14:36:16

标题: 利用时间序列基础模型在智能农业中进行土壤湿度预测

摘要: 最近，基于自然语言处理和计算机视觉的基础模型激发了各个领域的创新。受到这一进展的启发，我们探索了基础模型在智能农业时间序列预测中的潜力，这个领域通常受限于数据的有限性。具体来说，本文介绍了$\texttt{TimeGPT}$的一种新颖应用，这是一种最先进的时间序列基础模型，用于预测土壤水势（$\psi_\mathrm{soil}$），这是田间水分状态的关键指标，通常用于灌溉建议。传统上，这项任务依赖于各种输入变量。我们探讨了$\psi_\mathrm{soil}$在以下情况下预测$\psi_\mathrm{soil}$的能力：（i）零样本设置，（ii）仅依赖历史$\psi_\mathrm{soil}$测量的微调设置，以及（iii）在模型中还添加外生变量的微调设置。我们将$\texttt{TimeGPT}$的性能与已建立的最先进基线模型进行了比较，用于预测$\psi_\mathrm{soil}$。我们的结果表明，$\texttt{TimeGPT}$仅使用历史$\psi_\mathrm{soil}$数据就能达到竞争力的预测准确度，突显了它在农业应用中的巨大潜力。这项研究为农业可持续发展铺平了基础时间序列模型的道路，使得传统上依赖于大量数据收集和领域专业知识的预测任务成为可能。

更新时间: 2024-07-29 14:36:16

领域: cs.LG

下载: http://arxiv.org/abs/2405.18913v2

A Knowledge Enhanced Learning and Semantic Composition Model for Multi-Claim Fact Checking

To inhibit the spread of rumorous information and its severe consequences, traditional fact checking aims at retrieving relevant evidence to verify the veracity of a given claim. Fact checking methods typically use knowledge graphs (KGs) as external repositories and develop reasoning mechanism to retrieve evidence for verifying the triple claim. However, existing methods only focus on verifying a single claim. As real-world rumorous information is more complex and a textual statement is often composed of multiple clauses (i.e. represented as multiple claims instead of a single one), multiclaim fact checking is not only necessary but more important for practical applications. Although previous methods for verifying a single triple can be applied repeatedly to verify multiple triples one by one, they ignore the contextual information implied in a multi-claim statement and could not learn the rich semantic information in the statement as a whole. In this paper, we propose an end-to-end knowledge enhanced learning and verification method for multi-claim fact checking. Our method consists of two modules, KG-based learning enhancement and multi-claim semantic composition. To fully utilize the contextual information, the KG-based learning enhancement module learns the dynamic context-specific representations via selectively aggregating relevant attributes of entities. To capture the compositional semantics of multiple triples, the multi-claim semantic composition module constructs the graph structure to model claim-level interactions, and integrates global and salient local semantics with multi-head attention. Experimental results on a real-world dataset and two benchmark datasets show the effectiveness of our method for multi-claim fact checking over KG.

Updated: 2024-07-29 14:33:52

标题: 一个基于知识增强学习和语义组合的多主张事实检查模型

摘要: 为了抑制谣言信息的传播及其严重后果，传统事实核查旨在检索相关证据以验证给定主张的真实性。事实核查方法通常使用知识图谱（KG）作为外部存储库，并开发推理机制来检索证据以验证三元主张。然而，现有方法仅专注于验证单个主张。由于现实世界中的谣言信息更加复杂，文本陈述通常由多个子句组成（即表示为多个主张而不是单个），多主张事实核查不仅是必要的，而且对实际应用更为重要。虽然先前用于验证单个三元组的方法可以重复应用以逐个验证多个三元组，但它们忽略了多主张陈述中隐含的上下文信息，并且无法学习陈述作为整体的丰富语义信息。在本文中，我们提出了一种端到端的知识增强学习和验证方法，用于多主张事实核查。我们的方法包含两个模块，基于知识图谱的学习增强和多主张语义组合。为了充分利用上下文信息，基于知识图谱的学习增强模块通过有选择地汇聚实体的相关属性来学习动态的特定上下文表示。为了捕获多个三元组的组合语义，多主张语义组合模块构建图结构来模拟主张级别的互动，并使用多头注意力集成全局和突出的局部语义。在一个真实数据集和两个基准数据集上的实验结果显示了我们的方法在多主张事实核查上的有效性。

更新时间: 2024-07-29 14:33:52

领域: cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2104.13046v2

Dynamic Spiking Graph Neural Networks

The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named \underline{Dy}namic \underline{S}p\underline{i}king \underline{G}raph \underline{N}eural Networks (\method{}). To mitigate the information loss problem, \method{} propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, \method{} extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of \method{} on dynamic node classification tasks with lower computational costs.

Updated: 2024-07-29 14:33:02

标题: 动态脉冲图神经网络

摘要: 脉冲神经网络（SNNs）和图神经网络（GNNs）的集成逐渐引起关注，因为在处理由图表示的非欧几里德数据时具有低功耗和高效率。然而，作为一个普遍问题，动态图表示学习面临诸如高复杂性和大内存开销等挑战。目前的工作通常使用SNNs而不是循环神经网络（RNNs），通过使用二进制特征而不是连续特征进行高效训练，这会忽略图结构信息，并在传播过程中导致细节丢失。此外，优化动态脉冲模型通常需要在时间步之间传播信息，这会增加内存需求。为了解决这些挑战，我们提出了一个名为“动态脉冲图神经网络（\method{}）”的框架。为了缓解信息丢失问题，\method{}直接将早期层的信息传播到最后一层进行信息补偿。为了适应内存需求，我们在平衡状态上应用了隐式微分，这不依赖于正向计算的精确反向计算。虽然传统的隐式微分方法通常用于静态情况，但\method{}将其扩展到了动态图设置。对三个大规模真实世界动态图数据集进行的大量实验验证了\method{}在动态节点分类任务上的有效性，同时降低了计算成本。

更新时间: 2024-07-29 14:33:02

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.05373v2

Differentiable Gaussianization Layers for Inverse Problems Regularized by Deep Generative Models

Deep generative models such as GANs, normalizing flows, and diffusion models are powerful regularizers for inverse problems. They exhibit great potential for helping reduce ill-posedness and attain high-quality results. However, the latent tensors of such deep generative models can fall out of the desired high-dimensional standard Gaussian distribution during inversion, particularly in the presence of data noise and inaccurate forward models, leading to low-fidelity solutions. To address this issue, we propose to reparameterize and Gaussianize the latent tensors using novel differentiable data-dependent layers wherein custom operators are defined by solving optimization problems. These proposed layers constrain inverse problems to obtain high-fidelity in-distribution solutions. We validate our technique on three inversion tasks: compressive-sensing MRI, image deblurring, and eikonal tomography (a nonlinear PDE-constrained inverse problem) using two representative deep generative models: StyleGAN2 and Glow. Our approach achieves state-of-the-art performance in terms of accuracy and consistency.

Updated: 2024-07-29 14:31:47

标题: 深度生成模型正则化的反问题的可微高斯化层

摘要: 深度生成模型，如GANs、归一化流和扩散模型，是逆问题的强大正则化器。它们展示了帮助减少不适定性和获得高质量结果的巨大潜力。然而，这些深度生成模型的潜在张量在反演过程中可能会脱离所需的高维标准高斯分布，特别是在存在数据噪声和不准确的正向模型的情况下，导致低保真度解决方案。为了解决这个问题，我们提出重新参数化和高斯化潜在张量，使用新颖的可微数据相关层，在其中通过解决优化问题定义自定义运算符。这些提出的层约束逆问题以获得高保真度的分布内解决方案。我们在三个反演任务上验证了我们的技术：压缩感知MRI、图像去模糊和埃库诺尔层析成像（非线性PDE约束的逆问题），使用两个代表性的深度生成模型：StyleGAN2和Glow。我们的方法在准确性和一致性方面实现了最先进的性能。

更新时间: 2024-07-29 14:31:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2112.03860v5

Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals

Environmental, Social, and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This paper explores the application of established machine learning techniques for imputing missing data in a real-world ESG dataset, emphasizing the quantification of uncertainty through prediction intervals. By employing multiple imputation strategies, this study assesses the robustness of imputation methods and quantifies the uncertainty associated with missing data. The findings highlight the importance of probabilistic machine learning models in providing better understanding of ESG scores, thereby addressing the inherent risks of wrong ratings due to incomplete data. This approach improves imputation practices to enhance the reliability of ESG ratings.

Updated: 2024-07-29 14:31:44

标题: 去噪ESG：利用机器学习和预测区间量化缺失数据的数据不确定性

摘要: 环境、社会和治理（ESG）数据集经常存在重大数据缺失问题，导致ESG评级不一致，因为不同的填充方法。本文探讨了在现实世界的ESG数据集中应用建立的机器学习技术来填补缺失数据，强调通过预测区间来量化不确定性。通过采用多重填充策略，本研究评估了填充方法的稳健性，并量化了与缺失数据相关的不确定性。研究结果强调了概率机器学习模型在提供对ESG评分更好理解方面的重要性，从而解决由于数据不完整而导致错误评级的固有风险。这种方法改进了填充实践，增强了ESG评级的可靠性。

更新时间: 2024-07-29 14:31:44

领域: cs.LG

下载: http://arxiv.org/abs/2407.20047v1

Application of Unsupervised Artificial Neural Network (ANN) Self_Organizing Map (SOM) in Identifying Main Car Sales Factors

Factors which attract customers and persuade them to buy new car are various regarding different consumer tastes. There are some methods to extract pattern form mass data. In this case we firstly asked passenger car marketing experts to rank more important factors which affect customer decision making behavior using fuzzy Delphi technique, then we provided a sample set from questionnaires and tried to apply a useful artificial neural network method called self_organizing map SOM to find out which factors have more effect on Iranian customer's buying decision making. Fuzzy tools were applied to adjust the study to be more real. MATLAB software was used for developing and training network. Results report four factors are more important rather than the others. Results are rather different from marketing expert rankings. Such results would help manufacturers to focus on more important factors and increase company sales level.

Updated: 2024-07-29 14:24:16

标题: 无监督人工神经网络（ANN）自组织映射（SOM）在识别主要汽车销售因素中的应用

摘要: Factors that attract customers and persuade them to buy a new car vary depending on different consumer preferences. There are various methods to extract patterns from large amounts of data. In this study, passenger car marketing experts were asked to rank the most important factors affecting customer decision-making behavior using the fuzzy Delphi technique. A sample set was then provided from questionnaires, and an artificial neural network method called self-organizing map (SOM) was applied to determine which factors had a greater impact on Iranian customers' buying decisions. Fuzzy tools were used to make the study more realistic. MATLAB software was utilized for developing and training the network. The results indicated that four factors were more important than others, which differed from the rankings provided by marketing experts. These findings can help manufacturers focus on key factors and improve company sales levels.

更新时间: 2024-07-29 14:24:16

领域: cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2408.05110v1

Privacy-preserving data release leveraging optimal transport and particle gradient descent

We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.

Updated: 2024-07-29 14:12:50

标题: 隐私保护数据发布：利用最优输运和粒子梯度下降

摘要: 我们提出了一种新颖的方法，用于差分私密数据合成受保护的表格数据集，这是高度敏感领域（如医疗保健和政府）中的一个相关任务。当前最先进的方法主要使用基于边际的方法，其中数据集是从边际的私密估计生成的。在本文中，我们介绍了PrivPGD，一种基于边际的私密数据合成的新一代方法，利用了最优传输和粒子梯度下降的工具。我们的算法在大范围数据集上表现优于现有方法，同时具有高度可扩展性，并提供灵活性以纳入额外的领域特定约束。

更新时间: 2024-07-29 14:12:50

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2401.17823v3

Non-Clashing Teaching Maps for Balls in Graphs

Recently, Kirkpatrick et al. [ALT 2019] and Fallat et al. [JMLR 2023] introduced non-clashing teaching and showed it is the most efficient machine teaching model satisfying the Goldman-Mathias collusion-avoidance criterion. A teaching map $T$ for a concept class $\mathcal{C}$ assigns a (teaching) set $T(C)$ of examples to each concept $C \in \mathcal{C}$. A teaching map is non-clashing if no pair of concepts are consistent with the union of their teaching sets. The size of a non-clashing teaching map (NCTM) $T$ is the maximum size of a teaching set $T(C)$, $C \in \mathcal{C}$. The non-clashing teaching dimension NCTD$(\mathcal{C})$ of $\mathcal{C}$ is the minimum size of an NCTM for $\mathcal{C}$. NCTM$^+$ and NCTD$^+(\mathcal{C})$ are defined analogously, except the teacher may only use positive examples. We study NCTMs and NCTM$^+$s for the concept class $\mathcal{B}(G)$ consisting of all balls of a graph $G$. We show that the associated decision problem B-NCTD$^+$ for NCTD$^+$ is NP-complete in split, co-bipartite, and bipartite graphs. Surprisingly, we even prove that, unless the ETH fails, B-NCTD$^+$ does not admit an algorithm running in time $2^{2^{o(\text{vc})}}\cdot n^{O(1)}$, nor a kernelization algorithm outputting a kernel with $2^{o(\text{vc})}$ vertices, where vc is the vertex cover number of $G$. We complement these lower bounds with matching upper bounds. These are extremely rare results: it is only the second problem in NP to admit such a tight double-exponential lower bound parameterized by vc, and only one of very few problems to admit such an ETH-based conditional lower bound on the number of vertices in a kernel. For trees, interval graphs, cycles, and trees of cycles, we derive NCTM$^+$s or NCTMs for $\mathcal{B}(G)$ of size proportional to its VC-dimension, and for Gromov-hyperbolic graphs, we design an approximate NCTM$^+$ of size 2.

Updated: 2024-07-29 14:10:52

标题: 在图中球的非冲突教学地图

摘要: 最近，Kirkpatrick等人 [ALT 2019] 和Fallat等人 [JMLR 2023]引入了非冲突教学，并展示了它是满足Goldman-Mathias防止勾结准则的最有效的机器教学模型。一个概念类$\mathcal{C}$的教学映射$T$将一个（教学）示例集$T(C)$分配给每个概念$C \in \mathcal{C}$。如果没有一对概念与它们的教学集的并集一致，则教学映射是非冲突的。非冲突教学映射（NCTM）$T$的大小是教学集$T(C)$的最大大小，其中$C \in \mathcal{C}$。概念类$\mathcal{C}$的非冲突教学维度NCTD$(\mathcal{C})$是$\mathcal{C}$的NCTM的最小大小。NCTM$^+$和NCTD$^+(\mathcal{C})$的定义类似，只是教师只能使用正例。我们研究了由图$G$的所有球组成的概念类$\mathcal{B}(G)$的NCTMs和NCTM$^+$s。我们证明了与NCTD$^+$相关的决策问题B-NCTD$^+$在分裂、共二部和二部图中是NP完全的。令人惊讶的是，我们甚至证明了，除非ETH失败，B-NCTD$^+$没有算法在时间$2^{2^{o(\text{vc})}}\cdot n^{O(1)}$内运行，也没有核化算法输出一个具有$2^{o(\text{vc})}$个顶点的核。其中vc是图$G$的顶点覆盖数。我们用匹配的上界补充这些下界。这些结果非常罕见：它是NP中仅有的第二个问题，其参数化为vc的双指数下界非常紧密，而且仅有极少数问题具有这样基于ETH的条件下界，限制了核中顶点数量。对于树、区间图、循环和循环树，我们推导出与其VC维度成比例的$\mathcal{B}(G)$的NCTM$^+$s或NCTMs，对于Gromov-双曲图，我们设计了一个大小为2的近似NCTM$^+$。

更新时间: 2024-07-29 14:10:52

领域: cs.CC,cs.DM,cs.DS,cs.LG,math.CO

下载: http://arxiv.org/abs/2309.02876v2

Aircraft Trajectory Segmentation-based Contrastive Coding: A Framework for Self-supervised Trajectory Representation

Air traffic trajectory recognition has gained significant interest within the air traffic management community, particularly for fundamental tasks such as classification and clustering. This paper introduces Aircraft Trajectory Segmentation-based Contrastive Coding (ATSCC), a novel self-supervised time series representation learning framework designed to capture semantic information in air traffic trajectory data. The framework leverages the segmentable characteristic of trajectories and ensures consistency within the self-assigned segments. Intensive experiments were conducted on datasets from three different airports, totaling four datasets, comparing the learned representation's performance of downstream classification and clustering with other state-of-the-art representation learning techniques. The results show that ATSCC outperforms these methods by aligning with the labels defined by aeronautical procedures. ATSCC is adaptable to various airport configurations and scalable to incomplete trajectories. This research has expanded upon existing capabilities, achieving these improvements independently without predefined inputs such as airport configurations, maneuvering procedures, or labeled data.

Updated: 2024-07-29 14:04:46

标题: 飞机轨迹分割的对比编码：一种用于自监督轨迹表示的框架

摘要: 空中交通轨迹识别在空中交通管理社区中引起了极大的关注，特别是对于基本任务如分类和聚类。本文介绍了基于飞机轨迹分割的对比编码（ATSCC），这是一种新颖的自监督时间序列表示学习框架，旨在捕捉空中交通轨迹数据中的语义信息。该框架利用轨迹的可分割特性，并确保在自动分配的段内保持一致性。在来自三个不同机场的数据集上进行了大量实验，总共有四个数据集，将学习到的表示性能与其他最先进的表示学习技术进行了比较，用于下游分类和聚类。结果表明，ATSCC通过与航空程序定义的标签对齐，优于这些方法。ATSCC适用于各种机场配置，并可扩展到不完整的轨迹。这项研究扩展了现有的能力，独立实现了这些改进，而无需预定义的输入，如机场配置、操纵程序或标记数据。

更新时间: 2024-07-29 14:04:46

领域: cs.LG

下载: http://arxiv.org/abs/2407.20028v1

Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

Bandit algorithms hold great promise for improving personalized decision-making but are notoriously sample-hungry. In most health applications, it is infeasible to fit a new bandit for each patient, and observable variables are often insufficient to determine optimal treatments, ruling out applying contextual bandits learned from multiple patients. Latent bandits offer both rapid exploration and personalization beyond what context variables can reveal but require that a latent variable model can be learned consistently. In this work, we propose bandit algorithms based on nonlinear independent component analysis that can be provably identified from observational data to a degree sufficient to infer the optimal action in a new bandit instance consistently. We verify this strategy in simulated data, showing substantial improvement over learning independent multi-armed bandits for every instance.

Updated: 2024-07-29 14:04:20

标题: 可识别的潜在匪徒：结合观察数据和探索以个性化医疗

摘要: 赌博算法在改善个性化决策方面具有巨大潜力，但却以样本需求量大而著称。在大多数健康应用中，为每个患者拟合一个新的赌博算法是不可行的，观测变量通常不足以确定最佳治疗方案，因此无法应用从多个患者学习的情境赌博算法。潜在赌博算法在快速探索和个性化方面提供了超越情境变量所能揭示的内容，但需要能够一致地学习潜在变量模型。在这项工作中，我们提出了基于非线性独立成分分析的赌博算法，可以从观测数据中得到可证明的识别，以足以一致地推断新赌博实例中的最佳行动。我们在模拟数据中验证了这一策略，显示出相对于为每个实例学习独立多臂赌博算法的显著改进。

更新时间: 2024-07-29 14:04:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.16239v2

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process. However, validation data may not be available at distillation time either, making it infeasible to record the student snapshot that achieved the peak accuracy. Therefore, a practical data-free KD method should be robust and ideally provide monotonically increasing student accuracy during distillation. This is challenging because the student experiences knowledge degradation due to the distribution shift of the synthetic data. A straightforward approach to overcome this issue is to store and rehearse the generated samples periodically, which increases the memory footprint and creates privacy concerns. We propose to model the distribution of the previously observed synthetic samples with a generative network. In particular, we design a Variational Autoencoder (VAE) with a training objective that is customized to learn the synthetic data representations optimally. The student is rehearsed by the generative pseudo replay technique, with samples produced by the VAE. Hence knowledge degradation can be prevented without storing any samples. Experiments on image classification benchmarks show that our method optimizes the expected value of the distilled model accuracy while eliminating the large memory overhead incurred by the sample-storing methods.

Updated: 2024-07-29 13:57:56

标题: 生成式伪回放的稳健和资源高效的无数据知识蒸馏

摘要: 无数据知识蒸馏（KD）允许在没有原始训练数据的情况下，将训练过的神经网络（教师）的知识转移给更紧凑的神经网络（学生）。现有的研究使用验证集来监控学生在真实数据上的准确性，并报告整个过程中的最佳性能。然而，在蒸馏时可能无法获得验证数据，因此无法记录达到峰值准确性的学生快照。因此，一个实用的无数据知识蒸馏方法应该是稳健的，并且在蒸馏过程中最好能提供单调增加的学生准确性。这是具有挑战性的，因为学生经历了由于合成数据的分布偏移而导致的知识退化。克服这一问题的直接方法是定期存储和重复生成的样本，这会增加内存占用，并引发隐私问题。我们提出用生成网络对先前观察到的合成样本的分布进行建模。具体而言，我们设计了一个具有定制化训练目标的变分自动编码器（VAE），以最佳方式学习合成数据表示。学生通过生成伪重放技术进行复习，使用VAE生成的样本。因此，可以防止知识退化，而无需存储任何样本。在图像分类基准测试上的实验表明，我们的方法优化了蒸馏模型准确性的预期值，同时消除了通过存储样本方法带来的大量内存开销。

更新时间: 2024-07-29 13:57:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2201.03019v3

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers

Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we identify that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From the observation of aligned attention, we find that aligning attention maps of synthetic data helps to improve the overall performance of quantized ViTs. Motivated by this finding, we devise \aname, a novel DFQ method designed for ViTs that focuses on inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention responses in relation to spatial query patches. Then, we apply head-wise structural attention distillation to align the attention maps of the quantized network to those of the full-precision teacher. The experimental results show that the proposed method significantly outperforms baselines, setting a new state-of-the-art performance for data-free ViT quantization.

Updated: 2024-07-29 13:57:40

标题: MimiQ：视觉Transformer的低比特数据无损量化

摘要: 无数据量化（DFQ）是一种技术，它从完整精度的网络中创建一个轻量级网络，而无需原始训练数据，通常通过合成数据集实现。尽管已经提出了几种用于视觉转换器（ViT）架构的DFQ方法，但它们在低位设置下未能实现有效性。通过检查现有方法，我们发现它们的合成数据产生了不对齐的注意力图，而真实样本的注意力图高度对齐。通过观察对齐的注意力，我们发现对齐合成数据的注意力图有助于提高量化ViTs的整体性能。受到这一发现的启发，我们设计了一种新颖的DFQ方法\aname，专为ViTs而设计，重点关注头部注意力的相似性。首先，我们通过将头部关注响应与空间查询补丁对齐来生成合成数据。然后，我们应用头部结构关注蒸馏，将量化网络的注意力图与完整精度教师的注意力图进行对齐。实验结果表明，所提出的方法明显优于基线，为无数据ViT量化设定了新的技术水平。

更新时间: 2024-07-29 13:57:40

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.20021v1

ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning

Generative models, such as diffusion models (DMs), variational autoencoders (VAEs), and generative adversarial networks (GANs), produce images with a level of authenticity that makes them nearly indistinguishable from real photos and artwork. While this capability is beneficial for many industries, the difficulty of identifying synthetic images leaves online media platforms vulnerable to impersonation and misinformation attempts. To support the development of defensive methods, we introduce ImagiNet, a high-resolution and balanced dataset for synthetic image detection, designed to mitigate potential biases in existing resources. It contains 200K examples, spanning four content categories: photos, paintings, faces, and uncategorized. Synthetic images are produced with open-source and proprietary generators, whereas real counterparts of the same content type are collected from public datasets. The structure of ImagiNet allows for a two-track evaluation system: i) classification as real or synthetic and ii) identification of the generative model. To establish a baseline, we train a ResNet-50 model using a self-supervised contrastive objective (SelfCon) for each track. The model demonstrates state-of-the-art performance and high inference speed across established benchmarks, achieving an AUC of up to 0.99 and balanced accuracy ranging from 86% to 95%, even under social network conditions that involve compression and resizing. Our data and code are available at https://github.com/delyan-boychev/imaginet.

Updated: 2024-07-29 13:57:24

标题: ImagiNet: 一种用于通过对比学习进行通用合成图像检测的多内容数据集

摘要: 生成模型，如扩散模型（DMs）、变分自动编码器（VAEs）和生成对抗网络（GANs），能够生成具有接近真实照片和艺术作品水平真实性的图像。虽然这种能力对许多行业有益，但鉴别合成图像的困难使在线媒体平台容易受到冒名顶替和误导尝试的威胁。为支持防御方法的发展，我们引入了ImagiNet，一个用于合成图像检测的高分辨率和平衡数据集，旨在减轻现有资源中潜在的偏见。它包含20万个示例，涵盖四个内容类别：照片、绘画、面孔和未分类。合成图像是使用开源和专有生成器生成的，而相同内容类型的真实对应物则来自公共数据集。ImagiNet的结构允许使用两种评估系统：i）分类为真实或合成和ii）识别生成模型。为建立基线，我们使用自监督对比目标（SelfCon）为每个跟踪训练了一个ResNet-50模型。该模型在已建立的基准测试中表现出最先进的性能和高推理速度，实现了高达0.99的AUC和在86%至95%之间的平衡准确度，即使在涉及压缩和调整大小的社交网络条件下也是如此。我们的数据和代码可在https://github.com/delyan-boychev/imaginet上获得。

更新时间: 2024-07-29 13:57:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20020v1

Classification of freshwater snails of the genus \emph{Radomaniola} with multimodal triplet networks

In this paper, we present our first proposal of a machine learning system for the classification of freshwater snails of the genus \emph{Radomaniola}. We elaborate on the specific challenges encountered during system design, and how we tackled them; namely a small, very imbalanced dataset with a high number of classes and high visual similarity between classes. We then show how we employed triplet networks and the multiple input modalities of images, measurements, and genetic information to overcome these challenges and reach a performance comparable to that of a trained domain expert.

Updated: 2024-07-29 13:45:23

标题: 使用多模态三元组网络对属于\emph{Radomaniola}的淡水蜗牛进行分类

摘要: 在这篇论文中，我们提出了一个用于分类属于\emph{Radomaniola}属淡水蜗牛的机器学习系统的初步方案。我们详细阐述了系统设计过程中遇到的具体挑战，以及我们是如何解决这些挑战的；即一个小型、非常不平衡的数据集，包含大量类别，以及类别之间的视觉相似性很高。然后我们展示了如何利用三元组网络以及图像、测量和遗传信息等多种输入模态来克服这些挑战，并达到与经过培训的领域专家相媲美的性能水平。

更新时间: 2024-07-29 13:45:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.20013v1

Agent-OM: Leveraging LLM Agents for Ontology Matching

Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While large language models (LLMs) and LLM agents have revolutionised data engineering and have been applied creatively in many domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With consideration of several specific challenges in leveraging LLM agents for OM, we propose a generic framework, namely Agent-OM (w.r.t. Agent for Ontology Matching), consisting of two Siamese agents for retrieval and matching, with a set of simple OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve results very close to the long-standing best performance on simple OM tasks and can significantly improve the performance on complex and few-shot OM tasks.

Updated: 2024-07-29 13:40:11

标题: Agent-OM：利用LLM代理进行本体匹配

摘要: 本文介绍了一个基于大语言模型（LLM）的代理驱动的本体匹配系统设计范式，用于解决不同本体之间的语义互操作性，并通过对相关实体进行对齐来解决其概念异质性。当前OM系统有两种主流设计范式：传统基于知识的专家系统和较新的基于机器学习的预测系统。尽管大语言模型（LLMs）和LLM代理已经在许多领域创造性地应用于数据工程，并且已经彻底改变了数据工程，但它们在OM领域的潜力仍未充分挖掘。本研究引入了一个新颖的基于代理驱动的LLM设计范式用于OM系统。考虑到利用LLM代理进行OM面临的一些具体挑战，我们提出了一个通用框架，即Agent-OM（关于本体匹配的代理），包括两个Siamese代理用于检索和匹配，以及一组简单的OM工具。我们的框架在一个概念验证系统中实现。对三个本体对齐评估倡议（OAEI）赛道进行的评估显示，我们的系统在简单OM任务上可以实现非常接近长期最佳性能，并且可以显著提高复杂和少样本OM任务的性能。

更新时间: 2024-07-29 13:40:11

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2312.00326v3

On the Effects of Irrelevant Variables in Treatment Effect Estimation with Deep Disentanglement

Estimating treatment effects from observational data is paramount in healthcare, education, and economics, but current deep disentanglement-based methods to address selection bias are insufficiently handling irrelevant variables. We demonstrate in experiments that this leads to prediction errors. We disentangle pre-treatment variables with a deep embedding method and explicitly identify and represent irrelevant variables, additionally to instrumental, confounding and adjustment latent factors. To this end, we introduce a reconstruction objective and create an embedding space for irrelevant variables using an attached autoencoder. Instead of relying on serendipitous suppression of irrelevant variables as in previous deep disentanglement approaches, we explicitly force irrelevant variables into this embedding space and employ orthogonalization to prevent irrelevant information from leaking into the latent space representations of the other factors. Our experiments with synthetic and real-world benchmark datasets show that we can better identify irrelevant variables and more precisely predict treatment effects than previous methods, while prediction quality degrades less when additional irrelevant variables are introduced.

Updated: 2024-07-29 13:34:34

标题: 关于深度解交结在治疗效果估计中无关变量的影响

摘要: 从观察数据中估计治疗效果在医疗保健、教育和经济领域至关重要，但目前基于深度解缠的方法来解决选择偏倚不足以处理无关变量。我们在实验中展示了这导致了预测错误。我们使用深度嵌入方法解除预处理变量，并明确识别和表示无关变量，除了工具、混淆和调整潜在因素。为此，我们引入了一个重建目标，并使用附加的自动编码器为无关变量创建一个嵌入空间。与以往深度解缠方法中依赖偶然抑制无关变量不同，我们明确地将无关变量强制输入到这个嵌入空间中，并采用正交化方法防止无关信息渗入其他因素的潜在空间表示中。我们使用合成和真实世界基准数据集的实验表明，我们能够更好地识别无关变量，并比以前的方法更精确地预测治疗效果，当引入额外的无关变量时，预测质量下降的程度也较小。

更新时间: 2024-07-29 13:34:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.20003v1

Collision Probability Distribution Estimation via Temporal Difference Learning

We introduce CollisionPro, a pioneering framework designed to estimate cumulative collision probability distributions using temporal difference learning, specifically tailored to applications in robotics, with a particular emphasis on autonomous driving. This approach addresses the demand for explainable artificial intelligence (XAI) and seeks to overcome limitations imposed by model-based approaches and conservative constraints. We formulate our framework within the context of reinforcement learning to pave the way for safety-aware agents. Nevertheless, we assert that our approach could prove beneficial in various contexts, including a safety alert system or analytical purposes. A comprehensive examination of our framework is conducted using a realistic autonomous driving simulator, illustrating its high sample efficiency and reliable prediction capabilities for previously unseen collision events. The source code is publicly available.

Updated: 2024-07-29 13:32:42

标题: 通过时间差异学习估计碰撞概率分布

摘要: 我们介绍CollisionPro，这是一个开创性的框架，旨在使用时间差分学习来估计累积碰撞概率分布，特别适用于机器人应用，尤其是自动驾驶。这种方法解决了对可解释人工智能（XAI）的需求，并试图克服模型为基础的方法和保守约束所带来的局限性。我们在强化学习的背景下构建了我们的框架，为安全意识代理铺平道路。然而，我们断言我们的方法可能在各种情境中都有益，包括安全警报系统或分析目的。我们使用一个逼真的自动驾驶模拟器对我们的框架进行了全面检查，展示了其对以前未见碰撞事件的高样本效率和可靠的预测能力。源代码已公开可用。

更新时间: 2024-07-29 13:32:42

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.20000v1

Generalization Error Bounds for Learning under Censored Feedback

Generalization error bounds from learning theory provide statistical guarantees on how well an algorithm will perform on previously unseen data. In this paper, we characterize the impacts of data non-IIDness due to censored feedback (a.k.a. selective labeling bias) on such bounds. We first derive an extension of the well-known Dvoretzky-Kiefer-Wolfowitz (DKW) inequality, which characterizes the gap between empirical and theoretical CDFs given IID data, to problems with non-IID data due to censored feedback. We then use this CDF error bound to provide a bound on the generalization error guarantees of a classifier trained on such non-IID data. We show that existing generalization error bounds (which do not account for censored feedback) fail to correctly capture the model's generalization guarantees, verifying the need for our bounds. We further analyze the effectiveness of (pure and bounded) exploration techniques, proposed by recent literature as a way to alleviate censored feedback, on improving our error bounds. Together, our findings illustrate how a decision maker should account for the trade-off between strengthening the generalization guarantees of an algorithm and the costs incurred in data collection when future data availability is limited by censored feedback.

Updated: 2024-07-29 13:32:39

标题: 学习在有审查反馈下的泛化误差界限

摘要: 学习理论中的泛化误差界限提供了关于算法在以前未见数据上表现如何的统计保证。本文研究了由于被审查反馈（也称选择性标记偏差）导致的数据非独立同分布性对这些界限的影响。我们首先推导了著名的Dvoretzky-Kiefer-Wolfowitz（DKW）不等式的一个扩展，该不等式描述了由于被审查反馈导致的非独立同分布数据中经验和理论CDF之间的差距。然后，我们使用这个CDF误差界限来为在这种非独立同分布数据上训练的分类器的泛化误差保证提供一个界限。我们表明，现有的泛化误差界限（不考虑被审查反馈）无法正确捕捉模型的泛化保证，验证了我们界限的必要性。我们进一步分析了最近文献提出的（纯和有界的）探索技术的有效性，作为一种缓解被审查反馈的方法，以改善我们的错误界限。总之，我们的研究结果说明了决策者应该如何权衡在未来数据受到被审查反馈限制时，加强算法的泛化保证和数据收集所造成的成本之间的折衷。

更新时间: 2024-07-29 13:32:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.09247v2

Do LLMs Really Adapt to Domains? An Ontology Learning Perspective

Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL). However, it has not effectively been verified whether their success is due to their ability to reason over unstructured or semi-structured data, or their effective learning of linguistic patterns and senses alone. This unresolved question is particularly crucial when dealing with domain-specific data, where the lexical senses and their meaning can completely differ from what a LLM has learned during its training stage. This paper investigates the following question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning? To answer this question and, we devise a controlled experiment setup that uses WordNet to synthesize parallel corpora, with English and gibberish terms. We examine the differences in the outputs of LLMs for each corpus in two OL tasks: relation extraction and taxonomy discovery. Empirical results show that, while adapting to the gibberish corpora, off-the-shelf LLMs do not consistently reason over semantic relationships between concepts, and instead leverage senses and their frame. However, fine-tuning improves the performance of LLMs on lexical semantic tasks even when the domain-specific terms are arbitrary and unseen during pre-training, hinting at the applicability of pre-trained LLMs for OL.

Updated: 2024-07-29 13:29:43

标题: LLMs是否真的适应领域？一个本体学习的视角

摘要: 大型语言模型（LLMs）已在各种应用领域的各种自然语言处理任务中展示了前所未有的能力。最近的研究表明，LLMs可以用于执行词汇语义任务，如知识库补全（KBC）或本体学习（OL）。然而，尚未有效验证它们的成功是因为它们能够推理未结构化或半结构化数据，还是仅仅是因为它们有效地学习了语言模式和意义。在处理特定领域数据时，这个未解决的问题尤为关键，因为词汇意义及其含义可能与LLM在训练阶段学到的完全不同。本文探讨以下问题：LLMs是否真的适应领域并在提取结构化知识方面保持一致，还是仅仅学习了词汇意义而没有推理能力？为了回答这个问题，我们设计了一个受控实验设置，使用WordNet合成平行语料库，包含英语和胡言乱语术语。我们检查LLMs在两个OL任务中每个语料库的输出差异：关系提取和分类发现。实证结果显示，虽然适应于胡言乱语语料库，现成的LLMs并不一致地推理概念之间的语义关系，而是利用意义及其框架。然而，微调改善了LLMs在词汇语义任务上的性能，即使在预训练期间未见过任意领域特定术语，暗示了预训练的LLMs在OL中的适用性。

更新时间: 2024-07-29 13:29:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19998v1

Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Text-to-image generative models often present issues regarding fairness with respect to certain sensitive attributes, such as gender or skin tone. This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation" by Zhang et al. (2023a), which introduces a model to improve inclusiveness in these kinds of models. We show that most of the claims made by the authors about ITI-GEN hold: it improves the diversity and quality of generated images, it is scalable to different domains, it has plug-and-play capabilities, and it is efficient from a computational point of view. However, ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. In addition, when the number of considered attributes increases, the training time grows exponentially and ITI-GEN struggles to generate inclusive images for all elements in the joint distribution. To solve these issues, we propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search. Nonetheless, Hard Prompt Search (with or without negative prompting) cannot be used for continuous attributes that are hard to express in natural language, an area where ITI-GEN excels as it is guided by images during training. Finally, we propose combining ITI-GEN and Hard Prompt Search with negative prompting.

Updated: 2024-07-29 13:27:44

标题: 《"ITI-GEN：包容性文本到图像生成"的可重复性研究》

摘要: 文本到图像生成模型在某些敏感属性，如性别或肤色方面往往存在公平性问题。本研究旨在复现张等人（2023a）提出的“ITI-GEN：包容性文本到图像生成”中所呈现的结果，该研究介绍了一种改善这类模型包容性的模型。我们展示了作者关于ITI-GEN的大部分声明是成立的：它提高了生成图像的多样性和质量，适用于不同领域，具有即插即用的功能，并且在计算方面高效。然而，ITI-GEN有时会使用不希望的属性作为代理特征，且无法分离一些（相关）属性对，如性别和秃头。此外，当考虑的属性数量增加时，训练时间呈指数增长，ITI-GEN难以为联合分布中的所有元素生成具有包容性的图像。为解决这些问题，我们提出使用带有负提示的Hard Prompt Search，这是一种无需训练且更好处理否定的方法。然而，Hard Prompt Search（无论是否带有负提示）无法用于难以用自然语言表达的连续属性，而在这一领域，ITI-GEN表现出色，因为它在训练时受到图像的指导。最后，我们提出将ITI-GEN和带有负提示的Hard Prompt Search结合使用。

更新时间: 2024-07-29 13:27:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19996v1

A Study on the Implementation Method of an Agent-Based Advanced RAG System Using Graph

This study aims to improve knowledge-based question-answering (QA) systems by overcoming the limitations of existing Retrieval-Augmented Generation (RAG) models and implementing an advanced RAG system based on Graph technology to develop high-quality generative AI services. While existing RAG models demonstrate high accuracy and fluency by utilizing retrieved information, they may suffer from accuracy degradation as they generate responses using pre-loaded knowledge without reprocessing. Additionally, they cannot incorporate real-time data after the RAG configuration stage, leading to issues with contextual understanding and biased information. To address these limitations, this study implemented an enhanced RAG system utilizing Graph technology. This system is designed to efficiently search and utilize information. Specifically, it employs LangGraph to evaluate the reliability of retrieved information and synthesizes diverse data to generate more accurate and enhanced responses. Furthermore, the study provides a detailed explanation of the system's operation, key implementation steps, and examples through implementation code and validation results, thereby enhancing the understanding of advanced RAG technology. This approach offers practical guidelines for implementing advanced RAG systems in corporate services, making it a valuable resource for practical application.

Updated: 2024-07-29 13:26:43

标题: 使用图形的基于代理的先进RAG系统的实施方法研究

摘要: 这项研究旨在通过克服现有检索增强生成（RAG）模型的局限性，并基于图技术实施先进的RAG系统，以开发高质量的生成式人工智能服务，从而改进基于知识的问答（QA）系统。尽管现有的RAG模型通过利用检索到的信息展现出高准确性和流畅性，但由于它们在生成响应时使用预加载的知识而没有重新处理，可能会遭受准确性下降的问题。此外，它们无法在RAG配置阶段后融合实时数据，导致上下文理解和信息偏见方面存在问题。为了解决这些限制，本研究实施了一种利用图技术的增强型RAG系统。该系统旨在高效地搜索和利用信息。具体来说，它利用LangGraph评估检索到的信息的可靠性，并综合各种数据以生成更准确和增强的响应。此外，该研究通过实施代码和验证结果提供了系统运行、关键实施步骤和示例的详细解释，从而增进对先进RAG技术的理解。这种方法为在企业服务中实施先进RAG系统提供了实用指导，使其成为实际应用的宝贵资源。

更新时间: 2024-07-29 13:26:43

领域: cs.AI

下载: http://arxiv.org/abs/2407.19994v1

Introducing δ-XAI: a novel sensitivity-based method for local AI explanations

Explainable Artificial Intelligence (XAI) is central to the debate on integrating Artificial Intelligence (AI) and Machine Learning (ML) algorithms into clinical practice. High-performing AI/ML models, such as ensemble learners and deep neural networks, often lack interpretability, hampering clinicians' trust in their predictions. To address this, XAI techniques are being developed to describe AI/ML predictions in human-understandable terms. One promising direction is the adaptation of sensitivity analysis (SA) and global sensitivity analysis (GSA), which inherently rank model inputs by their impact on predictions. Here, we introduce a novel delta-XAI method that provides local explanations of ML model predictions by extending the delta index, a GSA metric. The delta-XAI index assesses the impact of each feature's value on the predicted output for individual instances in both regression and classification problems. We formalize the delta-XAI index and provide code for its implementation. The delta-XAI method was evaluated on simulated scenarios using linear regression models, with Shapley values serving as a benchmark. Results showed that the delta-XAI index is generally consistent with Shapley values, with notable discrepancies in models with highly impactful or extreme feature values. The delta-XAI index demonstrated higher sensitivity in detecting dominant features and handling extreme feature values. Qualitatively, the delta-XAI provides intuitive explanations by leveraging probability density functions, making feature rankings clearer and more explainable for practitioners. Overall, the delta-XAI method appears promising for robustly obtaining local explanations of ML model predictions. Further investigations in real-world clinical settings will be conducted to evaluate its impact on AI-assisted clinical workflows.

Updated: 2024-07-29 13:25:41

标题: 引入δ-XAI：一种基于敏感性的新型局部人工智能解释方法

摘要: 可解释的人工智能（XAI）是关于将人工智能（AI）和机器学习（ML）算法整合到临床实践中的辩论的核心。高性能的AI/ML模型，如集成学习器和深度神经网络，通常缺乏可解释性，使临床医生对它们的预测缺乏信任。为了解决这个问题，正在开发XAI技术来用人类可理解的术语描述AI/ML的预测。一个有前途的方向是适应敏感性分析（SA）和全局敏感性分析（GSA），这些方法本质上通过它们对预测的影响对模型输入进行排序。在这里，我们介绍了一种新颖的delta-XAI方法，通过扩展delta指数（一个GSA指标）来提供ML模型预测的局部解释。delta-XAI指数评估了每个特征值对回归和分类问题中个别实例的预测输出的影响。我们将delta-XAI指数形式化，并提供其实施代码。使用线性回归模型对模拟场景评估了delta-XAI方法，Shapley值作为基准。结果显示，delta-XAI指数通常与Shapley值一致，在具有高度影响或极端特征值的模型中存在明显的差异。delta-XAI指数在检测主导特征和处理极端特征值方面表现出更高的敏感性。从定性上看，通过利用概率密度函数，delta-XAI提供了直观的解释，使特征排名对从业者更清晰和更易理解。总的来说，delta-XAI方法似乎有望稳健地获得ML模型预测的局部解释。将在实际临床环境中进行进一步研究，以评估其对AI辅助临床工作流程的影响。

更新时间: 2024-07-29 13:25:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18343v2

Classification of Alzheimer's Dementia vs. Healthy subjects by studying structural disparities in fMRI Time-Series of DMN

Time series from different regions of interest (ROI) of default mode network (DMN) from Functional Magnetic Resonance Imaging (fMRI) can reveal significant differences between healthy and unhealthy people. Here, we propose the utility of an existing metric quantifying the lack/presence of structure in a signal called, "deviation from stochasticity" (DS) measure to characterize resting-state fMRI time series. The hypothesis is that differences in the level of structure in the time series can lead to discrimination between the subject groups. In this work, an autoencoder-based model is utilized to learn efficient representations of data by training the network to reconstruct its input data. The proposed methodology is applied on fMRI time series of 50 healthy individuals and 50 subjects with Alzheimer's Disease (AD), obtained from publicly available ADNI database. DS measure for healthy fMRI as expected turns out to be different compared to that of AD. Peak classification accuracy of 95% was obtained using Gradient Boosting classifier, using the DS measure applied on 100 subjects.

Updated: 2024-07-29 13:22:49

标题: 通过研究DMN在fMRI时间序列中的结构差异，对阿尔茨海默病痴呆和健康受试者进行分类

摘要: 来自不同感兴趣区域（ROI）的默认模式网络（DMN）的时间序列可以通过功能磁共振成像（fMRI）揭示健康和不健康人群之间的显著差异。在这里，我们提出利用一种现有的度量来量化信号中结构缺失/存在的程度，称为“偏离随机性”（DS）度量，来表征静息状态fMRI时间序列的效用。假设时间序列中结构水平的差异可以导致对受试群体进行区分。在这项工作中，利用自动编码器模型通过训练网络重构其输入数据来学习数据的高效表示。建议的方法应用于从公开可用的ADNI数据库获取的50名健康个体和50名患有阿尔茨海默病（AD）的受试者的fMRI时间序列。预期的健康fMRI的DS度量结果与AD的不同。使用Gradient Boosting分类器在100名受试者上应用DS度量获得了95%的峰值分类准确度。

更新时间: 2024-07-29 13:22:49

领域: cs.LG,eess.IV,q-bio.NC

下载: http://arxiv.org/abs/2407.19990v1

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baseline models, while reducing inference time compute by over two-fold. We validate our approach on standard image and video datasets - ImageNet-21K, Kinetics400, and Something-Something-v2. We further highlight MoNE$'$s adaptability by showcasing its ability to maintain strong performance across different inference-time compute budgets on videos, using only a single trained model.

Updated: 2024-07-29 13:19:31

标题: 混合嵌套专家：对视觉令牌的自适应处理

摘要: 视觉媒体（图片和视频）自然包含大量信息冗余，因此提供了在处理中利用效率的绝佳机会。虽然基于视觉变换器（ViT）的模型能够有效地扩展到大数据范围，但它们未能充分利用这种固有冗余，导致更高的计算成本。专家混合（MoE）网络展示了可伸缩性，同时保持相同的推断时间成本，但它们具有更大的参数印记。我们提出了嵌套专家混合（MoNE），它利用专家的嵌套结构，其中各个专家均处于逐渐增加的计算-准确性曲线上。在给定计算预算的情况下，MoNE学习动态选择优先顺序中的令牌，因此冗余令牌通过更便宜的嵌套专家进行处理。利用这一框架，我们实现了与基线模型相当的性能，同时将推断时间计算减少了两倍以上。我们在标准图像和视频数据集（ImageNet-21K，Kinetics400和Something-Something-v2）上验证了我们的方法。我们进一步突出MoNE的适应性，展示了它在视频上跨不同推断时间计算预算中保持强大性能的能力，仅使用一个训练模型。

更新时间: 2024-07-29 13:19:31

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.19985v1

On the Fly Detection of Root Causes from Observed Data with Application to IT Systems

This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.

Updated: 2024-07-29 13:13:30

标题: 实时检测IT系统根本原因的研究及应用

摘要: 这篇论文介绍了一种专门用于表示基于阈值的IT系统的新结构因果模型，并提出了一种旨在快速检测这类系统中异常根本原因的新算法。当根本原因不具有因果关系时，该方法被证明是正确的；同时，提出了基于代理干预以放宽这一假设的扩展。我们的算法及其基于代理的扩展利用离线数据中的因果发现，并在在线数据中遇到新异常时进行子图遍历。我们的广泛实验证明了我们的方法的卓越性能，即使应用于来自替代结构因果模型或真实IT监控数据生成的数据时也是如此。

更新时间: 2024-07-29 13:13:30

领域: cs.AI

下载: http://arxiv.org/abs/2402.06500v2

Private and Secure Fuzzy Name Matching

Modern financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision making, including identifying money laundering and fraud. However, data privacy regulations impose restrictions on data sharing. Privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to identify people that hold an account in each dataset. We observe that the names of account holders may be recorded differently in each data set. We introduce a novel privacy-preserving approach for fuzzy name matching across institutions, employing fully homomorphic encryption with locality-sensitive hashing. The efficiency of the approach is enhanced using a clustering mechanism. The practicality and effectiveness of the proposed approach are evaluated using different datasets. Experimental results demonstrate it takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively. Moreover, the proposed approach exhibits significant improvement in reducing communication overhead by 30-300 times, using clustering.

Updated: 2024-07-29 13:11:53

标题: 私密且安全的模糊名称匹配

摘要: 现代金融机构依赖数据进行许多操作，包括提高效率、增强服务和预防金融犯罪。组织内部或机构之间的数据共享可以促进快速、基于证据的决策制定，包括识别洗钱和欺诈行为。然而，数据隐私法规对数据共享施加了限制。隐私增强技术正在越来越多地被使用，以允许组织在确保合规性的同时获得共享智能。本文研究了一种情况，即监管限制意味着一方无法与另一方（内部或外部）共享感兴趣的账户数据，以识别每个数据集中持有账户的人员。我们观察到，每个数据集中账户持有人的姓名可能记录不同。我们介绍了一种新颖的用于跨机构模糊姓名匹配的隐私保护方法，采用全同态加密和局部敏感哈希。通过使用聚类机制增强了该方法的效率。提出的方法的实用性和有效性通过使用不同数据集进行评估。实验结果表明，分别从10k和100k个姓名中搜索1000个姓名需要大约100和1000秒。此外，提出的方法通过使用聚类在减少通信开销方面表现出显著的改进，可减少30-300倍。

更新时间: 2024-07-29 13:11:53

领域: cs.CR

下载: http://arxiv.org/abs/2407.19979v1

Generalized Groves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance

While machine learning methods have significantly improved model performance over traditional methods, their black-box structure makes it difficult for researchers to interpret results. For highly regulated financial industries, model transparency is equally important to accuracy. Without understanding how models work, even highly accurate machine learning methods are unlikely to be accepted. We address this issue by introducing a novel class of transparent machine learning models known as generalized groves of neural additive models. The generalized groves of neural additive models separate features into three categories: linear features, individual nonlinear features, and interacted nonlinear features. Additionally, interactions in the last category are only local. A stepwise selection algorithm distinguishes the linear and nonlinear components, and interacted groups are carefully verified by applying additive separation criteria. Through some empirical examples in finance, we demonstrate that generalized grove of neural additive models exhibit high accuracy and transparency with predominantly linear terms and only sparse nonlinear ones.

Updated: 2024-07-29 13:04:16

标题: 神经加法模型的广义格罗夫斯：追求金融领域透明且准确的机器学习模型

摘要: 尽管机器学习方法在模型性能方面显著优于传统方法，但其黑匣子结构使研究人员难以解释结果。对于高度受监管的金融行业，模型透明度与准确性同样重要。如果不了解模型的工作原理，即使是高度准确的机器学习方法也不太可能被接受。我们通过引入一种称为广义神经附加模型的透明机器学习模型类来解决这个问题。广义神经附加模型将特征分为三类：线性特征、单个非线性特征和交互非线性特征。此外，最后一类中的交互作用仅是局部的。逐步选择算法区分线性和非线性组件，并通过应用附加分离标准仔细验证交互组。通过金融领域的一些实证示例，我们展示了广义神经附加模型具有高准确性和透明度，主要是线性项和稀疏非线性项。

更新时间: 2024-07-29 13:04:16

领域: cs.LG,cs.AI,q-fin.CP

下载: http://arxiv.org/abs/2209.10082v2

Simply Trainable Nearest Neighbour Machine Translation with GPU Inference

Nearest neighbor machine translation is a successful approach for fast domain adaption, which interpolates the pre-trained transformers with domain-specific token-level k-nearest-neighbor (kNN) retrieval without retraining. Despite kNN MT's success, searching large reference corpus and fixed interpolation between the kNN and pre-trained model led to computational complexity and translation quality challenges. Among other papers, Dai et al. proposed methods to obtain a small number of reference samples dynamically for which they introduced a distance-aware interpolation method using an equation that includes free parameters. This paper proposes a simply trainable nearest neighbor machine translation and carry out inference experiments on GPU. Similar to Dai et al., we first adaptively construct a small datastore for each input sentence. Second, we train a single-layer network for the interpolation coefficient between the knnMT and pre-trained result to automatically interpolate in different domains. Experimental results on different domains show that our proposed method either improves or sometimes maintain the translation quality of methods in Dai et al. while being automatic. In addition, our GPU inference results demonstrate that knnMT can be integrated into GPUs with a drop of only 5% in terms of speed.

Updated: 2024-07-29 12:55:40

标题: 使用GPU推断的简单可训练最近邻机器翻译

摘要: 最近邻机器翻译是一种成功的快速领域适应方法，它通过在不重新训练的情况下将预训练的transformers与特定领域的令牌级k最近邻（kNN）检索进行插值。尽管kNN MT取得了成功，但在大型参考语料库中搜索和kNN与预训练模型之间的固定插值导致了计算复杂性和翻译质量挑战。戴等人在其他论文中提出了一种方法，动态获取少量参考样本，并引入了一个包含自由参数的距离感知插值方法的方程。本文提出了一种简单可训练的最近邻机器翻译，并在GPU上进行推断实验。与戴等人类似，我们首先为每个输入句子自适应构建一个小型数据存储。其次，我们训练一个单层网络，用于knnMT和预训练结果之间的插值系数，以在不同领域自动插值。在不同领域的实验结果表明，我们提出的方法要么提高了翻译质量，要么有时保持了戴等人方法的翻译质量，同时实现了自动化。此外，我们的GPU推断结果表明，knnMT可以集成到GPU中，速度仅下降了5%。

更新时间: 2024-07-29 12:55:40

领域: cs.AI

下载: http://arxiv.org/abs/2407.19965v1

Quasi-Framelets: Robust Graph Neural Networks via Adaptive Framelet Convolution

This paper aims to provide a novel design of a multiscale framelet convolution for spectral graph neural networks (GNNs). While current spectral methods excel in various graph learning tasks, they often lack the flexibility to adapt to noisy, incomplete, or perturbed graph signals, making them fragile in such conditions. Our newly proposed framelet convolution addresses these limitations by decomposing graph data into low-pass and high-pass spectra through a finely-tuned multiscale approach. Our approach directly designs filtering functions within the spectral domain, allowing for precise control over the spectral components. The proposed design excels in filtering out unwanted spectral information and significantly reduces the adverse effects of noisy graph signals. Our approach not only enhances the robustness of GNNs but also preserves crucial graph features and structures. Through extensive experiments on diverse, real-world graph datasets, we demonstrate that our framelet convolution achieves superior performance in node classification tasks. It exhibits remarkable resilience to noisy data and adversarial attacks, highlighting its potential as a robust solution for real-world graph applications. This advancement opens new avenues for more adaptive and reliable spectral GNN architectures.

Updated: 2024-07-29 12:54:16

标题: 准框架：通过自适应框架卷积实现鲁棒的图神经网络

摘要: 本文旨在提供一种新颖的多尺度框架卷积设计，用于光谱图神经网络（GNNs）。尽管当前的光谱方法在各种图学习任务中表现出色，但它们通常缺乏适应嘈杂、不完整或受干扰的图信号的灵活性，使它们在这些条件下变得脆弱。我们新提出的框架卷积通过精心调整的多尺度方法将图数据分解为低通和高通频谱，从而解决了这些限制。我们的方法直接在频谱域内设计滤波函数，允许对频谱分量进行精确控制。所提出的设计在过滤掉不需要的频谱信息方面表现出色，并显著减少嘈杂图信号的不利影响。我们的方法不仅增强了GNNs的鲁棒性，还保留了关键的图特征和结构。通过在多样化的真实世界图数据集上进行大量实验，我们证明了我们的框架卷积在节点分类任务中实现了优越的性能。它表现出对嘈杂数据和对抗性攻击的显著韧性，突显了其作为真实世界图应用的稳健解决方案的潜力。这一进展为更具适应性和可靠性的光谱GNN架构开辟了新的途径。

更新时间: 2024-07-29 12:54:16

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2201.04728v2

Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not applicable when server is not fully trusted (our setting). Furthermore, there is often heterogeneity in batch and/or dataset size of clients, which as shown, results in extra variation in the DP noise level across clients model updates. With these sources of heterogeneity, straightforward aggregation strategies, e.g., assigning clients aggregation weights proportional to their privacy parameters will lead to lower utility. We propose Robust-HDP, which efficiently estimates the true noise level in clients model updates and reduces the noise-level in the aggregated model updates considerably. Robust-HDP improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameter to server. Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here.

Updated: 2024-07-29 12:53:53

标题: 噪声感知算法用于异构差分隐私联邦学习

摘要: 高效性和严格的数据隐私是联邦学习（FL）系统的主要目标，该系统从分布在一些客户端之间的数据中学习模型。后者已经尝试通过在FL中使用差分隐私（DPFL）来实现。客户端的隐私要求通常存在异质性，而现有的DPFL工作要么假设客户端具有统一的隐私要求，要么在服务器不完全可信的情况下不适用（我们的设置）。此外，客户端的批处理和/或数据集大小通常存在异质性，正如所示，这导致了在客户端模型更新中DP噪声水平的额外变化。在这些异质性来源下，直接的聚合策略，例如根据其隐私参数为客户端分配聚合权重，将导致降低效用。我们提出了Robust-HDP，该方法能够有效估计客户端模型更新中的真实噪声水平，并显著降低聚合模型更新中的噪声水平。Robust-HDP提高了效用和收敛速度，同时对可能恶意向服务器发送虚假隐私参数的客户端是安全的。对多个数据集的广泛实验结果和我们的理论分析证实了Robust-HDP的有效性。我们的代码可以在此处找到。

更新时间: 2024-07-29 12:53:53

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.03519v2

Prichain II: CloudGuardian Cloud Security Proposal with Blockchain

With the advancement of cloud computing, data storage, and security have become crucial. The growing adoption of cloud services by companies, accompanied by increased threats from cybersecurity, highlights the importance of privacy and ownership of user data. Between 2022 and 2023, there has been an increase of around 48% in cloud security threats, emphasizing the urgent need for strong security solutions. To face these challenges, in this project, we propose integrating the Ethereum network's blockchain technology with a database located in the PostgreSQL cloud. The proposed solution aims to provide bidirectional data synchronization and strict control of access mechanisms. Blockchain technology ensures immutability and transparency of transactions, while PostgreSQL provides efficient and scalable storage. Through rigorous testing in an adaptive traffic control scenario, the results obtained indicate that this solution offers a significantly high level of security due to the decentralization of data, confirming that this solution is effective, and making it a powerful new option to improve security in cloud environments. In conclusion, the solution proposed in this project not only increases information security but also demonstrates the practical feasibility of integrating blockchain with cloud relational databases. This two-way alignment improves protection against cyberattacks and ensures that user data is protected from unauthorized access and malicious changes.

Updated: 2024-07-29 12:52:27

标题: Prichain II: 使用区块链的CloudGuardian云安全提案

摘要: 随着云计算的发展，数据存储和安全变得至关重要。伴随着公司对云服务的日益采用，以及网络安全威胁的增加，突出了用户数据隐私和所有权的重要性。在2022年至2023年间，云安全威胁增长约48%，强调了强大安全解决方案的紧迫性。为了应对这些挑战，在本项目中，我们提出将以太坊网络的区块链技术与位于PostgreSQL云中的数据库进行集成。所提出的解决方案旨在提供双向数据同步和严格的访问机制控制。区块链技术确保交易的不可变性和透明性，而PostgreSQL提供高效且可扩展的存储。通过在自适应交通控制场景中的严格测试，所得到的结果表明，这个解决方案由于数据的分散化提供了显著高水平的安全性，证实了这个解决方案的有效性，并使其成为改善云环境安全性的强大新选择。总之，本项目提出的解决方案不仅提高了信息安全性，还展示了将区块链与云关系数据库集成的实际可行性。这种双向对齐提高了对网络攻击的保护，并确保用户数据免受未经授权的访问和恶意更改。

更新时间: 2024-07-29 12:52:27

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.19961v1

Integrated Communications and Security: RIS-Assisted Simultaneous Transmission and Generation of Secret Keys

We develop a new integrated communications and security (ICAS) design paradigm by leveraging the concept of reconfigurable intelligent surfaces (RISs). In particular, we propose RIS-assisted simultaneous transmission and secret key generation by sharing the RIS for these two tasks. Specifically, the legitimate transceivers intend to jointly optimize the data transmission rate and the key generation rate by configuring the phase-shift of the RIS in the presence of a smart attacker. We first derive the key generation rate of the RIS-assisted physical layer key generation (PLKG). Then, to obtain the optimal RIS configuration, we formulate the problem as a secure transmission (ST) game and prove the existence of the Nash equilibrium (NE), and then derive the NE point of the static game. For the dynamic ST game, we model the problem as a finite Markov decision process and propose a model-free reinforcement learning approach to obtain the NE point. Particularly, considering that the legitimate transceivers cannot obtain the channel state information (CSI) of the attacker in real-world conditions, we develop a deep recurrent Q-network (DRQN) based dynamic ST strategy to learn the optimal RIS configuration. The details of the algorithm are provided, and then, the system complexity is analyzed. Our simulation results show that the proposed DRQN based dynamic ST strategy has a better performance than the benchmarks even with a partial observation information, and achieves "one time pad" communication by allocating a suitable weight factor for data transmission and PLKG.

Updated: 2024-07-29 12:51:26

标题: 集成通信与安全：RIS辅助的密钥同时传输和生成

摘要: 我们通过利用可重构智能表面（RIS）的概念，开发了一种新的集成通信和安全（ICAS）设计范式。具体而言，我们提出了利用RIS辅助的同时传输和秘钥生成，通过共享RIS来完成这两项任务。具体来说，合法的收发器打算在存在智能攻击者的情况下，通过配置RIS的相位移动来共同优化数据传输速率和秘钥生成速率。我们首先推导了RIS辅助物理层秘钥生成（PLKG）的秘钥生成速率。然后，为了获得最佳的RIS配置，我们将问题形式化为一个安全传输（ST）博弈，并证明了纳什均衡（NE）的存在，并推导了静态博弈的NE点。对于动态ST博弈，我们将问题建模为有限马尔可夫决策过程，并提出了一种无模型强化学习方法来获得NE点。特别地，考虑到合法的收发器在现实条件下无法获得攻击者的信道状态信息（CSI），我们开发了基于深度循环Q网络（DRQN）的动态ST策略来学习最佳的RIS配置。算法的细节被提供，并且系统复杂性进行了分析。我们的仿真结果显示，提出的基于DRQN的动态ST策略比基准具有更好的性能，即使只有部分观察信息，也能通过为数据传输和PLKG分配适当的权重因子，实现“一次性密码”通信。

更新时间: 2024-07-29 12:51:26

领域: cs.CR

下载: http://arxiv.org/abs/2407.19960v1

Deep-ELA: Deep Exploratory Landscape Analysis with Self-Supervised Pretrained Transformers for Single- and Multi-Objective Continuous Optimization Problems

In many recent works, the potential of Exploratory Landscape Analysis (ELA) features to numerically characterize, in particular, single-objective continuous optimization problems has been demonstrated. These numerical features provide the input for all kinds of machine learning tasks on continuous optimization problems, ranging, i.a., from High-level Property Prediction to Automated Algorithm Selection and Automated Algorithm Configuration. Without ELA features, analyzing and understanding the characteristics of single-objective continuous optimization problems is -- to the best of our knowledge -- very limited. Yet, despite their usefulness, as demonstrated in several past works, ELA features suffer from several drawbacks. These include, in particular, (1.) a strong correlation between multiple features, as well as (2.) its very limited applicability to multi-objective continuous optimization problems. As a remedy, recent works proposed deep learning-based approaches as alternatives to ELA. In these works, e.g., point-cloud transformers were used to characterize an optimization problem's fitness landscape. However, these approaches require a large amount of labeled training data. Within this work, we propose a hybrid approach, Deep-ELA, which combines (the benefits of) deep learning and ELA features. Specifically, we pre-trained four transformers on millions of randomly generated optimization problems to learn deep representations of the landscapes of continuous single- and multi-objective optimization problems. Our proposed framework can either be used out-of-the-box for analyzing single- and multi-objective continuous optimization problems, or subsequently fine-tuned to various tasks focussing on algorithm behavior and problem understanding.

Updated: 2024-07-29 12:45:40

标题: Deep-ELA：使用自监督预训练的Transformer进行单目标和多目标连续优化问题的深度探索性景观分析

摘要: 在许多最近的研究中，探索性景观分析（ELA）特征在特别是单目标连续优化问题的数字表征方面的潜力已经得到证明。这些数字特征为连续优化问题的各种机器学习任务提供输入，包括从高级属性预测到自动算法选择和自动算法配置等。没有ELA特征，分析和理解单目标连续优化问题的特征是非常有限的。然而，尽管在过去的几项研究中证明了它们的有用性，ELA特征仍然存在一些缺点。其中包括多个特征之间的强相关性，以及其对多目标连续优化问题的适用性非常有限。作为补救措施，最近的研究提出了基于深度学习的方法作为ELA的替代方案。在这些研究中，例如使用点云变换器来表征优化问题的适应性景观。然而，这些方法需要大量有标记的训练数据。在本研究中，我们提出了一种混合方法Deep-ELA，结合了深度学习和ELA特征的优势。具体来说，我们预先在数百万个随机生成的优化问题上对四个变换器进行预训练，以学习连续单目标和多目标优化问题的景观的深度表征。我们提出的框架可以直接用于分析单目标和多目标连续优化问题，或者随后对算法行为和问题理解等各种任务进行微调。

更新时间: 2024-07-29 12:45:40

领域: cs.LG

下载: http://arxiv.org/abs/2401.01192v2

Can I trust my anomaly detection system? A case study based on explainable AI

Generative models based on variational autoencoders are a popular technique for detecting anomalies in images in a semi-supervised context. A common approach employs the anomaly score to detect the presence of anomalies, and it is known to reach high level of accuracy on benchmark datasets. However, since anomaly scores are computed from reconstruction disparities, they often obscure the detection of various spurious features, raising concerns regarding their actual efficacy. This case study explores the robustness of an anomaly detection system based on variational autoencoder generative models through the use of eXplainable AI methods. The goal is to get a different perspective on the real performances of anomaly detectors that use reconstruction differences. In our case study we discovered that, in many cases, samples are detected as anomalous for the wrong or misleading factors.

Updated: 2024-07-29 12:39:07

标题: 我可以相信我的异常检测系统吗？基于可解释人工智能的案例研究

摘要: 基于变分自动编码器的生成模型是在半监督环境中检测图像异常的一种流行技术。一种常见的方法是使用异常分数来检测异常的存在，并且已知在基准数据集上能够达到高水平的准确性。然而，由于异常分数是从重建差异计算得出的，它们经常会掩盖各种虚假特征的检测，引发对其实际功效的担忧。本案例研究通过可解释人工智能方法探讨了基于变分自动编码器生成模型的异常检测系统的稳健性。目标是从不同的角度获取使用重建差异的异常检测器的真实性能。在我们的案例研究中，我们发现在许多情况下，样本被检测为异常是因为错误或误导性因素。

更新时间: 2024-07-29 12:39:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19951v1

Inference acceleration for large language models using "stairs" assisted greedy generation

Large Language Models (LLMs) with billions of parameters are known for their impressive predicting capabilities but require lots of resources to run. With their massive rise in popularity, even a small reduction in required resources could have an impact on environment. On the other hand, smaller models require fewer resources but may sacrifice accuracy. In this work, we are proposing an implementation of ``stairs'' assisted greedy generation. It is a modified assisted generation methodology that makes use of a smaller model's fast generation, large model's batch prediction, and "stairs" validation in order to achieve a speed up in prediction generation. Results show between 9.58 and 17.24 percent inference time reduction compared to a stand-alone large LLM prediction in a text generation task without a loss in accuracy.

Updated: 2024-07-29 12:29:29

标题: 大规模语言模型的推理加速：使用“楼梯”辅助贪婪生成

摘要: 拥有数十亿参数的大型语言模型(LLMs)以其出色的预测能力而闻名，但运行这些模型需要大量资源。随着它们在流行度上的迅速增长，即使对所需资源进行小幅减少也可能对环境产生影响。另一方面，较小的模型需要更少的资源，但可能会牺牲准确性。在这项工作中，我们提出了一种“楼梯”辅助贪婪生成的实现方法。这是一种修改过的辅助生成方法，利用较小模型的快速生成、大型模型的批量预测和“楼梯”验证，以加快预测生成速度。结果显示，在文本生成任务中，与独立的大型LLM预测相比，推断时间减少了9.58至17.24％，而准确性没有损失。

更新时间: 2024-07-29 12:29:29

领域: cs.CL,cs.LG,68T07, 68T50, 68T05,,I.2.6; I.2.7

下载: http://arxiv.org/abs/2407.19947v1

Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.

Updated: 2024-07-29 12:24:28

标题: 通过多跳特征质量估计实现抗噪声的无监督图表示学习

摘要: 基于图神经网络（GNN）的无监督图表示学习（UGRL）因其在处理图结构数据方面的高效性而受到越来越多的关注。然而，现有的UGRL方法理想地假设节点特征是无噪声的，这使它们在应用于具有嘈杂特征的真实数据时无法区分有用信息和噪声，从而影响了学习表示的质量。这促使我们在现实世界的UGRL中考虑节点嘈杂特征。通过实证分析，我们揭示了特征传播，在GNN中的基本操作，作为处理嘈杂特征的“双刃剑” - 它既可以去噪又可以扩散噪声，导致节点之间甚至同一节点在不同跳数时的特征质量变化。基于这一洞察，我们提出了一种基于多跳特征质量估计（简称MQE）的新型UGRL方法。与大多数直接利用传播型GNN生成表示的UGRL模型不同，我们的方法旨在通过估计不同跳数传播特征的质量来学习表示。具体而言，我们引入了一个高斯模型，利用可学习的“元表示”作为条件，通过神经网络估计多跳传播特征的期望值和方差。通过这种方式，“元表示”捕捉了多个传播特征底层的语义和结构信息，但自然不太容易受到噪声的干扰，因此可作为有益于下游任务的高质量节点表示。对多个真实数据集的广泛实验表明，在具有不同类型特征噪声的场景中，MQE能够学习可靠的节点表示。

更新时间: 2024-07-29 12:24:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19944v1

Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Counterfactual learning to rank (CLTR ) can be risky; various circumstances can cause it to produce sub-optimal models that hurt performance when deployed. Safe CLTR was introduced to mitigate these risks when using inverse propensity scoring to correct for position bias. However, the existing safety measure for CLTR is not applicable to state-of-the-art CLTR, it cannot handle trust bias, and its guarantees rely on specific assumptions about user behavior. Our contributions are two-fold. First, we generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust (DR) CLTR and trust bias. Second, we propose a novel approach, proximal ranking policy optimization (PRPO ), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Thereby, PRPO imposes a limit on how much learned models can degrade performance metrics, without relying on any specific user assumptions. Our experiments show that both our novel safe doubly robust method and PRPO provide higher performance than the existing safe inverse propensity scoring approach. However, when circumstances are unexpected, the safe doubly robust approach can become unsafe and bring detrimental performance. In contrast, PRPO always maintains safety, even in maximally adversarial situations. By avoiding assumptions, PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.

Updated: 2024-07-29 12:23:59

标题: 高级反事实学习排名的实用和稳健安全保证

摘要: 反事实学习排序（CLTR）可能存在风险；各种情况可能导致其产生次优模型，从而在部署时影响性能。安全CLTR被引入以减轻在使用逆向倾向评分纠正位置偏见时的风险。然而，现有的CLTR安全度量不适用于最先进的CLTR，无法处理信任偏见，并且其保证依赖于关于用户行为的特定假设。我们的贡献有两个方面。首先，我们将现有的安全CLTR方法推广，使其适用于最先进的双重稳健（DR）CLTR和信任偏见。其次，我们提出了一种新颖的方法，近端排序策略优化（PRPO），在部署中提供安全性，而无需对用户行为做出假设。PRPO消除了学习排名行为与安全排名模型过于不同类的激励。因此，PRPO对学习模型如何降低性能指标施加了限制，而无需依赖于任何特定的用户假设。我们的实验表明，我们的新颖安全双重稳健方法和PRPO都比现有的安全逆向倾向评分方法提供更高的性能。然而，当情况出乎意料时，安全双重稳健方法可能变得不安全，并带来有害性能。相反，PRPO始终保持安全性，即使在极端对抗情况下也是如此。通过避免假设，PRPO是第一种在部署中具有无条件安全性的方法，可转化为真实世界应用的稳健安全性。

更新时间: 2024-07-29 12:23:59

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2407.19943v1

Boosting Graph Foundation Model from Structural Perspective

Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspective and propose BooG. The model constructs virtual super nodes to unify structural characteristics of graph data from different domains. Specifically, the super nodes fuse the information of anchor nodes and class labels, where each anchor node captures the information of a node or a graph instance to be classified. Instead of using the raw graph structure, we connect super nodes to all nodes within their neighborhood by virtual edges. This new structure allows for effective information aggregation while unifying cross-domain structural characteristics. Additionally, we propose a novel pre-training objective based on contrastive learning, which learns more expressive representations for graph data and generalizes effectively to different domains and downstream tasks. Experimental results on various datasets and tasks demonstrate the superior performance of BooG. We provide our code and data here: https://anonymous.4open.science/r/BooG-EE42/.

Updated: 2024-07-29 12:22:16

标题: 从结构角度提升图基础模型

摘要: 最近，由于其强大的泛化能力，图基础模型引起了广泛关注。尽管现有方法借助语言模型学习跨领域统一的语义表示，但却忽略了不同领域图的独特结构特征。为了解决这个问题，在本文中，我们从结构角度提升了图基础模型，并提出了BooG。该模型构建虚拟超级节点，以统一不同领域图数据的结构特征。具体来说，超级节点融合了锚定节点和类标签的信息，其中每个锚定节点捕获了待分类节点或图实例的信息。我们没有使用原始图结构，而是通过虚拟边将超级节点连接到其邻域内的所有节点。这种新结构允许有效地聚合信息，同时统一跨领域的结构特征。此外，我们提出了一种基于对比学习的新型预训练目标，该目标学习了更具表现力的图数据表示，并有效地泛化到不同领域和下游任务。在各种数据集和任务上的实验结果显示了BooG的卓越性能。我们在此提供我们的代码和数据：https://anonymous.4open.science/r/BooG-EE42/。

更新时间: 2024-07-29 12:22:16

领域: cs.LG

下载: http://arxiv.org/abs/2407.19941v1

MSegRNN:Enhanced SegRNN Model with Mamba for Long-Term Time Series Forecasting

Long time series forecasting aims to utilize historical information to forecast future states over extended horizons. Traditional RNN-based series forecasting methods struggle to effectively address long-term dependencies and gradient issues in long time series problems. Recently, SegRNN has emerged as a leading RNN-based model tailored for long-term series forecasting, demonstrating state-of-the-art performance while maintaining a streamlined architecture through innovative segmentation and parallel decoding techniques. Nevertheless, SegRNN has several limitations: its fixed segmentation disrupts data continuity and fails to effectively leverage information across different segments, the segmentation strategy employed by SegRNN does not fundamentally address the issue of information loss within the recurrent structure. To address these issues, we propose the MSegRNN method with three key enhancements: we introduce an implicit segmentation structure to decompose the time series and map it to segmented hidden states, resulting in denser information exchange during the segmentation phase. Additionally, we incorporate residual structures in the encoding layer to mitigate information loss within the recurrent structure. To extract information more effectively, we further integrate the Mamba architecture to enhance time series information extraction. Experiments on several real-world long time series forecasting datasets demonstrate that our model surpasses the performance of current state-of-the-art models.

Updated: 2024-07-29 12:19:10

标题: MSegRNN：使用Mamba增强SegRNN模型进行长期时间序列预测

摘要: 长时间序列预测旨在利用历史信息来预测未来状态，覆盖较长时间范围。传统的基于RNN的时间序列预测方法在处理长期依赖性和渐变问题时往往效果不佳。最近，SegRNN作为一种专门用于长期时间序列预测的领先RNN模型出现，展现出最先进的性能，同时通过创新的分割和并行解码技术保持了简洁的架构。然而，SegRNN存在一些局限性：其固定的分割方式破坏了数据的连续性，未能有效地利用不同分段之间的信息，SegRNN采用的分割策略并没有从根本上解决循环结构内信息丢失的问题。为了解决这些问题，我们提出了MSegRNN方法，具有三个关键增强点：我们引入了隐式分割结构来分解时间序列并将其映射到分段隐藏状态，从而在分割阶段实现更密集的信息交换。此外，我们在编码层中引入残差结构来减轻循环结构内的信息丢失。为了更有效地提取信息，我们进一步整合了Mamba架构来增强时间序列信息的提取。在几个真实世界的长时间序列预测数据集上的实验证明，我们的模型超越了当前最先进模型的性能。

更新时间: 2024-07-29 12:19:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.10768v3

RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data, as well as the scarcity of real-world measurements. To address these challenges, our study presents RadioGAT, a novel framework based on Graph Attention Network (GAT) tailored for MB-RMR within a single area, eliminating the need for multi-region datasets. RadioGAT innovatively merges model-based spatial-spectral correlation encoding with data-driven radiomap generalization, thus minimizing the reliance on extensive data sources. The framework begins by transforming sparse multi-band data into a graph structure through an innovative encoding strategy that leverages radio propagation models to capture the spatial-spectral correlation inherent in the data. This graph-based representation not only simplifies data handling but also enables tailored label sampling during training, significantly enhancing the framework's adaptability for deployment. Subsequently, The GAT is employed to generalize the radiomap information across various frequency bands. Extensive experiments using raytracing datasets based on real-world environments have demonstrated RadioGAT's enhanced accuracy in supervised learning settings and its robustness in semi-supervised scenarios. These results underscore RadioGAT's effectiveness and practicality for MB-RMR in environments with limited data availability.

Updated: 2024-07-29 12:18:15

标题: RadioGAT: 一种基于模型和数据驱动的多频段无线信号地图重建的联合框架，基于图注意力网络

摘要: 多频段无线电地图重建（MB-RMR）是无线通信中的关键组成部分，用于诸如频谱管理和网络规划等任务。然而，传统基于机器学习的MB-RMR方法往往依赖于模拟数据或完整的结构化地面真实数据，面临着重要的部署挑战。这些挑战源自模拟数据和实际数据之间的差异，以及真实世界测量数据的稀缺性。为了解决这些挑战，我们的研究提出了RadioGAT，这是一个基于图注意力网络（GAT）的新框架，专为在单个区域内进行MB-RMR而设计，消除了多区域数据集的需求。RadioGAT创新地将基于模型的空间-频谱相关性编码与数据驱动的无线电地图泛化相结合，从而最大程度地减少对大量数据源的依赖。该框架通过一种创新的编码策略，将稀疏的多频段数据转换为图结构，利用无线传播模型捕捉数据中固有的空间-频谱相关性。这基于图的表示不仅简化了数据处理，还使得在训练过程中能够定制标签采样，显著增强了框架在部署中的适应性。随后，采用GAT来泛化跨各种频段的无线电地图信息。基于真实世界环境的射线跟踪数据集进行的广泛实验表明，RadioGAT在监督学习环境中具有更高的准确性，并且在半监督场景中具有更强的稳健性。这些结果强调了RadioGAT在数据稀缺环境中进行MB-RMR的有效性和实用性。

更新时间: 2024-07-29 12:18:15

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2403.16397v2

Robust Conformal Volume Estimation in 3D Medical Images

Volumetry is one of the principal downstream applications of 3D medical image segmentation, for example, to detect abnormal tissue growth or for surgery planning. Conformal Prediction is a promising framework for uncertainty quantification, providing calibrated predictive intervals associated with automatic volume measurements. However, this methodology is based on the hypothesis that calibration and test samples are exchangeable, an assumption that is in practice often violated in medical image applications. A weighted formulation of Conformal Prediction can be framed to mitigate this issue, but its empirical investigation in the medical domain is still lacking. A potential reason is that it relies on the estimation of the density ratio between the calibration and test distributions, which is likely to be intractable in scenarios involving high-dimensional data. To circumvent this, we propose an efficient approach for density ratio estimation relying on the compressed latent representations generated by the segmentation model. Our experiments demonstrate the efficiency of our approach to reduce the coverage error in the presence of covariate shifts, in both synthetic and real-world settings. Our implementation is available at https://github.com/benolmbrt/wcp_miccai

Updated: 2024-07-29 12:18:07

标题: 3D医学图像中的稳健的符合体积估计

摘要: 体积测量是3D医学图像分割的主要下游应用之一，例如用于检测异常组织生长或手术规划。符合性预测是一种有希望的不确定性量化框架，提供与自动体积测量相关的校准预测区间。然而，该方法基于校准和测试样本可交换的假设，这一假设在医学图像应用中实际上经常被违反。可以构建符合性预测的加权形式来缓解这个问题，但在医学领域中对其进行的经验研究仍然缺乏。一个潜在的原因是它依赖于校准和测试分布之间的密度比率的估计，这在涉及高维数据的场景中可能是难以处理的。为了避免这一问题，我们提出了一种依赖于分割模型生成的压缩潜在表示的密度比率估计的高效方法。我们的实验证明了我们的方法在合成和真实世界环境中降低协变量偏移下的覆盖误差的效率。我们的实现可在https://github.com/benolmbrt/wcp_miccai找到。

更新时间: 2024-07-29 12:18:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19938v1

AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Recent recommender systems aim to provide not only accurate recommendations but also explanations that help users understand them better. However, most existing explainable recommendations only consider the importance of content in reviews, such as words or aspects, and ignore the ordering relationship among them. This oversight neglects crucial ordering dimensions in the human decision-making process, leading to suboptimal performance. Therefore, in this paper, we propose Aspect Order Tree-based (AOTree) explainable recommendation method, inspired by the Order Effects Theory from cognitive and decision psychology, in order to capture the dependency relationships among decisive factors. We first validate the theory in the recommendation scenario by analyzing the reviews of the users. Then, according to the theory, the proposed AOTree expands the construction of the decision tree to capture aspect orders in users' decision-making processes, and use attention mechanisms to make predictions based on the aspect orders. Extensive experiments demonstrate our method's effectiveness on rating predictions, and our approach aligns more consistently with the user' s decision-making process by displaying explanations in a particular order, thereby enhancing interpretability.

Updated: 2024-07-29 12:17:48

标题: AOTree：基于Aspect Order Tree的可解释推荐模型

摘要: 最近的推荐系统旨在提供不仅准确的推荐，还有帮助用户更好理解推荐的解释。然而，大多数现有的可解释推荐只考虑评论中内容的重要性，如词语或方面，而忽略它们之间的排序关系。这种疏忽忽视了人类决策过程中关键的排序维度，导致表现不佳。因此，在本文中，我们提出了基于Aspect Order Tree（AOTree）的可解释推荐方法，受到认知和决策心理学中的Order Effects Theory的启发，以捕捉决定性因素之间的依赖关系。我们首先通过分析用户的评论验证了该理论在推荐场景中的有效性。然后，根据该理论，所提出的AOTree扩展了决策树的构建，以捕捉用户决策过程中的方面顺序，并使用注意机制基于方面顺序进行预测。广泛的实验证明了我们的方法在评分预测上的有效性，我们的方法通过按特定顺序显示解释，更加一致地与用户的决策过程相一致，从而增强了可解释性。

更新时间: 2024-07-29 12:17:48

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.19937v1

AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning

Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce a case-conditioned prompting strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97.4\% with GPT-4-turbo and 86.2\% with GPT-3.5-turbo on ALFWorld benchmark tasks. The code is available at https://github.com/minghchen/automanual.

Updated: 2024-07-29 12:16:56

标题: AutoManual：通过LLM代理通过交互式环境学习生成说明手册

摘要: 基于大型语言模型（LLM）的代理在自主完成各种领域的任务中表现出潜力，例如机器人技术、游戏和网络导航。然而，这些代理通常需要精心设计和专家提示来解决特定领域的任务，这限制了它们的适应性。我们引入了AutoManual，这是一个框架，使LLM代理能够通过互动自主地建立对环境的理解，并适应新的环境。AutoManual将环境知识分类为不同的规则，并通过两个代理以在线方式优化它们：1）规划者基于当前规则编码可操作的计划以与环境互动。2）构建者通过一个结构良好的规则系统更新规则，促进在线规则管理和关键细节保留。为了减轻管理规则中的幻觉，我们为构建者引入了一种基于案例条件的提示策略。最后，形式化代理将这些规则编译成一本全面的手册。自动生成的手册不仅可以提高适应性，还可以指导规划较小的LLM，同时具有人类可读性。仅通过一个简单的演示，AutoManual显著提高了任务成功率，在ALFWorld基准任务上，GPT-4-turbo实现了97.4％，GPT-3.5-turbo实现了86.2％。该代码可在https://github.com/minghchen/automanual 上找到。

更新时间: 2024-07-29 12:16:56

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16247v2

Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

Imaging is a standard example of an inverse problem, where the task of reconstructing a ground truth from a noisy measurement is ill-posed. Recent state-of-the-art approaches for imaging use deep learning, spearheaded by unrolled and end-to-end models and trained on various image datasets. However, many such methods require the availability of ground truth data, which may be unavailable or expensive, leading to a fundamental barrier that can not be bypassed by choice of architecture. Unsupervised learning presents an alternative paradigm that bypasses this requirement, as they can be learned directly on noisy data and do not require any ground truths. A principled Bayesian approach to unsupervised learning is to maximize the marginal likelihood with respect to the given noisy measurements, which is intrinsically linked to classical variational regularization. We propose an unsupervised approach using maximum marginal likelihood estimation to train a convex neural network-based image regularization term directly on noisy measurements, improving upon previous work in both model expressiveness and dataset size. Experiments demonstrate that the proposed method produces priors that are near competitive when compared to the analogous supervised training method for various image corruption operators, maintaining significantly better generalization properties when compared to end-to-end methods. Moreover, we provide a detailed theoretical analysis of the convergence properties of our proposed algorithm.

Updated: 2024-07-29 12:10:27

标题: 无监督训练凸正则化器的最大似然估计

摘要: 成像是一个标准的反问题示例，其中从嘈杂的测量中重建真实情况是一个不适当的任务。最近的成像最新方法使用深度学习，由展开和端到端模型带头，并在各种图像数据集上进行训练。然而，许多这种方法需要真实数据的可用性，这可能是不可用或昂贵的，导致了一个基本的障碍，不能通过架构的选择来绕过。无监督学习提出了一个绕过这一要求的替代范式，因为它们可以直接在嘈杂的数据上学习，并不需要任何真实情况。一种有原则的贝叶斯方法是最大化与给定嘈杂测量相关的边际似然，这与经典变分正则化密切相关。我们提出了一种使用最大边际似然估计的无监督方法，直接在嘈杂的测量上训练基于神经网络的图像正则化项，改进了以前在模型表达能力和数据集大小方面的工作。实验证明，所提出的方法在各种图像损坏操作符的情况下产生的先验与类似的监督训练方法相比几乎具有竞争力，并且相比端到端方法具有明显更好的泛化性能。此外，我们提供了对我们提出的算法的收敛特性的详细理论分析。

更新时间: 2024-07-29 12:10:27

领域: stat.ME,cs.LG,stat.CO,62C12, 62F15, 65C40, 65J22

下载: http://arxiv.org/abs/2404.05445v2

HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy

Large Language Models (LLMs) can play a vital role in psychotherapy by adeptly handling the crucial task of cognitive reframing and overcoming challenges such as shame, distrust, therapist skill variability, and resource scarcity. Previous LLMs in cognitive reframing mainly converted negative emotions to positive ones, but these approaches have limited efficacy, often not promoting clients' self-discovery of alternative perspectives. In this paper, we unveil the Helping and Empowering through Adaptive Language in Mental Enhancement (HealMe) model. This novel cognitive reframing therapy method effectively addresses deep-rooted negative thoughts and fosters rational, balanced perspectives. Diverging from traditional LLM methods, HealMe employs empathetic dialogue based on psychotherapeutic frameworks. It systematically guides clients through distinguishing circumstances from feelings, brainstorming alternative viewpoints, and developing empathetic, actionable suggestions. Moreover, we adopt the first comprehensive and expertly crafted psychological evaluation metrics, specifically designed to rigorously assess the performance of cognitive reframing, in both AI-simulated dialogues and real-world therapeutic conversations. Experimental results show that our model outperforms others in terms of empathy, guidance, and logical coherence, demonstrating its effectiveness and potential positive impact on psychotherapy.

Updated: 2024-07-29 12:05:36

标题: HealMe：利用大型语言模型中的认知重构进行心理治疗

摘要: 大型语言模型（LLMs）在心理治疗中可以发挥重要作用，熟练处理认知重构等关键任务，并克服羞耻、不信任、治疗师技能差异和资源匮乏等挑战。先前的认知重构LLMs主要将负面情绪转化为正面情绪，但这些方法的疗效有限，往往不能促进客户自我发现替代观点。在本文中，我们揭示了Helping and Empowering through Adaptive Language in Mental Enhancement（HealMe）模型。这种新颖的认知重构疗法方法有效地解决了根深蒂固的负面思维，并促进了理性、平衡的观点。与传统的LLM方法不同，HealMe采用基于心理治疗框架的共情对话。它系统地引导客户区分情境和感受，集思广益地探讨替代观点，并制定富有同理心、可操作的建议。此外，我们采用了第一个全面且专业设计的心理评估指标，专门用于严格评估认知重构在AI模拟对话和现实世界治疗对话中的表现。实验结果显示，我们的模型在共情、指导和逻辑连贯性方面优于其他模型，显示了其在心理治疗中的有效性和潜在积极影响。

更新时间: 2024-07-29 12:05:36

领域: cs.HC,cs.AI,cs.CL,J.4

下载: http://arxiv.org/abs/2403.05574v3

Lightweight Dataset for Decoy Development to Improve IoT Security

In this paper, the authors introduce a lightweight dataset to interpret IoT (Internet of Things) activity in preparation to create decoys by replicating known data traffic patterns. The dataset comprises different scenarios in a real network setting. This paper also surveys information related to other IoT datasets along with the characteristics that make our data valuable. Many of the datasets available are synthesized (simulated) or often address industrial applications, while the IoT dataset we present is based on likely smart home scenarios. Further, there are only a limited number of IoT datasets that contain both normal operation and attack scenarios. A discussion of the network configuration and the steps taken to prepare this dataset are presented as we prepare to create replicative patterns for decoy purposes. The dataset, which we refer to as IoT Flex Data, consists of four categories, namely, IoT benign idle, IoT benign active, IoT setup, and malicious (attack) traffic associating the IoT devices with the scenarios under consideration.

Updated: 2024-07-29 12:01:50

标题: 轻量级数据集用于诱饵开发以提高物联网安全性

摘要: 在本文中，作者介绍了一个轻量级数据集，用于解释物联网（IoT）活动，为了创建模拟已知数据流量模式的诱饵。该数据集包含了在真实网络环境中的不同场景。本文还调查了与其他IoT数据集相关的信息，以及使我们的数据有价值的特征。许多可用的数据集是合成的（模拟的），或者通常涉及工业应用，而我们提供的IoT数据集是基于可能的智能家居场景的。此外，只有少数IoT数据集包含正常操作和攻击场景。在准备为诱饵目的创建复制模式时，我们展示了网络配置和准备该数据集所采取的步骤。这个数据集，我们称之为IoT Flex Data，包括四个类别，即IoT良性空闲、IoT良性活跃、IoT设置和恶意（攻击）流量，将IoT设备与考虑中的场景相关联。

更新时间: 2024-07-29 12:01:50

领域: cs.CR

下载: http://arxiv.org/abs/2407.19926v1

Monetizing Currency Pair Sentiments through LLM Explainability

Large language models (LLMs) play a vital role in almost every domain in today's organizations. In the context of this work, we highlight the use of LLMs for sentiment analysis (SA) and explainability. Specifically, we contribute a novel technique to leverage LLMs as a post-hoc model-independent tool for the explainability of SA. We applied our technique in the financial domain for currency-pair price predictions using open news feed data merged with market prices. Our application shows that the developed technique is not only a viable alternative to using conventional eXplainable AI but can also be fed back to enrich the input to the machine learning (ML) model to better predict future currency-pair values. We envision our results could be generalized to employing explainability as a conventional enrichment for ML input for better ML predictions in general.

Updated: 2024-07-29 11:58:54

标题: 通过LLM可解释性将货币对情绪转化为货币

摘要: 大型语言模型（LLMs）在当今组织的几乎每个领域中起着至关重要的作用。在这项工作的背景下，我们强调了LLMs在情感分析（SA）和可解释性方面的应用。具体而言，我们提出了一种新颖的技术，利用LLMs作为后续独立于模型的工具，用于解释SA。我们在金融领域应用了我们的技术，利用合并了市场价格的开放新闻源数据进行货币对价格预测。我们的应用显示，开发的技术不仅是使用传统的可解释人工智能的可行替代方案，而且还可以反馈到机器学习（ML）模型的输入中，以更好地预测未来的货币对价值。我们设想我们的结果可以推广到将可解释性作为机器学习输入的传统丰富化，以便更好地进行一般的机器学习预测。

更新时间: 2024-07-29 11:58:54

领域: cs.AI,68T50

下载: http://arxiv.org/abs/2407.19922v1

Invariance of deep image quality metrics to affine transformations

Deep architectures are the current state-of-the-art in predicting subjective image quality. Usually, these models are evaluated according to their ability to correlate with human opinion in databases with a range of distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. Here invariance of a metric refers to the fact that certain distances should be neglected (considered to be zero) if their values are below a threshold. This is what we call invisibility threshold of a metric. We propose a methodology to assign such invisibility thresholds for any perceptual metric. This methodology involves transformations to a distance space common to any metric, and psychophysical measurements of thresholds in this common space. By doing so, we allow the analyzed metrics to be directly comparable with actual human thresholds. We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds.

Updated: 2024-07-29 11:55:53

标题: 深度图像质量度量对仿射变换的不变性

摘要: 深度架构是当前在预测主观图像质量方面的最先进技术。通常，这些模型根据它们与人类意见在可能出现在数字媒体中的一系列失真的相关性来进行评估。然而，这些模型忽视了可能更好地代表实际发生在自然条件下的图像变化的仿射变换。与数字变换相反，人类可能对这些自然变换特别不变。在这项工作中，我们通过评估最先进的深度图像质量度量来评估它们对仿射变换的不变性，具体包括：旋转、平移、缩放和光谱照明变化。在这里，度量的不变性指的是如果其值低于阈值，则应忽略（视为零）某些距离。这就是我们所说的度量的不可见阈值。我们提出一种方法来为任何感知度量分配这种不可见阈值。这种方法涉及将转换到任何度量的共同距离空间，并在这个共同空间中对阈值进行心理物理测量。通过这样做，我们允许分析的度量与实际人类阈值直接进行比较。我们发现，在这种基于不可见阈值的强测试下，没有任何最先进的度量显示出类似于人类的结果。这意味着，将模型调整为专门预测通用失真的可见性可能会忽视人类视觉的其他属性，例如不变性或不可见阈值。

更新时间: 2024-07-29 11:55:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17927v2

Deep NURBS -- Admissible Physics-informed Neural Networks

In this study, we propose a new numerical scheme for physics-informed neural networks (PINNs) that enables precise and inexpensive solution for partial differential equations (PDEs) in case of arbitrary geometries while strictly enforcing Dirichlet boundary conditions. The proposed approach combines admissible NURBS parametrizations required to define the physical domain and the Dirichlet boundary conditions with a PINN solver. The fundamental boundary conditions are automatically satisfied in this novel Deep NURBS framework. We verified our new approach using two-dimensional elliptic PDEs when considering arbitrary geometries, including non-Lipschitz domains. Compared to the classical PINN solver, the Deep NURBS estimator has a remarkably high convergence rate for all the studied problems. Moreover, a desirable accuracy was realized for most of the studied PDEs using only one hidden layer of neural networks. This novel approach is considered to pave the way for more effective solutions for high-dimensional problems by allowing for more realistic physics-informed statistical learning to solve PDE-based variational problems.

Updated: 2024-07-29 11:53:06

标题: 深度NURBS-可接受的物理启发神经网络

摘要: 在这项研究中，我们提出了一种新的数值方案，用于物理信息神经网络（PINNs），可以在任意几何情况下精确且廉价地解决偏微分方程（PDEs），同时严格执行Dirichlet边界条件。所提出的方法将用于定义物理域和Dirichlet边界条件所需的可容许NURBS参数化与PINN求解器相结合。基本边界条件在这种新型深度NURBS框架中自动满足。我们使用二维椭圆PDEs验证了我们的新方法，考虑了任意几何情况，包括非利普希茨域。与传统的PINN求解器相比，深度NURBS估计器对所有研究的问题具有显著高的收敛速度。此外，对于大多数研究的PDEs，仅使用一个隐藏层的神经网络就可以实现理想的精度。这种新颖的方法被认为为通过允许更现实的物理信息统计学习来解决基于PDE的变分问题，为高维问题提供更有效的解决方案铺平了道路。

更新时间: 2024-07-29 11:53:06

领域: math.NA,cs.LG,cs.NA,stat.ML

下载: http://arxiv.org/abs/2210.13900v2

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture

Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across devices and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. More importantly, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication architectures. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex settings. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.

Updated: 2024-07-29 11:51:40

标题: 强大的全异步方法用于一般架构上的分布式训练

摘要: 在分布式机器学习问题中，完美的同步是低效甚至不可能的，这是由于延迟、数据包丢失和慢速设备的存在。我们提出了一种鲁棒的完全异步随机梯度跟踪方法（R-FAST），其中每个设备以自己的速度进行本地计算和通信，而无需任何形式的同步。与现有的异步分布式算法不同，R-FAST可以通过采用一种依赖于适当设计的辅助变量的鲁棒梯度跟踪策略来消除跨设备的数据异质性的影响，并允许数据包丢失。更重要的是，所提出的方法利用两个覆盖树图进行通信，只要两者至少有一个共同的根，就可以在通信架构中实现灵活设计。我们展示了R-FAST对于平滑且强凸目标，能够以几何速率收敛到最优解的邻域；对于一般的非凸设置，能够以次线性速率收敛到一个稳定点。大量实验证明，R-FAST比同步基准算法（如Ring-AllReduce和D-PSGD）运行速度快1.5-2倍，同时仍然达到可比较的精度，并且在存在慢速设备时表现优于现有的异步SOTA算法（如AD-PSGD和OSGP）。

更新时间: 2024-07-29 11:51:40

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2307.11617v2

Aero-Nef: Neural Fields for Rapid Aircraft Aerodynamics Simulations

This paper presents a methodology to learn surrogate models of steady state fluid dynamics simulations on meshed domains, based on Implicit Neural Representations (INRs). The proposed models can be applied directly to unstructured domains for different flow conditions, handle non-parametric 3D geometric variations, and generalize to unseen shapes at test time. The coordinate-based formulation naturally leads to robustness with respect to discretization, allowing an excellent trade-off between computational cost (memory footprint and training time) and accuracy. The method is demonstrated on two industrially relevant applications: a RANS dataset of the two-dimensional compressible flow over a transonic airfoil and a dataset of the surface pressure distribution over 3D wings, including shape, inflow condition, and control surface deflection variations. On the considered test cases, our approach achieves a more than three times lower test error and significantly improves generalization error on unseen geometries compared to state-of-the-art Graph Neural Network architectures. Remarkably, the method can perform inference five order of magnitude faster than the high fidelity solver on the RANS transonic airfoil dataset. Code is available at https://gitlab.isae-supaero.fr/gi.catalani/aero-nepf

Updated: 2024-07-29 11:48:44

标题: Aero-Nef：用于快速飞机空气动力学模拟的神经场

摘要: 这篇论文提出了一种基于隐式神经表示（INRs）的方法，用于在网格域上学习稳态流体动力学模拟的代理模型。所提出的模型可以直接应用于不同流动条件下的非结构域，处理非参数化的3D几何变化，并在测试时推广到未见过的形状。基于坐标的公式自然地导致了对离散化的鲁棒性，允许在计算成本（内存占用和训练时间）和准确性之间取得良好的平衡。该方法在两个工业相关应用上进行了演示：一个是二维可压缩流经跨声速翼型的RANS数据集，另一个是包括形状、入流条件和控制面偏转变化的3D机翼表面压力分布数据集。在考虑的测试案例中，我们的方法实现了比现有技术的图神经网络架构更高三倍以上的测试误差，并显著提高了对未见几何形状的泛化误差。值得注意的是，与RANS跨声速翼型数据集上的高保真求解器相比，该方法的推理速度可快五个数量级。代码可在以下链接获取：https://gitlab.isae-supaero.fr/gi.catalani/aero-nepf

更新时间: 2024-07-29 11:48:44

领域: cs.CE,cs.LG,cs.NA,math.NA,physics.flu-dyn

下载: http://arxiv.org/abs/2407.19916v1

Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models

Sentiment analysis is a widely researched area within Natural Language Processing (NLP), attracting significant interest due to the advent of automated solutions. Despite this, the task remains challenging because of the inherent complexity of languages and the subjective nature of sentiments. It is even more challenging for less-studied and less-resourced languages such as Lithuanian. Our review of existing Lithuanian NLP research reveals that traditional machine learning methods and classification algorithms have limited effectiveness for the task. In this work, we address sentiment analysis of Lithuanian five-star-based online reviews from multiple domains that we collect and clean. We apply transformer models to this task for the first time, exploring the capabilities of pre-trained multilingual Large Language Models (LLMs), specifically focusing on fine-tuning BERT and T5 models. Given the inherent difficulty of the task, the fine-tuned models perform quite well, especially when the sentiments themselves are less ambiguous: 80.74% and 89.61% testing recognition accuracy of the most popular one- and five-star reviews respectively. They significantly outperform current commercial state-of-the-art general-purpose LLM GPT-4. We openly share our fine-tuned LLMs online.

Updated: 2024-07-29 11:44:21

标题: 通过大型语言模型进行对立方网上评论的情感分析

摘要: 情感分析是自然语言处理（NLP）领域内广泛研究的一个领域，由于自动化解决方案的出现而受到了极大的关注。尽管如此，由于语言的固有复杂性和情感的主观性，任务仍然具有挑战性。对于立陶宛这样的研究较少、资源较少的语言来说，情况更加困难。我们对现有的立陶宛NLP研究进行了回顾，发现传统的机器学习方法和分类算法对于这一任务的效果有限。在这项工作中，我们针对从多个领域收集并清理的立陶宛五星级在线评论进行情感分析。我们首次将变形金刚模型应用于这一任务，探究预训练的多语言大型语言模型（LLMs）的能力，特别关注BERT和T5模型的微调。鉴于任务的困难性，经过微调的模型表现相当不错，尤其是当情感本身不太模糊时：最受欢迎的一星评论和五星评论的测试识别准确率分别为80.74%和89.61%。它们明显优于当前商业最先进的通用LLM GPT-4。我们在网上公开分享我们的经过微调的LLMs。

更新时间: 2024-07-29 11:44:21

领域: cs.CL,cs.IR,cs.LG,68T07, 68T50, 68T05,,I.2.6; I.2.7

下载: http://arxiv.org/abs/2407.19914v1

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language' strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models.

Updated: 2024-07-29 11:43:31

标题: 激励多模态令牌以提升LLMs的端到端自主驾驶模仿学习

摘要: 在最近的学术文献中，大型语言模型（LLMs）在强化学习领域的应用，特别是作为规划者，引起了相当大的关注。然而，现有研究的相当一部分主要集中在为机器人设计规划模型，将从感知模型中获取的输出转化为语言形式，从而采用了“纯语言”策略。在这项研究中，我们提出了一个融合基本驾驶模仿学习和基于多模态提示令牌的LLMs的混合端到端学习框架，用于自动驾驶。我们的创新点在于两个方面，而不仅仅是将分离的训练模型中的感知结果简单转换为纯语言输入。第一，将视觉和LiDAR传感器输入端到端集成为可学习的多模态令牌，从而本质上减轻了由分离的预训练感知模型引起的描述偏见。第二，本文探讨了一种混合设置，即让LLMs帮助驾驶模型纠正错误和复杂情景，而不是直接让LLMs驾驶。我们的实验结果表明，所提出的方法可以在CARLA进行的离线评估中实现49.21%的驾驶分数，以及令人印象深刻的91.34%的路线完成率。这些性能指标与最先进的驾驶模型相媲美。

更新时间: 2024-07-29 11:43:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.04869v2

Efficient Shield Synthesis via State-Space Transformation

We consider the problem of synthesizing safety strategies for control systems, also known as shields. Since the state space is infinite, shields are typically computed over a finite-state abstraction, with the most common abstraction being a rectangular grid. However, for many systems, such a grid does not align well with the safety property or the system dynamics. That is why a coarse grid is rarely sufficient, but a fine grid is typically computationally infeasible to obtain. In this paper, we show that appropriate state-space transformations can still allow to use a coarse grid at almost no computational overhead. We demonstrate in three case studies that our transformation-based synthesis outperforms a standard synthesis by several orders of magnitude. In the first two case studies, we use domain knowledge to select a suitable transformation. In the third case study, we instead report on results in engineering a transformation without domain knowledge.

Updated: 2024-07-29 11:39:22

标题: 通过状态空间转换实现高效的屏蔽合成

摘要: 我们考虑合成控制系统的安全策略的问题，也称为护盾。由于状态空间是无限的，护盾通常是在有限状态抽象上计算的，最常见的抽象是矩形网格。然而，对于许多系统，这样的网格与安全性质或系统动态不太匹配。这就是为什么粗网格很少足够，但通常无法获得精细网格。在本文中，我们展示适当的状态空间转换仍然可以允许在几乎没有计算开销的情况下使用粗网格。我们在三个案例研究中展示，我们基于转换的合成比标准合成效果好几个数量级。在前两个案例研究中，我们使用领域知识来选择适当的转换。在第三个案例研究中，我们报告了在没有领域知识的情况下工程转换的结果。

更新时间: 2024-07-29 11:39:22

领域: cs.LO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.19911v1

Reverse Map Projections as Equivariant Quantum Embeddings

We introduce the novel class $(E_\alpha)_{\alpha \in [-\infty,1)}$ of reverse map projection embeddings, each one defining a unique new method of encoding classical data into quantum states. Inspired by well-known map projections from the unit sphere onto its tangent planes, used in practice in cartography, these embeddings address the common drawback of the amplitude embedding method, wherein scalar multiples of data points are identified and information about the norm of data is lost. We show how reverse map projections can be utilised as equivariant embeddings for quantum machine learning. Using these methods, we can leverage symmetries in classical datasets to significantly strengthen performance on quantum machine learning tasks. Finally, we select four values of $\alpha$ with which to perform a simple classification task, taking $E_\alpha$ as the embedding and experimenting with both equivariant and non-equivariant setups. We compare their results alongside those of standard amplitude embedding.

Updated: 2024-07-29 11:31:24

标题: 反向地图投影作为等变量量子嵌入

摘要: 我们介绍了一种新颖的类$(E_\alpha)_{\alpha \in [-\infty,1)}$的逆映射投影嵌入，每个嵌入都定义了一种独特的将经典数据编码为量子态的方法。受到在实际制图中使用的将单位球投影到其切平面的著名地图投影的启发，这些嵌入解决了振幅嵌入方法的常见缺点，其中数据点的标量倍数被识别，数据的范数信息丢失。我们展示了逆地图投影如何被用作量子机器学习的等变嵌入。利用这些方法，我们可以利用经典数据集中的对称性显著增强量子机器学习任务的性能。最后，我们选择了四个$\alpha$值来执行一个简单的分类任务，将$E_\alpha$作为嵌入，并尝试使用等变和非等变设置。我们将它们的结果与标准振幅嵌入进行比较。

更新时间: 2024-07-29 11:31:24

领域: quant-ph,cs.AI,cs.ET,math-ph,math.MP

下载: http://arxiv.org/abs/2407.19906v1

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Music generation introduces challenging complexities to large language models. Symbolic structures of music often include vertical harmonization as well as horizontal counterpoint, urging various adaptations and enhancements for large-scale Transformers. However, existing works share three major drawbacks: 1) their tokenization requires domain-specific annotations, such as bars and beats, that are typically missing in raw MIDI data; 2) the pure impact of enhancing token embedding methods is hardly examined without domain-specific annotations; and 3) existing works to overcome the aforementioned drawbacks, such as MuseNet, lack reproducibility. To tackle such limitations, we develop a MIDI-based music generation framework inspired by MuseNet, empirically studying two structural embeddings that do not rely on domain-specific annotations. We provide various metrics and insights that can guide suitable encoding to deploy. We also verify that multiple embedding configurations can selectively boost certain musical aspects. By providing open-source implementations via HuggingFace, our findings shed light on leveraging large language models toward practical and reproducible music generation.

Updated: 2024-07-29 11:24:10

标题: 大型语言模型结构嵌入实现的实用且可复现的符号音乐生成

摘要: 音乐生成为大型语言模型引入了挑战性的复杂性。音乐的符号结构通常包括垂直和谐以及水平对位，促使大规模Transformer进行各种适应和增强。然而，现有作品存在三个主要缺点：1）它们的标记化需要领域特定的注释，例如小节和拍子，在原始MIDI数据中通常缺失；2）增强标记嵌入方法的纯影响很少在没有领域特定注释的情况下进行检查；3）现有作品克服上述缺点的努力，如MuseNet，缺乏可重现性。为了解决这些限制，我们开发了一种基于MIDI的音乐生成框架，灵感来自于MuseNet，并对不依赖于领域特定注释的两种结构嵌入进行了实证研究。我们提供了各种指标和见解，可以指导适当的编码部署。我们还验证了多种嵌入配置可以选择性地提升某些音乐方面。通过HuggingFace提供开源实现，我们的发现启示了如何利用大型语言模型实现实用和可重现的音乐生成。

更新时间: 2024-07-29 11:24:10

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.19900v1

BEExAI: Benchmark to Evaluate Explainable AI

Recent research in explainability has given rise to numerous post-hoc attribution methods aimed at enhancing our comprehension of the outputs of black-box machine learning models. However, evaluating the quality of explanations lacks a cohesive approach and a consensus on the methodology for deriving quantitative metrics that gauge the efficacy of explainability post-hoc attribution methods. Furthermore, with the development of increasingly complex deep learning models for diverse data applications, the need for a reliable way of measuring the quality and correctness of explanations is becoming critical. We address this by proposing BEExAI, a benchmark tool that allows large-scale comparison of different post-hoc XAI methods, employing a set of selected evaluation metrics.

Updated: 2024-07-29 11:21:17

标题: BEExAI: 评估可解释人工智能的基准

摘要: 最近对可解释性的研究已经产生了许多针对黑盒机器学习模型输出理解的后续归因方法。然而，评估解释质量缺乏统一的方法，并对衡量可解释性后续归因方法效力的定量指标方法缺乏共识。此外，随着越来越复杂的深度学习模型用于各种数据应用的发展，衡量解释质量和正确性的可靠方法变得至关重要。我们通过提出BEExAI这一基准工具来解决这个问题，该工具允许对不同后续XAI方法进行大规模比较，采用一组精选的评估指标。

更新时间: 2024-07-29 11:21:17

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.19897v1

On the (In)Security of LLM App Stores

LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, the development of a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries, and automated monitoring tools to identify and mitigate threats. We uncovered that 15,146 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful content such as hate speech, self-harm, extremism, etc. Additionally, we evaluated the potential for LLM apps to facilitate malicious activities, finding that 616 apps could be used for malware generation, phishing, etc. Our findings highlight the urgent need for robust regulatory frameworks and enhanced enforcement mechanisms.

Updated: 2024-07-29 11:18:57

标题: 关于LLM应用商店的（不）安全性

摘要: LLM 应用商店迅速增长，导致了大量定制的 LLM 应用的泛滥。然而，这种扩张引发了安全担忧。在本研究中，我们提出了一个三层次的关注框架，用于识别LLM 应用的潜在安全风险，即具有滥用潜力的LLM 应用，恶意意图的LLM 应用和具有可利用漏洞的LLM 应用。在五个月的时间内，我们从六个主要应用商店：GPT Store、FlowGPT、Poe、Coze、Cici 和Character.AI 收集了 786,036 个LLM 应用。我们的研究整合了静态和动态分析，开发了一个包含 31,783 个词条的大规模有毒词典（即 ToxicDict），以及自动化监测工具来识别和减轻威胁。我们发现，有 15,146 个应用具有误导性描述，1,366 个应用违反其隐私政策收集了敏感个人信息，15,996 个应用生成了有害内容，如仇恨言论、自残、极端主义等。此外，我们评估了LLM 应用促进恶意活动的潜力，发现有 616 个应用可用于生成恶意软件、钓鱼等。我们的研究结果突出了强化监管框架和加强执法机制的紧急需求。

更新时间: 2024-07-29 11:18:57

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.08422v2

Leveraging Foundation Models for Zero-Shot IoT Sensing

Deep learning models are increasingly deployed on edge Internet of Things (IoT) devices. However, these models typically operate under supervised conditions and fail to recognize unseen classes different from training. To address this, zero-shot learning (ZSL) aims to classify data of unseen classes with the help of semantic information. Foundation models (FMs) trained on web-scale data have shown impressive ZSL capability in natural language processing and visual understanding. However, leveraging FMs' generalized knowledge for zero-shot IoT sensing using signals such as mmWave, IMU, and Wi-Fi has not been fully investigated. In this work, we align the IoT data embeddings with the semantic embeddings generated by an FM's text encoder for zero-shot IoT sensing. To utilize the physics principles governing the generation of IoT sensor signals to derive more effective prompts for semantic embedding extraction, we propose to use cross-attention to combine a learnable soft prompt that is optimized automatically on training data and an auxiliary hard prompt that encodes domain knowledge of the IoT sensing task. To address the problem of IoT embeddings biasing to seen classes due to the lack of unseen class data during training, we propose using data augmentation to synthesize unseen class IoT data for fine-tuning the IoT feature extractor and embedding projector. We evaluate our approach on multiple IoT sensing tasks. Results show that our approach achieves superior open-set detection and generalized zero-shot learning performance compared with various baselines. Our code is available at https://github.com/schrodingho/FM\_ZSL\_IoT.

Updated: 2024-07-29 11:16:48

标题: 利用基础模型进行零-shot IoT感知

摘要: 深度学习模型越来越多地部署在边缘物联网(IoT)设备上。然而，这些模型通常在监督条件下运行，无法识别训练时不同的未知类别。为了解决这个问题，零样本学习(ZSL)旨在利用语义信息对未知类别的数据进行分类。在大规模网络数据上训练的基础模型(FMs)在自然语言处理和视觉理解方面展示了令人印象深刻的ZSL能力。然而，利用FMs的泛化知识来进行零样本IoT传感（如毫米波、IMU和Wi-Fi）的研究尚未得到充分探讨。在这项工作中，我们将IoT数据嵌入与FM文本编码器生成的语义嵌入进行对齐，以进行零样本IoT传感。为了利用物理原理来生成IoT传感器信号，以获得更有效的语义嵌入提取提示，我们提出使用交叉注意力来结合一个可学习的软提示和一个编码IoT传感任务领域知识的辅助硬提示。为了解决由于训练过程中缺乏未知类别数据导致IoT嵌入偏向于已知类别的问题，我们提出使用数据增强来合成未知类别的IoT数据，以对IoT特征提取器和嵌入投影器进行微调。我们在多个IoT传感任务上评估了我们的方法。结果显示，与各种基线相比，我们的方法实现了更优越的开放式检测和泛化零样本学习性能。我们的代码可在https://github.com/schrodingho/FM\_ZSL\_IoT 上找到。

更新时间: 2024-07-29 11:16:48

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.19893v1

Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Gaussian graphical models can be used to extract conditional dependencies between the features of the dataset. This is often done by making an independence assumption about the samples, but this assumption is rarely satisfied in reality. However, state-of-the-art approaches that avoid this assumption are not scalable, with $O(n^3)$ runtime and $O(n^2)$ space complexity. In this paper, we introduce a method that has $O(n^2)$ runtime and $O(n)$ space complexity, without assuming independence. We validate our model on both synthetic and real-world datasets, showing that our method's accuracy is comparable to that of prior work We demonstrate that our approach can be used on unprecedentedly large datasets, such as a real-world 1,000,000-cell scRNA-seq dataset; this was impossible with previous approaches. Our method maintains the flexibility of prior work, such as the ability to handle multi-modal tensor-variate datasets and the ability to work with data of arbitrary marginal distributions. An additional advantage of our method is that, unlike prior work, our hyperparameters are easily interpretable.

Updated: 2024-07-29 11:15:25

标题: 使多轴高斯图形模型可扩展到数百万个样本和特征

摘要: 高斯图模型可以用于提取数据集特征之间的条件依赖关系。通常情况下，这是通过对样本做独立性假设来实现的，但这个假设在现实中很少成立。然而，避免这一假设的最新方法并不具备可扩展性，其运行时间为$O(n^3)$，空间复杂度为$O(n^2)$。在本文中，我们介绍了一种方法，其运行时间为$O(n^2)$，空间复杂度为$O(n)$，而不需要假设独立性。我们在合成和真实数据集上验证了我们的模型，表明我们的方法的准确性与先前的工作相当。我们展示了我们的方法可以应用于前所未有的大型数据集，如真实的100万细胞scRNA-seq数据集；这在以前的方法中是不可能的。我们的方法保持了先前工作的灵活性，如处理多模态张量变量数据集的能力以及处理任意边缘分布数据的能力。我们方法的另一个优势是，与先前的工作不同，我们的超参数是易于解释的。

更新时间: 2024-07-29 11:15:25

领域: stat.ML,cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2407.19892v1

Yucca: A Deep Learning Framework For Medical Image Analysis

Medical image analysis using deep learning frameworks has advanced healthcare by automating complex tasks, but many existing frameworks lack flexibility, modularity, and user-friendliness. To address these challenges, we introduce Yucca, an open-source AI framework available at https://github.com/Sllambias/yucca, designed specifically for medical imaging applications and built on PyTorch and PyTorch Lightning. Yucca features a three-tiered architecture: Functional, Modules, and Pipeline, providing a comprehensive and customizable solution. Evaluated across diverse tasks such as cerebral microbleeds detection, white matter hyperintensity segmentation, and hippocampus segmentation, Yucca achieves state-of-the-art results, demonstrating its robustness and versatility. Yucca offers a powerful, flexible, and user-friendly platform for medical image analysis, inviting community contributions to advance its capabilities and impact.

Updated: 2024-07-29 11:09:10

标题: Yucca：医学图像分析的深度学习框架

摘要: 使用深度学习框架进行医学图像分析已经推动了医疗保健的发展，通过自动化复杂任务，但许多现有框架缺乏灵活性、模块化和用户友好性。为了解决这些挑战，我们介绍了Yucca，一个开源的人工智能框架，可在https://github.com/Sllambias/yucca 上找到，专门设计用于医学成像应用，并建立在PyTorch和PyTorch Lightning之上。Yucca具有三层架构：功能、模块和流水线，提供了全面和可定制的解决方案。在各种任务中进行评估，如脑微出血检测、白质高信号分割和海马体分割，Yucca实现了最先进的结果，展示了其稳健性和多功能性。Yucca为医学图像分析提供了强大、灵活和用户友好的平台，鼓励社区贡献以推动其能力和影响力。

更新时间: 2024-07-29 11:09:10

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.19888v1

A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation

With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommendation systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality modelling. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality modelling process produces disjointed embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item representations for effective user preferences prediction. We hypothesise that the use of a unified model for addressing both aforementioned isolated processes will enable the consistent extraction and cohesive fusion of joint multi-modal features, thereby enhancing the effectiveness of multi-modal recommender systems. In this paper, we propose a novel model, called Unified Multi-modal Graph Transformer (UGT), which firstly leverages a multi-way transformer to extract aligned multi-modal features from raw data for top-k recommendation. Subsequently, we build a unified graph neural network in our UGT model to jointly fuse the user/item representations with their corresponding multi-modal features. Using the graph transformer architecture of our UGT model, we show that the UGT model can achieve significant effectiveness gains, especially when jointly optimised with the commonly-used multi-modal recommendation losses.

Updated: 2024-07-29 11:04:31

标题: 一个统一的图变换器，用于克服多模态推荐中的隔离问题

摘要: 随着在线多媒体服务的快速发展，特别是在电子商务平台上，迫切需要个性化推荐系统，能够有效地编码与每个项目相关的多样化多模态内容。然而，我们认为现有的多模态推荐系统通常使用孤立的过程进行特征提取和模态建模。这种孤立的过程可能会损害推荐性能。首先，孤立的提取过程低估了多模态推荐中有效特征提取的重要性，可能会包含与项目表示不相关的信息，这对项目表示有害。其次，孤立的模态建模过程由于对每种模态的单独处理，导致项目模态产生不连贯的嵌入，这会导致用户/项目表示的亚优化融合，影响有效的用户偏好预测。我们假设使用一个统一的模型来处理上述两个孤立过程将使联合多模态特征的一致提取和凝聚融合成为可能，从而增强多模态推荐系统的效力。在本文中，我们提出了一种新颖的模型，称为统一多模态图变换器（UGT），首先利用多路变换器从原始数据中提取对齐的多模态特征进行top-k推荐。随后，在我们的UGT模型中建立一个统一的图神经网络，以联合融合用户/项目表示与其对应的多模态特征。使用我们UGT模型的图变换器架构，我们展示了UGT模型可以取得显著的效力增益，特别是在与常用的多模态推荐损失联合优化时。

更新时间: 2024-07-29 11:04:31

领域: cs.AI

下载: http://arxiv.org/abs/2407.19886v1

The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset

Subgraph representation learning is a technique for analyzing local structures (or shapes) within complex networks. Enabled by recent developments in scalable Graph Neural Networks (GNNs), this approach encodes relational information at a subgroup level (multiple connected nodes) rather than at a node level of abstraction. We posit that certain domain applications, such as anti-money laundering (AML), are inherently subgraph problems and mainstream graph techniques have been operating at a suboptimal level of abstraction. This is due in part to the scarcity of annotated datasets of real-world size and complexity, as well as the lack of software tools for managing subgraph GNN workflows at scale. To enable work in fundamental algorithms as well as domain applications in AML and beyond, we introduce Elliptic2, a large graph dataset containing 122K labeled subgraphs of Bitcoin clusters within a background graph consisting of 49M node clusters and 196M edge transactions. The dataset provides subgraphs known to be linked to illicit activity for learning the set of "shapes" that money laundering exhibits in cryptocurrency and accurately classifying new criminal activity. Along with the dataset we share our graph techniques, software tooling, promising early experimental results, and new domain insights already gleaned from this approach. Taken together, we find immediate practical value in this approach and the potential for a new standard in anti-money laundering and forensic analytics in cryptocurrencies and other financial networks.

Updated: 2024-07-29 10:53:34

标题: 洗钱的形式：在区块链上使用Elliptic2数据集进行子图表示学习

摘要: 子图表示学习是一种分析复杂网络中局部结构（或形状）的技术。借助最近可扩展的图神经网络（GNN）的发展，这种方法在子组水平（多个连接的节点）而不是在节点抽象水平上编码关系信息。我们认为，某些领域应用，如反洗钱（AML），本质上是子图问题，主流图技术一直在以次优的抽象级别运作。部分原因是由于实际规模和复杂性的标注数据集的稀缺，以及缺乏用于管理大规模子图GNN工作流的软件工具。为了使基本算法以及AML等领域应用能够工作，我们介绍了Elliptic2，一个大型图数据集，其中包含122K个比特币集群的标记子图，背景图由4900万个节点集群和1.96亿个边交易组成。该数据集提供了已知与非法活动相关的子图，用于学习在加密货币中洗钱所表现出的“形状”，并准确分类新的犯罪活动。除了数据集之外，我们还分享了我们的图技术，软件工具，有前途的早期实验结果以及从这种方法中已经得出的新领域见解。综合起来，我们发现这种方法具有即时的实际价值，以及在反洗钱和法医分析中建立新标准的潜力，包括加密货币和其他金融网络。

更新时间: 2024-07-29 10:53:34

领域: cs.LG,q-fin.GN

下载: http://arxiv.org/abs/2404.19109v3

Unfolding Time: Generative Modeling for Turbulent Flows in 4D

A recent study in turbulent flow simulation demonstrated the potential of generative diffusion models for fast 3D surrogate modeling. This approach eliminates the need for specifying initial states or performing lengthy simulations, significantly accelerating the process. While adept at sampling individual frames from the learned manifold of turbulent flow states, the previous model lacks the capability to generate sequences, hindering analysis of dynamic phenomena. This work addresses this limitation by introducing a 4D generative diffusion model and a physics-informed guidance technique that enables the generation of realistic sequences of flow states. Our findings indicate that the proposed method can successfully sample entire subsequences from the turbulent manifold, even though generalizing from individual frames to sequences remains a challenging task. This advancement opens doors for the application of generative modeling in analyzing the temporal evolution of turbulent flows, providing valuable insights into their complex dynamics.

Updated: 2024-07-29 10:43:46

标题: 揭示时间：4D湍流流动的生成建模

摘要: 最近一项关于湍流流动模拟的研究展示了生成扩散模型在快速三维代理建模中的潜力。这种方法消除了指定初始状态或进行漫长模拟的需要，显著加速了过程。虽然擅长从学习的湍流流动状态流形中采样单帧，但先前的模型缺乏生成序列的能力，阻碍了动态现象的分析。本研究通过引入4D生成扩散模型和一种物理信息引导技术来解决这一限制，使得能够生成逼真的流动状态序列。我们的发现表明，提出的方法可以成功地从湍流流动流形中采样整个子序列，尽管从单帧到序列的泛化仍然是一个具有挑战性的任务。这一进展为利用生成建模分析湍流流动的时间演变打开了大门，为了解其复杂动态提供了宝贵的见解。

更新时间: 2024-07-29 10:43:46

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2406.11390v2

OpenUAS: Embeddings of Cities in Japan with Anchor Data for Cross-city Analysis of Area Usage Patterns

We publicly release OpenUAS, a dataset of area embeddings based on urban usage patterns, including embeddings for over 1.3 million 50-meter square meshes covering a total area of 3,300 square kilometers. This dataset is valuable for analyzing area functions in fields such as market analysis, urban planning, transportation infrastructure, and infection prediction. It captures the characteristics of each area in the city, such as office districts and residential areas, by employing an area embedding technique that utilizes location information typically obtained by GPS. Numerous area embedding techniques have been proposed, and while the public release of such embedding datasets is technically feasible, it has not been realized. One of the obstacles has been the integration of data from different cities and periods into a unified space without sharing raw location data. We address this issue by developing an anchoring method that establishes anchors within a shared embedding space. We publicly release this anchor dataset along with area embedding datasets from several periods in eight major Japanese cities. This dataset allows users to analyze urban usage patterns in Japanese cities and embed their urban dataset into the same embedding space using the anchoring method. Our key contributions include the development of the anchoring method, releasing area embedding datasets for Japanese cities, and providing tools for effective data utilization.

Updated: 2024-07-29 10:43:15

标题: OpenUAS：使用锚定数据嵌入日本城市的城市区域使用模式跨城市分析

摘要: 我们公开发布了OpenUAS数据集，这是一个基于城市使用模式的区域嵌入数据集，包括超过130万个50米方块网格的嵌入数据，总覆盖面积为3300平方公里。该数据集对于分析市场分析、城市规划、交通基础设施和感染预测等领域的区域功能非常有价值。通过利用通常由GPS获取的位置信息，它捕捉了城市中每个区域的特征，如办公区和住宅区，采用了一种区域嵌入技术。已经提出了许多区域嵌入技术，虽然从技术上讲公开发布这种嵌入数据集是可行的，但尚未实现。其中一个障碍是在不共享原始位置数据的情况下将来自不同城市和时期的数据集集成到统一空间中。我们通过开发一种在共享嵌入空间内建立锚点的方法来解决这个问题。我们公开发布了这个锚点数据集以及来自八个主要日本城市的多个时期的区域嵌入数据集。该数据集允许用户分析日本城市的城市使用模式，并使用锚定方法将其城市数据集嵌入到相同的嵌入空间中。我们的关键贡献包括开发锚定方法、发布日本城市的区域嵌入数据集以及提供有效数据利用工具。

更新时间: 2024-07-29 10:43:15

领域: cs.LG

下载: http://arxiv.org/abs/2407.19872v1

Fast Private Location-based Information Retrieval Over the Torus

Location-based services offer immense utility, but also pose significant privacy risks. In response, we propose LocPIR, a novel framework using homomorphic encryption (HE), specifically the TFHE scheme, to preserve user location privacy when retrieving data from public clouds. Our system employs TFHE's expertise in non-polynomial evaluations, crucial for comparison operations. LocPIR showcases minimal client-server interaction, reduced memory overhead, and efficient throughput. Performance tests confirm its computational speed, making it a viable solution for practical scenarios, demonstrated via application to a COVID-19 alert model. Thus, LocPIR effectively addresses privacy concerns in location-based services, enabling secure data sharing from the public cloud.

Updated: 2024-07-29 10:42:17

标题: 快速私密基于位置的环形信息检索

摘要: 基于位置的服务提供了巨大的实用性，但也带来了显著的隐私风险。为此，我们提出了LocPIR，这是一个使用同态加密（HE）的新颖框架，具体采用TFHE方案，用于在从公共云检索数据时保护用户位置隐私。我们的系统利用TFHE在非多项式评估方面的专长，这对比较操作至关重要。LocPIR展示了最小化的客户端-服务器交互、减少的内存开销和高效的吞吐量。性能测试证实了它的计算速度，使其成为实际场景的可行解决方案，通过应用于COVID-19警报模型进行了演示。因此，LocPIR有效地解决了基于位置的服务中的隐私问题，实现了从公共云安全共享数据的可能。

更新时间: 2024-07-29 10:42:17

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2407.19871v1

Distances Between Partial Preference Orderings

This paper proposes to establish the distance between partial preference orderings based on two very different approaches. The first approach corresponds to the brute force method based on combinatorics. It generates all possible complete preference orderings compatible with the partial preference orderings and calculates the Frobenius distance between all fully compatible preference orderings. Unfortunately, this first method is not very efficient in solving high-dimensional problems because of its big combinatorial complexity. That is why we propose to circumvent this problem by using a second approach based on belief functions, which can adequately model the missing information of partial preference orderings. This second approach to the calculation of distance does not suffer from combinatorial complexity limitation. We show through simple examples how these two theoretical methods work.

Updated: 2024-07-29 10:39:40

标题: 偏好排序之间的距离

摘要: 本文提出建立基于两种非常不同的方法的部分偏好排序之间的距离。第一种方法对应于基于组合学的蛮力方法。它生成与部分偏好排序兼容的所有可能的完整偏好排序，并计算所有完全兼容偏好排序之间的Frobenius距离。不幸的是，由于其巨大的组合复杂性，这种第一种方法在解决高维问题时效率不高。这就是为什么我们提出通过使用基于信念函数的第二种方法来规避这个问题，信念函数可以充分模拟部分偏好排序的缺失信息。这种计算距离的第二种方法不受组合复杂性限制。我们通过简单的例子展示这两种理论方法如何工作。

更新时间: 2024-07-29 10:39:40

领域: cs.AI

下载: http://arxiv.org/abs/2407.19869v1

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.

Updated: 2024-07-29 10:35:50

标题: MMEarth：探索用于地理空间表示学习的多模态预训练任务

摘要: 未标记的地球观测（EO）数据量巨大，但许多重要应用缺乏标记的训练数据。然而，EO数据提供了独特的机会，可以根据地理位置和时间自动配对来自不同模态和传感器的数据，几乎不需要人力成本。我们抓住这个机会创建了MMEarth，一个全球范围的多样化多模态预训练数据集。利用这个新的包含120万个位置的语料库，我们提出了一种多前文掩码自动编码器（MP-MAE）方法，用于学习光学卫星图像的通用表示。我们的方法基于ConvNeXt V2架构，一个完全卷积掩码自动编码器（MAE）。借助一系列多模态前文任务，我们证明了我们的MP-MAE方法在性能上优于在ImageNet上预训练的MAEs和在特定领域卫星图像上预训练的MAEs。这在包括图像分类和语义分割在内的几个下游任务上得到了证明。我们发现，用多模态前文任务进行预训练显著提高了与仅在光学卫星图像上进行预训练相比的线性探测性能。这也导致更好的标签效率和参数效率，这是全球规模应用中关键的方面。

更新时间: 2024-07-29 10:35:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02771v2

Deep Image Priors for Magnetic Resonance Fingerprinting with pretrained Bloch-consistent denoising autoencoders

The estimation of multi-parametric quantitative maps from Magnetic Resonance Fingerprinting (MRF) compressed sampled acquisitions, albeit successful, remains a challenge due to the high underspampling rate and artifacts naturally occuring during image reconstruction. Whilst state-of-the-art DL methods can successfully address the task, to fully exploit their capabilities they often require training on a paired dataset, in an area where ground truth is seldom available. In this work, we propose a method that combines a deep image prior (DIP) module that, without ground truth and in conjunction with a Bloch consistency enforcing autoencoder, can tackle the problem, resulting in a method faster and of equivalent or better accuracy than DIP-MRF.

Updated: 2024-07-29 10:35:39

标题: 深度图像先验在具有预训练布洛赫一致去噪自动编码器的磁共振指纹图像处理中的应用

摘要: 从磁共振指纹（MRF）压缩采样获取的多参数定量地图的估计，尽管成功，仍然是一个挑战，这是由于高欠采样率和在图像重建过程中自然发生的伪影。尽管最先进的深度学习方法可以成功地解决这个任务，但要充分利用它们的能力，通常需要在一个很少有真实标准的领域进行配对数据集的训练。在这项工作中，我们提出了一种方法，它结合了一个深度图像先验（DIP）模块，这个模块可以在没有真实标准的情况下，并与一个布洛赫一致性强制自动编码器一起处理问题，从而产生一个比DIP-MRF更快且具有等效或更好精度的方法。

更新时间: 2024-07-29 10:35:39

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2407.19866v1

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actions. In particular, we consider two rule-based expert agents: a greedy agent and a N-1 agent. While the latter is more computationally expensive since it takes N-1 safety considerations into account, it exhibits a much higher operational performance. We train a fully-connected neural network (FCNN) on expert state-action pairs and evaluate it in two ways. First, we find that classification accuracy is limited despite extensive hyperparameter tuning, due to class imbalance and class overlap. Second, as a power system agent, the FCNN performs only slightly worse than expert agents. Furthermore, hybrid agents, which incorporate minimal additional simulations, match expert agents' performance with significantly lower computational cost. Consequently, imitation learning shows promise for developing fast, high-performing power grid agents, motivating its further exploration in future L2RPN studies.

Updated: 2024-07-29 10:34:19

标题: 拓扑操作的模仿学习用于日内电网运营

摘要: 电网运行由于可再生能源发电量的增加而变得越来越复杂。最近一系列的“学习运行电力网络”(L2RPN)比赛鼓励使用人工智能代理来辅助人类调度员操作电网。本文研究了模仿学习在日前电网运行中通过拓扑操作的表现。具体来说，我们考虑了两种基于规则的专家代理：贪婪代理和N-1代理。尽管后者在计算方面更昂贵，因为它考虑了N-1个安全考虑因素，但它表现出更高的运行性能。我们在专家状态-动作对上训练了一个全连接神经网络(FCNN)，并进行了两种评估。首先，我们发现尽管进行了广泛的超参数调整，但由于类别不平衡和类别重叠，分类准确性受到限制。其次，作为一个电力系统代理，FCNN的表现仅略逊于专家代理。此外，混合代理，它们包含最少的额外模拟，可以以更低的计算成本与专家代理的表现相匹配。因此，模仿学习在开发快速、高性能的电网代理方面表现出潜力，激发了在未来L2RPN研究中进一步探索的动力。

更新时间: 2024-07-29 10:34:19

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.19865v1

Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

The deployment of artificial intelligence (AI) in decision-making applications requires ensuring an appropriate level of safety and reliability, particularly in changing environments that contain a large number of unknown observations. To address this challenge, we propose a novel safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety. Our proposed solution Safe Reinforcement Learning with Anomalous State Sequences (AnoSeqs) consists of two stages. First, we train an agent in a non-safety-critical offline 'source' environment to collect safe state sequences. Next, we use these safe sequences to build an anomaly detection model that can detect potentially unsafe state sequences in a 'target' safety-critical environment where failures can have high costs. The estimated risk from the anomaly detection model is utilized to train a risk-averse RL policy in the target environment; this involves adjusting the reward function to penalize the agent for visiting anomalous states deemed unsafe by our anomaly model. In experiments on multiple safety-critical benchmarking environments including self-driving cars, our solution approach successfully learns safer policies and proves that sequential anomaly detection can provide an effective supervisory signal for training safety-aware RL agents

Updated: 2024-07-29 10:30:07

标题: 异常状态序列建模以增强强化学习中的安全性

摘要: 人工智能在决策应用中的部署需要确保适当的安全性和可靠性，特别是在包含大量未知观察的变化环境中。为了解决这一挑战，我们提出了一种新颖的安全强化学习（RL）方法，利用异常状态序列来增强RL的安全性。我们提出的解决方案安全强化学习与异常状态序列（AnoSeqs）包括两个阶段。首先，我们在一个非安全关键的离线“源”环境中训练一个代理程序来收集安全状态序列。接下来，我们使用这些安全序列来构建一个异常检测模型，可以在“目标”安全关键环境中检测潜在的不安全状态序列，失败可能造成高昂的代价。从异常检测模型中估计的风险被用来训练目标环境中的风险回避RL策略；这涉及调整奖励函数，惩罚代理程序访问被我们的异常模型视为不安全的异常状态。在包括自动驾驶汽车在内的多个安全关键基准环境的实验中，我们的解决方案成功学习到更安全的策略，并证明序列异常检测可以为训练安全意识的RL代理提供有效的监督信号。

更新时间: 2024-07-29 10:30:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19860v1

AI-Powered Energy algorithmic Trading: Integrating Hidden Markov Models with Neural Networks

In the field of quantitative finance, machine learning methods have become essential for alpha generation. This paper presents a pioneering method that uniquely combines Hidden Markov Models (HMM) and neural networks, creating a dual-model alpha generation system integrated with Black-Litterman portfolio optimization. The methodology, implemented on the QuantConnect platform, aims to predict future price movements and optimize trading strategies. Specifically, it filters for highly liquid, top-cap energy stocks to ensure stable and predictable performance while also accounting for broker payments. QuantConnect was selected because of its robust framework and to guarantee experimental reproducibility. The algorithm achieved a 31% return between June 1, 2023, and January 1, 2024, with a Sharpe ratio of 1.669, demonstrating its potential. The findings suggest significant improvements in trading strategy performance through the combined use of the HMM and neural networks. This study explores the architecture of the algorithm, data pre-processing techniques, model training procedures, and performance evaluation, highlighting its practical applicability and effectiveness in real-world trading environments. The full code and backtesting data are available under the MIT license.

Updated: 2024-07-29 10:26:52

标题: AI动力能源算法交易：将隐马尔可夫模型与神经网络整合起来

摘要: 在量化金融领域，机器学习方法已成为Alpha生成的必要工具。本文提出了一种开创性的方法，独特地结合了隐马尔可夫模型（HMM）和神经网络，创建了一个与Black-Litterman投资组合优化集成的双模型Alpha生成系统。该方法在QuantConnect平台上实现，旨在预测未来价格走势并优化交易策略。具体而言，它通过过滤高流动性、市值排名靠前的能源股票，以确保稳定和可预测的表现，同时考虑经纪人支付。选择QuantConnect是因为其健壮的框架并保证实验可重复性。该算法在2023年6月1日至2024年1月1日期间取得了31%的回报，夏普比率为1.669，展示了其潜力。研究结果表明，通过HMM和神经网络的结合使用，交易策略性能取得了显著改善。本研究探讨了算法的架构、数据预处理技术、模型训练程序和绩效评估，突出了其在真实交易环境中的实际适用性和有效性。完整的代码和回测数据可在MIT许可下获得。

更新时间: 2024-07-29 10:26:52

领域: q-fin.PM,cs.LG,q-fin.GN,stat.AP

下载: http://arxiv.org/abs/2407.19858v1

A data balancing approach towards design of an expert system for Heart Disease Prediction

Heart disease is a serious global health issue that claims millions of lives every year. Early detection and precise prediction are critical to the prevention and successful treatment of heart related issues. A lot of research utilizes machine learning (ML) models to forecast cardiac disease and obtain early detection. In order to do predictive analysis on "Heart disease health indicators " dataset. We employed five machine learning methods in this paper: Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis, Extra Tree Classifier, and AdaBoost. The model is further examined using various feature selection (FS) techniques. To enhance the baseline model, we have separately applied four FS techniques: Sequential Forward FS, Sequential Backward FS, Correlation Matrix, and Chi2. Lastly, K means SMOTE oversampling is applied to the models to enable additional analysis. The findings show that when it came to predicting heart disease, ensemble approaches in particular, random forests performed better than individual classifiers. The presence of smoking, blood pressure, cholesterol, and physical inactivity were among the major predictors that were found. The accuracy of the Random Forest and Decision Tree model was 99.83%. This paper demonstrates how machine learning models can improve the accuracy of heart disease prediction, especially when using ensemble methodologies. The models provide a more accurate risk assessment than traditional methods since they incorporate a large number of factors and complex algorithms.

Updated: 2024-07-29 10:22:00

标题: 一个数据平衡方法用于心脏病预测专家系统的设计

摘要: 心脏病是一个严重的全球健康问题，每年夺走数百万人的生命。早期检测和准确预测对预防和成功治疗心脏相关问题至关重要。许多研究利用机器学习（ML）模型来预测心脏疾病并实现早期检测。为了对“心脏病健康指标”数据集进行预测分析，本文采用了五种机器学习方法：决策树（DT）、随机森林（RF）、线性判别分析、额外树分类器和AdaBoost。该模型进一步使用了各种特征选择（FS）技术进行检验。为了增强基线模型，我们分别应用了四种FS技术：顺序前向FS、顺序后向FS、相关矩阵和Chi2。最后，K均值SMOTE过采样应用于模型中，以实现额外的分析。研究结果表明，当涉及预测心脏病时，特别是在使用集成方法时，随机森林的表现优于单个分类器。吸烟、血压、胆固醇和体力不足等因素是主要的预测因子之一。随机森林和决策树模型的准确率为99.83%。本文演示了机器学习模型如何提高心脏病预测的准确性，特别是在使用集成方法时。这些模型提供比传统方法更准确的风险评估，因为它们包含大量因素和复杂算法。

更新时间: 2024-07-29 10:22:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.18606v2

Response Theory via Generative Score Modeling

We introduce an approach for analyzing the responses of dynamical systems to external perturbations that combines score-based generative modeling with the Generalized Fluctuation-Dissipation Theorem (GFDT). The methodology enables accurate estimation of system responses, including those with non-Gaussian statistics. We numerically validate our approach using time-series data from three different stochastic partial differential equations of increasing complexity: an Ornstein-Uhlenbeck process with spatially correlated noise, a modified stochastic Allen-Cahn equation, and the 2D Navier-Stokes equations. We demonstrate the improved accuracy of the methodology over conventional methods and discuss its potential as a versatile tool for predicting the statistical behavior of complex dynamical systems.

Updated: 2024-07-29 10:18:23

标题: 通过生成式评分建模的响应理论

摘要: 我们介绍了一种分析动力系统对外部扰动响应的方法，该方法将基于得分的生成建模与广义波动-耗散定理（GFDT）结合在一起。该方法使得能够准确估计系统响应，包括那些具有非高斯统计特性的响应。我们利用来自三种不同随机偏微分方程的时间序列数据对我们的方法进行了数值验证，这些方程的复杂性逐渐增加：一个具有空间相关噪声的Ornstein-Uhlenbeck过程，一个修改后的随机Allen-Cahn方程和二维Navier-Stokes方程。我们展示了该方法相对于传统方法的改进准确性，并讨论其作为预测复杂动力系统统计行为的多功能工具的潜力。

更新时间: 2024-07-29 10:18:23

领域: physics.data-an,cs.LG

下载: http://arxiv.org/abs/2402.01029v2

Online Multi-Source Domain Adaptation through Gaussian Mixtures and Dataset Dictionary Learning

This paper addresses the challenge of online multi-source domain adaptation (MSDA) in transfer learning, a scenario where one needs to adapt multiple, heterogeneous source domains towards a target domain that comes in a stream. We introduce a novel approach for the online fit of a Gaussian Mixture Model (GMM), based on the Wasserstein geometry of Gaussian measures. We build upon this method and recent developments in dataset dictionary learning for proposing a novel strategy in online MSDA. Experiments on the challenging Tennessee Eastman Process benchmark demonstrate that our approach is able to adapt \emph{on the fly} to the stream of target domain data. Furthermore, our online GMM serves as a memory, representing the whole stream of data.

Updated: 2024-07-29 10:10:40

标题: 通过高斯混合和数据集字典学习进行在线多源域自适应

摘要: 这篇论文讨论了在线多源域自适应（MSDA）在迁移学习中的挑战，这是一个需要将多个异构源域适应到一个以流形式出现的目标域的情景。我们介绍了一种基于高斯混合模型（GMM）的在线拟合方法，基于高斯测度的Wasserstein几何。我们基于这种方法和最近的数据集字典学习发展，提出了一种新颖的在线MSDA策略。在具有挑战性的Tennessee Eastman Process基准测试上的实验表明，我们的方法能够实时适应目标域数据流。此外，我们的在线GMM充当记忆，代表了整个数据流。

更新时间: 2024-07-29 10:10:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.19853v1

Quantum Long Short-Term Memory for Drug Discovery

Quantum computing combined with machine learning (ML) is an extremely promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we successfully apply QML to drug discovery, showing that QML can significantly improve model performance and achieve faster convergence compared to classical ML. Moreover, we demonstrate that the model accuracy of the QML improves as the number of qubits increases. We also introduce noise to the QML model and find that it has little effect on our experimental conclusions, illustrating the high robustness of the QML model. This work highlights the potential application of quantum computing to yield significant benefits for scientific advancement as the qubit quantity increase and quality improvement in the future.

Updated: 2024-07-29 10:10:03

标题: 量子长短期记忆用于药物发现

摘要: 量子计算与机器学习相结合是一个非常有前途的研究领域，许多研究表明，量子机器学习（QML）预计能够比经典机器学习更有效地解决科学问题。在这项工作中，我们成功地将QML应用于药物发现，显示出QML可以显著提高模型性能，并相对于经典机器学习实现更快的收敛。此外，我们证明了随着量子比特数量增加，QML模型的准确性也会提高。我们还向QML模型引入噪声，并发现对我们的实验结论几乎没有影响，说明了QML模型的高鲁棒性。这项工作突出了量子计算潜在应用于科学进步的优势，随着未来量子比特数量的增加和质量的改善。

更新时间: 2024-07-29 10:10:03

领域: quant-ph,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2407.19852v1

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

As an emerging approach to explore the vulnerability of deep neural networks (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility of existing works, there is a lack of a unified and standardized benchmark of backdoor learning, causing unfair comparisons or unreliable conclusions (e.g., misleading, biased or even false conclusions). Consequently, it is difficult to evaluate the current progress and design the future development roadmap of this literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. Our benchmark makes three valuable contributions to the research community. 1) We provide an integrated implementation of state-of-the-art (SOTA) backdoor learning algorithms (currently including 20 attack and 32 defense algorithms), based on an extensible modular-based codebase. 2) We conduct comprehensive evaluations with 5 poisoning ratios, based on 4 models and 4 datasets, leading to 11,492 pairs of attack-against-defense evaluations in total. 3) Based on above evaluations, we present abundant analysis from 10 perspectives via 18 useful analysis tools, and provide several inspiring insights about backdoor learning. We hope that our efforts could build a solid foundation of backdoor learning to facilitate researchers to investigate existing algorithms, develop more innovative algorithms, and explore the intrinsic mechanism of backdoor learning. Finally, we have created a user-friendly website at http://backdoorbench.com, which collects all important information of BackdoorBench, including codebase, docs, leaderboard, and model Zoo.

Updated: 2024-07-29 09:57:03

标题: BackdoorBench：一个全面的后门学习基准和分析

摘要: 作为一种探索深度神经网络（DNNs）脆弱性的新兴方法，后门学习近年来引起了越来越多的关注，并且许多开创性的后门攻击和防御算法正在迅速发展中。然而，由于现有作品的多样性设置和实施难度，以及难以再现性，缺乏一个统一和标准化的后门学习基准，导致不公平的比较或不可靠的结论（例如，误导性、偏见性甚至虚假结论）。因此，评估当前进展并设计未来发展路线图变得困难。为了缓解这一困境，我们建立了一个名为BackdoorBench的全面后门学习基准。我们的基准为研究社区提供了三项有价值的贡献。1）我们提供了一体化实施最先进的（SOTA）后门学习算法（目前包括20种攻击和32种防御算法），基于可扩展的基于模块的代码库。2）我们进行了基于4个模型和4个数据集的5个毒害比例的全面评估，总共导致了11,492对攻击-防御评估。3）基于上述评估，我们通过18个有用的分析工具从10个角度提供了丰富的分析，并提供了关于后门学习的一些启发性见解。我们希望我们的努力可以为后门学习建立坚实的基础，以便研究人员调查现有算法、开发更具创新性的算法，并探索后门学习的内在机制。最后，我们在http://backdoorbench.com创建了一个用户友好的网站，收集了所有关于BackdoorBench的重要信息，包括代码库、文档、排行榜和模型动物园。

更新时间: 2024-07-29 09:57:03

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.19845v1

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability

Large Language Models (LLMs), characterized by being trained on broad amounts of data in a self-supervised manner, have shown impressive performance across a wide range of tasks. Indeed, their generative abilities have aroused interest on the application of LLMs across a wide range of contexts. However, neural networks in general, and LLMs in particular, are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model. This is a serious concern that impedes the use of LLMs on high-stakes applications, such as healthcare, where a wrong prediction can imply serious consequences. Even though there are many efforts on making LLMs more robust to adversarial attacks, there are almost no works that study \emph{how} and \emph{where} these vulnerabilities that make LLMs prone to adversarial attacks happen. Motivated by these facts, we explore how to localize and understand vulnerabilities, and propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process. Specifically, this method enables us to detect vulnerabilities related to a concrete task by (i) obtaining the subset of the model that is responsible for that task, (ii) generating adversarial samples for that task, and (iii) using MI techniques together with the previous samples to discover and understand the possible vulnerabilities. We showcase our method on a pretrained GPT-2 Small model carrying out the task of predicting 3-letter acronyms to demonstrate its effectiveness on locating and understanding concrete vulnerabilities of the model.

Updated: 2024-07-29 09:55:34

标题: 通过机制可解释性检测和理解语言模型中的漏洞

摘要: 大型语言模型（LLMs）以在自我监督方式下训练的大量数据为特征，已在广泛的任务中展现出令人印象深刻的性能。事实上，它们的生成能力引起了人们对在各种情境中应用LLMs的兴趣。然而，神经网络总体上，特别是LLMs，被认为容易受到对抗性攻击的影响，即对输入进行微不可察觉的改变可能误导模型的输出。这是一个严重的问题，阻碍了在高风险应用中使用LLMs，如医疗保健，错误的预测可能导致严重后果。尽管有许多努力使LLMs更加抗击对抗性攻击，但几乎没有研究探讨使LLMs容易受到对抗性攻击的漏洞是如何产生的。受这些事实的激励，我们探讨如何定位和理解漏洞，并提出了一种基于机械解释性（MI）技术的方法来指导这一过程。具体来说，这种方法使我们能够通过（i）获取负责该任务的模型子集，（ii）为该任务生成对抗样本，以及（iii）使用MI技术和先前的样本来发现和理解可能的漏洞，从而检测与具体任务相关的漏洞。我们展示了我们的方法在一个预训练的GPT-2 Small模型上执行预测3个字母缩写的任务，以展示其在定位和理解模型的具体漏洞方面的有效性。

更新时间: 2024-07-29 09:55:34

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.19842v1

Statistical Test on Diffusion Model-based Generated Images by Selective Inference

AI technology for generating images, such as diffusion models, has advanced rapidly. However, there is no established framework for quantifying the reliability of AI-generated images, which hinders their use in critical decision-making tasks, such as medical image diagnosis. In this study, we propose a method to quantify the reliability of decision-making tasks that rely on images produced by diffusion models within a statistical testing framework. The core concept of our statistical test involves using a selective inference framework, in which the statistical test is conducted under the condition that the images are produced by a trained diffusion model. As a case study, we study a diffusion model-based anomaly detection task for medical images. With our approach, the statistical significance of medical image diagnostic outcomes can be quantified in terms of a p-value, enabling decision-making with a controlled error rate. We demonstrate the theoretical soundness and practical effectiveness of our statistical test through numerical experiments on both synthetic and brain image datasets.

Updated: 2024-07-29 09:51:35

标题: 在选择性推断下对基于扩散模型生成的图像进行统计测试

摘要: 人工智能技术用于生成图像，如扩散模型，已经快速发展。然而，目前还没有建立一个确定人工智能生成图像可靠性的框架，这阻碍了它们在关键决策任务中的应用，比如医学图像诊断。在本研究中，我们提出了一种方法，在统计测试框架内量化依赖扩散模型生成的图像的决策任务的可靠性。我们的统计检验的核心概念涉及使用选择性推理框架，在这个框架下，统计检验是在图像由训练有素的扩散模型生成的条件下进行的。作为一个案例研究，我们研究了基于扩散模型的医学图像异常检测任务。通过我们的方法，医学图像诊断结果的统计显著性可以用p值来量化，从而实现具有控制误差率的决策。我们通过对合成和脑图像数据集进行的数值实验，展示了我们统计检验的理论合理性和实际有效性。

更新时间: 2024-07-29 09:51:35

领域: stat.ML,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.11789v2

RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching

RNA plays a crucial role in diverse life processes. In contrast to the rapid advancement of protein design methods, the work related to RNA is more demanding. Most current RNA design approaches concentrate on specified target attributes and rely on extensive experimental searches. However, these methods remain costly and inefficient due to practical limitations. In this paper, we characterize all sequence design issues as conditional generation tasks and offer parameterized representations for multiple problems. For these problems, we have developed a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs as per their requirements and integrate it into the generation network. We evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse folding, family-specific sequence generation, and 5'UTR translation efficiency prediction. RNACG attains superior or competitive performance on these tasks compared with other methods. RNACG exhibits extensive applicability in sequence generation and property prediction tasks, providing a novel approach to RNA sequence design and potential methods for simulation experiments with large-scale RNA sequence data.

Updated: 2024-07-29 09:46:46

标题: RNACG: 基于流匹配的通用RNA序列条件生成模型 (Note: "RNACG" stands for RNA Conditional Generation in this context)

摘要: RNA在多种生命过程中发挥着至关重要的作用。与蛋白质设计方法迅速发展相比，与RNA相关的工作更具挑战性。大多数当前的RNA设计方法集中在特定目标属性上，并依赖于广泛的实验搜索。然而，由于实际限制，这些方法仍然昂贵且低效。在本文中，我们将所有序列设计问题描述为条件生成任务，并为多个问题提供参数化表示。针对这些问题，我们开发了一种基于流匹配的通用RNA序列生成模型，即RNACG。RNACG可以适应各种条件输入，并具有可移植性，使用户可以根据自己的需求定制编码网络以及将其整合到生成网络中。我们在RNA三维结构反向折叠、2D结构反向折叠、家族特异性序列生成和5'UTR翻译效率预测方面评估了RNACG。与其他方法相比，RNACG在这些任务中表现出卓越或具有竞争力的性能。RNACG在序列生成和属性预测任务中展示了广泛的适用性，为RNA序列设计提供了一种新颖的方法，并为大规模RNA序列数据的模拟实验提供了潜在方法。

更新时间: 2024-07-29 09:46:46

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2407.19838v1

ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation

Classical Arabic represents a significant era, encompassing the golden age of Arab culture, philosophy, and scientific literature. With a broad consensus on the importance of translating these literatures to enrich knowledge dissemination across communities, the advent of large language models (LLMs) and translation systems offers promising tools to facilitate this goal. However, we have identified a scarcity of translation datasets in Classical Arabic, which are often limited in scope and topics, hindering the development of high-quality translation systems. In response, we present the ATHAR dataset, comprising 66,000 high-quality Classical Arabic to English translation samples that cover a wide array of subjects including science, culture, and philosophy. Furthermore, we assess the performance of current state-of-the-art LLMs under various settings, concluding that there is a need for such datasets in current systems. Our findings highlight how models can benefit from fine-tuning or incorporating this dataset into their pretraining pipelines. The dataset is publicly available on the HuggingFace Data Hub at \url{https://huggingface.co/datasets/mohamed-khalil/ATHAR}.

Updated: 2024-07-29 09:45:34

标题: ATHAR：一个用于古典阿拉伯语到英语翻译的高质量且多样化的数据集

摘要: 古典阿拉伯语代表了一个重要的时代，包括阿拉伯文化、哲学和科学文学的黄金时期。关于将这些文学作品翻译成丰富知识传播的重要性有着广泛的共识，大型语言模型（LLMs）和翻译系统的出现为实现这一目标提供了有希望的工具。然而，我们发现古典阿拉伯语的翻译数据集稀缺，通常范围和主题有限，阻碍了高质量翻译系统的发展。作为回应，我们提出了ATHAR数据集，包括6.6万个高质量的古典阿拉伯语到英语的翻译样本，涵盖了广泛的主题，包括科学、文化和哲学。此外，我们评估了当前最先进的LLMs在不同设置下的性能，得出结论当前系统需要这样的数据集。我们的发现突显了模型如何能够从微调或将该数据集纳入其预训练流程中受益。该数据集可以在HuggingFace Data Hub上公开获取：https://huggingface.co/datasets/mohamed-khalil/ATHAR。

更新时间: 2024-07-29 09:45:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19835v1

ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2

Multimodal Large Language Models (MLLMs) have attracted much attention due to their multifunctionality. However, traditional Transformer architectures incur significant overhead due to their secondary computational complexity. To address this issue, we introduce ML-Mamba, a multimodal language model that utilizes the latest and efficient Mamba-2 model for inference. Mamba-2 is known for its linear extension and fast processing of long sequences. We replace the Transformer based backbone with a pre-trained Mamba-2 model and explore methods for integrating 2D visual selective scanning mechanisms into multimodal learning. We also try various visual encoders and Mamba-2 model variants. Our extensive experiments conducted in various multimodal benchmark tests have demonstrated the competitive performance of ML-Mamba and highlighted the potential of state space models in multimodal tasks. The experimental results show that: (1) ML-Mamba achieves performance comparable to state-of-the-art methods such as TinyLaVA and MobileVLM v2 through its linear sequential modeling, while also having faster inference speed; (2) ML-Mamba performs well in visual hallucinations and spatial relationship judgment in closed set benchmark tests; (3) ML-Mamba achieves performance comparable to LLaVA while reducing the number of parameters by 40\%.(4) Compared to the multimodal model using the original Mamba model, the Mamba-2 based large-scale multimodal language model has stronger inference performance and effectiveness.

Updated: 2024-07-29 09:38:15

标题: ML-Mamba: 高效的多模态大型语言模型，利用Mamba-2

摘要: Multimodal Large Language Models (MLLMs)由于其多功能性而受到广泛关注。然而，传统的Transformer架构由于其辅助计算复杂性而产生了显着的开销。为了解决这个问题，我们引入了ML-Mamba，这是一个利用最新和高效的Mamba-2模型进行推断的多模态语言模型。Mamba-2以其线性扩展和对长序列的快速处理而闻名。我们将基于Transformer的骨干替换为预训练的Mamba-2模型，并探索将二维视觉选择性扫描机制整合到多模态学习中的方法。我们还尝试了各种视觉编码器和Mamba-2模型变体。我们在各种多模态基准测试中进行的大量实验已经证明了ML-Mamba的竞争性能，并突显了状态空间模型在多模态任务中的潜力。实验结果显示：（1）通过其线性顺序建模，ML-Mamba在推断速度更快的同时实现了与TinyLaVA和MobileVLM v2等最先进方法可比的性能；（2）ML-Mamba在封闭集基准测试中表现出色，具有良好的视觉幻觉和空间关系判断能力；（3）ML-Mamba在减少40%参数数量的同时实现了与LLaVA可比的性能；（4）与使用原始Mamba模型的多模态模型相比，基于Mamba-2的大规模多模态语言模型具有更强的推断性能和有效性。

更新时间: 2024-07-29 09:38:15

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.19832v1

The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer

Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.765 and an AP value of 0.415 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.

Updated: 2024-07-29 09:33:01

标题: 数据和知识的结合力量：GPT-4o在预测肺癌淋巴结转移中是一种有效的机器学习模型解释器

摘要: 淋巴结转移（LNM）是决定肺癌患者初始治疗的关键因素，然而准确的术前诊断LNM仍具有挑战性。最近，大型语言模型（LLMs）由于其出色的文本生成能力而受到重视。利用从大量语料库中学习的广泛医学知识，LLMs可以估计临床问题的概率，尽管其性能在历史上一直不如数据驱动的机器学习模型。在本文中，我们提出了一种新颖的集成方法，将LLMs获得的医学知识与机器学习模型识别的潜在模式相结合，以增强LNM预测性能。首先，我们利用患者数据开发了机器学习模型。然后，我们设计了一个提示模板，将患者数据与机器学习模型的预测概率整合在一起。随后，我们指示OpenAI开发的最先进的LLM——GPT-4o，根据患者数据估计LNM的可能性，然后使用机器学习输出调整估计值。最后，我们使用相同提示从GPT-4o收集了三个输出，并将这些结果合并为最终预测。使用提出的方法，我们的模型在LNM预测上实现了AUC值为0.765和AP值为0.415，相较于基线机器学习模型显著提升了预测性能。实验结果表明，GPT-4o能够有效利用其医学知识和机器学习模型预测的概率，实现更准确的LNM预测。这些发现表明，LLMs在临床风险预测任务中表现良好，为在临床预测中整合医学知识和患者数据提供了新的范式。

更新时间: 2024-07-29 09:33:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17900v2

Generative Retrieval with Preference Optimization for E-commerce Search

Generative retrieval introduces a groundbreaking paradigm to document retrieval by directly generating the identifier of a pertinent document in response to a specific query. This paradigm has demonstrated considerable benefits and potential, particularly in representation and generalization capabilities, within the context of large language models. However, it faces significant challenges in E-commerce search scenarios, including the complexity of generating detailed item titles from brief queries, the presence of noise in item titles with weak language order, issues with long-tail queries, and the interpretability of results. To address these challenges, we have developed an innovative framework for E-commerce search, called generative retrieval with preference optimization. This framework is designed to effectively learn and align an autoregressive model with target data, subsequently generating the final item through constraint-based beam search. By employing multi-span identifiers to represent raw item titles and transforming the task of generating titles from queries into the task of generating multi-span identifiers from queries, we aim to simplify the generation process. The framework further aligns with human preferences using click data and employs a constrained search method to identify key spans for retrieving the final item, thereby enhancing result interpretability. Our extensive experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.

Updated: 2024-07-29 09:31:19

标题: 在电子商务搜索中基于偏好优化的生成式检索

摘要: 生成式检索引入了一种开创性的文档检索范式，通过直接生成与特定查询相应的相关文档的标识符。这种范式在大型语言模型的背景下展现出了相当大的好处和潜力，特别是在表示和泛化能力方面。然而，在电子商务搜索场景中，它面临着重大挑战，包括从简短查询中生成详细商品标题的复杂性，商品标题中存在语言次序较弱的噪声，长尾查询问题，以及结果的可解释性。为了解决这些挑战，我们开发了一种创新的电子商务搜索框架，称为具有偏好优化的生成式检索。该框架旨在有效地学习和与目标数据对齐自回归模型，随后通过基于约束的波束搜索生成最终商品。通过使用多跨度标识符来表示原始商品标题，并将从查询生成标题的任务转化为从查询生成多跨度标识符的任务，我们旨在简化生成过程。该框架进一步通过使用点击数据与人类偏好对齐，并采用约束搜索方法来识别用于检索最终商品的关键跨度，从而增强结果的可解释性。我们的广泛实验表明，这一框架在真实数据集上取得了竞争性表现，在线A/B测试证明了其改进转化率的卓越性和有效性。

更新时间: 2024-07-29 09:31:19

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.19829v1

Federated Learning based Latent Factorization of Tensors for Privacy-Preserving QoS Prediction

In applications related to big data and service computing, dynamic connections tend to be encountered, especially the dynamic data of user-perspective quality of service (QoS) in Web services. They are transformed into high-dimensional and incomplete (HDI) tensors which include abundant temporal pattern information. Latent factorization of tensors (LFT) is an extremely efficient and typical approach for extracting such patterns from an HDI tensor. However, current LFT models require the QoS data to be maintained in a central place (e.g., a central server), which is impossible for increasingly privacy-sensitive users. To address this problem, this article creatively designs a federated learning based on latent factorization of tensors (FL-LFT). It builds a data-density -oriented federated learning model to enable isolated users to collaboratively train a global LFT model while protecting user's privacy. Extensive experiments on a QoS dataset collected from the real world verify that FL-LFT shows a remarkable increase in prediction accuracy when compared to state-of-the-art federated learning (FL) approaches.

Updated: 2024-07-29 09:30:00

标题: 联邦学习基于张量的潜在因子分解，用于隐私保护的QoS预测

摘要: 在与大数据和服务计算相关的应用中，经常会遇到动态连接，特别是Web服务中用户视角的服务质量（QoS）的动态数据。它们被转化为包含丰富时间模式信息的高维不完整（HDI）张量。张量的潜在因子分解（LFT）是一种极其高效和典型的方法，用于从HDI张量中提取这些模式。然而，当前的LFT模型要求QoS数据在一个中心位置（例如中央服务器）进行维护，这对于越来越注重隐私的用户来说是不可能的。为解决这一问题，本文创造性地设计了基于张量潜在因子分解的联邦学习（FL-LFT）。它构建了一个以数据密度为导向的联邦学习模型，使得孤立用户能够共同训练一个全局LFT模型，同时保护用户的隐私。从真实世界收集的QoS数据集上进行的广泛实验验证了，与最先进的联邦学习（FL）方法相比，FL-LFT在预测精度方面表现出显著的提高。

更新时间: 2024-07-29 09:30:00

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.19828v1

Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process

In system monitoring, automatic fault diagnosis seeks to infer the systems' state based on sensor readings, e.g., through machine learning models. In this context, it is of key importance that, based on historical data, these systems are able to generalize to incoming data. In parallel, many factors may induce changes in the data probability distribution, hindering the possibility of such models to generalize. In this sense, domain adaptation is an important framework for adapting models to different probability distributions. In this paper, we propose a new benchmark, based on the Tennessee Eastman Process of Downs and Vogel (1993), for benchmarking domain adaptation methods in the context of chemical processes. Besides describing the process, and its relevance for domain adaptation, we describe a series of data processing steps for reproducing our benchmark. We then test 11 domain adaptation strategies on this novel benchmark, showing that optimal transport-based techniques outperform other strategies.

Updated: 2024-07-29 09:22:04

标题: 在田纳西伊斯曼过程中的化学过程领域自适应基准测试

摘要: 在系统监控中，自动故障诊断旨在基于传感器读数推断系统的状态，例如通过机器学习模型。在这种情况下，基于历史数据，这些系统能够推广到传入数据是非常重要的。同时，许多因素可能导致数据概率分布发生变化，阻碍这些模型推广的可能性。在这种意义上，领域自适应是一种重要的框架，用于将模型适应不同的概率分布。在本文中，我们提出了一个基于Downs和Vogel（1993年）的田纳西伊斯曼工艺的新基准，用于在化学过程的背景下对领域自适应方法进行基准测试。除了描述该过程及其与领域自适应的相关性外，我们还描述了一系列数据处理步骤来复制我们的基准。然后我们在这个新的基准上测试了11种领域自适应策略，结果显示基于最优输运的技术优于其他策略。

更新时间: 2024-07-29 09:22:04

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2308.11247v2

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

Today's large language models (LLMs) can solve challenging question-answering tasks, and prompt engineering techniques, such as chain-of-thought (CoT), have gained attention for enhancing the explanation and correctness of outputs. Nevertheless, models require significant time to generate answers augmented with lengthy reasoning details. To address this issue, this paper analyzes the impact of output lengths on LLM inference pipelines and proposes novel metrics to evaluate them in terms of \textit{correct conciseness}. It also examines the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT (CCoT), which encourages the model to limit output length. Experiments on pre-trained LLMs demonstrated the benefit of the proposed metrics and the effectiveness of CCoT across different models. For instance, constraining the reasoning of LLaMA2-70b to 100 words improves the accuracy from 36.01\% (CoT) to 41.07\% (CCoT) on the GSM8K dataset, while reducing the average output length by 28 words.

Updated: 2024-07-29 09:21:52

标题: 简洁思考：输出长度对LLM推理和成本的影响

摘要: 今天的大型语言模型（LLMs）可以解决具有挑战性的问答任务，并且工程技术，如思维链（CoT），已经引起了人们的关注，以增强输出的解释和正确性。然而，模型需要大量时间来生成附加了长篇推理细节的答案。为了解决这个问题，本文分析了输出长度对LLM推理流水线的影响，并提出了用于评估它们的新颖指标，即“正确简洁性”。它还通过一个精细的提示工程策略Constrained-CoT（CCoT）来控制输出长度的影响。CCoT鼓励模型限制输出长度。在预训练的LLMs上进行的实验显示了所提出的指标的好处，以及CCoT在不同模型上的有效性。例如，将LLaMA2-70b的推理限制在100个单词内，将GSM8K数据集上的准确度从36.01\%（CoT）提高到41.07\%（CCoT），同时减少平均输出长度28个单词。

更新时间: 2024-07-29 09:21:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19825v1

Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription

Automatic drum transcription is a critical tool in Music Information Retrieval for extracting and analyzing the rhythm of a music track, but it is limited by the size of the datasets available for training. A popular method used to increase the amount of data is by generating them synthetically from music scores rendered with virtual instruments. This method can produce a virtually infinite quantity of tracks, but empirical evidence shows that models trained on previously created synthetic datasets do not transfer well to real tracks. In this work, besides increasing the amount of data, we identify and evaluate three more strategies that practitioners can use to improve the realism of the generated data and, thus, narrow the synthetic-to-real transfer gap. To explore their efficacy, we used them to build a new synthetic dataset and then we measured how the performance of a model scales and, specifically, at what value it will stagnate when increasing the number of training tracks for different datasets. By doing this, we were able to prove that the aforementioned strategies contribute to make our dataset the one with the most realistic data distribution and the lowest synthetic-to-real transfer gap among the synthetic datasets we evaluated. We conclude by highlighting the limits of training with infinite data in drum transcription and we show how they can be overcome.

Updated: 2024-07-29 09:17:16

标题: 分析和减少音乐信息检索中合成到真实转移差距：自动鼓转录任务

摘要: 自动鼓谱转录是音乐信息检索中的关键工具，用于提取和分析音乐曲目的节奏，但受限于可用于训练的数据集大小。一种常用的方法是通过使用虚拟乐器渲染的乐谱生成数据，以增加数据量。这种方法可以产生几乎无限数量的曲目，但经验证，之前创建的合成数据集训练的模型在真实曲目上表现不佳。在本研究中，除了增加数据量外，我们还确定并评估了从业者可以使用的三种策略，以提高生成数据的真实性，从而缩小合成到真实转移差距。为了探究它们的有效性，我们使用它们构建了一个新的合成数据集，然后测量了模型性能如何扩展，特别是在增加不同数据集的训练曲目时会在何值处停滞。通过这样做，我们能够证明上述策略有助于使我们的数据集成为具有最真实数据分布和最低合成到真实转移差距的合成数据集之一。最后，我们强调了在鼓谱转录中使用无限数据的局限性，并展示了如何克服这些限制。

更新时间: 2024-07-29 09:17:16

领域: cs.SD,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.19823v1

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPs give access to results for POSGs, studied in game theory; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

Updated: 2024-07-29 09:15:29

标题: 不精确概率遇上部分可观测性：鲁棒POMDPs的游戏语义

摘要: 部分可观察的马尔可夫决策过程（POMDPs）依赖于一个关键假设，即概率分布是精确已知的。鲁棒POMDPs（RPOMDPs）通过定义不精确的概率，即不确定性集合，来减轻这种担忧。虽然鲁棒MDPs已经得到广泛研究，但对RPOMDPs的研究有限，并主要集中在算法解决方法上。我们通过展示以下内容扩展了对RPOMDPs的理论理解：1）对不确定性集合的不同假设会影响最优策略和价值；2）RPOMDPs具有部分可观察的随机博弈（POSG）语义；3）相同的RPOMDP在不同假设下会导致语义上不同的POSGs，从而产生不同的策略和价值。这些新颖的RPOMDP语义为POSGs的研究提供了结果；具体来说，我们展示了纳什均衡的存在。最后，我们使用我们的语义对现有的RPOMDP文献进行分类，澄清了这些现有作品所操作的不确定性假设。

更新时间: 2024-07-29 09:15:29

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2405.04941v2

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Accurate real-time object detection is vital across numerous industrial applications, from safety monitoring to quality control. Traditional approaches, however, are hindered by arduous manual annotation and data collection, struggling to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an innovative automated end-to-end pipeline that revolutionizes object detection workflows from data collection to model evaluation. It eliminates the need for laborious human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios. DART encompasses four key stages: (1) Data Diversification using subject-driven image generation (DreamBooth with SDXL), (2) Annotation via open-vocabulary object detection (Grounding DINO) to generate bounding box and class labels, (3) Review of generated images and pseudo-labels by large multimodal models (InternVL-1.5 and GPT-4o) to guarantee credibility, and (4) Training of real-time object detectors (YOLOv8 and YOLOv10) using the verified data. We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832. Its modular design ensures easy exchangeability and extensibility, allowing for future algorithm upgrades, seamless integration of new object categories, and adaptability to customized environments without manual labeling and additional data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.

Updated: 2024-07-29 09:14:07

标题: DART：具有数据多样性、开放词汇边界框标注、伪标签审查和模型训练的自动化端到端目标检测管道

摘要: 准确的实时物体检测在许多工业应用中至关重要，从安全监控到质量控制。然而，传统方法受到费力的手动标注和数据收集的阻碍，难以适应不断变化的环境和新颖的目标物体。为了解决这些限制，本文提出了DART，这是一种革新的自动端到端流水线，从数据收集到模型评估彻底改变了物体检测工作流程。它消除了繁琐的人工标注和广泛数据收集的需要，同时在各种场景中取得了出色的准确性。DART包括四个关键阶段：（1）使用主体驱动图像生成（DreamBooth with SDXL）进行数据多样化，（2）通过开放词汇物体检测（Grounding DINO）生成边界框和类标签进行注释，（3）通过大型多模型（InternVL-1.5和GPT-4o）审核生成的图像和伪标签以确保可靠性，以及（4）使用经过验证的数据训练实时物体检测器（YOLOv8和YOLOv10）。我们将DART应用于一个名为Liebherr Product的自采集数据集，其中包含23个类别的超过15K张高质量图像。DART的当前实例将平均精度（AP）从0.064提高到0.832。其模块化设计确保易于交换和扩展，允许未来算法升级，无缝集成新的物体类别，以及适应定制环境而无需手动标注和额外数据收集。代码和数据集已发布在https://github.com/chen-xin-94/DART。

更新时间: 2024-07-29 09:14:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.09174v3

Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent

Adversarial attacks against deep learning models represent a major threat to the security and reliability of natural language processing (NLP) systems. In this paper, we propose a modification to the BERT-Attack framework, integrating Projected Gradient Descent (PGD) to enhance its effectiveness and robustness. The original BERT-Attack, designed for generating adversarial examples against BERT-based models, suffers from limitations such as a fixed perturbation budget and a lack of consideration for semantic similarity. The proposed approach in this work, PGD-BERT-Attack, addresses these limitations by leveraging PGD to iteratively generate adversarial examples while ensuring both imperceptibility and semantic similarity to the original input. Extensive experiments are conducted to evaluate the performance of PGD-BERT-Attack compared to the original BERT-Attack and other baseline methods. The results demonstrate that PGD-BERT-Attack achieves higher success rates in causing misclassification while maintaining low perceptual changes. Furthermore, PGD-BERT-Attack produces adversarial instances that exhibit greater semantic resemblance to the initial input, enhancing their applicability in real-world scenarios. Overall, the proposed modification offers a more effective and robust approach to adversarial attacks on BERT-based models, thus contributing to the advancement of defense against attacks on NLP systems.

Updated: 2024-07-29 09:07:29

标题: 用投影梯度下降增强对BERT模型的对抗文本攻击

摘要: 对深度学习模型的对抗性攻击对自然语言处理（NLP）系统的安全性和可靠性构成了重大威胁。本文提出了对BERT-Attack框架的修改，将投影梯度下降（PGD）集成进来，以增强其效果和鲁棒性。原始的BERT-Attack旨在针对基于BERT的模型生成对抗性示例，但存在固定扰动预算和对语义相似性考虑不足等限制。本文提出的方法PGD-BERT-Attack通过利用PGD来迭代生成对抗性示例，同时确保不可察觉性和与原始输入的语义相似性，以解决这些限制。进行了大量实验来评估PGD-BERT-Attack与原始BERT-Attack和其他基准方法的性能。结果表明，PGD-BERT-Attack在导致错误分类的成功率方面更高，同时保持低感知变化。此外，PGD-BERT-Attack生成的对抗实例与初始输入具有更大的语义相似性，增强了它们在现实场景中的适用性。总体而言，提出的修改为基于BERT的模型的对抗攻击提供了更有效和鲁棒的方法，从而促进了对NLP系统攻击的防御的进步。

更新时间: 2024-07-29 09:07:29

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.21073v1

Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging

Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.

Updated: 2024-07-29 09:05:17

标题: 基准依赖度量标准：防止医学影像中的快捷学习

摘要: 医学影像队列经常受到因素的干扰，如获取设备、医院地点、患者背景等等。因此，深度学习模型往往会学习到虚假相关性，而不是因果相关的特征，从而限制了它们对新数据的泛化能力。这个问题可以通过最小化任务相关和非任务相关变量之间的中间表示的依赖性度量来解决。这些度量包括互信息、距离相关性和对抗分类器的性能。在这里，我们对防止快捷学习的任务进行了这些依赖性度量的基准测试。我们使用Morpho-MNIST和CheXpert胸部放射影像任务进行了简化设置的研究。我们的结果为如何减轻医学影像中的混杂因素提供了见解。

更新时间: 2024-07-29 09:05:17

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.18792v2

Improving Retrieval Augmented Language Model with Self-Reasoning

The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.

Updated: 2024-07-29 09:05:10

标题: 使用自我推理改进检索增强语言模型

摘要: 检索增强语言模型（RALM）通过在推理过程中整合外部知识，展现出在知识密集型任务上出色的表现，从而减轻了大型语言模型（LLMs）中遗传的事实幻觉。尽管取得了这些进展，但在RALM的实施方面仍存在挑战，特别是与其可靠性和可追溯性有关的问题。具体而言，不相关的文档检索可能导致无用的响应生成，甚至降低LLMs的性能，而在生成的输出中缺乏适当的引文会使验证模型的可信度变得更加复杂。为此，我们提出了一个旨在提高RALMs可靠性和可追溯性的新型自我推理框架，其核心思想是利用LLM本身生成的推理轨迹。该框架涉及三个过程：一个关注相关性的过程，一个关注证据的选择性过程，以及一个轨迹分析过程。我们在四个公共数据集上评估了我们的框架（两个短形QA数据集，一个长形QA数据集和一个事实验证数据集），以展示我们的方法的优越性，可以胜过现有的最先进模型，并且可以在仅使用2,000个训练样本的情况下达到与GPT-4可比的性能。

更新时间: 2024-07-29 09:05:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19813v1

Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture

Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.

Updated: 2024-07-29 09:04:11

标题: 使用视觉-MLP架构的合成热像和RGB视频进行自动疼痛评估

摘要: 疼痛评估对于制定最佳疼痛管理方案以减轻痛苦并预防患者功能下降至关重要。因此，可靠准确的自动疼痛评估系统对于持续有效地监测患者至关重要。本研究介绍了由生成对抗网络生成的合成热视频，并将其整合到疼痛识别流程中，并评估了其有效性。该框架由一个Vision-MLP和一个基于Transformer的模块组成，利用RGB和合成热视频在单模态和多模态设置中。在来自BioVid数据库的面部视频上进行的实验展示了合成热视频的有效性，并强调了它的潜在优势。

更新时间: 2024-07-29 09:04:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19811v1

Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS

Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.

Updated: 2024-07-29 09:02:43

标题: 双胞胎-疼痛ViT：面向跨模态视觉变压器框架的多模态自动疼痛评估，使用面部视频和fNIRS

摘要: 自动疼痛评估在推进医疗保健和优化疼痛管理策略方面起着关键作用。这项研究已提交给下一代疼痛评估的第一届多模态感知大挑战（AI4PAIN）。所提出的多模态框架利用面部视频和fNIRS，并提出了一种模态无关的方法，减轻了对领域特定模型的需求。采用双ViT配置并采用fNIRS的波形表示，以及从两种模态中提取的嵌入，展示了所提出方法的有效性，在多级疼痛评估任务中实现了46.76％的准确率。

更新时间: 2024-07-29 09:02:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19809v1

Imputation for prediction: beware of diminishing returns

Missing values are prevalent across various fields, posing challenges for training and deploying predictive models. In this context, imputation is a common practice, driven by the hope that accurate imputations will enhance predictions. However, recent theoretical and empirical studies indicate that simple constant imputation can be consistent and competitive. This empirical study aims at clarifying if and when investing in advanced imputation methods yields significantly better predictions. Relating imputation and predictive accuracies across combinations of imputation and predictive models on 20 datasets, we show that imputation accuracy matters less i) when using expressive models, ii) when incorporating missingness indicators as complementary inputs, iii) matters much more for generated linear outcomes than for real-data outcomes. Interestingly, we also show that the use of the missingness indicator is beneficial to the prediction performance, even in MCAR scenarios. Overall, on real-data with powerful models, improving imputation only has a minor effect on prediction performance. Thus, investing in better imputations for improved predictions often offers limited benefits.

Updated: 2024-07-29 09:01:06

标题: 预测中的插补：小心递减收益

摘要: 缺失值在各个领域中普遍存在，给训练和部署预测模型带来挑战。在这种情况下，填补是一种常见的做法，希望准确的填补可以提高预测。然而，最近的理论和实证研究表明，简单的常数填补可以保持一致性且具有竞争力。这项实证研究旨在澄清，投资于先进的填补方法何时会产生显著更好的预测。通过在20个数据集上比较填补和预测模型组合之间的填补精度和预测准确度，我们发现：在使用表现型模型时，填补准确度不那么重要；在将缺失指标作为补充输入时，填补准确度也不那么重要；对于生成的线性结果而言，填补的重要性要比真实数据结果更大。有趣的是，我们还发现，在MCAR场景中，使用缺失指标对预测性能有益。总的来说，在具有强大模型的真实数据上，改进填补仅对预测性能产生轻微影响。因此，为了改进预测，投资于更好的填补通常提供有限的益处。

更新时间: 2024-07-29 09:01:06

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.19804v1

Decision Machines: Enhanced Decision Trees

This paper presents Decision Machines (DMs), an innovative evolution of traditional binary decision trees, which leverages matrix computations to significantly enhance both computational efficiency and interpretability. By explicitly mapping the dependencies between predictions and binary tests within a vector space, DMs offer a streamlined approach to navigating decision paths. We integrate decision trees with kernel methods, ensemble methods and attention mechanisms. The integration of these elements not only bolsters the hierarchical structure of decision trees but also aligns with the computational efficiency of matrix computations. Our work bridges the gap between traditional machine learning algorithms and modern deep learning techniques, providing a novel foundation for further research and application in the field of machine learning.

Updated: 2024-07-29 08:56:31

标题: 决策机器：增强决策树

摘要: 本文介绍了决策机器（DMs），这是传统二叉决策树的创新演变，利用矩阵计算显著增强了计算效率和可解释性。通过在向量空间中明确映射预测和二元测试之间的依赖关系，DMs提供了一种简化的决策路径导航方法。我们将决策树与核方法、集成方法和注意机制相结合。这些元素的整合不仅增强了决策树的层次结构，还与矩阵计算的计算效率相一致。我们的工作弥合了传统机器学习算法和现代深度学习技术之间的差距，在机器学习领域提供了进一步研究和应用的新基础。

更新时间: 2024-07-29 08:56:31

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2101.11347v6

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose VolDoGer: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed VolDoGer by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models, ranging from fine-tuned models to a recent multimodal large language model, through VolDoGer.

Updated: 2024-07-29 08:38:46

标题: VolDoGer：领域泛化视觉语言任务中LLM辅助数据集

摘要: 域泛化是深度学习模型的一个关键方面，因为它决定了模型在未见领域数据上表现良好的能力。然而，针对视觉-语言任务的深度学习模型的域泛化研究仍然有限，主要是因为缺乏必要的数据集。为了解决这些挑战，我们提出了VolDoGer：用于域泛化的视觉-语言数据集，这是一个专门设计的数据集，用于解决三个视觉-语言任务：图像字幕，视觉问题回答和视觉蕴涵。我们通过扩展基于LLM的数据注释技术到视觉-语言任务，构建了VolDoGer，从而减轻了招募人类注释者的负担。我们通过VolDoGer评估了各种模型的域泛化能力，从微调模型到最近的多模态大型语言模型。

更新时间: 2024-07-29 08:38:46

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.19795v1

A learning theory for quantum photonic processors and beyond

We consider the tasks of learning quantum states, measurements and channels generated by continuous-variable (CV) quantum circuits. This family of circuits is suited to describe optical quantum technologies and in particular it includes state-of-the-art photonic processors capable of showing quantum advantage. We define classes of functions that map classical variables, encoded into the CV circuit parameters, to outcome probabilities evaluated on those circuits. We then establish efficient learnability guarantees for such classes, by computing bounds on their pseudo-dimension or covering numbers, showing that CV quantum circuits can be learned with a sample complexity that scales polynomially with the circuit's size, i.e., the number of modes. Our results show that CV circuits can be trained efficiently using a number of training samples that, unlike their finite-dimensional counterpart, does not scale with the circuit depth.

Updated: 2024-07-29 08:38:25

标题: 一个关于量子光子处理器及其进一步发展的学习理论

摘要: 我们考虑学习由连续变量（CV）量子电路产生的量子态、测量和通道的任务。这类电路适合描述光学量子技术，特别是包括能够展示量子优势的最新光子处理器。我们定义了将经典变量映射到在这些电路上评估的结果概率的函数类。然后，我们通过计算它们的伪维数或覆盖数的边界，为这些类建立了高效的可学习性保证，表明CV量子电路可以以与电路规模（即模式数量）多项式比例的样本复杂度学习。我们的结果表明，CV电路可以在训练样本数量上高效地训练，与其有限维度对应物不同，不会随电路深度增加而增加。

更新时间: 2024-07-29 08:38:25

领域: quant-ph,cs.CC,cs.IT,cs.LG,math-ph,math.IT,math.MP

下载: http://arxiv.org/abs/2209.03075v3

Hashing based Contrastive Learning for Virtual Screening

Virtual screening (VS) is a critical step in computer-aided drug discovery, aiming to identify molecules that bind to a specific target receptor like protein. Traditional VS methods, such as docking, are often too time-consuming for screening large-scale molecular databases. Recent advances in deep learning have demonstrated that learning vector representations for both proteins and molecules using contrastive learning can outperform traditional docking methods. However, given that target databases often contain billions of molecules, real-valued vector representations adopted by existing methods can still incur significant memory and time costs in VS. To address this problem, in this paper we propose a hashing-based contrastive learning method, called DrugHash, for VS. DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval. In particular, DrugHash designs a simple yet effective hashing strategy to enable end-to-end learning of binary hash codes for both protein and molecule modalities, which can dramatically reduce the memory and time costs with higher accuracy compared with existing methods. Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy, with a memory saving of 32$\times$ and a speed improvement of 3.5$\times$.

Updated: 2024-07-29 08:33:49

标题: 基于哈希的对比学习在虚拟筛选中的应用

摘要: 虚拟筛选（VS）是计算辅助药物发现中的关键步骤，旨在识别与特定靶点受体（如蛋白质）结合的分子。传统的VS方法，如对接，通常对于筛选大规模分子数据库来说耗时过长。最近深度学习的进展表明，利用对比学习为蛋白质和分子学习向量表示可以超越传统的对接方法。然而，考虑到目标数据库通常包含数十亿分子，现有方法采用的实数向量表示仍然可能在VS中产生显著的内存和时间成本。为解决这一问题，本文提出了一种基于哈希的对比学习方法，称为DrugHash，用于VS。DrugHash将VS视为一种使用高效二进制哈希码进行检索的任务。具体而言，DrugHash设计了一种简单而有效的哈希策略，以实现蛋白质和分子模态的端到端学习二进制哈希码，可与现有方法相比显著降低内存和时间成本，并具有更高的准确性。实验结果表明，DrugHash可以超越现有方法，实现最先进的准确性，内存节省率达到32倍，速度提高率为3.5倍。

更新时间: 2024-07-29 08:33:49

领域: cs.AI

下载: http://arxiv.org/abs/2407.19790v1

Technical Report on the Pangram AI-Generated Text Classifier

We present Pangram Text, a transformer-based neural network trained to distinguish text written by large language models from text written by humans. Pangram Text outperforms zero-shot methods such as DetectGPT as well as leading commercial AI detection tools with over 38 times lower error rates on a comprehensive benchmark comprised of 10 text domains (student writing, creative writing, scientific writing, books, encyclopedias, news, email, scientific papers, short-form Q&A) and 8 open- and closed-source large language models. We propose a training algorithm, hard negative mining with synthetic mirrors, that enables our classifier to achieve orders of magnitude lower false positive rates on high-data domains such as reviews. Finally, we show that Pangram Text is not biased against nonnative English speakers and generalizes to domains and models unseen during training.

Updated: 2024-07-29 08:27:34

标题: 关于Pangram人工智能生成的文本分类器的技术报告

摘要: 我们提出了Pangram Text，这是一个基于transformer的神经网络，经过训练可以区分由大型语言模型编写的文本和由人类编写的文本。Pangram Text在由10个文本领域（学生写作、创意写作、科学写作、书籍、百科全书、新闻、电子邮件、科学论文、简短问答）和8个开源和闭源大型语言模型组成的全面基准测试中，表现优于零样本方法（如DetectGPT）以及主流商业AI检测工具，其错误率降低了38倍以上。我们提出了一种训练算法，即使用合成镜像进行难负样本挖掘，使我们的分类器在高数据领域（如评论）上实现了数量级别的更低误报率。最后，我们展示了Pangram Text不对非英语为母语的人存在偏见，并且在训练期间未见过的领域和模型上也能泛化。

更新时间: 2024-07-29 08:27:34

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2402.14873v3

Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery

Causal discovery aims to estimate causal structures among variables based on observational data. Large Language Models (LLMs) offer a fresh perspective to tackle the causal discovery problem by reasoning on the metadata associated with variables rather than their actual data values, an approach referred to as knowledge-based causal discovery. In this paper, we investigate the capabilities of Small Language Models (SLMs, defined as LLMs with fewer than 1 billion parameters) with prompt-based learning for knowledge-based causal discovery. Specifically, we present KG Structure as Prompt, a novel approach for integrating structural information from a knowledge graph, such as common neighbor nodes and metapaths, into prompt-based learning to enhance the capabilities of SLMs. Experimental results on three types of biomedical and open-domain datasets under few-shot settings demonstrate the effectiveness of our approach, surpassing most baselines and even conventional fine-tuning approaches trained on full datasets. Our findings further highlight the strong capabilities of SLMs: in combination with knowledge graphs and prompt-based learning, SLMs demonstrate the potential to surpass LLMs with larger number of parameters. Our code and datasets are available on GitHub.

Updated: 2024-07-29 08:27:33

标题: 知识图谱结构作为提示：提高基于知识的因果发现的小语言模型能力

摘要: 因果发现旨在基于观测数据估计变量之间的因果结构。大型语言模型（LLMs）为解决因果发现问题提供了全新的视角，通过对变量相关的元数据进行推理，而不是它们的实际数据值，这种方法被称为基于知识的因果发现。在本文中，我们研究了小型语言模型（SLMs，定义为参数少于10亿的LLMs）在基于提示学习的基础知识因果发现中的能力。具体而言，我们提出了以KG Structure为提示的方法，这是一种将知识图中的结构信息（如共同邻居节点和元路径）整合到基于提示学习中以增强SLMs能力的新方法。在少样本设置下对三种生物医学和开放域数据集进行的实验结果表明了我们方法的有效性，超越了大多数基线甚至是在完整数据集上进行训练的传统微调方法。我们的发现进一步强调了SLMs的强大能力：结合知识图和基于提示的学习，SLMs展现出超越具有更多参数的LLMs的潜力。我们的代码和数据集可以在GitHub上找到。

更新时间: 2024-07-29 08:27:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18752v2

Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting

Alongside the continuous process of improving AI performance through the development of more sophisticated models, researchers have also focused their attention to the emerging concept of data-centric AI, which emphasizes the important role of data in a systematic machine learning training process. Nonetheless, the development of models has also continued apace. One result of this progress is the development of the Transformer Architecture, which possesses a high level of capability in multiple domains such as Natural Language Processing (NLP), Computer Vision (CV) and Time Series Forecasting (TSF). Its performance is, however, heavily dependent on input data preprocessing and output data evaluation, justifying a data-centric approach to future research. We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently. However, there is a gap regarding the integration of transformer-based TSF and data-centric AI. This survey aims to pin down this gap via the extensive literature review based on the proposed taxonomy. We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.

Updated: 2024-07-29 08:27:21

标题: 调查与分类：数据中心人工智能在基于Transformer的时间序列预测中的作用

摘要: 在通过开发更复杂的模型不断改进AI性能的过程中，研究人员还将注意力集中在新兴概念——数据中心的AI上，这强调了数据在系统化机器学习训练过程中的重要作用。然而，模型的发展也在不断推进。其中一个成果是Transformer架构的发展，该架构在自然语言处理（NLP）、计算机视觉（CV）和时间序列预测（TSF）等多个领域具有高水平的能力。然而，其性能在很大程度上取决于输入数据的预处理和输出数据的评估，这证明了对未来研究采取以数据为中心的方法的必要性。我们认为数据中心的AI对于有效训练AI模型至关重要，特别是对于基于Transformer的TSF模型。然而，关于Transformer-based TSF和数据中心的AI整合仍存在差距。本调查旨在通过所提出的分类法进行广泛文献综述，以确定这一差距。我们从数据中心的AI视角审查了先前的研究成果，并打算为未来Transformer-based架构和数据中心的AI的发展奠定基础。

更新时间: 2024-07-29 08:27:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19784v1

Multimodal Large Language Models for Bioimage Analysis

Rapid advancements in imaging techniques and analytical methods over the past decade have revolutionized our ability to comprehensively probe the biological world at multiple scales, pinpointing the type, quantity, location, and even temporal dynamics of biomolecules. The surge in data complexity and volume presents significant challenges in translating this wealth of information into knowledge. The recently emerged Multimodal Large Language Models (MLLMs) exhibit strong emergent capacities, such as understanding, analyzing, reasoning, and generalization. With these capabilities, MLLMs hold promise to extract intricate information from biological images and data obtained through various modalities, thereby expediting our biological understanding and aiding in the development of novel computational frameworks. Previously, such capabilities were mostly attributed to humans for interpreting and summarizing meaningful conclusions from comprehensive observations and analysis of biological images. However, the current development of MLLMs shows increasing promise in serving as intelligent assistants or agents for augmenting human researchers in biology research

Updated: 2024-07-29 08:21:25

标题: 多模态大语言模型用于生物图像分析

摘要: 在过去的十年里，成像技术和分析方法的快速发展彻底改变了我们全面探索生物世界的能力，在多个尺度上准确定位生物分子的类型、数量、位置，甚至时间动态。数据复杂性和量级的激增带来了将这些信息转化为知识的重大挑战。最近出现的多模态大语言模型（MLLMs）展现出强大的新兴能力，如理解、分析、推理和泛化。凭借这些能力，MLLMs有望从通过各种模态获得的生物图像和数据中提取复杂信息，从而加速我们对生物的理解，并帮助开发新的计算框架。以前，这种能力大多归功于人类对生物图像的全面观察和分析中解释和总结有意义的结论。然而，MLLMs的当前发展显示出越来越多的希望，成为生物研究中智能助手或代理，以增强人类研究者的研究能力。

更新时间: 2024-07-29 08:21:25

领域: cs.AI

下载: http://arxiv.org/abs/2407.19778v1

Revisiting Agnostic PAC Learning

PAC learning, dating back to Valiant'84 and Vapnik and Chervonenkis'64,'74, is a classic model for studying supervised learning. In the agnostic setting, we have access to a hypothesis set $\mathcal{H}$ and a training set of labeled samples $(x_1,y_1),\dots,(x_n,y_n) \in \mathcal{X} \times \{-1,1\}$ drawn i.i.d. from an unknown distribution $\mathcal{D}$. The goal is to produce a classifier $h : \mathcal{X} \to \{-1,1\}$ that is competitive with the hypothesis $h^\star_{\mathcal{D}} \in \mathcal{H}$ having the least probability of mispredicting the label $y$ of a new sample $(x,y)\sim \mathcal{D}$. Empirical Risk Minimization (ERM) is a natural learning algorithm, where one simply outputs the hypothesis from $\mathcal{H}$ making the fewest mistakes on the training data. This simple algorithm is known to have an optimal error in terms of the VC-dimension of $\mathcal{H}$ and the number of samples $n$. In this work, we revisit agnostic PAC learning and first show that ERM is in fact sub-optimal if we treat the performance of the best hypothesis, denoted $\tau:=\Pr_{\mathcal{D}}[h^\star_{\mathcal{D}}(x) \neq y]$, as a parameter. Concretely we show that ERM, and any other proper learning algorithm, is sub-optimal by a $\sqrt{\ln(1/\tau)}$ factor. We then complement this lower bound with the first learning algorithm achieving an optimal error for nearly the full range of $\tau$. Our algorithm introduces several new ideas that we hope may find further applications in learning theory.

Updated: 2024-07-29 08:20:49

标题: 《重新审视不可知PAC学习》

摘要: PAC学习始于Valiant'84和Vapnik和Chervonenkis'64,'74，是研究监督学习的经典模型。在不可知的设置中，我们可以访问一个假设集合$\mathcal{H}$和一个带标签样本的训练集$(x_1,y_1),\dots,(x_n,y_n) \in \mathcal{X} \times \{-1,1\}$，这些样本是从一个未知分布$\mathcal{D}$中独立同分布地抽取的。目标是产生一个分类器$h : \mathcal{X} \to \{-1,1\}$，它与假设$h^\star_{\mathcal{D}} \in \mathcal{H}$竞争，在最小化对新样本$(x,y)\sim \mathcal{D}$的标签$y$的错误预测概率方面。经验风险最小化（ERM）是一种自然的学习算法，它简单地输出使训练数据上错误最少的假设。已知这种简单算法在VC维度和样本数量$n$方面具有最佳错误率。在这项工作中，我们重新审视了不可知的PAC学习，并首先表明如果将最佳假设的性能（记为$\tau:=\Pr_{\mathcal{D}}[h^\star_{\mathcal{D}}(x) \neq y]$）视为一个参数，ERM实际上是次优的。具体地，我们展示了ERM和任何其他适当的学习算法都是次优的，差距为$\sqrt{\ln(1/\tau)}$。然后，我们通过第一个学习算法的下界补充了这个结论，该算法在几乎整个$\tau$范围内实现了最佳错误率。我们的算法引入了几个新的想法，希望这些想法在学习理论中能够找到更多应用。

更新时间: 2024-07-29 08:20:49

领域: cs.LG,cs.DS,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2407.19777v1

Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decentralized AI inference. Our framework uses blockchain-based sequential deep neural network sharding to distribute computational tasks across a diverse network of nodes based on a personalised heuristic and routing mechanism. This enables efficient distributed training and inference for recent large-scale models even on consumer-grade hardware. We use compression techniques like dynamic blockwise quantization and mixed matrix decomposition to reduce data transfer and memory needs. We also integrate robust security measures, including hardware-based trusted execution environments to ensure data integrity and confidentiality. Evaluating our system across various natural language processing and vision tasks shows that these compression strategies do not compromise model accuracy. Our results highlight the potential to democratize access to cutting-edge AI technologies by enabling secure and efficient inference on a decentralized network.

Updated: 2024-07-29 08:18:48

标题: 模型无关的异构分片用于异构分布式推理

摘要: 大规模人工智能模型的快速增长，特别是大型语言模型，带来了数据隐私、计算资源和可访问性方面的重大挑战。传统的集中式架构通常难以满足所需的数据安全性和可伸缩性需求，这阻碍了人工智能系统的民主化。Nesa引入了一个面向分散式人工智能推理的模型无关分片框架。我们的框架使用基于区块链的序列深度神经网络分片，根据个性化的启发式和路由机制，在各种节点的网络中分配计算任务。这使得即使在消费级硬件上，也能有效地进行大规模模型的分布式训练和推理。我们使用诸如动态分块量化和混合矩阵分解等压缩技术来减少数据传输和内存需求。我们还整合了强大的安全措施，包括基于硬件的可信执行环境，以确保数据的完整性和保密性。在各种自然语言处理和视觉任务中评估我们的系统表明，这些压缩策略不会影响模型的准确性。我们的结果突显了通过在分散网络上进行安全和高效的推理，实现对尖端人工智能技术的民主化访问的潜力。

更新时间: 2024-07-29 08:18:48

领域: cs.AI,cs.CL,cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.19775v1

Generating Unseen Code Tests In Infinitum

Large Language Models (LLMs) are used for many tasks, including those related to coding. An important aspect of being able to utilize LLMs is the ability to assess their fitness for specific usages. The common practice is to evaluate LLMs against a set of benchmarks. While benchmarks provide a sound foundation for evaluation and comparison of alternatives, they suffer from the well-known weakness of leaking into the training data \cite{Xu2024Benchmarking}. We present a method for creating benchmark variations that generalize across coding tasks and programming languages, and may also be applied to in-house code bases. Our approach enables ongoing generation of test-data thus mitigating the leaking into the training data issue. We implement one benchmark, called \textit{auto-regression}, for the task of text-to-code generation in Python. Auto-regression is specifically created to aid in debugging and in tracking model generation changes as part of the LLM regression testing process.

Updated: 2024-07-29 08:11:20

标题: 在无限生成看不见的代码测试

摘要: 大型语言模型（LLMs）用于许多任务，包括与编码相关的任务。利用LLMs的一个重要方面是能够评估它们对特定用途的适应性。常见做法是针对一组基准进行LLMs评估。虽然基准提供了评估和比较替代方案的坚实基础，但它们存在已知的训练数据泄漏问题\cite{Xu2024Benchmarking}。我们提出了一种方法，用于创建泛化跨编码任务和编程语言的基准变体，也可应用于内部代码库。我们的方法使得可以持续生成测试数据，从而减轻训练数据泄漏问题。我们实现了一个名为“自回归”的基准，用于在Python中进行文本到代码生成任务。自回归特意创建用于在LLM回归测试过程中辅助调试和跟踪模型生成变化。

更新时间: 2024-07-29 08:11:20

领域: cs.AI

下载: http://arxiv.org/abs/2407.19772v1

Optimizing Cooperative path-finding: A Scalable Multi-Agent RRT* with Dynamic Potential Fields

Cooperative path-finding in multi-agent systems demands scalable solutions to navigate agents from their origins to destinations without conflict. Despite the breadth of research, scalability remains hampered by increased computational demands in complex environments. This study introduces the multi-agent RRT* potential field (MA-RRT*PF), an innovative algorithm that addresses computational efficiency and path-finding efficacy in dense scenarios. MA-RRT*PF integrates a dynamic potential field with a heuristic method, advancing obstacle avoidance and optimizing the expansion of random trees in congested spaces. The empirical evaluations highlight MA-RRT*PF's significant superiority over conventional multi-agent RRT* (MA-RRT*) in dense environments, offering enhanced performance and solution quality without compromising integrity. This work not only contributes a novel approach to the field of cooperative multi-agent path-finding but also offers a new perspective for practical applications in densely populated settings where traditional methods are less effective.

Updated: 2024-07-29 08:03:22

标题: 优化合作路径规划：具有动态潜在场的可扩展多智能体RRT*

摘要: 在多智能体系统中进行合作路径规划需要可扩展的解决方案，以便将智能体从起点导航至目的地而无冲突发生。尽管有大量研究，但在复杂环境中增加的计算需求仍然限制了可扩展性。本研究介绍了多智能体RRT*潜在场（MA-RRT*PF）算法，该算法创新性地解决了在密集场景中的计算效率和路径规划有效性问题。MA-RRT*PF将动态潜在场与启发式方法相结合，提升了障碍物避免和在拥挤空间中随机树扩展的优化。实证评估突显了MA-RRT*PF在密集环境中明显优于传统的多智能体RRT*（MA-RRT*），提供了提升性能和解决方案质量而不损害完整性的优势。这项工作不仅为合作多智能体路径规划领域提供了一种新颖方法，还为传统方法效果不佳的密集人口密集区域的实际应用提供了新的视角。

更新时间: 2024-07-29 08:03:22

领域: cs.MA,cs.AI,cs.RO

下载: http://arxiv.org/abs/1911.07840v4

Enhancing Training Efficiency Using Packing with Flash Attention

Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including irrelevant padding tokens in the computation and wastes GPU resources. On the other hand, the Hugging Face SFT trainer offers the option to use packing to combine multiple training examples up to the maximum sequence length. This allows for maximal utilization of GPU resources. However, without proper masking of each packed training example, attention will not be computed correctly when using SFT trainer. We enable and then analyse packing and Flash Attention with proper attention masking of each example and show the benefits of this training paradigm.

Updated: 2024-07-29 07:58:53

标题: 利用带有闪光注意力的打包提高培训效率

摘要: 填充通常用于调整LLM模型，通过向较短的训练示例添加特殊标记，以匹配每个批次中最长序列的长度。虽然这确保了批处理的统一性，但在计算中包含了不相关的填充标记，浪费了GPU资源。另一方面，Hugging Face SFT训练器提供了使用打包的选项，将多个训练示例组合到最大序列长度。这允许最大程度地利用GPU资源。然而，如果没有正确地对每个打包的训练示例进行掩蔽，当使用SFT训练器时，注意力将无法正确计算。我们启用并分析了打包和Flash Attention，并使用适当的注意力掩蔽每个示例，展示了这种训练范式的好处。

更新时间: 2024-07-29 07:58:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.09105v3

Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model

User mobility modeling serves a crucial role in analysis and optimization of contemporary wireless networks. Typical stochastic mobility models, e.g., random waypoint model and Gauss Markov model, can hardly capture the distribution characteristics of users within real-world areas. State-of-the-art trace-based mobility models and existing learning-based trajectory generation methods, however, are frequently constrained by the inaccessibility of substantial real trajectories due to privacy concerns. In this paper, we harness the intrinsic correlation between street maps and trajectories and develop a novel zero-shot trajectory generation method, named Map2Traj, by exploiting the diffusion model. We incorporate street maps as a condition to consistently pilot the denoising process and train our model on diverse sets of real trajectories from various regions in Xi'an, China, and their corresponding street maps. With solely the street map of an unobserved area, Map2Traj generates synthetic trajectories that not only closely resemble the real-world mobility pattern but also offer comparable efficacy. Extensive experiments validate the efficacy of our proposed method on zero-shot trajectory generation tasks in terms of both trajectory and distribution similarities. In addition, a case study of employing Map2Traj in wireless network optimization is presented to validate its efficacy for downstream applications.

Updated: 2024-07-29 07:57:03

标题: Map2Traj: 使用扩散模型在街道地图上导航的零样本轨迹生成

摘要: 用户移动性建模在分析和优化现代无线网络中起着至关重要的作用。典型的随机移动性模型，例如随机航点模型和高斯马尔可夫模型，很难捕捉到用户在现实世界区域内的分布特征。然而，最先进的基于轨迹的移动性模型和现有的基于学习的轨迹生成方法经常受到由于隐私问题而无法获得大量真实轨迹的限制。在本文中，我们利用街道地图和轨迹之间的内在相关性，通过利用扩散模型开发了一种新颖的零样本轨迹生成方法，命名为Map2Traj。我们将街道地图作为条件，持续引导去噪过程，并在中国西安各个地区的真实轨迹和相应的街道地图上训练我们的模型。只需未观察到区域的街道地图，Map2Traj就可以生成合成轨迹，不仅紧密地模拟真实世界的移动模式，而且还具有可比性。大量实验证实了我们提出的零样本轨迹生成方法在轨迹和分布相似性方面的有效性。此外，还提供了一个将Map2Traj应用于无线网络优化的案例研究，以验证其在下游应用中的有效性。

更新时间: 2024-07-29 07:57:03

领域: cs.AI

下载: http://arxiv.org/abs/2407.19765v1

Adaptive maximization of social welfare

We consider the problem of repeatedly choosing policies to maximize social welfare. Welfare is a weighted sum of private utility and public revenue. Earlier outcomes inform later policies. Utility is not observed, but indirectly inferred. Response functions are learned through experimentation. We derive a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 algorithm. Cumulative regret grows at a rate of $T^{2/3}$. This implies that (i) welfare maximization is harder than the multi-armed bandit problem (with a rate of $T^{1/2}$ for finite policy sets), and (ii) our algorithm achieves the optimal rate. For the stochastic setting, if social welfare is concave, we can achieve a rate of $T^{1/2}$ (for continuous policy sets), using a dyadic search algorithm. We analyze an extension to nonlinear income taxation, and sketch an extension to commodity taxation. We compare our setting to monopoly pricing (which is easier), and price setting for bilateral trade (which is harder).

Updated: 2024-07-29 07:54:56

标题: 社会福利的自适应最大化

摘要: 我们考虑反复选择政策以最大化社会福利的问题。福利是私人效用和公共收入的加权和。先前的结果影响后续政策。效用不被观察到，但间接推断。响应函数通过实验学习。我们推导了一个关于后悔的下界，以及Exp3算法变体的匹配对抗上界。累积后悔以$T^{2/3}$的速率增长。这意味着（i）最大化福利比多臂赌博问题更难（对于有限政策集，速率为$T^{1/2}$），（ii）我们的算法实现了最佳速率。对于随机设置，如果社会福利是凹的，我们可以通过使用二进制搜索算法（针对连续政策集）实现速率为$T^{1/2}$。我们分析了非线性所得税的扩展，并草图了商品税的扩展。我们将我们的设置与垄断定价（较容易）和双边贸易定价（较难）进行了比较。

更新时间: 2024-07-29 07:54:56

领域: econ.EM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.09597v2

FlightScope: A Deep Comprehensive Review of Aircraft Detection Algorithms in Satellite Imagery

Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring. While deep learning algorithms are constantly evolving, they have been mostly implemented and tested on popular ground-based taken photos. This paper critically evaluates and compares a suite of advanced object detection algorithms customized for the task of identifying aircraft within satellite imagery. Using the large HRPlanesV2 dataset, together with a rigorous validation with the GDIT dataset, this research encompasses an array of methodologies including YOLO versions 5 and 8, Faster RCNN, CenterNet, RetinaNet, RTMDet, and DETR, all trained from scratch. This exhaustive training and validation study reveal YOLOv5 as the preeminent model for the specific case of identifying airplanes from remote sensing data, showcasing high precision and adaptability across diverse imaging conditions. This research highlight the nuanced performance landscapes of these algorithms, with YOLOv5 emerging as a robust solution for aerial object detection, underlining its importance through superior mean average precision, Recall, and Intersection over Union scores. The findings described here underscore the fundamental role of algorithm selection aligned with the specific demands of satellite imagery analysis and extend a comprehensive framework to evaluate model efficacy. The benchmark toolkit and codes, available via https://github.com/toelt-llc/FlightScope_Bench, aims to further exploration and innovation in the realm of remote sensing object detection, paving the way for improved analytical methodologies in satellite imagery applications.

Updated: 2024-07-29 07:35:27

标题: FlightScope：卫星图像中飞行器检测算法的深入综述

摘要: 在遥感卫星图像中进行目标检测在许多领域（如生物物理和环境监测）中至关重要。虽然深度学习算法不断发展，但它们大多数是在流行的地面拍摄照片上实施和测试的。本文对一系列高级目标检测算法进行了批判性评估和比较，这些算法专为识别卫星图像中的飞机而定制。利用大型HRPlanesV2数据集，结合与GDIT数据集的严格验证，本研究涵盖了一系列方法，包括YOLO版本5和8、Faster RCNN、CenterNet、RetinaNet、RTMDet和DETR，全部从头开始训练。这项详尽的训练和验证研究揭示了YOLOv5作为从遥感数据中识别飞机的特定案例中的卓越模型，展示了在各种成像条件下的高精度和适应性。这项研究突显了这些算法的微妙性能景观，YOLOv5成为空中目标检测的强大解决方案，通过卓越的平均精度、召回率和联合交集分数来强调其重要性。这里描述的发现强调了与卫星图像分析的具体需求相一致的算法选择的基本作用，并扩展了一个全面的框架来评估模型的有效性。通过https://github.com/toelt-llc/FlightScope_Bench提供的基准工具包和代码，旨在进一步探索和创新遥感目标检测领域，为卫星图像应用中的改进分析方法铺平道路。

更新时间: 2024-07-29 07:35:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02877v3

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.

Updated: 2024-07-29 07:29:17

标题: 通过高保真度生成对抗网络进行语音带宽扩展

摘要: 语音带宽扩展对于扩展低带宽语音信号的频率范围至关重要，从而提高数字应用中的音频质量、清晰度和可感知性。其应用范围涵盖电话、压缩、文本转语音合成和语音识别。本文提出了一种新颖的方法，使用高保真度的生成对抗网络，与级联系统不同，我们的系统在配对的窄带和宽带语音信号上进行端到端训练。我们的方法将各种带宽上采样比率整合到一个单一统一模型中，专门为语音带宽扩展应用而设计。我们的方法在各种带宽扩展因素上表现出强大的性能，包括在训练期间未遇到的因素，展示了零射击能力。据我们所知，这是第一个展示这种能力的工作。实验结果表明，我们的方法优于先前的端到端方法，以及插值和传统技术，展示了其在实际语音增强应用中的有效性。

更新时间: 2024-07-29 07:29:17

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.18571v2

Revolutionizing Binary Decision Tree Traversals with Arithmetical Representations

This paper introduces an innovative method for traversing binary decision trees using arithmetic operations. We present a suite of binary tree traversal algorithms that leverage novel representation matrices to flatten the full binary tree structure and embed the aggregated internal node decisions into a single vector. Our approach, grounded in maximum inner product search, offers new insights into decision tree partitioning.

Updated: 2024-07-29 07:28:24

标题: 用算术表示方法革新二叉决策树遍历

摘要: 本文介绍了一种使用算术运算来遍历二叉决策树的创新方法。我们提出了一套利用新颖的表示矩阵将完整二叉树结构展平，并将聚合的内部节点决策嵌入到单个向量中的二叉树遍历算法。我们的方法基于最大内积搜索，为决策树划分提供了新的见解。

更新时间: 2024-07-29 07:28:24

领域: cs.LG,cs.DS,cs.NA,math.NA

下载: http://arxiv.org/abs/2209.04825v6

Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Large Language Models (LLMs) have become very popular and are used in many domains, such as chatbots, auto-task completion agents, and much more. However, LLMs suffer from many safety vulnerabilities, which can be exploited using different types of attacks, such as jailbreaking, prompt injection attacks, and privacy leakage attacks. These attacks can disrupt the working of the LLMs and make powerful LLM systems generate malicious or unethical content, take malicious actions, or leak confidential information by bypassing the security filters and taking advantage of their access. Foundational LLMs undergo alignment training, which includes safety training. This helps the model learn how to generate outputs that are ethical and aligned with human responses. Further, to make the models even safer, guardrails are added to filter the inputs received and the output generated by the model. These foundational LLMs are subjected to fine-tuning, quantization, or alteration of guardrails to use these models for specialized tasks or to use them in a resource-constrained environment. So, understanding the impact of modifications such as fine-tuning, quantization, and guardrails on the safety of LLM becomes an important question. Understanding and mitigating the consequences will help build reliable systems and effective strategies to make LLMs more secure. In this study, we tested foundational models like Mistral, Llama, MosaicML, and their finetuned versions. These comprehensive evaluations show that fine-tuning increases jailbreak attack success rates (ASR), quantization has a variable impact on the ASR, and guardrails can help significantly improve jailbreak resistance.

Updated: 2024-07-29 07:24:49

标题: 微调、量化和LLMs：导航意想不到的结果

摘要: 大型语言模型（LLMs）已经变得非常流行，并在许多领域中被使用，例如聊天机器人、自动任务完成代理等等。然而，LLMs存在许多安全漏洞，可以利用不同类型的攻击来进行利用，例如越狱攻击、提示注入攻击和隐私泄露攻击。这些攻击可以干扰LLMs的工作，并使强大的LLM系统生成恶意或不道德的内容，采取恶意行动，或通过绕过安全过滤器并利用其访问权限来泄露机密信息。基础LLMs经过对齐训练，其中包括安全训练。这有助于模型学习如何生成符合道德标准并与人类回应相一致的输出。此外，为了使模型更加安全，还添加了防护栏来过滤模型接收的输入和生成的输出。这些基础LLMs经过微调、量化或修改防护栏，以便将这些模型用于专门任务或在资源受限环境中使用。因此，理解微调、量化和防护栏等修改对LLM安全性的影响成为一个重要问题。理解和减轻后果将有助于构建可靠的系统和有效的策略，使LLMs更加安全。在这项研究中，我们测试了Mistral、Llama、MosaicML等基础模型及其经过微调的版本。这些全面评估显示，微调会增加越狱攻击成功率（ASR），量化对ASR有不同影响，并且防护栏可以显著提高越狱抵抗力。

更新时间: 2024-07-29 07:24:49

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.04392v2

A Wasserstein perspective of Vanilla GANs

The empirical success of Generative Adversarial Networks (GANs) caused an increasing interest in theoretical research. The statistical literature is mainly focused on Wasserstein GANs and generalizations thereof, which especially allow for good dimension reduction properties. Statistical results for Vanilla GANs, the original optimization problem, are still rather limited and require assumptions such as smooth activation functions and equal dimensions of the latent space and the ambient space. To bridge this gap, we draw a connection from Vanilla GANs to the Wasserstein distance. By doing so, existing results for Wasserstein GANs can be extended to Vanilla GANs. In particular, we obtain an oracle inequality for Vanilla GANs in Wasserstein distance. The assumptions of this oracle inequality are designed to be satisfied by network architectures commonly used in practice, such as feedforward ReLU networks. By providing a quantitative result for the approximation of a Lipschitz function by a feedforward ReLU network with bounded H\"older norm, we conclude a rate of convergence for Vanilla GANs as well as Wasserstein GANs as estimators of the unknown probability distribution.

Updated: 2024-07-29 07:24:12

标题: 一个Wasserstein视角下的Vanilla GANs

摘要: 生成对抗网络（GANs）的实证成功引起了对理论研究的日益关注。统计文献主要集中在Wasserstein GANs及其推广上，特别是允许良好的降维性质。对于原始优化问题Vanilla GANs的统计结果仍然相当有限，需要假设平滑激活函数和潜在空间与环境空间相等的维度。为弥补这一差距，我们从Vanilla GANs到Wasserstein距离建立了联系。通过这样做，现有的Wasserstein GANs的结果可以推广到Vanilla GANs。特别地，我们获得了Vanilla GANs在Wasserstein距离上的“神谕不等式”。这个“神谕不等式”的假设旨在满足实践中常用的网络架构，比如前向ReLU网络。通过提供一个利普希茨函数被具有有界H\"older范数的前向ReLU网络逼近的定量结果，我们得出Vanilla GANs以及Wasserstein GANs作为未知概率分布估计器的收敛速率。

更新时间: 2024-07-29 07:24:12

领域: math.ST,cs.LG,stat.ML,stat.TH,62E17, 62G05, 68T07

下载: http://arxiv.org/abs/2403.15312v2

Hierarchical Policy Blending as Inference for Reactive Robot Control

Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.

Updated: 2024-07-29 07:13:42

标题: 层级策略融合作为反应式机器人控制的推理

摘要: 在拥挤、密集和动态环境中的运动生成是机器人学中的一个核心话题，被视为一个多目标决策问题。当前的方法在安全性和性能之间进行权衡。一方面，反应性策略可以在环境变化时快速响应，但可能存在次优行为的风险。另一方面，基于规划的运动生成提供可行的轨迹，但高计算成本可能限制控制频率，从而影响安全性。为了结合反应性策略和规划的优势，我们提出了一种分层的运动生成方法。此外，我们采用概率推理方法来形式化分层模型和随机优化。我们将这种方法实现为随机、反应式专家策略的加权乘积，其中规划用于自适应计算任务视野范围内的最优权重。这种随机优化避免了局部最优解，并提出了在拥挤和密集环境中找到路径的可行性反应性计划。我们在平面导航和6DoF操作的广泛实验研究中表明，我们提出的分层运动生成方法优于短视的反应式控制器和在线重新规划方法。

更新时间: 2024-07-29 07:13:42

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2210.07890v3

KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining

Dialogical Argument Mining(DialAM) is an important branch of Argument Mining(AM). DialAM-2024 is a shared task focusing on dialogical argument mining, which requires us to identify argumentative relations and illocutionary relations among proposition nodes and locution nodes. To accomplish this, we propose a two-stage pipeline, which includes the Two-Step S-Node Prediction Model in Stage 1 and the YA-Node Prediction Model in Stage 2. We also augment the training data in both stages and introduce context in Stage 2. We successfully completed the task and achieved good results. Our team Pokemon ranked 1st in the ARI Focused score and 4th in the Global Focused score.

Updated: 2024-07-29 07:07:37

标题: KNOWCOMP POKEMON团队在DialAM-2024：用于检测对话式论证挖掘中关系的两阶段流水线

摘要: Dialogical Argument Mining（DialAM）是Argument Mining（AM）的一个重要分支。DialAM-2024是一个关于对话式论证挖掘的共享任务，要求我们识别命题节点和言语节点之间的论证关系和言辞关系。为了完成这一任务，我们提出了一个两阶段流水线，包括第一阶段的两步S-节点预测模型和第二阶段的YA-节点预测模型。我们还在两个阶段增加了训练数据，并在第二阶段引入了上下文。我们成功完成了任务并取得了良好的结果。我们的团队Pokemon在ARI Focused得分中排名第一，在Global Focused得分中排名第四。

更新时间: 2024-07-29 07:07:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19740v1

Security Analysis of Smart Contract Migration from Ethereum to Arbitrum

When migrating smart contracts from one blockchain platform to another, there are potential security risks. This is because different blockchain platforms have different environments and characteristics for executing smart contracts. The focus of this paper is to study the security risks associated with the migration of smart contracts from Ethereum to Arbitrum. We collected relevant data and analyzed smart contract migration cases to explore the differences between Ethereum and Arbitrum in areas such as Arbitrum cross-chain messaging, block properties, contract address alias, and gas fees. From the 36 types of smart contract migration cases we identified, we selected 4 typical types of cases and summarized their security risks. The research shows that smart contracts deployed on Ethereum may face certain potential security risks during migration to Arbitrum, mainly due to issues inherent in public blockchain characteristics, such as outdated off-chain data obtained by the inactive sequencer, logic errors based on time, the permission check failed, Denial of Service(DOS) attacks. To mitigate these security risks, we proposed avoidance methods and provided considerations for users and developers to ensure a secure migration process. It's worth noting that this study is the first to conduct an in-depth analysis of the secure migration of smart contracts from Ethereum to Arbitrum.

Updated: 2024-07-29 07:01:34

标题: 以太坊智能合约迁移至Arbitrum的安全性分析

摘要: 在将智能合约从一个区块链平台迁移到另一个区块链平台时，存在潜在的安全风险。这是因为不同的区块链平台在执行智能合约时具有不同的环境和特征。本文的重点是研究与从以太坊迁移到Arbitrum的智能合约迁移相关的安全风险。我们收集了相关数据，并分析了智能合约迁移案例，以探讨以太坊和Arbitrum之间在Arbitrum跨链消息传递、区块属性、合约地址别名和燃气费等方面的差异。从我们确定的36种智能合约迁移案例中，我们选择了4种典型类型的案例，并总结了它们的安全风险。研究表明，在从以太坊迁移到Arbitrum的过程中，部署在以太坊上的智能合约可能面临一定的潜在安全风险，主要是由于公共区块链特性固有的问题，比如由不活跃的序列器获取的过时的链外数据、基于时间的逻辑错误、权限检查失败、拒绝服务（DOS）攻击。为了减轻这些安全风险，我们提出了避免方法，并为用户和开发人员提供了考虑因素，以确保安全的迁移过程。值得注意的是，这项研究是首次对从以太坊迁移到Arbitrum的智能合约进行深入分析。

更新时间: 2024-07-29 07:01:34

领域: cs.CR,68-02 (Primary),J.2; E.3

下载: http://arxiv.org/abs/2307.14773v3

Sensor Selection via GFlowNets: A Deep Generative Modeling Framework to Navigate Combinatorial Complexity

The performance of sensor arrays in sensing and wireless communications improves with more elements, but this comes at the cost of increased energy consumption and hardware expense. This work addresses the challenge of selecting $k$ sensor elements from a set of $m$ to optimize a generic Quality-of-Service metric. Evaluating all $\binom{m}{k}$ possible sensor subsets is impractical, leading to prior solutions using convex relaxations, greedy algorithms, and supervised learning approaches. The current paper proposes a new framework that employs deep generative modeling, treating sensor selection as a deterministic Markov Decision Process where sensor subsets of size $k$ arise as terminal states. Generative Flow Networks (GFlowNets) are employed to model an action distribution conditioned on the state. Sampling actions from the aforementioned distribution ensures that the probability of arriving at a terminal state is proportional to the performance of the corresponding subset. Applied to a standard sensor selection scenario, the developed approach outperforms popular methods which are based on convex optimization and greedy algorithms. Finally, a multiobjective formulation of the proposed approach is adopted and applied on the sparse antenna array design for Integrated Sensing and Communication (ISAC) systems. The multiobjective variation is shown to perform well in managing the trade-off between radar and communication performance.

Updated: 2024-07-29 06:56:57

标题: 通过GFlowNets传感器选择：一种深度生成建模框架来应对组合复杂性

摘要: 传感器阵列在感知和无线通信中的性能随着元素数量的增加而提高，但这是以增加能量消耗和硬件费用为代价的。本文解决了从一组$m$中选择$k$个传感器元素以优化通用服务质量指标的挑战。评估所有$\binom{m}{k}$可能的传感器子集是不切实际的，因此先前的解决方案采用了凸松弛、贪婪算法和监督学习方法。本文提出了一个新框架，采用深度生成建模，将传感器选择视为一个确定性马尔可夫决策过程，其中大小为$k$的传感器子集作为终端状态出现。生成流网络（GFlowNets）被用来建模以状态为条件的动作分布。从上述分布中采样动作可以确保到达终端状态的概率与相应子集的性能成比例。应用于标准传感器选择场景，开发的方法优于基于凸优化和贪婪算法的流行方法。最后，采用所提出方法的多目标形式，并应用于集成感知和通信（ISAC）系统的稀疏天线阵列设计。多目标变体表现出良好的性能，能够有效平衡雷达和通信性能之间的权衡。

更新时间: 2024-07-29 06:56:57

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.19736v1

Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

Integrating deep neural networks with the Hawkes process has significantly improved predictive capabilities in finance, health informatics, and information technology. Nevertheless, these models often face challenges in real-world settings, particularly due to substantial label noise. This issue is of significant concern in the medical field, where label noise can arise from delayed updates in electronic medical records or misdiagnoses, leading to increased prediction risks. Our research indicates that deep Hawkes process models exhibit reduced robustness when dealing with label noise, particularly when it affects both event types and timing. To address these challenges, we first investigate the influence of label noise in approximated intensity functions and present a novel framework, the Robust Deep Hawkes Process (RDHP), to overcome the impact of label noise on the intensity function of Hawkes models, considering both the events and their occurrences. We tested RDHP using multiple open-source benchmarks with synthetic noise and conducted a case study on obstructive sleep apnea-hypopnea syndrome (OSAHS) in a real-world setting with inherent label noise. The results demonstrate that RDHP can effectively perform classification and regression tasks, even in the presence of noise related to events and their timing. To the best of our knowledge, this is the first study to successfully address both event and time label noise in deep Hawkes process models, offering a promising solution for medical applications, specifically in diagnosing OSAHS.

Updated: 2024-07-29 06:55:36

标题: 在事件和发生标签噪声下的稳健深度霍克斯过程

摘要: 将深度神经网络与Hawkes过程相结合，在金融、健康信息学和信息技术领域显著提高了预测能力。然而，这些模型在现实世界中常常面临挑战，特别是由于存在大量标签噪声。在医疗领域，标签噪声可能来自电子医疗记录的延迟更新或误诊，导致预测风险增加。我们的研究表明，在处理标签噪声时，深度Hawkes过程模型的鲁棒性降低，特别是当噪声影响事件类型和时间时。为了解决这些挑战，我们首先研究了近似强度函数中标签噪声的影响，并提出了一个新颖的框架，即鲁棒深度Hawkes过程（RDHP），以克服标签噪声对Hawkes模型强度函数的影响，考虑了事件和它们的发生。我们使用多个开源基准测试对RDHP进行了测试，包括合成噪声，并在真实世界中进行了阻塞性睡眠呼吸暂停低通气综合征（OSAHS）的案例研究，其中存在固有的标签噪声。结果表明，RDHP能够有效地执行分类和回归任务，即使存在与事件及其时间相关的噪声。据我们所知，这是第一项成功解决深度Hawkes过程模型中事件和时间标签噪声的研究，为医疗应用提供了一个有前景的解决方案，特别是在诊断OSAHS方面。

更新时间: 2024-07-29 06:55:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17164v2

Consistency Based Weakly Self-Supervised Learning for Human Activity Recognition with Wearables

While the widely available embedded sensors in smartphones and other wearable devices make it easier to obtain data of human activities, recognizing different types of human activities from sensor-based data remains a difficult research topic in ubiquitous computing. One reason for this is that most of the collected data is unlabeled. However, many current human activity recognition (HAR) systems are based on supervised methods, which heavily rely on the labels of the data. We describe a weakly self-supervised approach in this paper that consists of two stages: (1) In stage one, the model learns from the nature of human activities by projecting the data into an embedding space where similar activities are grouped together; (2) In stage two, the model is fine-tuned using similarity information in a few-shot learning fashion using the similarity information of the data. This allows downstream classification or clustering tasks to benefit from the embeddings. Experiments on three benchmark datasets demonstrate the framework's effectiveness and show that our approach can help the clustering algorithm achieve comparable performance in identifying and categorizing the underlying human activities as pure supervised techniques applied directly to a corresponding fully labeled data set.

Updated: 2024-07-29 06:29:21

标题: 基于一致性的弱自监督学习方法用于可穿戴设备中的人体活动识别

摘要: 尽管智能手机和其他可穿戴设备中广泛可用的嵌入式传感器使获取人类活动数据变得更加容易，但从基于传感器的数据中识别不同类型的人类活动仍然是普遍计算中的一个困难研究课题。其中一个原因是大部分收集到的数据都是未标记的。然而，许多当前的人类活动识别（HAR）系统是基于监督方法的，这些方法严重依赖于数据的标签。本文描述了一种弱自监督方法，包括两个阶段：（1）第一阶段，模型通过将数据投影到一个嵌入空间中来从人类活动的本质中学习，类似的活动被分组在一起；（2）第二阶段，模型使用数据的相似性信息进行少样本学习方式的微调。这使得下游的分类或聚类任务可以从嵌入中受益。对三个基准数据集的实验证明了该框架的有效性，并表明我们的方法可以帮助聚类算法在识别和分类潜在的人类活动方面达到与直接应用于相应完全标记数据集的纯监督技术相当的性能。

更新时间: 2024-07-29 06:29:21

领域: eess.SP,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2408.07282v1

The Democratization of Wealth Management: Hedged Mutual Fund Blockchain Protocol

We develop several innovations to bring the best practices of traditional investment funds to the blockchain landscape. Specifically, we illustrate how: 1) fund prices can be updated regularly like mutual funds; 2) performance fees can be charged like hedge funds; 3) mutually hedged blockchain investment funds can operate with investor protection schemes, such as high water marks; and 4) measures to offset trading related slippage costs when redemptions happen. Using our concepts - and blockchain technology - traditional funds can calculate performance fees in a simplified manner and alleviate several operational issues. Blockchain can solve many problems for traditional finance, while tried and tested wealth management techniques can benefit decentralization, speeding its adoption. We provide detailed steps - including mathematical formulations and instructive pointers - to implement these ideas and discuss how our designs overcome several blockchain bottlenecks, making smart contracts smarter. We provide numerical illustrations of several scenarios related to our mechanisms.

Updated: 2024-07-29 06:17:47

标题: 财富管理的民主化：对冲型互惠基金区块链协议

摘要: 我们开发了几项创新，将传统投资基金的最佳实践引入区块链领域。具体来说，我们阐明了以下几点： 1）基金价格可以像共同基金一样定期更新； 2）可以像对冲基金一样收取绩效费； 3）互相套保的区块链投资基金可以运作，并配备投资者保护计划，如高水位线； 4）可以采取措施抵消赎回时的交易滑点成本。利用我们的概念 - 和区块链技术 - 传统基金可以简化地计算绩效费，并缓解几个运营问题。区块链可以为传统金融解决许多问题，而经过验证的财富管理技术可以使去中心化受益，并加速其采用。我们提供了详细步骤 - 包括数学公式和指导性指示 - 来实施这些想法，并讨论我们的设计如何克服几个区块链瓶颈，使智能合约更加智能。我们提供了与我们的机制相关的几种情景的数值说明。

更新时间: 2024-07-29 06:17:47

领域: cs.CR,q-fin.CP,q-fin.PM,q-fin.RM,q-fin.TR,91G15, 91G10, 62M10, 91G70, 91G45, 90B70, 97U70, 93A14, 97D10, 68T37

下载: http://arxiv.org/abs/2405.02302v2

Constructing artificial life and materials scientists with accelerated AI using Deep AndersoNN

Deep AndersoNN accelerates AI by exploiting the continuum limit as the number of explicit layers in a neural network approaches infinity and can be taken as a single implicit layer, known as a deep equilibrium model. Solving for deep equilibrium model parameters reduces to a nonlinear fixed point iteration problem, enabling the use of vector-to-vector iterative solvers and windowing techniques, such as Anderson extrapolation, for accelerating convergence to the fixed point deep equilibrium. Here we show that Deep AndersoNN achieves up to an order of magnitude of speed-up in training and inference. The method is demonstrated on density functional theory results for industrial applications by constructing artificial life and materials `scientists' capable of classifying drugs as strongly or weakly polar, metal-organic frameworks by pore size, and crystalline materials as metals, semiconductors, and insulators, using graph images of node-neighbor representations transformed from atom-bond networks. Results exhibit accuracy up to 98\% and showcase synergy between Deep AndersoNN and machine learning capabilities of modern computing architectures, such as GPUs, for accelerated computational life and materials science by quickly identifying structure-property relationships. This paves the way for saving up to 90\% of compute required for AI, reducing its carbon footprint by up to 60 gigatons per year by 2030, and scaling above memory limits of explicit neural networks in life and materials science, and beyond.

Updated: 2024-07-29 06:12:47

标题: 利用深度AndersoNN加速人工智能构建人工生命和材料科学家

摘要: Deep AndersoNN通过利用连续极限来加速人工智能，当神经网络中显式层的数量接近无穷大时，可以将其视为单个隐式层，称为深平衡模型。解决深平衡模型参数的问题可以简化为非线性不动点迭代问题，从而可以利用向量到向量的迭代求解器和窗口技术，如Anderson外推，加速收敛到固定点的深平衡。我们展示了Deep AndersoNN在训练和推断中可以实现高达一个数量级的加速。该方法在工业应用的密度泛函理论结果上得到了展示，通过构建能够将药物分类为强极性或弱极性、金属-有机框架按孔径大小分类、晶体材料分为金属、半导体和绝缘体的人工生命和材料“科学家”，利用从原子键网络转换而来的节点邻居表示的图像。结果显示准确率高达98\%，展示了Deep AndersoNN与现代计算架构（如GPU）的机器学习能力之间的协同作用，可以加速计算生命和材料科学，快速识别结构-性质关系。这为节省AI所需计算量的90%，到2030年减少碳排放量高达60吉吨，并在生命和材料科学以及其他领域超越显式神经网络的内存限制铺平了道路。

更新时间: 2024-07-29 06:12:47

领域: cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2407.19724v1

Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training

Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with PS to mitigate its incast issue. However, such PS-based INA has poor incremental deployment abilities as it requires replacing all the switches to show significant performance improvement, which is not cost-effective. In this study, we present the incorporation of INA capabilities into RAR, called RAR with In-Network Aggregation (Rina), to tackle both the problems above. Rina features its agent-worker mechanism. When an INA-capable ToR switch is deployed, all workers in this rack run as one abstracted worker with the help of the agent, resulting in both excellent incremental deployment capabilities and better throughput. We conducted extensive testbed and simulation evaluations to substantiate the throughput advantages of Rina over existing DDL training synchronization structures. Compared with the state-of-the-art PS-based INA methods ATP, Rina can achieve more than 50\% throughput with the same hardware cost.

Updated: 2024-07-29 06:06:10

标题: Rina: 在分布式模型训练中通过网络内聚合增强环形全局归约

摘要: Parameter Server (PS)和Ring-AllReduce (RAR)是多工人深度学习（DL）中广泛使用的两种同步架构，也被称为分布式深度学习（DDL）。然而，PS遇到“incast”问题，而RAR则面临由长依赖链引起的问题。新兴的网络内聚合（INA）被提出与PS集成以减轻其incast问题。然而，基于PS的INA对增量部署能力差，因为需要更换所有交换机才能显示出显著的性能提升，这是不划算的。在本研究中，我们提出了将INA能力整合到RAR中，称为具有网络内聚合的RAR（Rina），以解决上述问题。Rina具有其代理-工人机制。当部署了支持INA的ToR交换机时，该机架中的所有工人都作为一个抽象工人在代理的帮助下运行，从而获得出色的增量部署能力和更好的吞吐量。我们进行了广泛的测试和模拟评估，以证实Rina相对于现有DDL训练同步结构的吞吐量优势。与最先进的PS-based INA方法ATP相比，Rina可以在相同硬件成本下实现超过50\%的吞吐量。

更新时间: 2024-07-29 06:06:10

领域: cs.NI,cs.AI,cs.DC

下载: http://arxiv.org/abs/2407.19721v1

MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple Distractors

With the explosion of multimedia content, video moment retrieval (VMR), which aims to detect a video moment that matches a given text query from a video, has been studied intensively as a critical problem. However, the existing VMR framework evaluates video moment retrieval performance, assuming that a video is given, which may not reveal whether the models exhibit overconfidence in the falsely given video. In this paper, we propose the MVMR (Massive Videos Moment Retrieval for Faithfulness Evaluation) task that aims to retrieve video moments within a massive video set, including multiple distractors, to evaluate the faithfulness of VMR models. For this task, we suggest an automated massive video pool construction framework to categorize negative (distractors) and positive (false-negative) video sets using textual and visual semantic distance verification methods. We extend existing VMR datasets using these methods and newly construct three practical MVMR datasets. To solve the task, we further propose a strong informative sample-weighted learning method, CroCs, which employs two contrastive learning mechanisms: (1) weakly-supervised potential negative learning and (2) cross-directional hard-negative learning. Experimental results on the MVMR datasets reveal that existing VMR models are easily distracted by the misinformation (distractors), whereas our model shows significantly robust performance, demonstrating that CroCs is essential to distinguishing positive moments against distractors. Our code and datasets are publicly available: https://github.com/yny0506/Massive-Videos-Moment-Retrieval.

Updated: 2024-07-29 06:03:24

标题: MVMR：评估视频时刻检索的忠实度对多个干扰因素的新框架

摘要: 随着多媒体内容的爆炸增长，视频时刻检索（VMR）作为一个旨在从视频中检测与给定文本查询匹配的视频时刻的关键问题，已经受到密切关注。然而，现有的VMR框架评估视频时刻检索性能，假设已经给定了一个视频，这可能不会揭示模型是否对错误给定的视频表现出过度自信。在本文中，我们提出了MVMR（大规模视频时刻检索用于忠实性评估）任务，旨在检索大规模视频集合中的视频时刻，包括多个干扰者，以评估VMR模型的忠实性。对于这个任务，我们建议使用文本和视觉语义距离验证方法自动构建大规模视频池框架，以对负面（干扰者）和正面（假阴性）视频集进行分类。我们利用这些方法扩展现有的VMR数据集，并新构建了三个实用的MVMR数据集。为解决这个任务，我们进一步提出了一种强有力的信息样本加权学习方法CroCs，它采用两个对比学习机制：（1）弱监督潜在负样本学习和（2）交叉方向硬负样本学习。在MVMR数据集上的实验结果显示，现有的VMR模型很容易被错误信息（干扰者）分散注意力，而我们的模型表现出显著的稳健性能，表明CroCs对于区分正面时刻与干扰者至关重要。我们的代码和数据集可以在以下链接公开获取：https://github.com/yny0506/Massive-Videos-Moment-Retrieval。

更新时间: 2024-07-29 06:03:24

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2309.16701v3

How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

LLM-based (Large Language Model) fuzz driver generation is a promising research area. Unlike traditional program analysis-based method, this text-based approach is more general and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: - While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; - LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; - While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.

Updated: 2024-07-29 05:59:16

标题: 它们的有效性如何？探索基于大型语言模型的模糊驱动程序生成

摘要: 基于LLM（大型语言模型）的模糊驱动程序生成是一个有前途的研究领域。与传统的基于程序分析的方法不同，这种基于文本的方法更通用，能够利用各种API使用信息，从而生成对人类读者友好的代码。然而，关于这一方向的基本问题，如其有效性和潜在挑战，仍存在一定理解上的不足。为了弥补这一差距，我们进行了首次深入研究，重点研究使用LLM生成有效模糊驱动程序的重要问题。我们的研究包括一个经过筛选的数据集，其中包含来自30个广泛使用的C项目的86个模糊驱动程序生成问题。设计和测试了六种提示策略，涉及五种最先进的LLM以及五种不同的温度设置。总共，我们的研究评估了736,430个生成的模糊驱动程序，涉及0.85亿个令牌成本（超过8000美元的令牌费用）。此外，我们将LLM生成的驱动程序与工业中使用的进行比较，并进行了大量的模糊实验（3.75个CPU年）。我们的研究发现：-虽然基于LLM的模糊驱动程序生成是一个有前途的方向，但仍然面临着实际应用中的一些障碍；-LLM在为具有复杂具体信息的API生成有效的模糊驱动程序方面遇到困难。三种特色的提示策略设计选择可能有益：发出重复查询、使用示例进行查询和采用迭代查询过程；-虽然LLM生成的驱动程序可以产生与工业中使用的相当的模糊结果，但仍存在大量提升的机会，例如扩展包含的API使用，或集成语义预言以促进逻辑错误检测。我们的见解已被应用于改进OSS-Fuzz-Gen项目，促进工业中的实际模糊驱动程序生成。

更新时间: 2024-07-29 05:59:16

领域: cs.CR,D.2.5

下载: http://arxiv.org/abs/2307.12469v5

Understanding Robust Overfitting from the Feature Generalization Perspective

Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data. However, it is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness. In this paper, we investigate RO from a novel feature generalization perspective. Specifically, we design factor ablation experiments to assess the respective impacts of natural data and adversarial perturbations on RO, identifying that the inducing factor of RO stems from natural data. Given that the only difference between adversarial and natural training lies in the inclusion of adversarial perturbations, we further hypothesize that adversarial perturbations degrade the generalization of features in natural data and verify this hypothesis through extensive experiments. Based on these findings, we provide a holistic view of RO from the feature generalization perspective and explain various empirical behaviors associated with RO. To examine our feature generalization perspective, we devise two representative methods, attack strength and data augmentation, to prevent the feature generalization degradation during AT. Extensive experiments conducted on benchmark datasets demonstrate that the proposed methods can effectively mitigate RO and enhance adversarial robustness.

Updated: 2024-07-29 05:41:05

标题: 从特征泛化角度理解强大的过拟合

摘要: 对抗训练（AT）通过将对抗扰动融入自然数据中构建强健的神经网络。然而，它受到强健过拟合（RO）的困扰，严重影响模型的强健性。本文从新颖的特征泛化角度研究了RO。具体来说，我们设计了因素消融实验来评估自然数据和对抗扰动对RO的影响，发现RO的诱发因素源自自然数据。鉴于对抗训练和自然训练之间唯一的区别在于是否包含对抗性扰动，我们进一步假设对抗性扰动会降低自然数据中特征的泛化能力，并通过大量实验验证了这一假设。基于这些发现，我们从特征泛化角度提供了RO的整体视角，并解释了与RO相关的各种经验行为。为了检验我们的特征泛化观点，我们设计了两种代表性方法，攻击强度和数据增强，以防止在AT过程中特征泛化的下降。在基准数据集上进行的大量实验表明，提出的方法可以有效缓解RO并增强对抗性强度。

更新时间: 2024-07-29 05:41:05

领域: cs.LG

下载: http://arxiv.org/abs/2310.00607v2

Generalization bounds for regression and classification on adaptive covering input domains

Our main focus is on the generalization bound, which serves as an upper limit for the generalization error. Our analysis delves into regression and classification tasks separately to ensure a thorough examination. We assume the target function is real-valued and Lipschitz continuous for regression tasks. We use the 2-norm and a root-mean-square-error (RMSE) variant to measure the disparities between predictions and actual values. In the case of classification tasks, we treat the target function as a one-hot classifier, representing a piece-wise constant function, and employ 0/1 loss for error measurement. Our analysis underscores the differing sample complexity required to achieve a concentration inequality of generalization bounds, highlighting the variation in learning efficiency for regression and classification tasks. Furthermore, we demonstrate that the generalization bounds for regression and classification functions are inversely proportional to a polynomial of the number of parameters in a network, with the degree depending on the hypothesis class and the network architecture. These findings emphasize the advantages of over-parameterized networks and elucidate the conditions for benign overfitting in such systems.

Updated: 2024-07-29 05:40:08

标题: 自适应覆盖输入域上回归和分类的泛化界限

摘要: 我们的主要重点是概括性边界，它作为概括性误差的上限。我们的分析分别深入探讨了回归和分类任务，以确保全面的考察。我们假设目标函数对于回归任务是实值的，并且是Lipschitz连续的。我们使用2-范数和均方根误差（RMSE）变体来衡量预测值与实际值之间的差异。在分类任务中，我们将目标函数视为一个独热分类器，代表一个分段常数函数，并使用0/1损失来衡量误差。我们的分析强调了实现概括性边界的集中不等式所需的不同样本复杂度，突显了回归和分类任务的学习效率的差异。此外，我们证明了回归和分类函数的概括性边界与网络中参数数量的多项式成反比，其次数取决于假设类和网络架构。这些发现强调了过参数化网络的优势，并阐明了这些系统中良性过拟合的条件。

更新时间: 2024-07-29 05:40:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.19715v1

Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets

Surgical scene understanding is a key technical component for enabling intelligent and context aware systems that can transform various aspects of surgical interventions. In this work, we focus on the semantic segmentation task, propose a simple yet effective multi-modal (RGB and depth) training framework called SurgDepth, and show state-of-the-art (SOTA) results on all publicly available datasets applicable for this task. Unlike previous approaches, which either fine-tune SOTA segmentation models trained on natural images, or encode RGB or RGB-D information using RGB only pre-trained backbones, SurgDepth, which is built on top of Vision Transformers (ViTs), is designed to encode both RGB and depth information through a simple fusion mechanism. We conduct extensive experiments on benchmark datasets including EndoVis2022, AutoLapro, LapI2I and EndoVis2017 to verify the efficacy of SurgDepth. Specifically, SurgDepth achieves a new SOTA IoU of 0.86 on EndoVis 2022 SAR-RARP50 challenge and outperforms the current best method by at least 4%, using a shallow and compute efficient decoder consisting of ConvNeXt blocks.

Updated: 2024-07-29 05:35:51

标题: 重新思考RGB-D融合在外科数据集中的语义分割

摘要: 手术场景理解是实现智能和上下文感知系统的关键技术组成部分，可以改变手术干预的各个方面。在这项工作中，我们关注语义分割任务，提出了一个简单但有效的多模态（RGB和深度）训练框架，称为SurgDepth，并展示了在所有适用于该任务的公开数据集上的最新技术结果。与以往的方法不同，以往的方法要么微调在自然图像上训练的最新技术分割模型，要么使用仅在RGB上预训练的骨干结构编码RGB或RGB-D信息，而建立在Vision Transformers（ViTs）之上的SurgDepth被设计为通过简单的融合机制来编码RGB和深度信息。我们在包括EndoVis2022、AutoLapro、LapI2I和EndoVis2017在内的基准数据集上进行了大量实验，以验证SurgDepth的有效性。具体而言，SurgDepth在EndoVis2022 SAR-RARP50挑战赛上取得了新的最新技术IoU值0.86，并且使用由ConvNeXt块组成的浅层和高效的解码器，至少比当前最佳方法表现出色4%。

更新时间: 2024-07-29 05:35:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19714v1

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

Updated: 2024-07-29 05:26:44

标题: 将网络空间与现实世界对齐：关于具体化人工智能的综合调查

摘要: 具有身体化的人工智能（Embodied AI）对于实现人工通用智能（AGI）至关重要，并为连接虚拟空间和物理世界的各种应用奠定了基础。最近，多模态大型模型（MLMs）和世界模型（WMs）的出现引起了人们的极大关注，因为它们具有卓越的感知、交互和推理能力，使它们成为身体化代理的大脑的有前途的架构。然而，在MLMs时代没有关于身体化人工智能的全面调查。在这项调查中，我们全面探讨了身体化人工智能的最新进展。我们的分析首先浏览了代表性的身体化机器人和模拟器的最前沿工作，以充分理解研究重点及其局限性。然后，我们分析了四个主要研究目标：1）身体化感知，2）身体化交互，3）身体化代理和4）模拟到真实的适应，涵盖了最先进的方法、基本范式和全面数据集。此外，我们探讨了MLMs在虚拟和真实身体化代理中的复杂性，强调了它们在促进动态数字和物理环境中的互动中的重要性。最后，我们总结了身体化人工智能的挑战和局限性，并讨论了它们潜在的未来方向。我们希望这项调查将成为研究界的基础参考，并激发持续创新。相关项目可在https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List找到。

更新时间: 2024-07-29 05:26:44

领域: cs.CV,cs.AI,cs.LG,cs.MA,cs.RO

下载: http://arxiv.org/abs/2407.06886v6

Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations

This research introduces an extended application of neural networks for solving nonlinear partial differential equations (PDEs). A neural network, combined with a pseudo-arclength continuation, is proposed to construct bifurcation diagrams from parameterized nonlinear PDEs. Additionally, a neural network approach is also presented for solving eigenvalue problems to analyze solution linear stability, focusing on identifying the largest eigenvalue. The effectiveness of the proposed neural network is examined through experiments on the Bratu equation and the Burgers equation. Results from a finite difference method are also presented as comparison. Varying numbers of grid points are employed in each case to assess the behavior and accuracy of both the neural network and the finite difference method. The experimental results demonstrate that the proposed neural network produces better solutions, generates more accurate bifurcation diagrams, has reasonable computational times, and proves effective for linear stability analysis.

Updated: 2024-07-29 05:05:13

标题: 《神经网络用于偏微分方程稳态分叉和线性稳定性分析》

摘要: 这项研究介绍了神经网络在解决非线性偏微分方程（PDEs）中的扩展应用。提出了将神经网络与伪弧长延续结合起来，用于构建参数化非线性PDEs的分岔图。此外，还提出了一种神经网络方法，用于解决特征值问题以分析解的线性稳定性，重点是识别最大的特征值。通过对Bratu方程和Burgers方程的实验检验了所提出的神经网络的有效性。同时还提供了有限差分方法的结果作为对比。在每种情况下使用不同数量的网格点来评估神经网络和有限差分方法的行为和准确性。实验结果表明，所提出的神经网络产生更好的解，生成更准确的分岔图，具有合理的计算时间，并且对线性稳定性分析有效。

更新时间: 2024-07-29 05:05:13

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2407.19707v1

CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare

The rapid progress in Large Language Models (LLMs) has prompted the creation of numerous benchmarks to evaluate their capabilities.This study focuses on the Comprehensive Medical Benchmark in Chinese (CMB), showcasing how dataset diversity and distribution in supervised fine-tuning (SFT) may enhance LLM performance.Remarkably, We successfully trained a smaller base model to achieve scores comparable to larger models, indicating that a diverse and well-distributed dataset can optimize performance regardless of model size.This study suggests that even smaller models may reach high performance levels with carefully curated and varied datasets.By integrating a wide range of instructional content, our approach addresses potential issues such as data quality inconsistencies. Our results imply that a broader spectrum of training data may enhance a model's ability to generalize and perform effectively across different medical scenarios, highlighting the importance of dataset quality and diversity in fine-tuning processes.

Updated: 2024-07-29 05:00:48

标题: CollectiveSFT：使用医疗保健中的集体指导为中文医学基准测试扩展大型语言模型

摘要: 大型语言模型（LLMs）的迅速发展促使了许多基准的创建，以评估它们的能力。本研究关注中文综合医学基准（CMB），展示了数据集的多样性和分布在监督微调（SFT）中如何提升LLM性能。值得注意的是，我们成功地训练了一个较小的基础模型，达到了与更大模型相当的得分，表明一个多样化和分布良好的数据集可以优化性能，不受模型大小的影响。本研究表明，即使较小的模型也可以通过精心策划和多样化的数据集达到高性能水平。通过整合各种教学内容，我们的方法解决了数据质量不一致等潜在问题。我们的结果表明，更广泛范围的训练数据可以增强模型的泛化能力，并在不同的医学场景中有效执行，突显了数据集质量和多样性在微调过程中的重要性。

更新时间: 2024-07-29 05:00:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.19705v1

Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.

Updated: 2024-07-29 04:57:16

标题: 自适应自我训练框架用于细粒度场景图生成

摘要: 场景图生成（SGG）模型在基准数据集方面存在固有问题，例如长尾谓词分布和缺失注释问题。在这项工作中，我们旨在通过利用未注释的三元组来缓解SGG的长尾问题。为此，我们引入了一种用于SGG的自训练框架（ST-SGG），该框架基于未注释的三元组为其分配伪标签，用于训练SGG模型。虽然在图像识别方面已经取得了显著进展，但设计用于SGG任务的自训练框架更具挑战性，因为其固有特性，如语义模糊和谓词类别的长尾分布。因此，我们提出了一种新颖的用于SGG的伪标记技术，称为具有动量的类别自适应阈值（CATM），这是一个与模型无关的框架，可应用于任何现有的SGG模型。此外，我们设计了一个图结构学习器（GSL），在采用我们提出的自训练框架到基于消息传递神经网络（MPNN）的SGG模型时是有益的。我们的广泛实验验证了ST-SGG在各种SGG模型上的有效性，特别是在提升细粒度谓词类别性能方面。

更新时间: 2024-07-29 04:57:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.09786v3

Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning

Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and effective enough. We propose BPFL, an efficient Byzantine-robust and provably privacy-preserving FL method that addresses all the issues. Specifically, we draw on state-of-the-art Byzantine-robust FL methods and use similarity metrics to measure the robustness of each participating client in FL. The validity of clients are formulated as circuit constraints on similarity metrics and verified via a zero-knowledge proof. Moreover, the client models are masked by a shared random vector, which is generated based on homomorphic encryption. In doing so, the server receives the masked client models rather than the true ones, which are proven to be private. BPFL is also efficient due to the usage of non-interactive zero-knowledge proof. Experimental results on various datasets show that our BPFL is efficient, Byzantine-robust, and privacy-preserving.

Updated: 2024-07-29 04:55:30

标题: 高效的拜占庭容错和可证明隐私保护的联邦学习

摘要: 联邦学习（FL）是一种新兴的分布式学习范式，不共享参与客户端的私人数据。然而，现有研究表明FL对拜占庭（安全）攻击和数据重建（隐私）攻击都很脆弱。几乎所有现有的FL防御措施只解决这两种攻击中的一种。少数防御措施解决了这两种攻击，但它们并不足够高效和有效。我们提出了BPFL，一种高效的拜占庭鲁棒性和可证明隐私保护的FL方法，解决了所有问题。具体来说，我们借鉴了最先进的拜占庭鲁棒性FL方法，并使用相似性度量来衡量FL中每个参与客户端的鲁棒性。客户端的有效性被制定为相似性度量的电路约束，并通过零知识证明进行验证。此外，客户端模型被一个基于同态加密生成的共享随机向量掩盖。通过这种方式，服务器接收到的是掩盖的客户端模型而不是真实的模型，这已被证明是私密的。由于使用非交互式零知识证明，BPFL也是高效的。对各种数据集的实验结果表明，我们的BPFL是高效的、拜占庭鲁棒的和隐私保护的。

更新时间: 2024-07-29 04:55:30

领域: cs.CR

下载: http://arxiv.org/abs/2407.19703v1

SAPI: Surroundings-Aware Vehicle Trajectory Prediction at Intersections

In this work we propose a deep learning model, i.e., SAPI, to predict vehicle trajectories at intersections. SAPI uses an abstract way to represent and encode surrounding environment by utilizing information from real-time map, right-of-way, and surrounding traffic. The proposed model consists of two convolutional network (CNN) and recurrent neural network (RNN)-based encoders and one decoder. A refiner is proposed to conduct a look-back operation inside the model, in order to make full use of raw history trajectory information. We evaluate SAPI on a proprietary dataset collected in real-world intersections through autonomous vehicles. It is demonstrated that SAPI shows promising performance when predicting vehicle trajectories at intersection, and outperforms benchmark methods. The average displacement error(ADE) and final displacement error(FDE) for 6-second prediction are 1.84m and 4.32m respectively. We also show that the proposed model can accurately predict vehicle trajectories in different scenarios.

Updated: 2024-07-29 04:53:18

标题: SAPI：交叉口周围环境感知的车辆轨迹预测

摘要: 在这项工作中，我们提出了一个深度学习模型，即SAPI，用于预测交叉口的车辆轨迹。SAPI利用抽象的方式表示和编码周围环境，通过利用实时地图、优先通行权和周围交通的信息。所提出的模型包括两个基于卷积神经网络（CNN）和循环神经网络（RNN）的编码器和一个解码器。提出了一个细化器，在模型内部进行回溯操作，以充分利用原始历史轨迹信息。我们在通过自动驾驶车辆在现实世界交叉口收集的专有数据集上评估了SAPI。结果表明，SAPI在预测交叉口车辆轨迹时表现出有希望的性能，并且胜过基准方法。6秒预测的平均位移误差（ADE）和最终位移误差（FDE）分别为1.84米和4.32米。我们还展示了所提出的模型可以准确预测不同场景中的车辆轨迹。

更新时间: 2024-07-29 04:53:18

领域: cs.LG

下载: http://arxiv.org/abs/2306.01812v2

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Weakly-Supervised Scene Graph Generation (WSSGG) research has recently emerged as an alternative to the fully-supervised approach that heavily relies on costly annotations. In this regard, studies on WSSGG have utilized image captions to obtain unlocalized triplets while primarily focusing on grounding the unlocalized triplets over image regions. However, they have overlooked the two issues involved in the triplet formation process from the captions: 1) Semantic over-simplification issue arises when extracting triplets from captions, where fine-grained predicates in captions are undesirably converted into coarse-grained predicates, resulting in a long-tailed predicate distribution, and 2) Low-density scene graph issue arises when aligning the triplets in the caption with entity/predicate classes of interest, where many triplets are discarded and not used in training, leading to insufficient supervision. To tackle the two issues, we propose a new approach, i.e., Large Language Model for weakly-supervised SGG (LLM4SGG), where we mitigate the two issues by leveraging the LLM's in-depth understanding of language and reasoning ability during the extraction of triplets from captions and alignment of entity/predicate classes with target data. To further engage the LLM in these processes, we adopt the idea of Chain-of-Thought and the in-context few-shot learning strategy. To validate the effectiveness of LLM4SGG, we conduct extensive experiments on Visual Genome and GQA datasets, showing significant improvements in both Recall@K and mean Recall@K compared to the state-of-the-art WSSGG methods. A further appeal is that LLM4SGG is data-efficient, enabling effective model training with a small amount of training images.

Updated: 2024-07-29 04:47:04

标题: LLM4SGG：用于弱监督场景图生成的大型语言模型

摘要: 弱监督场景图生成（WSSGG）研究最近出现作为一种替代完全监督方法的选择，后者严重依赖昂贵的注释。在这方面，关于WSSGG的研究利用图像标题获取未定位的三元组，主要集中在对图像区域上的未定位三元组进行基础。然而，他们忽略了从标题中形成三元组的两个问题：1）当从标题中提取三元组时，会出现语义过度简化问题，其中标题中的细粒度谓词会不希望地转换为粗粒度谓词，导致长尾谓词分布；2）当将标题中的三元组与感兴趣的实体/谓词类别对齐时，会出现低密度场景图问题，其中许多三元组被丢弃并不用于训练，导致监督不足。为了解决这两个问题，我们提出了一种新方法，即用于弱监督SGG的大型语言模型（LLM4SGG），通过利用LLM对语言和推理能力的深入理解，在从标题中提取三元组和将实体/谓词类别与目标数据对齐时缓解这两个问题。为了进一步让LLM参与这些过程，我们采用了Chain-of-Thought的想法和上下文少样本学习策略。为了验证LLM4SGG的有效性，我们在Visual Genome和GQA数据集上进行了大量实验，显示与最先进的WSSGG方法相比，在Recall@K和平均Recall@K上都有显著改进。另一个吸引人之处是，LLM4SGG是数据高效的，能够在少量训练图像上进行有效的模型训练。

更新时间: 2024-07-29 04:47:04

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.10404v8

Legal Aspects of Decentralized and Platform-Driven Economies

The sharing economy is sprawling across almost every sector and activity around the world. About a decade ago, there were only a handful of platform driven companies operating on the market. Zipcar, BlaBlaCar and Couchsurfing among them. Then Airbnb and Uber revolutionized the transportation and hospitality industries with a presence in virtually every major city. Access over ownership is the paradigm shift from the traditional business model that grants individuals the use of products or services without the necessity of buying them. Digital platforms, data and algorithm-driven companies as well as decentralized blockchain technologies have tremendous potential. But they are also changing the rules of the game. One of such technologies challenging the legal system are AI systems that will also reshape the current legal framework concerning the liability of operators, users and manufacturers. Therefore, this introductory chapter deals with explaining and describing the legal issues of some of these disruptive technologies. The chapter argues for a more forward-thinking and flexible regulatory structure.

Updated: 2024-07-29 04:42:49

标题: 去中心化和平台驱动经济的法律方面

摘要: 分享经济正在遍布世界各地的几乎所有行业和活动。大约十年前，市场上只有少数几家平台驱动的公司在运营。其中包括Zipcar、BlaBlaCar和Couchsurfing。然后，Airbnb和Uber在几乎每个主要城市都有存在，彻底改变了交通和酒店行业。使用权取代所有权是传统商业模式的范式转变，使个人可以使用产品或服务而无需购买。数字平台、数据和算法驱动的公司以及去中心化的区块链技术具有巨大潜力。但它们也正在改变游戏规则。挑战法律体系的其中一种技术是人工智能系统，这也将重塑有关运营商、用户和制造商责任的当前法律框架。因此，这个引言章节旨在解释和描述一些颠覆性技术的法律问题。该章节主张建立一个更具前瞻性和灵活性的监管结构。

更新时间: 2024-07-29 04:42:49

领域: cs.CY,cs.AI,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2407.20301v1

Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting

Accurate workload forecasting is critical for efficient resource management in cloud computing systems, enabling effective scheduling and autoscaling. Despite recent advances with transformer-based forecasting models, challenges remain due to the non-stationary, nonlinear characteristics of workload time series and the long-term dependencies. In particular, inconsistent performance between long-term history and near-term forecasts hinders long-range predictions. This paper proposes a novel framework leveraging self-supervised multiscale representation learning to capture both long-term and near-term workload patterns. The long-term history is encoded through multiscale representations while the near-term observations are modeled via temporal flow fusion. These representations of different scales are fused using an attention mechanism and characterized with normalizing flows to handle non-Gaussian/non-linear distributions of time series. Extensive experiments on 9 benchmarks demonstrate superiority over existing methods.

Updated: 2024-07-29 04:42:18

标题: 多尺度表示增强的时间流融合模型用于长期工作量预测

摘要: 准确的工作量预测对于云计算系统的资源管理至关重要，可以实现有效的调度和自动扩展。尽管基于transformer的预测模型取得了一些进展，但由于工作负载时间序列的非平稳、非线性特征以及长期依赖性，挑战依然存在。特别是，长期历史和近期预测之间的性能不一致阻碍了长期预测。本文提出了一个新颖的框架，利用自监督多尺度表示学习来捕捉长期和近期工作负载模式。长期历史通过多尺度表示进行编码，而近期观察通过时间流融合进行建模。这些不同尺度的表示通过注意机制融合，并利用归一化流来处理时间序列的非高斯/非线性分布。对9个基准测试的大量实验证明了该方法优于现有方法。

更新时间: 2024-07-29 04:42:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19697v1

Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Although the widespread use of AI systems in today's world is growing, many current AI systems are found vulnerable due to hidden bias and missing information, especially in the most commonly used forecasting system. In this work, we explore the robustness and explainability of AI-based forecasting systems. We provide an in-depth analysis of the underlying causality involved in the effect prediction task and further establish a causal graph based on treatment, adjustment variable, confounder, and outcome. Correspondingly, we design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations. Extensive results demonstrate the superiority of our system over state-of-the-art methods and show remarkable versatility and extensibility in practice.

Updated: 2024-07-29 04:16:45

标题: 因果干预预测系统用于稳健和可解释性效果预测

摘要: 尽管当今世界中人工智能系统的广泛应用正在增长，但许多当前的人工智能系统由于隐藏的偏见和缺失信息而被发现存在脆弱性，特别是在最常用的预测系统中。在这项工作中，我们探讨了基于人工智能的预测系统的稳健性和可解释性。我们对涉及效果预测任务的潜在因果关系进行了深入分析，并进一步建立了一个基于治疗、调整变量、混杂因素和结果的因果图。相应地，我们设计了一个基于变分自动编码器和完全条件规范的多重插补的因果干预预测系统（CIPS）。广泛的结果证明了我们的系统优于最先进的方法，并在实践中表现出了显著的多功能性和可扩展性。

更新时间: 2024-07-29 04:16:45

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2407.19688v1

Efficiently and Effectively: A Two-stage Approach to Balance Plaintext and Encrypted Text for Traffic Classification

Encrypted traffic classification is the task of identifying the application or service associated with encrypted network traffic. One effective approach for this task is to use deep learning methods to encode the raw traffic bytes directly and automatically extract features for classification (byte-based models). However, current byte-based models input raw traffic bytes, whether plaintext or encrypted text, for automated feature extraction, neglecting the distinct impacts of plaintext and encrypted text on downstream tasks. Additionally, these models primarily focus on improving classification accuracy, with little emphasis on the efficiency of models. In this paper, for the first time, we analyze the impact of plaintext and encrypted text on the model's effectiveness and efficiency. Based on our observations and findings, we propose a two-phase approach to balance the trade-off between plaintext and encrypted text in traffic classification. Specifically, Stage one is to Determine whether the Plain text is enough to be accurately Classified (DPC) using the proposed DPC Selector. This stage quickly identifies samples that can be classified using plaintext, leveraging explicit byte features in plaintext to enhance model's efficiency. Stage two aims to adaptively make a classification with the result from stage one. This stage incorporates encrypted text information for samples that cannot be classified using plaintext alone, ensuring the model's effectiveness on traffic classification tasks. Experiments on two datasets demonstrate that our proposed model achieves state-of-the-art results in both effectiveness and efficiency.

Updated: 2024-07-29 04:10:13

标题: 高效和有效：一种两阶段方法来平衡明文和加密文本以用于流量分类

摘要: 加密流量分类是识别与加密网络流量相关的应用程序或服务的任务。这项任务的一个有效方法是使用深度学习方法直接编码原始流量字节并自动提取特征进行分类（基于字节的模型）。然而，目前的基于字节的模型输入原始流量字节，无论是明文还是加密文本，用于自动特征提取，忽略了明文和加密文本对下游任务的不同影响。此外，这些模型主要集中在提高分类准确性上，对模型的效率关注较少。在本文中，我们首次分析了明文和加密文本对模型有效性和效率的影响。基于我们的观察和发现，我们提出了一个两阶段方法来平衡明文和加密文本在流量分类中的权衡。具体而言，第一阶段是使用提出的DPC选择器确定是否足够准确地对明文进行分类（DPC）。这个阶段快速识别出可以使用明文分类的样本，利用明文中的显式字节特征来增强模型的效率。第二阶段旨在根据第一阶段的结果自适应地进行分类。这个阶段结合了无法仅使用明文进行分类的样本的加密文本信息，确保模型在流量分类任务上的有效性。在两个数据集上的实验证明，我们提出的模型在有效性和效率上均取得了最先进的结果。

更新时间: 2024-07-29 04:10:13

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2407.19687v1

Opponent Modeling in Multiplayer Imperfect-Information Games

In many real-world settings agents engage in strategic interactions with multiple opposing agents who can employ a wide variety of strategies. The standard approach for designing agents for such settings is to compute or approximate a relevant game-theoretic solution concept such as Nash equilibrium and then follow the prescribed strategy. However, such a strategy ignores any observations of opponents' play, which may indicate shortcomings that can be exploited. We present an approach for opponent modeling in multiplayer imperfect-information games where we collect observations of opponents' play through repeated interactions. We run experiments against a wide variety of real opponents and exact Nash equilibrium strategies in three-player Kuhn poker and show that our algorithm significantly outperforms all of the agents, including the exact Nash equilibrium strategies.

Updated: 2024-07-29 04:07:44

标题: 多人不完全信息博弈中的对手建模

摘要: 在许多现实世界的情境中，代理人与多个对立的代理人进行战略互动，对方可以采用各种各样的策略。为设计这种情境下的代理人的标准方法是计算或近似相关的博弈论解决概念，如纳什均衡，然后遵循指定的策略。然而，这样的策略忽略了对手玩法的任何观察，这些观察可能表明存在可以利用的缺陷。我们提出了一种在多人不完全信息游戏中对手建模的方法，通过重复互动收集对手玩法的观察。我们在三人Kuhn扑克游戏中与各种真实对手和精确的纳什均衡策略进行实验，结果显示我们的算法明显优于所有代理人，包括精确的纳什均衡策略。

更新时间: 2024-07-29 04:07:44

领域: cs.GT,cs.AI,cs.MA,econ.TH

下载: http://arxiv.org/abs/2212.06027v4

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as p_correct. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM decoding. Additionally, we implement PPD and conduct preliminary experiments to empirically validate its efficacy, addressing potential practical overheads not covered by theoretical analysis.

Updated: 2024-07-29 04:03:22

标题: 预测性流水解码：一种用于精确LLM解码的计算延迟权衡

摘要: 这篇论文介绍了“预测流水线解码（PPD）”的方法，该方法可以加速大型语言模型（LLMs）中的贪婪解码，同时保持与原始解码完全相同的输出。与传统策略不同，PPD利用额外的计算资源来并行化当前令牌解码期间启动后续令牌解码。这种方法降低了解码延迟，并重新塑造了LLM解码策略中的权衡理解。我们开发了一个理论框架，使我们能够分析计算和延迟之间的权衡。利用这个框架，我们可以通过评估匹配率（表示为p_correct）来分析估计与我们提出的方法相关的潜在延迟减少。结果表明，利用额外的计算资源有可能加速LLM解码。此外，我们实现了PPD，并进行了初步实验以经验性地验证其有效性，解决了理论分析未涵盖的潜在实际开销。

更新时间: 2024-07-29 04:03:22

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.05908v2

Dataset Distillation for Offline Reinforcement Learning

Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io. We also provide our implementation at this GitHub repository: https://github.com/ggflow123/DDRL.

Updated: 2024-07-29 04:02:17

标题: 数据集精炼用于离线强化学习

摘要: 离线强化学习通常需要一个高质量的数据集，我们可以在其中训练一个策略。然而，在许多情况下，很难获得这样的数据集，也很难训练一个在实际环境中表现良好的策略。我们提出使用数据蒸馏来训练和提炼一个更好的数据集，然后可以用它来训练一个更好的策略模型。我们展示了我们的方法能够合成一个数据集，该数据集上训练的模型达到了与在完整数据集上训练的模型或使用百分位行为克隆训练的模型类似的性能。我们的项目网站位于https://datasetdistillation4rl.github.io。我们还在GitHub存储库https://github.com/ggflow123/DDRL 提供了我们的实现。

更新时间: 2024-07-29 04:02:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20299v1

Revisiting the robustness of post-hoc interpretability methods

Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure the robustness of post-hoc interpretability methods. We propose an approach and two new metrics to provide a fine-grained assessment of post-hoc interpretability methods. We show that the robustness is generally linked to its coarse-grained performance.

Updated: 2024-07-29 03:55:52

标题: 重新审视事后可解释性方法的稳健性

摘要: 事后解释性方法在可解释人工智能（XAI）中发挥着关键作用，因为它们能够确定训练深度学习模型认为重要以做出决策的数据部分。然而，不同的事后解释性方法往往提供不同的结果，对其准确性产生怀疑。因此，已经提出了几种评估策略来了解事后解释性的准确性。许多这些评估策略提供了粗粒度评估，即评估模型的性能平均降低了多少，通过破坏多个样本中的不同数据点。尽管这些策略在选择平均上最可靠的事后解释性方法方面是有效的，但它们未能提供样本级别的，也称为细粒度的评估。换句话说，它们没有衡量事后解释性方法的稳健性。我们提出了一种方法和两个新的指标，以提供对事后解释性方法的细粒度评估。我们表明，鲁棒性通常与其粗粒度性能有关。

更新时间: 2024-07-29 03:55:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19683v1

Motion Manifold Flow Primitives for Language-Guided Trajectory Generation

Developing text-based robot trajectory generation models is made particularly difficult by the small dataset size, high dimensionality of the trajectory space, and the inherent complexity of the text-conditional motion distribution. Recent manifold learning-based methods have partially addressed the dimensionality and dataset size issues, but struggle with the complex text-conditional distribution. In this paper we propose a text-based trajectory generation model that attempts to address all three challenges while relying on only a handful of demonstration trajectory data. Our key idea is to leverage recent flow-based models capable of capturing complex conditional distributions, not directly in the high-dimensional trajectory space, but rather in the low-dimensional latent coordinate space of the motion manifold, with deliberately designed regularization terms to ensure smoothness of motions and robustness to text variations. We show that our {\it Motion Manifold Flow Primitive (MMFP)} framework can accurately generate qualitatively distinct motions for a wide range of text inputs, significantly outperforming existing methods.

Updated: 2024-07-29 03:53:14

标题: 基于语言引导的轨迹生成的运动流流形基元

摘要: 开发基于文本的机器人轨迹生成模型在小数据集大小、轨迹空间的高维度和文本条件运动分布的固有复杂性方面特别困难。最近基于流形学习的方法在一定程度上解决了维度和数据集大小的问题，但在处理复杂的文本条件分布时仍然存在困难。在本文中，我们提出了一种基于文本的轨迹生成模型，试图解决这三个挑战，同时仅依赖少量演示轨迹数据。我们的关键思想是利用最近能够捕捉复杂条件分布的基于流的模型，不是直接在高维轨迹空间中，而是在运动流形的低维潜在坐标空间中，通过故意设计的正则化项来确保运动的平滑性和对文本变化的稳健性。我们展示了我们的“运动流形流原始（MMFP）”框架能够准确生成广泛文本输入的定性不同运动，明显优于现有方法。

更新时间: 2024-07-29 03:53:14

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.19681v1

Harnessing Large Vision and Language Models in Agriculture: A Review

Large models can play important roles in many domains. Agriculture is another key factor affecting the lives of people around the world. It provides food, fabric, and coal for humanity. However, facing many challenges such as pests and diseases, soil degradation, global warming, and food security, how to steadily increase the yield in the agricultural sector is a problem that humans still need to solve. Large models can help farmers improve production efficiency and harvest by detecting a series of agricultural production tasks such as pests and diseases, soil quality, and seed quality. It can also help farmers make wise decisions through a variety of information, such as images, text, etc. Herein, we delve into the potential applications of large models in agriculture, from large language model (LLM) and large vision model (LVM) to large vision-language models (LVLM). After gaining a deeper understanding of multimodal large language models (MLLM), it can be recognized that problems such as agricultural image processing, agricultural question answering systems, and agricultural machine automation can all be solved by large models. Large models have great potential in the field of agriculture. We outline the current applications of agricultural large models, and aims to emphasize the importance of large models in the domain of agriculture. In the end, we envisage a future in which famers use MLLM to accomplish many tasks in agriculture, which can greatly improve agricultural production efficiency and yield.

Updated: 2024-07-29 03:47:54

标题: 利用大型视觉和语言模型在农业领域的应用：一项综述

摘要: 大型模型在许多领域中发挥着重要作用。农业是影响全球人们生活的另一个关键因素。它为人类提供食物、织物和煤炭。然而，面临着诸如害虫疾病、土壤退化、全球变暖和粮食安全等许多挑战，如何稳定增加农业部门的产量是人类仍然需要解决的问题。大型模型可以帮助农民通过检测一系列农业生产任务，如害虫疾病、土壤质量和种子质量，提高生产效率和收成。它还可以通过各种信息，如图像、文本等，帮助农民做出明智的决策。在这里，我们深入探讨了大型模型在农业中的潜在应用，从大型语言模型(LLM)和大型视觉模型(LVM)到大型视觉语言模型(LVLM)。在对多模态大型语言模型(MLLM)有了更深入了解之后，可以认识到农业图像处理、农业问答系统和农业机器自动化等问题都可以通过大型模型解决。大型模型在农业领域具有巨大潜力。我们概述了农业大型模型的当前应用，并旨在强调大型模型在农业领域的重要性。最后，我们设想一个未来，在那里农民使用MLLM来完成许多农业任务，这将极大地提高农业生产效率和产量。

更新时间: 2024-07-29 03:47:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.19679v1

Physics Informed Kolmogorov-Arnold Neural Networks for Dynamical Analysis via Efficent-KAN and WAV-KAN

Physics-informed neural networks have proven to be a powerful tool for solving differential equations, leveraging the principles of physics to inform the learning process. However, traditional deep neural networks often face challenges in achieving high accuracy without incurring significant computational costs. In this work, we implement the Physics-Informed Kolmogorov-Arnold Neural Networks (PIKAN) through efficient-KAN and WAV-KAN, which utilize the Kolmogorov-Arnold representation theorem. PIKAN demonstrates superior performance compared to conventional deep neural networks, achieving the same level of accuracy with fewer layers and reduced computational overhead. We explore both B-spline and wavelet-based implementations of PIKAN and benchmark their performance across various ordinary and partial differential equations using unsupervised (data-free) and supervised (data-driven) techniques. For certain differential equations, the data-free approach suffices to find accurate solutions, while in more complex scenarios, the data-driven method enhances the PIKAN's ability to converge to the correct solution. We validate our results against numerical solutions and achieve $99 \%$ accuracy in most scenarios.

Updated: 2024-07-29 03:46:15

标题: 物理信息科尔莫戈洛夫-阿诺德神经网络在动力学分析中的应用：通过高效KAN和WAV-KAN

摘要: 物理信息神经网络已被证明是解决微分方程的强大工具，利用物理原理来指导学习过程。然而，传统的深度神经网络通常面临在不增加显著计算成本的情况下实现高准确度的挑战。在本研究中，我们通过高效KAN和WAV-KAN实现了基于科尔莫哥洛夫-阿诺德表示定理的物理信息科尔莫哥洛夫-阿诺德神经网络（PIKAN）。PIKAN相较于传统的深度神经网络表现出更优越的性能，能够在更少的层次和减少的计算开销下实现相同水平的准确度。我们探索了基于B样条和小波的PIKAN实现，并使用无监督（无数据）和监督（数据驱动）技术来评估它们在各种普通和偏微分方程中的表现。对于某些微分方程，无数据方法足以找到准确解，而在更复杂的场景中，数据驱动方法增强了PIKAN收敛到正确解的能力。我们将结果与数值解进行验证，在大多数情况下实现了99%的准确度。

更新时间: 2024-07-29 03:46:15

领域: cs.LG

下载: http://arxiv.org/abs/2407.18373v2

Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI

Privacy is a hot topic for policymakers across the globe, including the United States. Evolving advances in AI and emerging concerns about the misuse of personal data have pushed policymakers to draft legislation on trustworthy AI and privacy protection for its citizens. This paper presents the state of the privacy legislation at the U.S. Congress and outlines how voice data is considered as part of the legislation definition. This paper also reviews additional privacy protection for children. This paper presents a holistic review of enacted and proposed privacy laws, and consideration for voice data, including guidelines for processing children's data, in those laws across the fifty U.S. states. As a groundbreaking alternative to actual human data, ethically generated synthetic data allows much flexibility to keep AI innovation in progress. Given the consideration of synthetic data in AI legislation by policymakers to be relatively new, as compared to that of privacy laws, this paper reviews regulatory considerations for synthetic data.

Updated: 2024-07-29 03:43:16

标题: 在语音隐私方面的美国立法格局：现行法律、提议法案、对儿童的保护以及用于AI的合成数据

摘要: 隐私是全球各国政策制定者关注的热门话题，包括美国。人工智能的不断进步和对个人数据滥用的新兴担忧推动政策制定者起草关于可信人工智能和隐私保护的立法。本文介绍了美国国会隐私立法的现状，并概述了语音数据如何被视为立法定义的一部分。本文还审查了针对儿童的额外隐私保护措施。本文综合审查了已制定和拟议的隐私法律，并考虑了在这些法律中对语音数据的处理指导，以及考虑了在美国五十个州的这些法律中儿童数据的处理指导。作为实际人类数据的一个突破性替代方案，道德生成的合成数据允许保持人工智能创新的灵活性。鉴于政策制定者对合成数据在人工智能立法中的考虑相对较新，相比隐私法，本文审查了合成数据的监管考虑。

更新时间: 2024-07-29 03:43:16

领域: cs.CY,cs.CR,cs.SD,eess.AS,I.2; J.1

下载: http://arxiv.org/abs/2407.19677v1

InstructIE: A Bilingual Instruction-based Information Extraction Dataset

Large language models can perform well on general natural language tasks, but their effectiveness is still suboptimal for information extraction (IE). Recent works indicate that the main reason lies in the lack of extensive data on IE instructions. Note that the existing datasets on IE instructions not only have limited coverage but also involve high construction costs. To address this issue, we introduce InstructIE, a bilingual instruction-based IE dataset, which covers 12 diverse domains. We propose KG2Instruction, a framework specifically for the automatic generation of such datasets. Additionally, we manually annotate the test set. Experimental results demonstrate that large language models trained with InstructIE can not only obtain better IE capabilities but also enhance zero-shot performance compared with baselines.

Updated: 2024-07-29 03:41:34

标题: InstructIE: 一个基于双语指令的信息提取数据集

摘要: 大型语言模型在一般自然语言任务上表现良好，但它们在信息抽取（IE）方面的效果仍然不理想。最近的研究表明，主要原因在于缺乏关于IE指令的大量数据。需要指出的是，现有的IE指令数据集不仅覆盖有限，而且涉及高昂的建设成本。为了解决这个问题，我们引入了InstructIE，一个双语指令为基础的IE数据集，涵盖了12个不同领域。我们提出了KG2Instruction，一个专门用于自动生成这类数据集的框架。此外，我们还手动注释了测试集。实验结果表明，使用InstructIE训练的大型语言模型不仅可以获得更好的IE能力，还可以与基线相比提高零样本性能。

更新时间: 2024-07-29 03:41:34

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2305.11527v4

Quantum copy-protection of compute-and-compare programs in the quantum random oracle model

Copy-protection allows a software distributor to encode a program in such a way that it can be evaluated on any input, yet it cannot be "pirated" - a notion that is impossible to achieve in a classical setting. Aaronson (CCC 2009) initiated the formal study of quantum copy-protection schemes, and speculated that quantum cryptography could offer a solution to the problem thanks to the quantum no-cloning theorem. In this work, we introduce a quantum copy-protection scheme for a large class of evasive functions known as "compute-and-compare programs" - a more expressive generalization of point functions. A compute-and-compare program $\mathsf{CC}[f,y]$ is specified by a function $f$ and a string $y$ within its range: on input $x$, $\mathsf{CC}[f,y]$ outputs $1$, if $f(x) = y$, and $0$ otherwise. We prove that our scheme achieves non-trivial security against fully malicious adversaries in the quantum random oracle model (QROM), which makes it the first copy-protection scheme to enjoy any level of provable security in a standard cryptographic model. As a complementary result, we show that the same scheme fulfils a weaker notion of software protection, called "secure software leasing", introduced very recently by Ananth and La Placa (eprint 2020), with a standard security bound in the QROM, i.e. guaranteeing negligible adversarial advantage. Finally, as a third contribution, we elucidate the relationship between unclonable encryption and copy-protection for multi-bit output point functions.

Updated: 2024-07-29 03:37:53

标题: 在量子随机预言模型中计算和比较程序的量子复制保护

摘要: 复制保护允许软件分发商对程序进行编码，以便可以在任何输入上进行评估，但却无法被“盗版” - 这在传统环境中是不可能实现的概念。Aaronson（CCC 2009）启动了量子复制保护方案的形式研究，并推测量子密码学可以通过量子不克隆定理为问题提供解决方案。在这项工作中，我们介绍了一种针对被称为“计算与比较程序”的大类回避函数的量子复制保护方案 - 这是点函数的更具表现力的泛化。计算与比较程序$\mathsf{CC}[f,y]$由一个函数$f$和一个在其范围内的字符串$y$指定：在输入$x$时，$\mathsf{CC}[f,y]$输出$1$，如果$f(x)=y$，否则输出$0$。我们证明我们的方案在量子随机预言机模型（QROM）下对全面恶意对手实现了非平凡的安全性，这使得它成为第一个在标准加密模型中享有任何可证安全级别的复制保护方案。作为一个补充结果，我们展示了相同方案在QROM中实现了一种较弱的软件保护概念，称为“安全软件租赁”，这是由Ananth和La Placa（eprint 2020）最近引入的，具有标准的安全性边界，即保证微不足道的对抗优势。最后，作为第三项贡献，我们阐明了不可克隆加密与多比特输出点函数的复制保护之间的关系。

更新时间: 2024-07-29 03:37:53

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2009.13865v5

Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks

As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the field, offering comprehensive evaluation tests and benchmarks for assessing the capabilities of LLMs across diverse domains. This paper provides an exploration and critical analysis of some of these evaluation methodologies, shedding light on their strengths, limitations, and impact on advancing the state-of-the-art in natural language processing.

Updated: 2024-07-29 03:37:14

标题: 超越指标：对大型语言模型评估框架中的可变性进行批判性分析

摘要: 随着大型语言模型（LLMs）不断发展，对于健壮和标准化的评估基准的需求变得至关重要。评估这些模型的性能是一个复杂的挑战，需要仔细考虑各种语言任务、模型架构和基准测试方法。近年来，各种框架已经出现作为该领域的重要贡献，提供了全面的评估测试和基准，用于评估LLMs在不同领域的能力。本文探讨并对其中一些评估方法进行了批判性分析，揭示了它们的优点、局限性以及对推动自然语言处理领域最新技术的影响。

更新时间: 2024-07-29 03:37:14

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.21072v1

Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement Network

Trajectory prediction is crucial for autonomous driving as it aims to forecast the future movements of traffic participants. Traditional methods usually perform holistic inference on the trajectories of agents, neglecting the differences in prediction difficulty among agents. This paper proposes a novel Difficulty-Guided Feature Enhancement Network (DGFNet), which leverages the prediction difficulty differences among agents for multi-agent trajectory prediction. Firstly, we employ spatio-temporal feature encoding and interaction to capture rich spatio-temporal features. Secondly, a difficulty-guided decoder is used to control the flow of future trajectories into subsequent modules, obtaining reliable future trajectories. Then, feature interaction and fusion are performed through the future feature interaction module. Finally, the fused agent features are fed into the final predictor to generate the predicted trajectory distributions for multiple participants. Experimental results demonstrate that our DGFNet achieves state-of-the-art performance on the Argoverse 1\&2 motion forecasting benchmarks. Ablation studies further validate the effectiveness of each module. Moreover, compared with SOTA methods, our method balances trajectory prediction accuracy and real-time inference speed.

Updated: 2024-07-29 03:24:57

标题: 多智能体轨迹预测：基于困难引导特征增强网络

摘要: 轨迹预测对于自动驾驶至关重要，因为它旨在预测交通参与者的未来移动。传统方法通常对代理的轨迹进行整体推断，忽略了代理之间在预测困难度上的差异。本文提出了一种新颖的难度引导特征增强网络（DGFNet），利用了代理之间的预测困难度差异进行多代理轨迹预测。首先，我们采用时空特征编码和交互来捕捉丰富的时空特征。其次，使用难度引导解码器来控制未来轨迹流入后续模块，获得可靠的未来轨迹。然后，通过未来特征交互模块执行特征交互和融合。最后，将融合的代理特征输入最终预测器，为多个参与者生成预测的轨迹分布。实验结果表明，我们的DGFNet在Argoverse 1和2运动预测基准上取得了最先进的性能。消融研究进一步验证了每个模块的有效性。此外，与SOTA方法相比，我们的方法平衡了轨迹预测准确性和实时推断速度。

更新时间: 2024-07-29 03:24:57

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.18551v2

A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation

Ophthalmology consultations are crucial for diagnosing, treating, and preventing eye diseases. However, the growing demand for consultations exceeds the availability of ophthalmologists. By leveraging large pre-trained language models, we can design effective dialogues for specific scenarios, aiding in consultations. Traditional fine-tuning strategies for question-answering tasks are impractical due to increasing model size and often ignoring patient-doctor role function during consultations. In this paper, we propose EyeDoctor, an ophthalmic medical questioning large language model that enhances accuracy through doctor-patient role perception guided and an augmented knowledge base with external disease information. Experimental results show EyeDoctor achieves higher question-answering precision in ophthalmology consultations. Notably, EyeDoctor demonstrated a 7.25% improvement in Rouge-1 scores and a 10.16% improvement in F1 scores on multi-round datasets compared to second best model ChatGPT, highlighting the importance of doctor-patient role differentiation and dynamic knowledge base expansion for intelligent medical consultations. EyeDoc also serves as a free available web based service and souce code is available at https://github.com/sperfu/EyeDoc.

Updated: 2024-07-29 03:16:13

标题: 基于风格差异的眼科会诊专用引导大型语言模型角色

摘要: 眼科会诊对于诊断、治疗和预防眼部疾病至关重要。然而，对眼科医生的需求增长超过了其供给。通过利用大型预训练语言模型，我们可以为特定情境设计有效的对话，帮助会诊。传统的微调策略在问答任务中变得不切实际，因为模型规模增大，并且通常忽视了会诊过程中患者与医生的角色功能。在本文中，我们提出了EyeDoctor，一种眼科医学疑问大型语言模型，通过医患角色感知引导和增强知识库与外部疾病信息，提高了准确性。实验结果显示，EyeDoctor在眼科会诊中实现了更高的问答精度。值得注意的是，EyeDoctor在多轮数据集上的Rouge-1分数提高了7.25%，F1分数提高了10.16%，相比第二好的模型ChatGPT，突显了医患角色区分和动态知识库扩展对智能医疗会诊的重要性。EyeDoc还作为一个免费的基于网络的服务，并且源代码可在https://github.com/sperfu/EyeDoc获取。

更新时间: 2024-07-29 03:16:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18483v2

Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity

Traffic accidents pose a significant risk to human health and property safety. Therefore, to prevent traffic accidents, predicting their risks has garnered growing interest. We argue that a desired prediction solution should demonstrate resilience to the complexity of traffic accidents. In particular, it should adequately consider the regional background, accurately capture both spatial proximity and semantic similarity, and effectively address the sparsity of traffic accidents. However, these factors are often overlooked or difficult to incorporate. In this paper, we propose a novel multi-granularity hierarchical spatio-temporal network. Initially, we innovate by incorporating remote sensing data, facilitating the creation of hierarchical multi-granularity structure and the comprehension of regional background. We construct multiple high-level risk prediction tasks to enhance model's ability to cope with sparsity. Subsequently, to capture both spatial proximity and semantic similarity, region feature and multi-view graph undergo encoding processes to distill effective representations. Additionally, we propose message passing and adaptive temporal attention module that bridges different granularities and dynamically captures time correlations inherent in traffic accident patterns. At last, a multivariate hierarchical loss function is devised considering the complexity of the prediction purpose. Extensive experiments on two real datasets verify the superiority of our model against the state-of-the-art methods.

Updated: 2024-07-29 03:10:15

标题: 城市交通事故风险预测再探讨：区域性、接近性、相似性和稀疏性

摘要: 交通事故对人类健康和财产安全构成了重大风险。因此，预防交通事故，预测其风险已经引起了越来越多的关注。我们认为，一个理想的预测解决方案应该表现出对交通事故复杂性的韧性。特别是，它应该充分考虑地区背景，准确捕捉空间接近性和语义相似性，并有效解决交通事故的稀疏性。然而，这些因素往往被忽视或难以整合。在本文中，我们提出了一种新颖的多粒度层次时空网络。首先，我们通过整合遥感数据创新，促进了多粒度层次结构的建立和地区背景的理解。我们构建了多个高级风险预测任务，以增强模型应对稀疏性的能力。随后，为了捕捉空间接近性和语义相似性，地区特征和多视图图经过编码过程，提炼有效的表示。此外，我们提出了消息传递和自适应时间注意模块，桥接不同粒度，并动态捕捉交通事故模式中固有的时间相关性。最后，我们设计了一个考虑预测目的复杂性的多元层次损失函数。对两个真实数据集的大量实验验证了我们的模型优于最先进的方法。

更新时间: 2024-07-29 03:10:15

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2407.19668v1

Mathematical models for off-ball scoring prediction in basketball

In professional basketball, the accurate prediction of scoring opportunities based on strategic decision-making is crucial for spatial and player evaluations. However, traditional models often face challenges in accounting for the complexities of off-ball movements, which are essential for comprehensive performance evaluations. In this study, we propose two mathematical models to predict off-ball scoring opportunities in basketball, considering pass-to-score and dribble-to-score sequences: the Ball Movement for Off-ball Scoring (BMOS) and the Ball Intercept and Movement for Off-ball Scoring (BIMOS) models. The BMOS model adapts principles from the Off-Ball Scoring Opportunities (OBSO) model, originally designed for soccer, to basketball, whereas the BIMOS model also incorporates the likelihood of interception during ball movements. We evaluated these models using player tracking data from 630 NBA games in the 2015-2016 regular season, demonstrating that the BIMOS model outperforms the BMOS model in terms of team scoring prediction accuracy, while also highlighting its potential for further development. Overall, the BIMOS model provides valuable insights for tactical analysis and player evaluation in basketball.

Updated: 2024-07-29 03:06:40

标题: 篮球比赛中非持球得分预测的数学模型

摘要: 在职业篮球中，基于战略决策准确预测得分机会对于空间和球员评估至关重要。然而，传统模型在考虑到必不可少的无球移动复杂性时往往面临挑战，这对于全面的表现评估至关重要。在这项研究中，我们提出了两种数学模型来预测篮球中的无球得分机会，考虑到传球得分和运球得分序列：球移动用于无球得分（BMOS）和球截拦和移动用于无球得分（BIMOS）模型。BMOS模型从最初设计用于足球的无球得分机会（OBSO）模型中借鉴原则，将其适用于篮球，而BIMOS模型还考虑了在球移动过程中拦截的可能性。我们使用2015-2016赛季630场NBA比赛的球员跟踪数据对这些模型进行了评估，结果表明BIMOS模型在团队得分预测准确性方面优于BMOS模型，同时也强调了其进一步发展的潜力。总体而言，BIMOS模型为篮球中的战术分析和球员评估提供了有价值的见解。

更新时间: 2024-07-29 03:06:40

领域: cs.LG

下载: http://arxiv.org/abs/2406.08749v2

Smart Language Agents in Real-World Planning

Comprehensive planning agents have been a long term goal in the field of artificial intelligence. Recent innovations in Natural Language Processing have yielded success through the advent of Large Language Models (LLMs). We seek to improve the travel-planning capability of such LLMs by extending upon the work of the previous paper TravelPlanner. Our objective is to explore a new method of using LLMs to improve the travel planning experience. We focus specifically on the "sole-planning" mode of travel planning; that is, the agent is given necessary reference information, and its goal is to create a comprehensive plan from the reference information. While this does not simulate the real-world we feel that an optimization of the sole-planning capability of a travel planning agent will still be able to enhance the overall user experience. We propose a semi-automated prompt generation framework which combines the LLM-automated prompt and "human-in-the-loop" to iteratively refine the prompt to improve the LLM performance. Our result shows that LLM automated prompt has its limitations and "human-in-the-loop" greatly improves the performance by $139\%$ with one single iteration.

Updated: 2024-07-29 03:00:30

标题: 智能语言代理在现实世界规划中的应用

摘要: 综合规划代理人一直是人工智能领域的长期目标。最近自然语言处理领域的创新通过大型语言模型（LLMs）取得了成功。我们试图通过在前一篇论文TravelPlanner的基础上进行扩展，改进这种LLMs的旅行规划能力。我们的目标是探索一种利用LLMs改进旅行规划体验的新方法。我们特别关注旅行规划中的“单独规划”模式；即，代理人被提供必要的参考信息，其目标是根据参考信息创建综合计划。尽管这并不模拟现实世界，但我们认为优化旅行规划代理人的单独规划能力仍能提升整体用户体验。我们提出了一种半自动提示生成框架，结合LLM自动提示和“人机协作”，通过迭代改进提示来提高LLM的性能。我们的结果显示，LLM自动提示存在局限性，“人机协作”通过一次迭代使性能提高了139%。

更新时间: 2024-07-29 03:00:30

领域: cs.AI

下载: http://arxiv.org/abs/2407.19667v1

Adaptive Soft Error Protection for Deep Learning

The rising incidence of soft errors in hardware systems represents a considerable risk to the reliability of deep learning systems and can precipitate severe malfunctions. Although essential, soft error mitigation can impose substantial costs on deep learning systems that are inherently demanding in terms of computation and memory. Previous research has primarily explored variations in vulnerability among different components of computing engines or neural networks, aiming for selective protection to minimize protection overhead. Our approach diverges from these studies by recognizing that the susceptibility of deep learning tasks to soft errors is heavily input-dependent. Notably, some inputs are simpler for deep learning models and inherently exhibit greater tolerance to soft errors. Conversely, more complex inputs are prone to soft error impact. Based on these insights, we introduce an adaptive soft error protection strategy that tailors protection to the computational demands of individual inputs. To implement this strategy, we develop a metric for assessing the complexity of inputs and deploy a lightweight machine learning algorithm to gauge input difficulty. Subsequently, we employ robust protection for challenging inputs and minimal protection for simpler ones. Our experimental evaluation across diverse datasets and deep learning tasks reveals that our adaptive strategy reduces the soft error protection overhead by an average of 46.9%, without compromising system reliability.

Updated: 2024-07-29 02:54:52

标题: 深度学习的自适应软错误保护

摘要: 硬件系统中软错误的不断增加代表着对深度学习系统可靠性的相当大风险，并可能引发严重故障。尽管软错误的缓解是必不可少的，但可能会给本身在计算和内存方面要求很高的深度学习系统带来重大成本。先前的研究主要探讨了计算引擎或神经网络不同组件之间的脆弱性变化，旨在选择性保护以最小化保护开销。我们的方法与这些研究不同，因为我们认识到深度学习任务对软错误的敏感性在很大程度上取决于输入。值得注意的是，对于深度学习模型来说，一些输入更简单，天生具有更大的软错误容忍度。相反，更复杂的输入容易受到软错误的影响。基于这些见解，我们引入了一种自适应软错误保护策略，根据个体输入的计算需求定制保护。为了实施这一策略，我们开发了一个用于评估输入复杂性的度量标准，并部署了一个轻量级的机器学习算法来衡量输入难度。随后，我们为具有挑战性的输入提供强大的保护，并为更简单的输入提供最小的保护。我们在不同数据集和深度学习任务上进行的实验评估表明，我们的自适应策略将软错误保护开销平均降低了46.9％，而不会损害系统可靠性。

更新时间: 2024-07-29 02:54:52

领域: cs.LG

下载: http://arxiv.org/abs/2407.19664v1

Short-Term Forecasting of Photovoltaic Power Generation Based on Entropy during the Foggy Winter

Solar energy is one of the most promising renewable energy resources. Forecasting photovoltaic power generation is an important way to increase photovoltaic penetration. However, the task of photovoltaic forecasting is complicated due to its property of uncertainty, especially in specific regions during the foggy winter. This paper proposes a novel model to accomplish the problem. A developed entropy is created to qualify the uncertainty during the foggy winter. The clustering method and modified retention network are applied to reduce complexity and forecast, respectively. We adopt an optimization to optimize the hyperparameters. Results are validated from the multivariate forecasting model using the dataset from a photovoltaic power station in Jiangsu Province, China. Experiments show that the proposed model improves the forecasting accuracy compared to various models during the foggy winter.

Updated: 2024-07-29 02:53:39

标题: 基于熵的雾冬光伏发电短期预测

摘要: 太阳能是最具潜力的可再生能源资源之一。预测光伏发电是增加光伏渗透率的重要途径。然而，由于光伏发电具有不确定性的特性，尤其是在雾霾的冬季，光伏预测任务变得复杂。本文提出了一个新颖的模型来解决这个问题。创建了一个开发的熵来评估雾霾冬季期间的不确定性。应用了聚类方法和修改的保留网络来分别降低复杂性和进行预测。我们采用优化来优化超参数。通过使用中国江苏省一个光伏电站的数据集进行多变量预测模型的验证。实验结果表明，与在雾霾冬季期间的各种模型相比，所提出的模型提高了预测准确性。

更新时间: 2024-07-29 02:53:39

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.19663v1

Towards Detecting IoT Event Spoofing Attacks Using Time-Series Classification

Internet of Things (IoT) devices have grown in popularity since they can directly interact with the real world. Home automation systems automate these interactions. IoT events are crucial to these systems' decision-making but are often unreliable. Security vulnerabilities allow attackers to impersonate events. Using statistical machine learning, IoT event fingerprints from deployed sensors have been used to detect spoofed events. Multivariate temporal data from these sensors has structural and temporal properties that statistical machine learning cannot learn. These schemes' accuracy depends on the knowledge base; the larger, the more accurate. However, the lack of huge datasets with enough samples of each IoT event in the nascent field of IoT can be a bottleneck. In this work, we deployed advanced machine learning to detect event-spoofing assaults. The temporal nature of sensor data lets us discover important patterns with fewer events. Our rigorous investigation of a publicly available real-world dataset indicates that our time-series-based solution technique learns temporal features from sensor data faster than earlier work, even with a 100- or 500-fold smaller training sample, making it a realistic IoT solution.

Updated: 2024-07-29 02:52:59

标题: 朝向利用时间序列分类检测物联网事件欺骗攻击

摘要: 物联网(IoT)设备因能够直接与现实世界互动而越来越受欢迎。家庭自动化系统自动化了这些互动。对于这些系统的决策来说，物联网事件至关重要，但往往不可靠。安全漏洞允许攻击者冒充事件。利用统计机器学习，从部署传感器中提取的物联网事件指纹已被用于检测伪造事件。这些传感器的多变量时间数据具有统计机器学习无法学习的结构和时间属性。这些方案的准确性取决于知识库；其规模越大，准确性就越高。然而，在物联网新兴领域中，缺乏足够的每个IoT事件样本的庞大数据集可能成为瓶颈。在这项工作中，我们部署了先进的机器学习来检测事件伪造攻击。传感器数据的时间性质让我们能够在较少的事件中发现重要模式。我们对一个公开可用的真实世界数据集进行了严格的研究，表明我们基于时间序列的解决方案技术比以往的工作更快地从传感器数据中学习时间特征，即使是在训练样本减少100倍或500倍的情况下，这使得它成为一个现实可行的物联网解决方案。

更新时间: 2024-07-29 02:52:59

领域: cs.CR

下载: http://arxiv.org/abs/2407.19662v1

Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

In recent years, there is increased interest in foundation models for geoscience due to vast amount of earth observing satellite imagery. Existing remote sensing foundation models make use of the various sources of spectral imagery to create large models pretrained on masked reconstruction task. The embeddings from these foundation models are then used for various downstream remote sensing applications. In this paper we propose a foundational modeling framework for remote sensing geoscience applications, that goes beyond these traditional single modality masked autoencoder family of foundation models. This framework leverages the knowledge guided principles that the spectral imagery captures the impact of the physical drivers on the environmental system, and that the relationship between them is governed by the characteristics of the system. Specifically, our method, called MultiModal Variable Step Forecasting (MM-VSF), uses mutlimodal data (spectral imagery and weather) as its input and a variable step forecasting task as its pretraining objective. In our evaluation we show forecasting of satellite imagery using weather can be used as an effective pretraining task for foundation models. We further show the effectiveness of the embeddings from MM-VSF on the downstream task of pixel wise crop mapping, when compared with a model trained in the traditional setting of single modality input and masked reconstruction based pretraining.

Updated: 2024-07-29 02:49:55

标题: 朝向空间-时间遥感应用的知识引导多模态基础模型

摘要: 近年来，由于大量的地球观测卫星图像，人们对地质基础模型的兴趣日益增加。现有的遥感基础模型利用各种光谱图像源创建了在掩膜重建任务上预训练的大型模型。然后利用这些基础模型的嵌入用于各种下游遥感应用。本文提出了一个用于遥感地球科学应用的基础建模框架，超越了传统的单模态掩膜自编码器家族的基础模型。该框架利用知识引导的原则，即光谱图像捕捉了物理驱动因素对环境系统的影响，并且它们之间的关系受系统特性的控制。具体而言，我们的方法，称为MultiModal Variable Step Forecasting（MM-VSF），以多模态数据（光谱图像和天气）作为输入，以可变步长预测任务作为其预训练目标。在我们的评估中，我们展示了利用天气预测卫星图像可以作为基础模型的有效预训练任务。我们进一步展示了与传统设置下的单模态输入和基于掩膜重建的预训练模型相比，MM-VSF的嵌入在像素级作物映射的下游任务上的有效性。

更新时间: 2024-07-29 02:49:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.19660v1

AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias

Artificial intelligence (AI) is rapidly advancing in healthcare, enhancing the efficiency and effectiveness of services across various specialties, including cardiology, ophthalmology, dermatology, emergency medicine, etc. AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions by leveraging technologies such as machine learning, neural networks, and natural language processing. However, these advancements also introduce substantial ethical and fairness challenges, particularly related to biases in data and algorithms. These biases can lead to disparities in healthcare delivery, affecting diagnostic accuracy and treatment outcomes across different demographic groups. This survey paper examines the integration of AI in healthcare, highlighting critical challenges related to bias and exploring strategies for mitigation. We emphasize the necessity of diverse datasets, fairness-aware algorithms, and regulatory frameworks to ensure equitable healthcare delivery. The paper concludes with recommendations for future research, advocating for interdisciplinary approaches, transparency in AI decision-making, and the development of innovative and inclusive AI applications.

Updated: 2024-07-29 02:39:17

标题: 人工智能驱动的医疗保健：确保公平性和减轻偏见的调查

摘要: 人工智能（AI）在医疗保健领域迅速发展，增强了各个专业的服务效率和有效性，包括心脏病学、眼科学、皮肤病学、急诊医学等。AI应用通过利用机器学习、神经网络和自然语言处理等技术，显著提高了诊断准确性、治疗个性化和患者预后预测。然而，这些进展也引入了重大的伦理和公平挑战，特别是与数据和算法中的偏见有关。这些偏见可能导致医疗保健服务的不平等，影响不同人口群体的诊断准确性和治疗结果。本调查论文研究了AI在医疗保健中的整合，突出了与偏见相关的关键挑战，并探讨了缓解策略。我们强调多样化数据集、公平感知算法和监管框架的必要性，以确保公平的医疗保健服务。论文最后提出了未来研究的建议，倡导跨学科方法、AI决策透明度和创新包容的AI应用的发展。

更新时间: 2024-07-29 02:39:17

领域: cs.AI

下载: http://arxiv.org/abs/2407.19655v1

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.

Updated: 2024-07-29 02:36:45

标题: 微调如何影响视觉语言模型的超出分布检测？

摘要: 最近大型视觉语言模型，如CLIP，展现出了非凡的超出分布（OOD）检测和泛化性能。然而，它们的零样本内分布（ID）准确性通常对下游数据集有所限制。最近基于CLIP的微调方法，如提示学习，在有OOD标签的情况下展示了对ID分类和OOD泛化的显著改进。然而，尚不清楚模型在没有OOD标签的情况下对语义转变的可靠性。在本文中，我们旨在弥合差距，并提出了一项全面研究，以了解微调对少样本下游任务的OOD检测的影响。通过将OOD检测框架化为多模态概念匹配，我们建立了微调方法与各种OOD得分之间的联系。我们的结果表明，对于基于CLIP的微调，选择合适的OOD得分是至关重要的。特别是，最大概念匹配（MCM）得分始终提供了一个有希望的解决方案。我们还展示了提示学习在零样本对应物上展示了最先进的OOD检测性能。

更新时间: 2024-07-29 02:36:45

领域: cs.CV,cs.CY,cs.LG

下载: http://arxiv.org/abs/2306.06048v3

Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Designing distributed filter circuits (DFCs) is complex and time-consuming, involving setting and optimizing multiple hyperparameters. Traditional optimization methods, such as using the commercial finite element solver HFSS (High-Frequency Structure Simulator) to enumerate all parameter combinations with fixed steps and then simulate each combination, are not only time-consuming and labor-intensive but also rely heavily on the expertise and experience of electronics engineers, making it difficult to adapt to rapidly changing design requirements. Additionally, these commercial tools struggle with precise adjustments when parameters are sensitive to numerical changes, resulting in limited optimization effectiveness. This study proposes a novel end-to-end automated method for DFC design. The proposed method harnesses reinforcement learning (RL) algorithms, eliminating the dependence on the design experience of engineers. Thus, it significantly reduces the subjectivity and constraints associated with circuit design. The experimental findings demonstrate clear improvements in design efficiency and quality when comparing the proposed method with traditional engineer-driven methods. Furthermore, the proposed method achieves superior performance when designing complex or rapidly evolving DFCs, highlighting the substantial potential of RL in circuit design automation. In particular, compared to the existing DFC automation design method CircuitGNN, our method achieves an average performance improvement of 8.72%. Additionally, the execution efficiency of our method is 2000 times higher than CircuitGNN on the CPU and 241 times higher on the GPU.

Updated: 2024-07-29 02:34:20

标题: 通过强化学习自动设计和优化分布式滤波电路

摘要: 设计分布式滤波电路（DFCs）是复杂且耗时的工作，涉及设置和优化多个超参数。传统的优化方法，如使用商业有限元求解器HFSS（高频结构模拟器）来列举所有参数组合并进行固定步长的模拟，不仅耗时费力，而且严重依赖电子工程师的专业知识和经验，使其难以适应快速变化的设计需求。此外，这些商业工具在参数对数值变化敏感时很难进行精确调整，导致优化效果有限。本研究提出了一种新颖的端到端自动化DFC设计方法。所提出的方法利用强化学习（RL）算法，消除了工程师设计经验的依赖性。因此，它显著减少了与电路设计相关的主观性和约束。实验结果表明，与传统的工程师驱动方法相比，所提出的方法在设计效率和质量方面有明显改善。此外，所提出的方法在设计复杂或快速发展的DFCs时表现出卓越性能，突显了RL在电路设计自动化中的巨大潜力。特别是，与现有的DFC自动化设计方法CircuitGNN相比，我们的方法平均性能提升了8.72%。此外，我们的方法在CPU上的执行效率比CircuitGNN高出2000倍，在GPU上高出241倍。

更新时间: 2024-07-29 02:34:20

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2402.14236v2

ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, transmitting raw, uncompressed images captured by end devices to the cloud requires an efficient image compression system. To address this, we focus on emerging neural image compression and propose a novel framework with a lightweight transform-neck and a surrogate loss to adapt compressed image latents for MLLM-based vision tasks. The proposed framework is generic and applicable to multiple application scenarios, where the neural image codec can be (1) pre-trained for human perception without updating, (2) fully updated for joint human and machine perception, or (3) fully updated for only machine perception. The transform-neck trained with the surrogate loss is universal, for it can serve various downstream vision tasks enabled by a variety of MLLMs that share the same visual encoder. Our framework has the striking feature of excluding the downstream MLLMs from training the transform-neck, and potentially the neural image codec as well. This stands out from most existing coding for machine approaches that involve downstream networks in training and thus could be impractical when the networks are MLLMs. Extensive experiments on different neural image codecs and various MLLM-based vision tasks show that our method achieves great rate-accuracy performance with much less complexity, demonstrating its effectiveness.

Updated: 2024-07-29 02:32:44

标题: ComNeck：通过通用变换颈部连接压缩图像潜变量和多模态LLM

摘要: 这篇论文首次研究了将压缩图像信息适应下游视觉任务的需求，这些任务采用多模态大型语言模型（MLLMs）。 MLLMs已经将大型语言模型的成功扩展到了超出文本的模态（例如图像），但其十亿级别的规模阻碍了在资源受限的终端设备上部署。虽然云托管的MLLMs可能可用，但将终端设备捕获的原始未压缩图像传输到云端需要一个高效的图像压缩系统。为了解决这个问题，我们专注于新兴的神经图像压缩，并提出了一个新颖的框架，其中包括一个轻量级的转换颈和一个替代损失，以适应压缩图像信息用于基于MLLM的视觉任务。所提出的框架是通用的，并适用于多种应用场景，其中神经图像编解码器可以（1）用于人类感知而无需更新预训练，（2）完全更新以进行联合人类和机器感知，或（3）仅用于机器感知的完全更新。使用替代损失训练的转换颈是通用的，因为它可以为各种由共享相同视觉编码器的多种MLLMs启用的各种下游视觉任务提供服务。我们的框架具有一项引人注目的特点，即排除了下游MLLMs训练转换颈，以及可能排除神经图像编解码器。这与大多数现有的涉及下游网络训练的机器编码方法有所不同，因此当网络是MLLMs时可能不切实际。对不同神经图像编解码器和各种基于MLLM的视觉任务进行的大量实验表明，我们的方法在较少复杂性的情况下实现了出色的速率-准确性性能，证明了其有效性。

更新时间: 2024-07-29 02:32:44

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2407.19651v1

PersonaGym: Evaluating Persona Agents and LLMs

Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the scope of agent applications. However, evaluating persona agent performance is incredibly challenging due to the complexity of assessing persona adherence in free-form interactions across various environments that are relevant to each persona agent. We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory for comprehensive large-scale evaluation of persona agents. Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities across state-of-the-art models. For example, Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore than GPT 3.5 despite being a much more advanced model. Importantly, we find that increased model size and complexity do not necessarily imply enhanced persona agent capabilities thereby highlighting the pressing need for algorithmic and architectural invention towards faithful and performant persona agents.

Updated: 2024-07-29 02:30:35

标题: PersonaGym：评估Persona代理和LLMs

摘要: Persona agents是LLM代理，根据分配的角色行动，已经在各种应用中展示出令人印象深刻的情境响应能力。这些Persona代理在教育、医疗保健和娱乐等多个领域提供了显著的增强，模型开发人员可以将代理响应与不同用户需求对齐，从而拓宽代理应用的范围。然而，由于评估Persona代理性能的复杂性，由于在各种环境中自由交互对每个Persona代理都是相关的，评估Persona依从性非常具有挑战性。我们介绍了PersonaGym，这是第一个用于评估Persona代理的动态评估框架，以及PersonaScore，这是第一个基于决策理论的自动人类对齐度量，用于全面大规模评估Persona代理。我们对6个开源和闭源LLM进行评估，使用包含200个角色和10,000个问题的基准，揭示了在最先进模型中Persona代理能力方面的重大机遇。例如，尽管Claude 3.5 Sonnet是一个更先进的模型，但其PersonaScore相对于GPT 3.5仅有2.97%的改进。重要的是，我们发现模型大小和复杂性的增加并不一定意味着增强的Persona代理能力，从而突出了对忠实和高性能Persona代理的算法和架构创新的迫切需求。

更新时间: 2024-07-29 02:30:35

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.18416v2

Foundations for Unfairness in Anomaly Detection -- Case Studies in Facial Imaging Data

Deep anomaly detection (AD) is perhaps the most controversial of data analytic tasks as it identifies entities that are then specifically targeted for further investigation or exclusion. Also controversial is the application of AI to facial imaging data. This work explores the intersection of these two areas to understand two core questions: "Who" these algorithms are being unfair to and equally important "Why". Recent work has shown that deep AD can be unfair to different groups despite being unsupervised with a recent study showing that for portraits of people: men of color are far more likely to be chosen to be outliers. We study the two main categories of AD algorithms: autoencoder-based and single-class-based which effectively try to compress all the instances with those that can not be easily compressed being deemed to be outliers. We experimentally verify sources of unfairness such as the under-representation of a group (e.g. people of color are relatively rare), spurious group features (e.g. men are often photographed with hats), and group labeling noise (e.g. race is subjective). We conjecture that lack of compressibility is the main foundation and the others cause it but experimental results show otherwise and we present a natural hierarchy amongst them.

Updated: 2024-07-29 02:04:29

标题: 异常检测中不公平性的基础——面部成像数据案例研究

摘要: 深度异常检测（AD）可能是数据分析任务中最具争议性的，因为它识别出特定目标进行进一步调查或排除。将人工智能应用于面部成像数据也是具有争议性的。本研究探讨了这两个领域的交叉点，以了解两个核心问题：“这些算法对谁不公平”以及同样重要的“为什么”。最近的研究表明，尽管深度AD是无监督的，但仍可能对不同群体不公平，最近的一项研究显示，在人物肖像中，有色人种更有可能被选择为异常值。我们研究了两种主要类别的AD算法：基于自动编码器的和基于单类的，它们有效地尝试将所有实例压缩，那些不能轻松压缩的被认定为异常值。我们通过实验证实了不公平性的来源，如某一群体的代表不足（例如有色人种相对较少），虚假的群体特征（例如男性经常戴帽子拍照），以及群体标签的噪声（例如种族是主观的）。我们推测，缺乏可压缩性是主要的基础，其他因素引起了这种情况，但实验结果表明情况并非如此，我们提出了它们之间的自然层次结构。

更新时间: 2024-07-29 02:04:29

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.19646v1

Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices

With the recent proliferation of on-device AI, there is an increasing need to run computationally intensive DNNs directly on mobile devices. However, the limited computing and memory resources of these devices necessitate effective pruning techniques. Block-wise pruning is promising due to its low accuracy drop tradeoff for speedup gains, but it requires block positions to be aligned with block size, hindering optimal position selection to minimize model accuracy drop. Unaligned block pruning (UBP) addresses this by allowing blocks to be selected at arbitrary positions, yet its practical use is limited by a time-consuming optimal block selection algorithm and lack of efficient inference kernels. In this paper, we propose a pseudo-optimal yet fast block selection algorithm called Block Expansion and Division (BED), which can be integrated into an iterative model training process. Additionally, we introduce an efficient inference kernel implementation for mobile devices, enabling a UBP-based model to achieve similar latency to a DNN model compressed by aligned block pruning. We demonstrate the superiority of our techniques on a real mobile phone with MobileNet and ResNet models.

Updated: 2024-07-29 01:59:06

标题: 在移动设备上实现DNN加速的非对齐块剪枝

摘要: 随着最近设备上AI的大量增长，越来越需要在移动设备上直接运行计算密集型DNNs。然而，这些设备的有限计算和内存资源要求有效的修剪技术。基于块的修剪由于其在加速增益方面的低准确度降低权衡而具有潜力，但需要块位置与块大小对齐，从而阻碍了最小化模型准确度降低的最佳位置选择。不对齐的块修剪（UBP）通过允许在任意位置选择块来解决这个问题，然而其实际使用受到耗时的最佳块选择算法和缺乏高效推断内核的限制。在本文中，我们提出了一个名为块扩展和分割（BED）的伪最优但快速的块选择算法，可以集成到一个迭代模型训练过程中。此外，我们还介绍了一个适用于移动设备的高效推断内核实现，使得基于UBP的模型能够实现与通过对齐块修剪压缩的DNN模型相似的延迟。我们在一部真实移动电话上展示了我们技术的优越性，使用MobileNet和ResNet模型。

更新时间: 2024-07-29 01:59:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.19644v1

Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation

Knowledge graphs (KGs) are essential in applications such as network alignment, question-answering, and recommender systems (RSs) since they offer structured relational data that facilitate the inference of indirect relationships. However, the development of KG-based RSs capable of processing user inputs in natural language faces significant challenges. Firstly, natural language processing units must effectively handle the ambiguity and variability in human language to interpret user intents accurately. Secondly, the system must precisely identify and link entities, like product names, to their corresponding nodes in KGs. To overcome these challenges, supported by Lenovo, we developed a novel chatbot called "Prometheus," which integrates a KG with a large language model (LLM), specifically designed for recommending computer components. This chatbot can accurately decode user requests and deliver personalized recommendations derived from KGs, ensuring precise comprehension and response to their computer setup needs.

Updated: 2024-07-29 01:57:10

标题: Prometheus Chatbot：基于知识图谱的大型语言模型，用于计算机组件推荐

摘要: 知识图谱（KGs）在网络对齐、问答和推荐系统（RSs）等应用中至关重要，因为它们提供了结构化的关系数据，有助于推断间接关系。然而，基于知识图谱的RSs在处理自然语言用户输入方面面临着重大挑战。首先，自然语言处理单元必须有效处理人类语言中的歧义和变异性，以准确解释用户意图。其次，系统必须准确识别和连接实体，如产品名称，到知识图谱中对应的节点。为了克服这些挑战，我们在联想的支持下开发了一个名为“普罗米修斯”的新型聊天机器人，它将知识图谱与大型语言模型（LLM）集成在一起，专门设计用于推荐计算机组件。这个聊天机器人可以准确解码用户请求，并从知识图谱中提取个性化推荐，确保对他们的计算机设置需求进行精确理解和响应。

更新时间: 2024-07-29 01:57:10

领域: cs.AI

下载: http://arxiv.org/abs/2407.19643v1

Segmented Private Data Aggregation in the Multi-message Shuffle Model

The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy protection level for all users, which may deter conservative users from participating and prevent liberal users from contributing more information, thereby reducing the overall data utility, such as the accuracy of aggregated statistics. In this work, we pioneer the study of segmented private data aggregation within the multi-message shuffle model of DP, introducing flexible privacy protection for users and enhanced utility for the aggregation server. Our framework not only protects users' data but also anonymizes their privacy level choices to prevent potential data leakage from these choices. To optimize the privacy-utility-communication trade-offs, we explore approximately optimal configurations for the number of blanket messages and conduct almost tight privacy amplification analyses within the shuffle model. Through extensive experiments, we demonstrate that our segmented multi-message shuffle framework achieves a reduction of about 50\% in estimation error compared to existing approaches, significantly enhancing both privacy and utility.

Updated: 2024-07-29 01:46:44

标题: 多消息混洗模型中的分段私有数据聚合

摘要: 差分隐私（DP）的洗牌模型在分散设置（例如物联网、移动边缘网络）中提供了引人注目的隐私-效用权衡。特别是，在多消息洗牌模型中，每个用户可以贡献多条消息，已经表明准确性可以接近DP的中心模型。然而，现有研究通常假设所有用户具有统一的隐私保护水平，这可能会阻止保守用户参与，并阻止自由用户提供更多信息，从而降低整体数据效用，例如聚合统计数据的准确性。在这项工作中，我们首次研究了DP中多消息洗牌模型中的分段私人数据聚合，为用户引入灵活的隐私保护并增强聚合服务器的效用。我们的框架不仅保护用户数据，还将他们的隐私级别选择匿名化，以防止这些选择导致潜在数据泄漏。为了优化隐私-效用-通信的权衡，我们探索了最佳的毯子消息数量配置，并在洗牌模型中进行了几乎紧密的隐私放大分析。通过广泛的实验，我们证明我们的分段多消息洗牌框架相比现有方法可以将估计误差降低约50％，显著增强了隐私和效用。

更新时间: 2024-07-29 01:46:44

领域: cs.CR

下载: http://arxiv.org/abs/2407.19639v1

OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce a Large Language Model (LLM)-based system designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. Our system is capable of developing mathematical models, writing and debugging solver code, evaluating the generated solutions, and improving efficiency and correctness of its model and code based on these evaluations. OptiMUS-0.3 utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS-0.3 outperforms existing state-of-the-art methods on easy datasets by more than 12% and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than 8%.

Updated: 2024-07-29 01:31:45

标题: OptiMUS-0.3：使用大型语言模型来建模和解决规模化优化问题

摘要: 优化问题在制造和分销到医疗保健等领域中普遍存在。然而，大多数这类问题仍然是通过经验启发式方法手工解决，而不是通过最先进的求解器进行最优解，因为需要专业知识来制定和解决这些问题，限制了优化工具和技术的广泛应用。我们引入了一个基于大型语言模型（LLM）的系统，旨在根据其自然语言描述制定和解决（混合整数）线性规划问题。我们的系统能够开发数学模型，编写和调试求解器代码，评估生成的解决方案，并根据这些评估改进其模型和代码的效率和正确性。OptiMUS-0.3采用模块化结构处理问题，使其能够处理具有长描述和复杂数据的问题，而无需长时间提示。实验表明，OptiMUS-0.3在易数据集上的表现优于现有的最新方法超过12％，在困难数据集上（包括本文发布的一个新数据集NLP4LP，其中包含长且复杂的问题）超过8％。

更新时间: 2024-07-29 01:31:45

领域: cs.AI

下载: http://arxiv.org/abs/2407.19633v1

Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation

The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). Our investigation reveals that the benefits of GCNs are more pronounced during testing rather than training. Motivated by this, LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations. This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue. We validate our model across several real-world datasets of different scales, demonstrating that LightGODE not only outperforms GCN-based models in terms of efficiency and effectiveness but also significantly mitigates the embedding discrepancy commonly associated with deeper graph convolution layers. Our LightGODE challenges the prevailing paradigms in RecSys training and suggests re-evaluating the role of graph convolutions, potentially guiding future developments of efficient large-scale graph-based RecSys.

Updated: 2024-07-29 01:26:51

标题: 我们在训练过程中真的需要图卷积吗？轻量级后训练图-ODE用于高效推荐

摘要: 图卷积网络（GCNs）在培训推荐系统（RecSys）中的效率和可伸缩性一直是持续关注的问题，这阻碍了它们在现实世界应用中的部署。本文对培训阶段图卷积的必要性进行了关键审查，并提出了一种创新的替代方案：轻量后训练图常微分方程（LightGODE）。我们的调查显示，GCNs的好处在测试阶段比在培训阶段更为明显。受此启发，LightGODE利用一种新颖的后训练图卷积方法，绕过GCNs的计算密集型消息传递，并采用非参数连续图常微分方程（ODE）动态建模节点表示。这种方法显著减少了训练时间，同时实现了精细的后训练图卷积，以避免原始训练嵌入空间的扭曲，称为嵌入不一致问题。我们在几个不同规模的真实世界数据集上验证了我们的模型，证明LightGODE不仅在效率和有效性方面优于基于GCN的模型，而且显著减轻了与更深的图卷积层常见相关的嵌入不一致问题。我们的LightGODE挑战了RecSys培训中的流行范式，并建议重新评估图卷积的作用，可能引导未来高效大规模基于图的RecSys的发展。

更新时间: 2024-07-29 01:26:51

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2407.18910v2

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

How can intelligent machines assess their competencies in completing tasks? This question has come into focus for autonomous systems that algorithmically reason and make decisions under uncertainty. It is argued here that machine self-confidence -- a form of meta-reasoning based on self-assessments of an agent's knowledge about the state of the world and itself, as well as its ability to reason about and execute tasks -- leads to many eminently computable and useful competency indicators for such agents. This paper presents a culmination of work on this concept in the form of a computational framework called Factorized Machine Self-confidence (FaMSeC), which provides an engineering-focused holistic description of factors driving an algorithmic decision-making process, including outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self-confidence indicators are derived from hierarchical `problem-solving statistics' embedded within broad classes of probabilistic decision-making algorithms such as Markov decision processes. The problem-solving statistics are obtained by evaluating and grading probabilistic exceedance margins with respect to given competency standards, which are specified for each decision-making competency factor by the informee (e.g. a non-expert user or an expert system designer). This approach allows `algorithmic goodness of fit' evaluations to be easily incorporated into the design of many kinds of autonomous agents via human-interpretable competency self-assessment reports. Detailed descriptions and running application examples for a Markov decision process agent show how two FaMSeC factors (outcome assessment and solver quality) can be practically computed and reported for a range of possible tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models.

Updated: 2024-07-29 01:22:04

标题: “一只好的机器人总是了解自己的局限性”：通过分解机器自信度评估自主系统决策能力

摘要: 智能机器如何评估完成任务的能力？这个问题已经成为自主系统的焦点，这些系统通过算法推理和在不确定性下做出决策。本文认为，机器自信——一种基于对代理人对世界状态以及自身知识的自我评估的元推理形式，以及对任务进行推理和执行的能力——导致了许多非常可计算和有用的能力指标，适用于这样的代理人。本文提出了这一概念的工作的集大成之作，以计算框架的形式呈现，称为因子化机器自信（FaMSeC），该框架提供了驱动算法决策过程的因素的工程化综合描述，包括结果评估、解算器质量、模型质量、对齐质量和过去经验。在FaMSeC中，自信指标是从嵌入在广泛类别的概率决策算法中的层次化“问题解决统计”中导出的，例如马尔可夫决策过程。通过评估和分级概率超限边际，相对于给定的能力标准，可以获得问题解决统计，这些标准由被告知者（例如非专家用户或专家系统设计者）为每个决策能力因素指定。这种方法允许通过可解释的能力自我评估报告，将“算法拟合度”评估轻松地纳入许多种类的自主代理设计中。通过马尔可夫决策过程代理的详细描述和运行应用示例，展示了如何通过新颖的元效用函数、行为模拟和替代预测模型，实际计算和报告了两个FaMSeC因素（结果评估和解算器质量），适用于一系列可能的任务背景。

更新时间: 2024-07-29 01:22:04

领域: cs.AI,cs.CY,cs.HC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.19631v1

LLMs' Understanding of Natural Language Revealed

Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. Despite their utility in a number of downstream NLP tasks, ample research has shown that LLMs are incapable of performing reasoning in tasks that require quantification over and the manipulation of symbolic variables (e.g., planning and problem solving); see for example [25][26]. In this document, however, we will focus on testing LLMs for their language understanding capabilities, their supposed forte. As we will show here, the language understanding capabilities of LLMs have been widely exaggerated. While LLMs have proven to generate human-like coherent language (since that's how they were designed), their language understanding capabilities have not been properly tested. In particular, we believe that the language understanding capabilities of LLMs should be tested by performing an operation that is the opposite of 'text generation' and specifically by giving the LLM snippets of text as input and then querying what the LLM "understood". As we show here, when doing so it will become apparent that LLMs do not truly understand language, beyond very superficial inferences that are essentially the byproduct of the memorization of massive amounts of ingested text.

Updated: 2024-07-29 01:21:11

标题: LLMs对自然语言的理解揭示了

摘要: 大型语言模型（LLMs）是对语言的庞大实验的结果，通过自下而上、数据驱动的逆向工程，在规模上对语言进行了重建。尽管它们在许多下游NLP任务中具有实用性，但大量研究表明，LLMs无法执行需要对符号变量进行量化和操作的推理任务（例如，规划和问题解决）；例如参见[25][26]。然而，在本文中，我们将专注于测试LLMs的语言理解能力，它们所谓的长处。正如我们将在这里展示的那样，LLMs的语言理解能力被广泛夸大了。虽然LLMs已被证明能够生成类似人类的连贯语言（因为它们是这样设计的），但它们的语言理解能力尚未得到适当测试。特别是，我们认为LLMs的语言理解能力应该通过执行一种与“文本生成”相反的操作来进行测试，具体来说，给定LLMs文本片段作为输入，然后查询LLMs“理解”了什么。正如我们在这里展示的那样，这样做时将会显而易见，LLMs并没有真正理解语言，除了非常肤浅的推断，这些推断基本上是记忆大量摄入文本的副产品。

更新时间: 2024-07-29 01:21:11

领域: cs.AI

下载: http://arxiv.org/abs/2407.19630v1

How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading

Using questions in written text is an effective strategy to enhance readability. However, what makes an active reading question good, what the linguistic role of these questions is, and what is their impact on human reading remains understudied. We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles. By analyzing the dataset, we present a comprehensive understanding of the use, distribution, and linguistic characteristics of these questions. Then, we explore various approaches to generate such questions using language models. Our results highlight the importance of capturing inter-question relationships and the challenge of question position identification in generating these questions. Finally, we conduct a human study to understand the implication of such questions on reading comprehension. We find that the generated questions are of high quality and are almost as effective as human-written questions in terms of improving readers' memorization and comprehension.

Updated: 2024-07-29 01:19:12

标题: 如何吸引读者？生成引导性问题以促进主动阅读

摘要: 在书面文本中使用问题是增强可读性的有效策略。然而，什么使一个积极的阅读问题好，这些问题的语言作用是什么，以及它们对人类阅读的影响仍然缺乏研究。我们介绍了GuidingQ，这是一个包含来自教科书和科学文章的1万个文本内问题的数据集。通过分析数据集，我们提供了对这些问题的使用、分布和语言特征的全面理解。然后，我们探讨了使用语言模型生成这些问题的各种方法。我们的结果突显了捕捉问题之间关系的重要性，以及在生成这些问题时问题位置识别的挑战。最后，我们进行了一项人类研究，以了解这些问题对阅读理解的影响。我们发现生成的问题质量很高，并且在提高读者的记忆和理解方面几乎与人工编写的问题一样有效。

更新时间: 2024-07-29 01:19:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14309v2

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to $30\%$ of the budget for single-message protocols, $75\%$ for multi-message ones, and a striking $75\%$-$95\%$ for parallel composition. Our bounds also result in a remarkably efficient $\tilde{O}(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=10^8$ users.

Updated: 2024-07-29 01:12:04

标题: 通过混洗实现隐私放大：统一、简化和加强

摘要: 差分隐私的洗牌模型在去中心化、保护隐私的数据分析中提供了具有前景的隐私与效用平衡。然而，目前关于通过洗牌实现隐私放大的分析缺乏严密性和普适性。为解决这一问题，我们提出了“变差比减少”作为单消息和多消息洗牌协议中隐私放大的综合框架。该框架利用了两个新的参数化：本地消息的总变差界限和全局消息的概率比界限，以确定不可区分性水平。我们的理论结果表明，我们的框架提供了更紧密的界限，特别是对于极端概率设计的本地随机器，我们的界限是完全紧密的。此外，变差比减少与洗牌模型中的并行组合相辅相成，提供了对统计查询中常用的基于采样的随机器（例如范围查询、边际查询和频繁项集挖掘）的增强隐私计算。实证结果表明，我们的数值放大界限超过了现有的界限，对于单消息协议可节省高达30%的预算，对于多消息协议可节省75%，对于并行组合可节省惊人的75%-95%。我们的界限还导致一个非常高效的近似O(n)算法，在n=10^8用户时，可以在不到10秒内对隐私进行数值放大。

更新时间: 2024-07-29 01:12:04

领域: cs.CR

下载: http://arxiv.org/abs/2304.05007v5

Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

The advent of large language models (LLMs) has significantly advanced the field of code translation, enabling automated translation between programming languages. However, these models often struggle with complex translation tasks due to inadequate contextual understanding. This paper introduces a novel approach that enhances code translation through Few-Shot Learning, augmented with retrieval-based techniques. By leveraging a repository of existing code translations, we dynamically retrieve the most relevant examples to guide the model in translating new code segments. Our method, based on Retrieval-Augmented Generation (RAG), substantially improves translation quality by providing contextual examples from which the model can learn in real-time. We selected RAG over traditional fine-tuning methods due to its ability to utilize existing codebases or a locally stored corpus of code, which allows for dynamic adaptation to diverse translation tasks without extensive retraining. Extensive experiments on diverse datasets with open LLM models such as Starcoder, Llama3-70B Instruct, CodeLlama-34B Instruct, Granite-34B Code Instruct, and Mixtral-8x22B, as well as commercial LLM models like GPT-3.5 Turbo and GPT-4o, demonstrate our approach's superiority over traditional zero-shot methods, especially in translating between Fortran and CPP. We also explored varying numbers of shots i.e. examples provided during inference, specifically 1, 2, and 3 shots and different embedding models for RAG, including Nomic-Embed, Starencoder, and CodeBERT, to assess the robustness and effectiveness of our approach.

Updated: 2024-07-29 00:41:48

标题: 用检索增强生成的Few-Shot学习增强语言模型中的代码翻译

摘要: 大型语言模型（LLMs）的出现显著推进了代码翻译领域，实现了编程语言之间的自动化翻译。然而，由于上下文理解不足，这些模型通常在复杂的翻译任务中遇到困难。本文介绍了一种通过Few-Shot Learning和检索技术增强代码翻译的新方法。通过利用现有代码翻译库，我们动态检索最相关的示例来指导模型翻译新的代码段。我们的方法基于检索增强生成（RAG），通过提供上下文示例显著提高了翻译质量，模型可以实时学习。我们选择RAG而不是传统的微调方法，因为它能够利用现有的代码库或本地存储的代码语料库，无需大量重新训练即可动态适应各种翻译任务。在各种数据集上进行了大量实验，包括Starcoder、Llama3-70B Instruct、CodeLlama-34B Instruct、Granite-34B Code Instruct和Mixtral-8x22B等开源LLM模型，以及商业LLM模型如GPT-3.5 Turbo和GPT-4o，证明了我们方法在传统零翻译方法上的优越性，特别是在Fortran和CPP之间的翻译中。我们还探讨了在推理期间提供的示例数量，即1、2和3个示例以及不同的RAG嵌入模型，包括Nomic-Embed、Starencoder和CodeBERT，以评估我们方法的稳健性和有效性。

更新时间: 2024-07-29 00:41:48

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.19619v1

Experimenting on Markov Decision Processes with Local Treatments

As service systems grow increasingly complex and dynamic, many interventions become localized, available and taking effect only in specific states. This paper investigates experiments with local treatments on a widely-used class of dynamic models, Markov Decision Processes (MDPs). Particularly, we focus on utilizing the local structure to improve the inference efficiency of the average treatment effect. We begin by demonstrating the efficiency of classical inference methods, including model-based estimation and temporal difference learning under a fixed policy, as well as classical A/B testing with general treatments. We then introduce a variance reduction technique that exploits the local treatment structure by sharing information for states unaffected by the treatment policy. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.

Updated: 2024-07-29 00:41:11

标题: 使用局部处理方案对马尔可夫决策过程进行实验

摘要: 随着服务系统变得越来越复杂和动态，许多干预措施变得局部化，仅在特定状态下可用并产生效果。本文研究了在一个广泛使用的动态模型类别，马尔可夫决策过程（MDPs）上对局部治疗进行实验。特别地，我们着重利用局部结构来提高平均治疗效果的推断效率。我们首先展示了经典推断方法的效率，包括基于模型的估计和在固定策略下的时间差异学习，以及使用一般治疗方法的经典A/B测试。然后，我们介绍了一种利用局部治疗结构的方差减少技术，通过共享对治疗策略无影响的状态的信息。我们的新估计器有效地克服了一般治疗的方差下限，同时与融入局部治疗结构的更严格的下限相匹配。此外，我们的估计器可以在大部分方差中实现与测试臂数量线性降低的最佳效果。最后，我们探讨了对控制组有完全了解的情景，并设计了进一步提高推断效率的估计器。

更新时间: 2024-07-29 00:41:11

领域: stat.ME,cs.LG,econ.EM,stat.AP,stat.ML

下载: http://arxiv.org/abs/2407.19618v1

AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs

Plant stress phenotyping traditionally relies on expert assessments and specialized models, limiting scalability in agriculture. Recent advances in multimodal large language models (LLMs) offer potential solutions to this challenge. We present AgEval, a benchmark comprising 12 diverse plant stress phenotyping tasks, to evaluate these models' capabilities. Our study assesses zero-shot and few-shot in-context learning performance of state-of-the-art models, including Claude, GPT, Gemini, and LLaVA. Results show significant performance improvements with few-shot learning, with F1 scores increasing from 46.24% to 73.37% in 8-shot identification for the best-performing model. Few-shot examples from other classes in the dataset have negligible or negative impacts, although having the exact category example helps to increase performance by 15.38%. We also quantify the consistency of model performance across different classes within each task, finding that the coefficient of variance (CV) ranges from 26.02% to 58.03% across models, implying that subject matter expertise is needed - of 'difficult' classes - to achieve reliability in performance. AgEval establishes baseline metrics for multimodal LLMs in agricultural applications, offering insights into their promise for enhancing plant stress phenotyping at scale. Benchmark and code can be accessed at: https://anonymous.4open.science/r/AgEval/

Updated: 2024-07-29 00:39:51

标题: AgEval：一个用于多模式LLMs的零样本和少样本植物应激表型评估的基准测试

摘要: 传统上，植物胁迫表型的评估依赖于专家评估和专门模型，限制了在农业中的可扩展性。最近多模态大型语言模型（LLMs）的进展为解决这一挑战提供了潜在解决方案。我们提出了AgEval，一个包含12个多样化植物胁迫表型任务的基准，以评估这些模型的能力。我们的研究评估了最先进模型（包括Claude、GPT、Gemini和LLaVA）的零射和少射入上下文学习性能。结果显示，少射学习可以显著提高性能，最佳模型在8次射入识别时，F1分数从46.24%提高到73.37%。数据集中其他类别的少射示例对性能影响微乎其微或负面，尽管具有确切类别示例有助于提高性能15.38%。我们还量化了模型在每个任务内不同类别之间性能的一致性，发现方差系数（CV）在模型之间的范围为26.02%至58.03%，这意味着需要专业知识来实现性能的可靠性。AgEval为农业应用中的多模态LLMs建立了基准指标，为其在大规模提升植物胁迫表型提供了见解。基准和代码可在以下网址访问：https://anonymous.4open.science/r/AgEval/

更新时间: 2024-07-29 00:39:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.19617v1

TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

Topic modeling is a technique for organizing and extracting themes from large collections of unstructured text. Non-negative matrix factorization (NMF) is a common unsupervised approach that decomposes a term frequency-inverse document frequency (TF-IDF) matrix to uncover latent topics and segment the dataset accordingly. While useful for highlighting patterns and clustering documents, NMF does not provide explicit topic labels, necessitating subject matter experts (SMEs) to assign labels manually. We present a methodology for automating topic labeling in documents clustered via NMF with automatic model determination (NMFk). By leveraging the output of NMFk and employing prompt engineering, we utilize large language models (LLMs) to generate accurate topic labels. Our case study on over 34,000 scientific abstracts on Knowledge Graphs demonstrates the effectiveness of our method in enhancing knowledge management and document organization.

Updated: 2024-07-29 00:18:17

标题: 主题标签：使用思维链和LLMs的NMF主题模型的自动注释

摘要: 主题建模是一种从大量非结构化文本中组织和提取主题的技术。非负矩阵分解（NMF）是一种常用的无监督方法，它将项频率-逆文档频率（TF-IDF）矩阵分解，以揭示潜在主题并相应地对数据集进行分段。虽然NMF对于突出模式和聚类文档很有用，但它并不提供明确的主题标签，需要主题专家手动分配标签。我们提出了一种方法，用于自动化通过NMF进行文档聚类的主题标签化，并采用自动模型确定（NMFk）。通过利用NMFk的输出并使用提示工程，我们利用大型语言模型（LLMs）生成准确的主题标签。我们在超过34,000篇关于知识图的科学摘要上的案例研究展示了我们的方法在增强知识管理和文档组织方面的有效性。

更新时间: 2024-07-29 00:18:17

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.19616v1

AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes

Inspired by cognitive theories, we introduce AnyHome, a framework that translates any text into well-structured and textured indoor scenes at a house-scale. By prompting Large Language Models (LLMs) with designed templates, our approach converts provided textual narratives into amodal structured representations. These representations guarantee consistent and realistic spatial layouts by directing the synthesis of a geometry mesh within defined constraints. A Score Distillation Sampling process is then employed to refine the geometry, followed by an egocentric inpainting process that adds lifelike textures to it. AnyHome stands out with its editability, customizability, diversity, and realism. The structured representations for scenes allow for extensive editing at varying levels of granularity. Capable of interpreting texts ranging from simple labels to detailed narratives, AnyHome generates detailed geometries and textures that outperform existing methods in both quantitative and qualitative measures.

Updated: 2024-07-29 00:09:46

标题: AnyHome：结构化和纹理化3D房屋的开放词汇生成

摘要: 受认知理论的启发，我们引入了AnyHome框架，将任何文本转化为结构良好、质感丰富的室内场景，并覆盖整个房屋尺度。通过设计模板，促使大型语言模型（LLMs）生成，我们的方法将提供的文本叙述转化为无模结构化表示。这些表示通过在定义的约束条件下引导几何网格的合成，确保一致且逼真的空间布局。然后采用得分蒸馏抽样过程对几何形状进行细化，接着通过自我中心的修补过程为其添加逼真的质感。AnyHome以其可编辑性、可定制性、多样性和逼真性脱颖而出。场景的结构化表示允许在不同粒度级别进行广泛的编辑。AnyHome能够解释从简单标签到详细叙述的文本，生成优于现有方法的详细几何形状和质感，无论是在定量还是定性上。

更新时间: 2024-07-29 00:09:46

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2312.06644v3