Arxiv Day: Article

Covariant spatio-temporal receptive fields for neuromorphic computing

Biological nervous systems constitute important sources of inspiration towards computers that are faster, cheaper, and more energy efficient. Neuromorphic disciplines view the brain as a coevolved system, simultaneously optimizing the hardware and the algorithms running on it. There are clear efficiency gains when bringing the computations into a physical substrate, but we presently lack theories to guide efficient implementations. Here, we present a principled computational model for neuromorphic systems in terms of spatio-temporal receptive fields, based on affine Gaussian kernels over space and leaky-integrator and leaky integrate-and-fire models over time. Our theory is provably covariant to spatial affine and temporal scaling transformations, and with close similarities to the visual processing in mammalian brains. We use these spatio-temporal receptive fields as a prior in an event-based vision task, and show that this improves the training of spiking networks, which otherwise is known as problematic for event-based vision. This work combines efforts within scale-space theory and computational neuroscience to identify theoretically well-founded ways to process spatio-temporal signals in neuromorphic systems. Our contributions are immediately relevant for signal processing and event-based vision, and can be extended to other processing tasks over space and time, such as memory and control.

Updated: 2024-05-07 23:54:23

标题: 协变时空感受野用于神经形态计算

摘要: 生物神经系统构成了对更快、更便宜和更节能的计算机的重要启发源。神经形态学学科将大脑视为一个共同进化的系统，同时优化硬件和在其上运行的算法。将计算带入物理基板时会产生明显的效率收益，但我们目前缺乏指导高效实现的理论。在这里，我们提出了一个基于时空感知场的神经形态系统的原则性计算模型，基于空间上的仿射高斯核和时间上的漏积分器和漏积分火模型。我们的理论可以被证明对空间仿射和时间缩放变换具有协变性，并且与哺乳动物大脑中的视觉处理具有密切相似之处。我们将这些时空感知场作为事件驱动视觉任务的先验，并展示这可以改善尖峰网络的训练，否则对于事件驱动视觉而言是有问题的。这项工作将规模空间理论和计算神经科学的努力结合起来，以识别在神经形态系统中处理时空信号的理论基础良好的方法。我们的贡献对信号处理和事件驱动视觉具有直接相关性，并可以扩展到其他空间和时间上的处理任务，如记忆和控制。

更新时间: 2024-05-07 23:54:23

领域: cs.NE,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.00318v2

Metaverse Survey & Tutorial: Exploring Key Requirements, Technologies, Standards, Applications, Challenges, and Perspectives

In this paper, we present a comprehensive survey of the metaverse, envisioned as a transformative dimension of next-generation Internet technologies. This study not only outlines the structural components of our survey but also makes a substantial scientific contribution by elucidating the foundational concepts underlying the emergence of the metaverse. We analyze its architecture by defining key characteristics and requirements, thereby illuminating the nascent reality set to revolutionize digital interactions. Our analysis emphasizes the importance of collaborative efforts in developing metaverse standards, thereby fostering a unified understanding among industry stakeholders, organizations, and regulatory bodies. We extend our scrutiny to critical technologies integral to the metaverse, including interactive experiences, communication technologies, ubiquitous computing, digital twins, artificial intelligence, and cybersecurity measures. For each technological domain, we rigorously assess current contributions, principal techniques, and representative use cases, providing a nuanced perspective on their potential impacts. Furthermore, we delve into the metaverse's diverse applications across education, healthcare, business, social interactions, industrial sectors, defense, and mission-critical operations, highlighting its extensive utility. Each application is thoroughly analyzed, demonstrating its value and addressing associated challenges. The survey concludes with an overview of persistent challenges and future directions, offering insights into essential considerations and strategies necessary to harness the full potential of the metaverse. Through this detailed investigation, our goal is to articulate the scientific contributions of this survey paper, transcending a mere structural overview to highlight the transformative implications of the metaverse.

Updated: 2024-05-07 23:49:02

标题: 《元宇宙调查与教程：探索关键要求、技术、标准、应用、挑战和展望》

摘要: 在这篇论文中，我们提供了关于元宇宙的综合调查，将其设想为下一代互联网技术的转变维度。这项研究不仅概述了我们调查的结构组成部分，还通过阐明构成元宇宙出现基本概念的方式做出了重要的科学贡献。我们通过定义关键特征和要求来分析其架构，从而阐明了即将彻底改变数字互动的新兴现实。我们的分析强调了在开发元宇宙标准方面的协作努力的重要性，从而促进产业利益相关者、组织和监管机构之间的统一理解。我们将审查延伸到元宇宙不可或缺的关键技术，包括互动体验、通信技术、普适计算、数字孪生、人工智能和网络安全措施。对于每个技术领域，我们严格评估当前的贡献、主要技术和代表性用例，提供对它们潜在影响的细致透视。此外，我们深入探讨了元宇宙在教育、医疗保健、商业、社会互动、工业部门、国防和关键任务运营等领域的各种应用，突出了其广泛的实用性。每个应用都经过深入分析，展示其价值并解决相关挑战。调查最后总结了持久挑战和未来方向的概述，提供了对于利用元宇宙的全部潜力所必需的关键考虑和策略的见解。通过这一详细的调查，我们的目标是阐明这篇调查论文的科学贡献，超越了单纯的结构概述，突显了元宇宙的转变影响。

更新时间: 2024-05-07 23:49:02

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.04718v1

Physics-based deep learning reveals rising heating demand heightens air pollution in Norwegian cities

Policymakers frequently analyze air quality and climate change in isolation, disregarding their interactions. This study explores the influence of specific climate factors on air quality by contrasting a regression model with K-Means Clustering, Hierarchical Clustering, and Random Forest techniques. We employ Physics-based Deep Learning (PBDL) and Long Short-Term Memory (LSTM) to examine the air pollution predictions. Our analysis utilizes ten years (2009-2018) of daily traffic, weather, and air pollution data from three major cities in Norway. Findings from feature selection reveal a correlation between rising heating degree days and heightened air pollution levels, suggesting increased heating activities in Norway are a contributing factor to worsening air quality. PBDL demonstrates superior accuracy in air pollution predictions compared to LSTM. This paper contributes to the growing literature on PBDL methods for more accurate air pollution predictions using environmental variables, aiding policymakers in formulating effective data-driven climate policies.

Updated: 2024-05-07 23:43:46

标题: 基于物理学的深度学习揭示出挪威城市升温需求增加导致空气污染加剧.

摘要: 决策者经常独立分析空气质量和气候变化，忽略它们之间的相互作用。本研究通过对比回归模型与K-Means聚类、层次聚类和随机森林技术，探讨了特定气候因素对空气质量的影响。我们采用基于物理的深度学习（PBDL）和长短期记忆（LSTM）来检验空气污染预测。我们分析了挪威三个主要城市2009年至2018年十年的每日交通、天气和空气污染数据。特征选择的结果揭示了升高的供暖度日与增加的空气污染水平之间的相关性，表明挪威的加热活动是恶化空气质量的一个 contributing factor。与LSTM相比，PBDL在空气污染预测方面表现出更高的精度。本文为利用环境变量进行更准确的空气污染预测的PBDL方法增加了文献，有助于决策者制定有效的数据驱动气候政策。

更新时间: 2024-05-07 23:43:46

领域: cs.CY,cs.AI,cs.LG,cs.NE,K.4.1; J.2; I.2

下载: http://arxiv.org/abs/2405.04716v1

Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning

Statistics suffers from a fundamental problem, "the curse of endogeneity" -- the regression function, or more broadly the prediction risk minimizer with infinite data, may not be the target we wish to pursue. This is because when complex data are collected from multiple sources, the biases deviated from the interested (causal) association inherited in individuals or sub-populations are not expected to be canceled. Traditional remedies are of hindsight and restrictive in being tailored to prior knowledge like untestable cause-effect structures, resulting in methods that risk model misspecification and lack scalable applicability. This paper seeks to offer a purely data-driven and universally applicable method that only uses the heterogeneity of the biases in the data rather than following pre-offered commandments. Such an idea is formulated as a nonparametric invariance pursuit problem, whose goal is to unveil the invariant conditional expectation $m^\star(x)\equiv \mathbb{E}[Y^{(e)}|X_{S^\star}^{(e)}=x_{S^\star}]$ with unknown important variable set $S^\star$ across heterogeneous environments $e\in \mathcal{E}$. Under the structural causal model framework, $m^\star$ can be interpreted as certain data-driven causality in general. The paper contributes to proposing a novel framework, called Focused Adversarial Invariance Regularization (FAIR), formulated as a single minimax optimization program that can solve the general invariance pursuit problem. As illustrated by the unified non-asymptotic analysis, our adversarial estimation framework can attain provable sample-efficient estimation akin to standard regression under a minimal identification condition for various tasks and models. As an application, the FAIR-NN estimator realized by two Neural Network classes is highlighted as the first approach to attain statistically efficient estimation in general nonparametric invariance learning.

Updated: 2024-05-07 23:37:40

标题: 通过神经对抗不变性学习在异质环境中追求因果关系

摘要: 统计学面临一个基本问题，“内生性的诅咒”——回归函数，或更广泛地说是预测风险最小化器在无限数据下可能不是我们希望追求的目标。这是因为当复杂数据从多个来源收集时，偏差偏离了个人或子群体中感兴趣的（因果）关联，不会被取消。传统的疗法是事后诸葛亮，并且在被量身定制为像不可测试的因果结构这样的先验知识时受到限制，导致方法可能存在模型错误规范化风险，缺乏可扩展的适用性。本文旨在提供一种纯粹基于数据驱动且通用适用的方法，仅使用数据中的偏差的异质性，而不是遵循预先提供的戒律。这样的想法被构想为一个非参数不变性追求问题，其目标是揭示不变条件期望$m^\star(x)\equiv \mathbb{E}[Y^{(e)}|X_{S^\star}^{(e)}=x_{S^\star}]$，其中未知的重要变量集$S^\star$ 在异质环境$e\in \mathcal{E}$中。在结构因果模型框架下，$m^\star$可以被解释为一般的数据驱动因果关系。本文提出了一个新颖的框架，称为Focused Adversarial Invariance Regularization (FAIR)，其被构想为一个单一的极小极大优化程序，可以解决一般的不变性追求问题。正如通过统一的非渐近分析所说明的，我们的对抗估计框架可以在各种任务和模型下获得可证明的样本高效估计，类似于标准回归，只需最小的识别条件。作为一个应用，由两个神经网络类实现的FAIR-NN估计器被强调为在一般的非参数不变性学习中实现统计高效估计的第一种方法。

更新时间: 2024-05-07 23:37:40

领域: math.ST,cs.LG,stat.ME,stat.ML,stat.TH,62G08

下载: http://arxiv.org/abs/2405.04715v1

RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes

Reinforcement learning provides an appealing framework for robotic control due to its ability to learn expressive policies purely through real-world interaction. However, this requires addressing real-world constraints and avoiding catastrophic failures during training, which might severely impede both learning progress and the performance of the final policy. In many robotics settings, this amounts to avoiding certain "unsafe" states. The high-speed off-road driving task represents a particularly challenging instantiation of this problem: a high-return policy should drive as aggressively and as quickly as possible, which often requires getting close to the edge of the set of "safe" states, and therefore places a particular burden on the method to avoid frequent failures. To both learn highly performant policies and avoid excessive failures, we propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum. Furthermore, we show that our risk-sensitive objective automatically avoids out-of-distribution states when equipped with an estimator for epistemic uncertainty. We implement our algorithm on a small-scale rally car and show that it is capable of learning high-speed policies for a real-world off-road driving task. We show that our method greatly reduces the number of safety violations during the training process, and actually leads to higher-performance policies in both driving and non-driving simulation environments with similar challenges.

Updated: 2024-05-07 23:32:36

标题: RACER：认识风险敏感强化学习实现更快速的驾驶并减少事故

摘要: 强化学习提供了一个吸引人的框架，用于机器人控制，因为它能够纯粹通过真实世界的交互学习表达丰富的策略。然而，这需要解决真实世界的约束，并在训练过程中避免灾难性失败，这可能严重阻碍学习进度和最终策略的性能。在许多机器人设置中，这意味着避免某些“不安全”状态。高速越野驾驶任务代表了这个问题的一个特别具有挑战性的实例：一个高回报策略应该尽可能激进和快速地驾驶，这通常需要靠近“安全”状态集合的边缘，因此对方法避免频繁失败提出了特殊要求。为了既学习高性能策略又避免过多的失败，我们提出了一个结合风险敏感控制和自适应动作空间课程的强化学习框架。此外，我们展示了我们的风险敏感目标在配备了对认知不确定性进行估计的情况下，自动避免了分布之外的状态。我们在一个小型拉力赛车上实现了我们的算法，并展示它能够学习用于真实世界越野驾驶任务的高速策略。我们展示了我们的方法大大减少了训练过程中的安全违规次数，并实际上在具有类似挑战的驾驶和非驾驶模拟环境中导致了更高性能的策略。

更新时间: 2024-05-07 23:32:36

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04714v1

Federated Q-Learning: Linear Regret Speedup with Low Communication Cost

In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning.

Updated: 2024-05-07 23:31:54

标题: 联邦Q学习：通过低通信成本实现线性后悔加速

摘要: 在这篇论文中，我们考虑了基于表格的情节马尔可夫决策过程（MDP）的联邦强化学习，在这种情况下，在中央服务器的协调下，多个代理协同探索环境并学习最优策略，而不共享原始数据。尽管在类似设置中已经实现了一些指标的线性加速，如收敛速率和样本复杂度，但目前还不清楚是否可能设计出一种无模型算法来实现线性后悔速度加速，并且具有低通信成本。我们提出了两种名为FedQ-Hoeffding和FedQ-Bernstein的联邦Q学习算法，并展示了当时间跨度足够大时，相应的总后悔与其单一代理对应物相比实现了线性加速，而通信成本在总时间步数$T$的对数尺度上增长。这些结果依赖于代理和服务器之间的事件触发同步机制，当服务器聚合状态-动作值的局部估计以形成全局估计时，使用了一种新的步长选择，以及一组新的集中不等式来限制非鞅差的总和。这是第一个表明在联邦强化学习中可以通过无模型算法实现线性后悔速度加速和对数通信成本的工作。

更新时间: 2024-05-07 23:31:54

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.15023v2

Untangling Lariats: Subgradient Following of Variationally Penalized Objectives

We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence attains the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i g_i(x_{i+1}-x_i)$. We derive, as special cases of our apparatus, known algorithms for the fused lasso and isotonic regression. Our approach also facilitates new variational penalties such as non-smooth barrier functions. We next derive and analyze multivariate problems in which $\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$ and variational penalties that depend on $\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$. The norms we consider are $\ell_2$ and $\ell_\infty$ which promote group sparsity. Last but not least, we derive a lattice-based subgradient following for variational penalties characterized through the output of arbitrary convolutional filters. This paradigm yields efficient solvers for problems in which sparse high-order discrete derivatives such as acceleration and jerk are desirable.

Updated: 2024-05-07 23:08:24

标题: 解开莱雅特：变分惩罚目标的次梯度跟随

摘要: 我们描述了一种新颖的次梯度跟随装置，用于计算具有变分惩罚的凸问题的最优解。在这种设置中，我们接收一个序列$y_i,\ldots,y_n$，并寻找一个平滑序列$x_1,\ldots,x_n$。这个平滑序列以最小的Bregman散度获得与输入序列具有加法变分惩罚的形式$\sum_i g_i(x_{i+1}-x_i)$。作为我们装置的特例，我们推导出了已知的融合拉索和保序回归算法。我们的方法还促进了新的变分惩罚，如非光滑障碍函数。接下来，我们推导并分析多变量问题，其中$\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$，并且变分惩罚取决于$\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$。我们考虑的规范是$\ell_2$和$\ell_\infty$，它们促进了组稀疏性。最后，我们推导了基于格点的次梯度跟随，用于通过任意卷积滤波器的输出来表征的变分惩罚。这种范式为那些稀疏高阶离散导数（如加速度和冲量）可取的问题提供了高效的求解器。

更新时间: 2024-05-07 23:08:24

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.04710v1

Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving

The end-to-end learning pipeline is gradually creating a paradigm shift in the ongoing development of highly autonomous vehicles, largely due to advances in deep learning, the availability of large-scale training datasets, and improvements in integrated sensor devices. However, a lack of interpretability in real-time decisions with contemporary learning methods impedes user trust and attenuates the widespread deployment and commercialization of such vehicles. Moreover, the issue is exacerbated when these cars are involved in or cause traffic accidents. Such drawback raises serious safety concerns from societal and legal perspectives. Consequently, explainability in end-to-end autonomous driving is essential to build trust in vehicular automation. However, the safety and explainability aspects of end-to-end driving have generally been investigated disjointly by researchers in today's state of the art. This survey aims to bridge the gaps between these topics and seeks to answer the following research question: When and how can explanations improve safety of end-to-end autonomous driving? In this regard, we first revisit established safety and state-of-the-art explainability techniques in end-to-end driving. Furthermore, we present three critical case studies and show the pivotal role of explanations in enhancing self-driving safety. Finally, we describe insights from empirical studies and reveal potential value, limitations, and caveats of practical explainable AI methods with respect to their safety assurance in end-to-end autonomous driving.

Updated: 2024-05-07 22:55:45

标题: 自动驾驶中可解释人工智能的安全性影响

摘要: 端到端学习管道正在逐渐引发高度自主车辆的发展范式转变，这在很大程度上归功于深度学习的进步、大规模训练数据集的可用性以及集成传感器设备的改进。然而，当代学习方法在实时决策中缺乏可解释性，这阻碍了用户的信任并削弱了这类车辆的广泛部署和商业化。此外，当这些车辆涉及或引发交通事故时，这个问题会变得更加严重。这种缺陷从社会和法律的角度引发了严重的安全担忧。因此，在端到端自动驾驶中的可解释性对于建立对车辆自动化的信任至关重要。然而，在当今的技术水平下，端到端驾驶的安全性和可解释性方面通常是由研究人员分别研究的。本调查旨在弥合这些主题之间的差距，并寻求回答以下研究问题：何时以及如何解释可以提高端到端自动驾驶的安全性？在这方面，我们首先回顾端到端驾驶中已建立的安全和最新的可解释性技术。此外，我们提供三个关键案例研究，并展示解释在增强自动驾驶安全性中的关键作用。最后，我们描述了来自经验研究的见解，并揭示了与端到端自动驾驶中的安全保证相关的实际可解释人工智能方法的潜在价值、限制和注意事项。

更新时间: 2024-05-07 22:55:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.12176v2

Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.

Updated: 2024-05-07 22:54:17

标题: 微小深度集成：通过共享权重的集成归一化层在边缘AI加速器中进行不确定性估计

摘要: 人工智能（AI）的应用正在快速发展，也常用于安全关键领域，如自动驾驶和医学诊断，其中功能安全至关重要。在以AI为驱动的系统中，不确定性估计可以帮助用户避免过度自信的预测，实现功能安全。因此，模型预测的鲁棒性和可靠性可以得到提高。然而，传统的不确定性估计方法，如深度集成方法，需要存储和处理多个模型，导致高计算量和硬件（延迟和能源）开销。另外，蒙特卡洛丢弃（MC-dropout）方法虽然具有较低的内存开销，但需要大量（约100次）的前向传递，导致高计算开销和延迟。因此，这些方法并不适用于具有有限计算和内存资源的电池供电边缘设备。在本文中，我们提出了Tiny-Deep Ensemble方法，这是一种适用于边缘设备的低成本不确定性估计方法。在我们的方法中，仅对归一化层进行$M$次集成，所有集成成员共享相同的权重和偏差，大大减少存储需求和延迟。此外，我们的方法在硬件架构中仅需要一次前向传递，允许进行推断和不确定性估计的批处理。此外，与单一模型相比，我们的方法具有大致相同的内存开销。因此，延迟和内存开销可以减少约$\sim M$倍。然而，我们的方法不会损害准确性，在各种基准数据集、任务和先进架构中，推断准确率最高可提高约$\sim 1\%$，RMSE减少$17.17\%。

更新时间: 2024-05-07 22:54:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05286v1

Learning Linear Utility Functions From Pairwise Comparison Queries

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.

Updated: 2024-05-07 22:49:14

标题: 从成对比较查询中学习线性效用函数

摘要: 我们研究了从成对比较查询中学习线性效用函数的可学习性。具体来说，我们考虑了两个学习目标。第一个目标是预测成对比较的样本外响应，而第二个目标是近似恢复效用函数的真实参数。我们表明，在被动学习设置下，线性效用在第一个目标方面是有效可学习的，当查询响应没有受到噪声干扰时，以及在分布足够“好”的情况下受到Tsybakov噪声的干扰。相反，我们表明，在没有强建模假设的情况下，对于大部分数据分布，即使查询响应是无噪声的，效用参数也是无法学习的。接下来，我们继续在主动学习设置中分析学习问题。在这种情况下，我们表明即使第二个目标也是有效可学习的，并提出了适用于无噪声和有噪声查询响应设置的算法。因此，我们的结果展示了从成对偏好查询中 passiv 和主动学习的质量可学习性差距，展示了选择成对查询进行效用学习的价值。

更新时间: 2024-05-07 22:49:14

领域: cs.LG,cs.AI,cs.CY,stat.ML

下载: http://arxiv.org/abs/2405.02612v2

Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise

Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern (or sometimes monotonically decreasing) as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets/ViTs trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.

Updated: 2024-05-07 22:35:37

标题: 调查模型宽度和密度对标签噪声存在情况下泛化性能的影响

摘要: 增加超参数化神经网络的大小是实现最先进性能的关键。这通过双下降现象体现，其中测试损失随着模型宽度的增加呈现出先减少-增加-再减少的模式（或有时单调减少）。然而，标签噪声对测试损失曲线的影响尚未完全探讨。在这项工作中，我们揭示了一个有趣的现象，即标签噪声导致最初观察到的双下降曲线出现“最终上升”。具体来说，在足够大的噪声与样本大小比例下，最佳泛化在中等宽度下实现。通过理论分析，我们将这一现象归因于标签噪声引起的测试损失方差形状转变。此外，我们将最终上升现象扩展到模型密度，并提供了第一个理论特征化结果，表明通过随机删除可训练参数来降低密度可以改善在标签噪声下的泛化能力。我们还彻底检查了正则化和样本大小的作用。令人惊讶的是，我们发现更大的L2正则化和针对标签噪声的强大学习方法会加剧最终上升。通过在MNIST上训练的ReLu网络、在CIFAR-10/100上训练的ResNets/ViTs以及在Stanford Cars上训练的InceptionResNet-v2上的大量实验，我们确认了我们发现的有效性。

更新时间: 2024-05-07 22:35:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2208.08003v5

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG.

Updated: 2024-05-07 22:31:50

标题: 边缘计算内存架构上的检索增强生成的稳健实现

摘要: 边缘设备上部署的大型语言模型（LLMs）通过微调和更新其部分参数来学习。尽管这种学习方法可以优化以减少资源利用，但总体所需资源仍然对边缘设备构成沉重负担。相反，检索增强生成（RAG）是一种资源高效的LLM学习方法，可以提高LLM生成内容的质量而无需更新模型参数。然而，基于RAG的LLM可能需要在每次用户-LLM交互中对配置文件数据进行重复搜索。这种搜索可能导致显著的延迟以及用户数据的积累。传统的降低延迟的努力导致限制保存的用户数据的大小，从而降低了RAG随着用户数据不断增长的可扩展性。一个仍然悬而未决的问题是：如何在边缘设备上解放RAG，摆脱延迟和可扩展性的限制？在本文中，我们提出了一个新颖的框架，通过计算内存（CiM）架构加速RAG。它通过在内存内执行原位计算来加速矩阵乘法，同时避免计算单元和内存之间的昂贵数据传输。我们的框架，名为Robust CiM-backed RAG（RoCR），利用一种基于对比学习的训练方法和噪声感知训练，可以使RAG能够高效地通过CiM搜索配置文件数据。据我们所知，这是首个利用CiM加速RAG的工作。

更新时间: 2024-05-07 22:31:50

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.04700v1

The Existential Theory of the Reals with Summation Operators

To characterize the computational complexity of satisfiability problems for probabilistic and causal reasoning within the Pearl's Causal Hierarchy, arXiv:2305.09508 [cs.AI] introduce a new natural class, named succ-$\exists$R. This class can be viewed as a succinct variant of the well-studied class $\exists$R based on the Existential Theory of the Reals (ETR). Analogously to $\exists$R, succ-$\exists$R is an intermediate class between NEXP and EXPSPACE, the exponential versions of NP and PSPACE. The main contributions of this work are threefold. Firstly, we characterize the class succ-$\exists$R in terms of nondeterministic real RAM machines and develop structural complexity theoretic results for real RAMs, including translation and hierarchy theorems. Notably, we demonstrate the separation of $\exists$R and succ-$\exists$R. Secondly, we examine the complexity of model checking and satisfiability of fragments of existential second-order logic and probabilistic independence logic. We show succ-$\exists$R- completeness of several of these problems, for which the best-known complexity lower and upper bounds were previously NEXP-hardness and EXPSPACE, respectively. Thirdly, while succ-$\exists$R is characterized in terms of ordinary (non-succinct) ETR instances enriched by exponential sums and a mechanism to index exponentially many variables, in this paper, we prove that when only exponential sums are added, the corresponding class $\exists$R^{\Sigma} is contained in PSPACE. We conjecture that this inclusion is strict, as this class is equivalent to adding a VNP-oracle to a polynomial time nondeterministic real RAM. Conversely, the addition of exponential products to ETR, yields PSPACE. Additionally, we study the satisfiability problem for probabilistic reasoning, with the additional requirement of a small model and prove that this problem is complete for $\exists$R^{\Sigma}.

Updated: 2024-05-07 22:20:41

标题: 实数的存在论理论及求和运算符

摘要: 为了表征在Pearl的因果层次结构中的概率和因果推理的满足性问题的计算复杂性，arXiv:2305.09508 [cs.AI]引入了一个名为succ-$\exists$R的新自然类。这个类可以被看作是Existential Theory of the Reals (ETR)基于现有研究的$\exists$R类的简洁变体。类似于$\exists$R，succ-$\exists$R是NEXP和EXPSPACE之间的一个中间类，分别是NP和PSPACE的指数版本。这项工作的主要贡献有三个方面。首先，我们通过非确定性实数RAM机表征了succ-$\exists$R类，并为实数RAM机开发了结构复杂性理论结果，包括翻译和层次定理。值得注意的是，我们展示了$\exists$R和succ-$\exists$R之间的分离。其次，我们考察了存在性第二阶逻辑和概率独立逻辑片段的模型检查和满足性的复杂性。我们展示了这些问题中几个的succ-$\exists$R-完备性，其中最好已知的复杂度下界和上界分别是NEXP-难度和EXPSPACE。第三，虽然succ-$\exists$R是通过加入指数和指数变量的机制来丰富普通（非简洁）ETR实例来表征的，在本文中，我们证明了当只添加指数和时，相应的类$\exists$R^{\Sigma}包含在PSPACE中。我们猜想这种包含是严格的，因为这个类等价于在多项式时间的非确定性实数RAM上添加一个VNP-oracle。相反，将指数乘积添加到ETR中会得到PSPACE。此外，我们研究了概率推理的满足性问题，要求模型较小，并证明了这个问题对于$\exists$R^{\Sigma}是完备的。

更新时间: 2024-05-07 22:20:41

领域: cs.CC,cs.LO

下载: http://arxiv.org/abs/2405.04697v1

Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance

This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF mimicking exam, a situational judgment test related to coaching competencies. Using a mixed method approach, we assessed the metacognitive performance, including sensitivity, accuracy in probabilistic predictions, and bias, of human participants and five advanced LLMs (GPT-4, Claude-3-Opus 3, Mistral Large, Llama 3, and Gemini 1.5 Pro). The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans. However, both LLMs and humans showed less adaptability in ambiguous scenarios, adhering closely to predefined decision frameworks. The study suggests that Generative AI can effectively engage in human-like metacognitive processing without conscious awareness. Implications of the study are discussed in relation to development of AI simulators that scaffold cognitive and metacognitive aspects of mastering coaching competencies. More broadly, implications of these results are discussed in relation to development of metacognitive modules that lead towards more autonomous and intuitive AI systems.

Updated: 2024-05-07 22:15:12

标题: 生成式人工智能作为元认知代理：一项与人类参与者进行的ICF模拟考试表现的比较性混合方法研究

摘要: 这项研究探讨了大型语言模型在国际教练联合会（ICF）仿真考试背景下相对于人类元认知能力的表现。该考试是与教练能力相关的情境判断测试。采用混合方法，我们评估了人类参与者和五种先进的LLM（GPT-4、Claude-3-Opus 3、Mistral Large、Llama 3和Gemini 1.5 Pro）的元认知表现，包括敏感性、概率预测准确性和偏差。结果表明，在所有元认知指标上，LLM表现优于人类，尤其是在减少自信过度方面。然而，LLM和人类在模糊情境中表现出的适应性较差，都严格遵循预先定义的决策框架。研究表明，生成式人工智能可以有效地进行类似于人类的元认知处理，而无需意识。研究结果的意义与开发AI模拟器相关，这些模拟器支持掌握教练能力的认知和元认知方面。更广泛地讨论了这些结果与开发导向更自主和直觉的AI系统的元认知模块之间的关系。

更新时间: 2024-05-07 22:15:12

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.05285v1

Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search

"Alert fatigue" is one of the biggest challenges faced by the Security Operations Center (SOC) today, with analysts spending more than half of their time reviewing false alerts. Endpoint detection products raise alerts by pattern matching on event telemetry against behavioral rules that describe potentially malicious behavior, but can suffer from high false positives that distract from actual attacks. While alert triage techniques based on data provenance may show promise, these techniques can take over a minute to inspect a single alert, while EDR customers may face tens of millions of alerts per day; the current reality is that these approaches aren't nearly scalable enough for production environments. We present Carbon Filter, a statistical learning based system that dramatically reduces the number of alerts analysts need to manually review. Our approach is based on the observation that false alert triggers can be efficiently identified and separated from suspicious behaviors by examining the process initiation context (e.g., the command line) that launched the responsible process. Through the use of fast-search algorithms for training and inference, our approach scales to millions of alerts per day. Through batching queries to the model, we observe a theoretical maximum throughput of 20 million alerts per hour. Based on the analysis of tens of million alerts from customer deployments, our solution resulted in a 6-fold improvement in the Signal-to-Noise ratio without compromising on alert triage performance.

Updated: 2024-05-07 22:06:24

标题: 碳过滤器：使用大规模聚类和快速搜索进行实时警报分类triage

摘要: “警报疲劳”是安全运营中心（SOC）今天面临的最大挑战之一，分析人员花费超过一半的时间来审查虚假警报。终端检测产品通过对事件遥测数据进行模式匹配，根据描述潜在恶意行为的行为规则产生警报，但可能存在高误报率，分散了对真实攻击的注意力。基于数据来源的警报分类技术可能表现出一些潜力，但这些技术可能需要超过一分钟来检查一个警报，而终端检测响应（EDR）客户可能每天面临数千万个警报；目前的现实是这些方法在生产环境中远远不够可扩展。我们提出了Carbon Filter，这是一个基于统计学习的系统，可以显著减少分析人员需要手动审查的警报数量。我们的方法基于这样一个观察：通过检查启动负责进程的过程启动上下文（例如，命令行），可以有效地识别并分离出虚假警报触发器与可疑行为。通过使用快速搜索算法进行训练和推断，我们的方法可以扩展到每天数百万个警报。通过对模型进行批量查询，我们观察到理论上的最大吞吐量为每小时2000万个警报。通过分析客户部署的数千万个警报，我们的解决方案使信噪比提高了6倍，而不会影响警报分类性能。

更新时间: 2024-05-07 22:06:24

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.04691v1

Towards Human-AI Mutual Learning: A New Research Paradigm

This paper describes a new research paradigm for studying human-AI collaboration, named "human-AI mutual learning", defined as the process where humans and AI agents preserve, exchange, and improve knowledge during human-AI collaboration. We describe relevant methodologies, motivations, domain examples, benefits, challenges, and future research agenda under this paradigm.

Updated: 2024-05-07 21:59:57

标题: 走向人工智能相互学习：一种新的研究范式

摘要: 本文描述了一种新的研究范式，用于研究人工智能与人类的协作，称为“人工智能互学”，定义为在人工智能与人类协作过程中，人类和人工智能代理保留、交换和改进知识的过程。我们描述了在这一范式下的相关方法论、动机、领域示例、好处、挑战和未来研究议程。

更新时间: 2024-05-07 21:59:57

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.04687v1

Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking

Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations, with a special focus on Turkish. We conduct an in-depth analysis to evaluate the impact of training strategies, model choices, and data availability on the performance of LLMs designed for underrepresented languages. Our approach includes two methodologies: (i) adapting existing LLMs originally pretrained in English to understand Turkish, and (ii) developing a model from the ground up using Turkish pretraining data, both supplemented with supervised fine-tuning on a novel Turkish instruction-tuning dataset aimed at enhancing reasoning capabilities. The relative performance of these methods is evaluated through the creation of a new leaderboard for Turkish LLMs, featuring benchmarks that assess different reasoning and knowledge skills. Furthermore, we conducted experiments on data and model scaling, both during pretraining and fine-tuning, simultaneously emphasizing the capacity for knowledge transfer across languages and addressing the challenges of catastrophic forgetting encountered during fine-tuning on a different language. Our goal is to offer a detailed guide for advancing the LLM framework in low-resource linguistic contexts, thereby making natural language processing (NLP) benefits more globally accessible.

Updated: 2024-05-07 21:58:45

标题: 穿越波斯普鲁斯海峡：通过低资源语言适应和基准测试策略推进土耳其大型语言模型

摘要: 大型语言模型（LLMs）在各个领域变得至关重要，强调了在代表性较低的语言中需要高质量模型的紧迫性。本研究探讨了低资源语言面临的独特挑战，如数据稀缺、模型选择、评估和计算限制，特别关注土耳其语。我们进行了深入分析，评估了训练策略、模型选择和数据可用性对专为代表性较低语言设计的LLMs性能的影响。我们的方法包括两种方法：（i）调整原本在英语中进行预训练的现有LLMs以理解土耳其语，（ii）从头开始使用土耳其语预训练数据开发模型，两者均在新颖的土耳其指导调整数据集上进行监督微调，旨在提升推理能力。通过创建一个新的土耳其LLMs排行榜，评估了这些方法的相对性能，其中包含评估不同推理和知识技能的基准。此外，我们进行了有关数据和模型扩展的实验，包括在预训练和微调过程中，同时强调跨语言知识传递的能力，并解决在不同语言上微调时遇到的灾难性遗忘挑战。我们的目标是为在低资源语言环境中推进LLM框架提供详细指南，从而使自然语言处理（NLP）效益更具全球可访问性。

更新时间: 2024-05-07 21:58:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04685v1

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., `a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since they are ubiquitous in the real-world (e.g., `a red panda climbing a tree' followed by `the red panda sleeps on the top of the tree'). To generate multi-scene videos from the pretrained T2V model, we introduce Time-Aligned Captions (TALC) framework. Specifically, we enhance the text-conditioning mechanism in the T2V architecture to recognize the temporal alignment between the video scenes and scene descriptions. For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e.g., `a red panda climbing a tree') and second scene description (e.g., `the red panda sleeps on the top of the tree'), respectively. As a result, we show that the T2V model can generate multi-scene videos that adhere to the multi-scene text descriptions and be visually consistent (e.g., entity and background). Further, we finetune the pretrained T2V model with multi-scene video-text data using the TALC framework. We show that the TALC-finetuned model outperforms the baseline methods by 15.5 points in the overall score, which averages visual consistency and text adherence using human evaluation. The project website is https://talc-mst2v.github.io/.

Updated: 2024-05-07 21:52:39

标题: TALC: 多场景文本到视频生成的时间对齐字幕

摘要: 最近在基于扩散的生成建模方面取得的进展已经导致了文本到视频（T2V）模型的发展，可以生成高质量的视频，以文本提示为条件。大多数这些T2V模型通常会生成描绘实体执行特定动作的单场景视频剪辑（例如，“一只红色熊猫爬树”）。然而，生成多场景视频是相关的，因为它们在现实世界中无处不在（例如，“一只红色熊猫爬树”后接着“这只熊猫在树顶睡觉”）。为了从预训练的T2V模型生成多场景视频，我们引入了时间对齐字幕（TALC）框架。具体来说，我们增强了T2V架构中的文本条件机制，以识别视频场景和场景描述之间的时间对齐。例如，我们将生成视频的早期和后续场景的视觉特征分别与第一个场景描述（例如，“一只红色熊猫爬树”）和第二个场景描述（例如，“这只熊猫在树顶睡觉”）的表示相结合。结果，我们展示了T2V模型可以生成符合多场景文本描述并在视觉上保持一致（例如，实体和背景）的多场景视频。此外，我们使用TALC框架对预训练的T2V模型进行微调，使用多场景视频文本数据。我们展示，TALC微调的模型在整体得分上比基准方法高出15.5个点，该得分使用人类评估平均了视觉一致性和文本依从性。该项目网站为https://talc-mst2v.github.io/。

更新时间: 2024-05-07 21:52:39

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04682v1

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models $\textit{on-demand}$ as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.

Updated: 2024-05-07 21:40:02

标题: 上下文化政策恢复：利用自适应模仿学习对医疗决策进行建模和解释

摘要: 可解释的策略学习旨在从观察到的行动中估计可理解的决策策略；然而，现有模型迫使在准确性和可解释性之间进行权衡，限制了对人类决策过程的数据驱动解释。从根本上讲，现有方法受到这种权衡的负担，因为它们将基础决策过程表示为通用策略，而实际上人类决策是动态的，在不同情境下可能会发生剧变。因此，我们开发了“上下文化策略恢复”（CPR），它将建模复杂决策过程的问题重新构建为多任务学习问题，其中每个情境都构成一个独特的任务，复杂的决策策略可以从许多简单的情境特定策略中逐步构建。CPR将每个情境特定策略建模为线性映射，并根据新观察到的情境更新生成新的策略模型。我们提供了CPR框架的两种版本：一种侧重于精确的本地可解释性，另一种保留全局可解释性。我们通过对模拟和实际数据的研究评估了CPR，在预测重症监护室中抗生素处方（与之前的SOTA相比，AUROC提高了22\%）和预测阿尔茨海默病患者的MRI处方（与之前的SOTA相比，AUROC提高了7.7\%）方面取得了最先进的性能。通过这一改进，CPR消除了可解释和黑盒方法之间的准确性差距，允许对特定情境决策模型进行高分辨率的探索和分析。

更新时间: 2024-05-07 21:40:02

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.07918v4

Responding to Generative AI Technologies with Research-through-Design: The Ryelands AI Lab as an Exploratory Study

Generative AI technologies demand new practical and critical competencies, which call on design to respond to and foster these. We present an exploratory study guided by Research-through-Design, in which we partnered with a primary school to develop a constructionist curriculum centered on students interacting with a generative AI technology. We provide a detailed account of the design of and outputs from the curriculum and learning materials, finding centrally that the reflexive and prolonged `hands-on' approach led to a co-development of students' practical and critical competencies. From the study, we contribute guidance for designing constructionist approaches to generative AI technology education; further arguing to do so with `critical responsivity.' We then discuss how HCI researchers may leverage constructionist strategies in designing interactions with generative AI technologies; and suggest that Research-through-Design can play an important role as a `rapid response methodology' capable of reacting to fast-evolving, disruptive technologies such as generative AI.

Updated: 2024-05-07 21:34:10

标题: 用研究设计响应生成式人工智能技术：莱兰兹人工智能实验室作为探索性研究

摘要: 生成式人工智能技术需要新的实践和批判性能力，这需要设计来回应和促进这些能力。我们进行了一项由“设计中的研究”指导的探索性研究，在这项研究中，我们与一所小学合作，开发了以学生与生成式人工智能技术互动为中心的建构主义课程。我们详细介绍了课程和学习材料的设计和成果，发现反思性和长期的“动手”方法导致了学生实际和批判性能力的共同发展。通过这项研究，我们为设计建构主义方法来教育生成式人工智能技术提供了指导；进一步提出要以“批判性响应”来进行。然后，我们讨论了人机交互研究人员如何利用建构主义策略设计与生成式人工智能技术的互动；并建议“设计中的研究”可以作为一种“快速响应方法论”，能够对生成式人工智能等快速发展、颠覆性技术做出反应。

更新时间: 2024-05-07 21:34:10

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.04677v1

A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series

Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performed models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights to detect such issues by elucidating model attributions of their decision, many limitations still exist -- They are primarily instance-based and not scalable across dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation with two time series datasets and user studies demonstrates the effectiveness of HILAD in fostering a deeper human understanding, immediate corrective actions, and the reliability enhancement of models.

Updated: 2024-05-07 21:25:15

标题: 一个可靠的人机协同时间序列异常检测框架

摘要: 时间序列异常检测是许多应用中的关键机器学习任务，例如金融、医疗保健和工业系统。然而，即使高性能模型可能存在潜在问题，如偏见，导致不可靠的结果和信心错误。虽然模型解释技术，特别是可视化解释，通过阐明模型对决策的归因，提供宝贵的洞见来检测这类问题，但仍然存在许多限制 -- 它们主要是基于实例的，并且在数据集之间不可扩展，并且它们提供模型到人类一侧的单向信息，缺乏用户解决检测到的问题的机制。为了填补这些空白，我们引入了HILAD，这是一个新颖的框架，旨在促进人类与人工智能之间的动态和双向合作，以增强时间序列中的异常检测模型。通过我们的可视化界面，HILAD赋予领域专家在规模上检测、解释和纠正意外的模型行为的能力。我们对两个时间序列数据集和用户研究的评估表明，HILAD在促进更深入的人类理解、即时的纠正措施以及模型的可靠性增强方面的有效性。

更新时间: 2024-05-07 21:25:15

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.03234v2

WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first semantic segmentation dataset with accurate clear and adverse weather image pairs that share an underlying scene. Through this dataset, we analyze the error modes in existing models and found that they were sensitive to the highly complex combination of different weather effects induced on the image during capture. To improve robustness, we propose a way to use language as guidance by identifying contributions of adverse weather conditions and injecting that as "side information". Models trained using our language guidance exhibit performance gains by up to 10.2% in mIoU on WeatherProof, up to 8.44% in mIoU on the widely used ACDC dataset compared to standard training techniques, and up to 6.21% in mIoU on the ACDC dataset as compared to previous SOTA methods.

Updated: 2024-05-07 21:25:06

标题: 防风雨：利用语言指导进行恶劣天气条件下的语义分割

摘要: 我们提出了一种从在恶劣天气条件下拍摄的图像中推断语义分割地图的方法。我们首先检查现有模型在受到雨、雾、或雪等天气条件影响的图像上的表现，并发现它们与在晴朗天气下拍摄的图像相比表现出明显的性能下降。为了控制场景结构的变化，我们提出了WeatherProof，这是第一个具有准确的晴天和恶劣天气图像对的语义分割数据集，这些图像共享一个基础场景。通过这个数据集，我们分析了现有模型的错误模式，并发现它们对在拍摄过程中施加在图像上的不同天气效应的高度复杂组合敏感。为了提高鲁棒性，我们提出了一种利用语言作为指导的方法，通过识别恶劣天气条件的贡献并将其注入作为“侧面信息”。使用我们的语言指导训练的模型在WeatherProof上的mIoU性能提高了最多10.2％，在广泛使用的ACDC数据集上的mIoU性能提高了最多8.44％，与标准训练技术相比，在ACDC数据集上的mIoU性能相比先前的SOTA方法提高了最多6.21％。

更新时间: 2024-05-07 21:25:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.14874v2

Interpretable Tensor Fusion

Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of diverse data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.

Updated: 2024-05-07 21:05:50

标题: 可解释的张量融合

摘要: 传统的机器学习方法主要设计用于基于单一数据类型预测结果。然而，实际应用可能涉及各种类型的数据，如文本、图像和音频。我们引入了可解释的张量融合（InTense），这是一种用于训练神经网络同时学习多模态数据表示和它们可解释融合的多模态学习方法。InTense可以分别捕捉不同数据类型的线性组合和乘法交互作用，从而将高阶交互作用与每种模态的个体效应分离开来。InTense通过为模态和它们的关联分配相关性分数来直接提供解释性。该方法在理论上有基础，并在多个合成和真实世界数据集上产生有意义的相关性分数。对六个真实世界数据集的实验结果显示，InTense在准确性和解释性方面均优于现有的最先进多模态可解释方法。

更新时间: 2024-05-07 21:05:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.04671v1

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023). In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers using the framework of Tian et al. (2023a). Our analysis reveals a core reason why the reversal curse happens: the (effective) weights of both auto-regressive models show asymmetry, i.e., the increase of weights from a token $A$ to token $B$ during training does not necessarily cause the increase of the weights from $B$ to $A$. Moreover, our analysis can be naturally applied to other logical reasoning tasks such as chain-of-thought (COT) (Wei et al., 2022b). We show the necessity of COT, i.e., a model trained on ''$A \to B$'' and ''$B \to C$'' fails to directly conclude ''$A \to C$'' without COT (also empirically observed by Allen-Zhu and Li (2023)), for one-layer transformers via training dynamics, which provides a new perspective different from previous work (Feng et al., 2024) that focuses on expressivity. Finally, we also conduct experiments to validate our theory on multi-layer transformers under different settings.

Updated: 2024-05-07 21:03:51

标题: 朝向通过训练动态理解“逆转诅咒”的理论

摘要: 自回归大型语言模型（LLMs）展示了惊人的能力，可以解决许多复杂的推理任务，但在一些简单的逻辑推理任务中却存在困难，比如逆向搜索：当在“A是B”上训练时，LLM在推理过程中无法直接得出“B是A”，这被称为“逆转诅咒”（Berglund等人，2023年）。在本文中，我们通过（随机）梯度下降的训练动态，从理论上分析了逆转诅咒，针对两种自回归模型：（1）可以被视为一层transformer的简化的双线性模型；（2）使用田等人（2023a）框架的一层transformer。我们的分析揭示了逆转诅咒发生的核心原因：两种自回归模型的（有效）权重表现出不对称性，即在训练过程中从标记A到标记B的权重增加并不一定导致从B到A的权重增加。此外，我们的分析可以自然地应用于其他逻辑推理任务，如思维链（COT）（魏等人，2022b）。我们展示了COT的必要性，即一个在“$A \to B$”和“$B \to C$”上训练的模型在没有COT的情况下无法直接得出“$A \to C$”（也被Allen-Zhu和李（2023）在实证中观察到），对于一层transformer通过训练动态提供了一种新的视角，不同于之前侧重于表达能力的工作（冯等人，2024年）。最后，我们还进行实验证实我们关于多层transformer在不同设置下的理论。

更新时间: 2024-05-07 21:03:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.04669v1

DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

Utilizing a single camera for measuring object distances is a cost-effective alternative to stereo-vision and LiDAR. Although monocular distance estimation has been explored in the literature, most existing techniques rely on object class knowledge to achieve high performance. Without this contextual data, monocular distance estimation becomes more challenging, lacking reference points and object-specific cues. However, these cues can be misleading for objects with wide-range variation or adversarial situations, which is a challenging aspect of object-agnostic distance estimation. In this paper, we propose DMODE, a class-agnostic method for monocular distance estimation that does not require object class knowledge. DMODE estimates an object's distance by fusing its fluctuation in size over time with the camera's motion, making it adaptable to various object detectors and unknown objects, thus addressing these challenges. We evaluate our model on the KITTI MOTS dataset using ground-truth bounding box annotations and outputs from TrackRCNN and EagerMOT. The object's location is determined using the change in bounding box sizes and camera position without measuring the object's detection source or class attributes. Our approach demonstrates superior performance in multi-class object distance detection scenarios compared to conventional methods.

Updated: 2024-05-07 21:02:34

标题: DMODE：无需特定类别信息的单目物体距离估计模块

摘要: 利用单个摄像头测量物体距离是一种经济有效的选择，可以替代立体视觉和激光雷达。尽管文献中已经探讨了单眼距离估计，但大多数现有技术依赖于物体类别知识以实现高性能。缺乏上下文数据时，单眼距离估计变得更具挑战性，缺乏参考点和物体特定线索。然而，这些线索对于具有广泛变化范围或对抗性情况的对象可能具有误导性，这是物体无关距离估计的一个具有挑战性的方面。在本文中，我们提出了DMODE，这是一种适用于单眼距离估计的无类别方法，不需要物体类别知识。DMODE通过将物体尺寸随时间的波动与摄像头运动融合，来估计物体的距离，使其能够适应各种物体检测器和未知物体，从而解决这些挑战。我们在KITTI MOTS数据集上评估了我们的模型，使用了地面实况边界框标注以及TrackRCNN和EagerMOT的输出。通过测量边界框大小的变化和摄像头位置来确定物体的位置，而无需测量物体的检测源或类属性。与传统方法相比，我们的方法在多类物体距离检测场景中表现出卓越性能。

更新时间: 2024-05-07 21:02:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2210.12596v3

Proximal Policy Optimization with Adaptive Exploration

Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.

Updated: 2024-05-07 20:51:49

标题: 使用自适应探索的近端策略优化

摘要: 介绍了一种新的学习算法——具有自适应探索的近端策略优化（axPPO）。本文研究了在强化学习环境中的探索-利用权衡，并旨在为强化学习算法设计提供新的见解。提出的自适应探索框架根据代理的最近表现动态调整训练中的探索幅度。我们提出的方法在学习效率上优于标准PPO算法，特别是在学习过程开始时需要大量探索行为的情况下。

更新时间: 2024-05-07 20:51:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04664v1

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code is available at https://github.com/sdoerrich97 .

Updated: 2024-05-07 20:49:46

标题: 重新考虑通过MedMNIST+数据集集合进行模型原型设计

摘要: 在临床实践中，基于深度学习的系统的整合经常受限于医学数据集的有限性和异质性所带来的挑战。此外，将在一些狭窄范围的基准测试上的边际性能改进优先于临床适用性的情况，减缓了算法进展的步伐。这种趋势经常导致过度微调现有方法，以在选定的数据集上实现最先进的性能，而不是促进具有临床相关性的创新。为了应对这一问题，本文提出了一个全面的基准测试框架，针对MedMNIST+数据库，以使评估环境多样化，并进行对常见卷积神经网络（CNNs）和基于Transformer的架构进行彻底分析，用于医学图像分类。我们的评估涵盖了各种医学数据集、训练方法和输入分辨率，旨在重新评估广泛使用的模型变体的优势和局限性。我们的研究结果表明，计算效率高的训练方案和现代基础模型在弥合高昂的端到端训练和更加资源精细的方法之间的差距方面具有潜力。此外，与流行的假设相反，我们观察到，更高的分辨率可能并不一直能够在一定阈值之上稳定提高性能，主张在原型阶段尤其使用较低的分辨率以加快处理速度。值得注意的是，我们的分析再次证实了卷积模型与基于ViT的架构的竞争力，强调理解不同模型架构的内在能力的重要性。此外，我们希望我们的标准化评估框架将有助于增强MedMNIST+数据集收集中的透明度、可重现性和可比性，以及未来该领域内的研究。代码可在https://github.com/sdoerrich97找到。

更新时间: 2024-05-07 20:49:46

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.15786v2

Reverse Training to Nurse the Reversal Curse

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby all words are used twice, doubling the amount of available tokens. The LLM is trained in both forward and reverse directions by reversing the training strings while preserving (i.e., not reversing) chosen substrings, such as entities. We show that data-matched reverse-trained models provide superior performance to standard models on standard tasks, and compute-matched reverse-trained models provide far superior performance on reversal tasks, helping resolve the reversal curse issue.

Updated: 2024-05-07 20:35:15

标题: 逆向训练来护理逆转诅咒

摘要: 大型语言模型（LLMs）存在一个令人惊讶的失败：当训练为"A具有特征B"时，它们无法推广到"B是A的特征"，这被称为反转诅咒。即使训练了数万亿个标记，这个问题仍然会出现，这是由于Zipf定律 - 即使我们在整个互联网上进行训练。本文提出了一种替代的训练方案，称为反向训练，其中所有单词都被使用两次，使可用标记数量翻倍。LLM在正向和反向方向上进行训练，通过颠倒训练字符串并保留（即不颠倒）选择的子字符串，如实体。我们展示了数据匹配的反向训练模型在标准任务上提供了优越的性能，而计算匹配的反向训练模型在反转任务上提供了远远优越的性能，有助于解决反转诅咒问题。

更新时间: 2024-05-07 20:35:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.13799v3

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capability, flexibility, and reliability remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern decision-making library that offers efficient and thoroughly tested reusable components. ACEGEN provides a robust, flexible, and efficient platform for molecular design. We validate its effectiveness by benchmarking it across various algorithms and conducting multiple drug discovery case studies. ACEGEN is accessible at https://github.com/acellera/acegen-open.

Updated: 2024-05-07 20:30:14

标题: ACEGEN：用于药物发现的生成化学药剂的强化学习

摘要: 最近几年，强化学习（RL）已经成为药物设计中的一个有价值的工具，提供了提出和优化具有期望属性的分子的潜力。然而，由于先进的RL算法的复杂性以及对专门代码的重要依赖，要在能力、灵活性和可靠性之间取得平衡仍然具有挑战性。在这项工作中，我们介绍了ACEGEN，一个专为生成式药物设计量身定制的全面且流畅的工具包，使用了TorchRL，这是一个现代决策制定库，提供了高效且经过充分测试的可重用组件。ACEGEN为分子设计提供了一个强大、灵活和高效的平台。我们通过在各种算法中进行基准测试并进行多个药物发现案例研究来验证其有效性。ACEGEN可在https://github.com/acellera/acegen-open访问。

更新时间: 2024-05-07 20:30:14

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2405.04657v1

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents. Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL. However, the strict MFC assumption of many independent, weakly-interacting agents is too inflexible in practice. We generalize MFC to instead simultaneously model many similar and few complex agents -- as Major-Minor Mean Field Control (M3FC). Theoretically, we give approximation results for finite agent control, and verify the sufficiency of stationary policies for optimality together with a dynamic programming principle. Algorithmically, we propose Major-Minor Mean Field MARL (M3FMARL) for finite agent systems instead of the limiting system. The algorithm is shown to approximate the policy gradient of the underlying M3FC MDP. Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods.

Updated: 2024-05-07 20:14:12

标题: 主要-次要均场多智能体强化学习

摘要: 多智能体强化学习（MARL）在规模扩展到许多智能体时仍然很困难。最近使用均场控制（MFC）的MARL提供了一种可行且严谨的方法来解决原本困难的合作MARL。然而，实践中，严格的MFC假设许多独立、弱相互作用的智能体太过僵硬。我们将MFC推广为同时模拟许多相似和少量复杂智能体的Major-Minor Mean Field Control（M3FC）。从理论上讲，我们为有限智能体控制提供了近似结果，并验证了稳态政策对最优性的充分性以及动态规划原理。在算法上，我们提出了Major-Minor Mean Field MARL（M3FMARL）用于有限智能体系统，而不是极限系统。该算法被证明可以近似底层M3FC MDP的策略梯度。最后，我们在各种场景中通过实验证明了其能力。我们观察到，与最先进的策略梯度MARL方法相比，其表现出色。

更新时间: 2024-05-07 20:14:12

领域: cs.LG,cs.MA,math.OC

下载: http://arxiv.org/abs/2303.10665v2

SUTRA: Scalable Multilingual Language Model Architecture

In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we explore the broader implications of its architecture for the future of multilingual AI, highlighting its potential to democratize access to AI technology globally and to improve the equity and utility of AI in regions with predominantly non-English languages. Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications.

Updated: 2024-05-07 20:11:44

标题: SUTRA: 可扩展的多语言语言模型架构

摘要: 在本文中，我们介绍了SUTRA，一种能够理解、推理和生成50多种语言文本的多语言大型语言模型架构。SUTRA的设计独特地将核心概念理解与特定语言处理分离，从而促进了可扩展和高效的多语言对齐和学习。通过在语言和概念处理中采用专家混合框架，SUTRA展示了计算效率和响应性。通过广泛的评估，SUTRA被证明在多语言任务的领先大规模多任务语言理解(MMLU)基准上比现有模型如GPT-3.5、Llama2高出20-30%。SUTRA模型还是在线LLM，可以利用互联网知识提供无幻觉、事实性和最新的响应，同时保留其多语言能力。此外，我们探讨了其架构对未来多语言人工智能的广泛影响，强调其潜力在全球范围内民主化获取人工智能技术，并提高在主要使用非英语语言的地区的人工智能的公平性和实用性。我们的研究结果表明，SUTRA不仅填补了多语言模型能力的关键空白，还为人工智能应用的运营效率和可扩展性建立了新的基准。

更新时间: 2024-05-07 20:11:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06694v1

A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images

Recognition of individual components and keypoint detection supported by instance segmentation is crucial to analyze the behavior of agents on the scene. Such systems could be used for surveillance, self-driving cars, and also for medical research, where behavior analysis of laboratory animals is used to confirm the aftereffects of a given medicine. A method capable of solving the aforementioned tasks usually requires a large amount of high-quality hand-annotated data, which takes time and money to produce. In this paper, we propose a method that alleviates the need for manual labeling of laboratory rats. To do so, first, we generate initial annotations with a computer vision-based approach, then through extensive augmentation, we train a deep neural network on the generated data. The final system is capable of instance segmentation, keypoint detection, and body part segmentation even when the objects are heavily occluded.

Updated: 2024-05-07 20:11:07

标题: 一种用于大鼠图像身体部位分割和关键点检测的自监督方法

摘要: 个体组件的识别和关键点检测，支持实例分割对于分析场景中代理的行为至关重要。这样的系统可以用于监视、自动驾驶汽车，还可以用于医学研究，其中实验室动物的行为分析用于确认给定药物的后果。通常，能够解决上述任务的方法需要大量高质量手动标注数据，这需要时间和金钱来生产。在本文中，我们提出了一种方法，可以减轻对实验室大鼠进行手动标记的需求。为此，首先，我们使用基于计算机视觉的方法生成初始标注，然后通过广泛的增强，我们对生成的数据进行深度神经网络训练。最终系统能够进行实例分割、关键点检测和身体部位分割，即使对象被严重遮挡。

更新时间: 2024-05-07 20:11:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04650v1

A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

Extracting meaningful information from high-dimensional data poses a formidable modeling challenge, particularly when the data is obscured by noise or represented through different modalities. This research proposes a novel non-parametric modeling approach, leveraging the Gaussian process (GP), to characterize high-dimensional data by mapping it to a latent low-dimensional manifold. This model, named the latent discriminative generative decoder (LDGD), employs both the data and associated labels in the manifold discovery process. We derive a Bayesian solution to infer the latent variables, allowing LDGD to effectively capture inherent stochasticity in the data. We demonstrate applications of LDGD on both synthetic and benchmark datasets. Not only does LDGD infer the manifold accurately, but its accuracy in predicting data points' labels surpasses state-of-the-art approaches. In the development of LDGD, we have incorporated inducing points to reduce the computational complexity of Gaussian processes for large datasets, enabling batch training for enhanced efficient processing and scalability. Additionally, we show that LDGD can robustly infer manifold and precisely predict labels for scenarios in that data size is limited, demonstrating its capability to efficiently characterize high-dimensional data with limited samples. These collective attributes highlight the importance of developing non-parametric modeling approaches to analyze high-dimensional data.

Updated: 2024-05-07 20:06:07

标题: 基于贝叶斯高斯过程的潜在判别生成解码器（LDGD）模型用于高维数据

摘要: 从高维数据中提取有意义的信息是一个巨大的建模挑战，特别是当数据被噪音所掩盖或者通过不同的模态表示时。这项研究提出了一种新颖的非参数建模方法，利用高斯过程（GP）将高维数据映射到潜在的低维流形上进行特征化。这个模型被命名为潜在判别生成解码器（LDGD），在流形发现过程中利用了数据和相关标签。我们推导出了一个贝叶斯解决方案来推断潜在变量，使得LDGD能够有效地捕捉数据中固有的随机性。我们展示了LDGD在合成和基准数据集上的应用。LDGD不仅可以准确推断流形，而且在预测数据点标签方面的准确性超过了最先进的方法。在LDGD的开发过程中，我们引入了诱导点来减少高斯过程在大型数据集上的计算复杂性，实现了批量训练以提高处理效率和可扩展性。此外，我们展示了LDGD可以在数据量有限的情况下稳健地推断流形并精确预测标签，展示了其有效地表征具有有限样本的高维数据的能力。这些集体属性突显了开发非参数建模方法来分析高维数据的重要性。

更新时间: 2024-05-07 20:06:07

领域: cs.LG,I.5.1; G.3

下载: http://arxiv.org/abs/2401.16497v3

Individual Fairness Through Reweighting and Tuning

Inherent bias within society can be amplified and perpetuated by artificial intelligence (AI) systems. To address this issue, a wide range of solutions have been proposed to identify and mitigate bias and enforce fairness for individuals and groups. Recently, Graph Laplacian Regularizer (GLR), a regularization technique from the semi-supervised learning literature has been used as a substitute for the common Lipschitz condition to enhance individual fairness. Notable prior work has shown that enforcing individual fairness through a GLR can improve the transfer learning accuracy of AI models under covariate shifts. However, the prior work defines a GLR on the source and target data combined, implicitly assuming that the target data are available at train time, which might not hold in practice. In this work, we investigated whether defining a GLR independently on the train and target data could maintain similar accuracy. Furthermore, we introduced the Normalized Fairness Gain score (NFG) to measure individual fairness by measuring the amount of gained fairness when a GLR is used versus not. We evaluated the new and original methods under NFG, the Prediction Consistency (PC), and traditional classification metrics on the German Credit Approval dataset. The results showed that the two models achieved similar statistical mean performances over five-fold cross-validation. Furthermore, the proposed metric showed that PC scores can be misleading as the scores can be high and statistically similar to fairness-enhanced models while NFG scores are small. This work therefore provides new insights into when a GLR effectively enhances individual fairness and the pitfalls of PC.

Updated: 2024-05-07 19:55:01

标题: 通过重新加权和调整实现个体公平

摘要: 社会内在偏见可能会被人工智能系统放大和延续。为了解决这个问题，已经提出了各种解决方案来识别和减轻偏见，并为个人和群体强制公平。最近，来自半监督学习文献的正则化技术Graph Laplacian Regularizer (GLR)被用作普通Lipschitz条件的替代，以增强个人公平性。值得注意的先前工作表明，通过GLR强制执行个人公平性可以提高人工智能模型在协变量转移下的迁移学习准确性。然而，先前的工作在源数据和目标数据的结合上定义了一个GLR，暗示目标数据在训练时是可用的，这在实践中可能不成立。在这项工作中，我们调查了在训练和目标数据上独立定义GLR是否能保持类似的准确性。此外，我们引入了标准化公平增益分数(NFG)来通过测量GLR使用与否时所获得的公平度量来衡量个人公平性。我们在德国信用批准数据集上评估了新方法和原始方法在NFG、预测一致性(PC)和传统分类指标上的表现。结果显示，这两个模型在五折交叉验证中实现了类似的统计平均性能。此外，提出的指标表明，PC分数可能会误导，因为分数可以很高，且在统计上与增强公平性模型类似，而NFG分数较小。因此，这项工作提供了GLR何时有效增强个人公平性以及PC的缺陷的新见解。

更新时间: 2024-05-07 19:55:01

领域: cs.LG,cs.AI,I.2.6

下载: http://arxiv.org/abs/2405.01711v2

Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds. Rather than deriving theoretical class complexity-based bounds, we propose a completely data-driven approach to estimate an upper bound on the maximum error. The simple and general nature of our solution to this fundamental challenge lends itself to several applications including: multiple CI construction, multiple hypothesis testing, estimating excess risk bounds (a fundamental measure of uncertainty in machine learning) for any training/fine-tuning algorithm, and enabling the development of a contextual bandit pipeline that can leverage any reward model estimation procedure as input (without additional mathematical analysis).

Updated: 2024-05-07 19:38:26

标题: 基于数据驱动的误差估计：不积累技术债务的多重误差上限界定

摘要: 我们将构建多个同时有效置信区间（CIs）的问题阐释为估计估计-估计量-误差元组类/集合的最大误差的高概率上限，并将其称为误差估计问题。对于单个这样的元组，数据驱动的置信区间通常可以用来界定我们估计中的误差。然而，对于估计-估计量-误差元组类，最大误差的非平凡高概率上限通常需要作为输入的类复杂性 -- 这限制了这些方法的实用性，并经常导致宽松的界限。我们提出了一个完全数据驱动的方法来估计最大误差的上限，而不是推导基于理论类复杂性的界限。我们解决这一基本挑战的简单且通用的性质使其适用于多个应用，包括：多个CI构建，多个假设检验，为任何训练/微调算法估计过度风险界限（机器学习中的不确定性的基本度量），以及实现可以利用任何奖励模型估计程序作为输入的上下文臂带流水线的开发（无需额外的数学分析）。

更新时间: 2024-05-07 19:38:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.04636v1

FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes

Mapping agencies are increasingly adopting Aerial Lidar Scanning (ALS) as a new tool to monitor territory and support public policies. Processing ALS data at scale requires efficient point classification methods that perform well over highly diverse territories. To evaluate them, researchers need large annotated Lidar datasets, however, current Lidar benchmark datasets have restricted scope and often cover a single urban area. To bridge this data gap, we present the FRench ALS Clouds from TArgeted Landscapes (FRACTAL) dataset: an ultra-large-scale aerial Lidar dataset made of 100,000 dense point clouds with high-quality labels for 7 semantic classes and spanning 250 km$^2$. FRACTAL is built upon France's nationwide open Lidar data. It achieves spatial and semantic diversity via a sampling scheme that explicitly concentrates rare classes and challenging landscapes from five French regions. It should support the development of 3D deep learning approaches for large-scale land monitoring. We describe the nature of the source data, the sampling workflow, the content of the resulting dataset, and provide an initial evaluation of segmentation performance using a performant 3D neural architecture.

Updated: 2024-05-07 19:37:22

标题: FRACTAL：一个用于不同景观的3D语义分割的超大规模航空激光雷达数据集

摘要: 地图机构越来越多地采用航空激光雷达扫描（ALS）作为监测领土和支持公共政策的新工具。在规模化处理ALS数据时，需要高效的点分类方法，这些方法在高度多样化的领土上表现良好。为了评估它们，研究人员需要大量注释的激光雷达数据集，然而，当前的激光雷达基准数据集范围有限，通常仅涵盖单个城市区域。为了弥合这一数据差距，我们提出了FRench ALS Clouds from TArgeted Landscapes（FRACTAL）数据集：这是一个超大规模的航空激光雷达数据集，包含100,000个密集点云，具有7个语义类别的高质量标签，覆盖250平方公里。FRACTAL基于法国全国开放的激光雷达数据构建。通过一个明确将稀有类别和具有挑战性的景观集中在一起的采样方案，它实现了空间和语义的多样性，来自法国五个地区。它应该支持大规模土地监测的3D深度学习方法的发展。我们描述了源数据的性质、采样工作流程、生成数据集的内容，并使用高性能的3D神经结构对分割性能进行了初步评估。

更新时间: 2024-05-07 19:37:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04634v1

ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography

Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.

Updated: 2024-05-07 19:20:32

标题: ResNCT：CT尿路造影中肾动脉期图像合成的深度学习模型

摘要: 目的：开发和评估一种基于变压器的深度学习模型，用于在CT尿路造影（CTU）检查中从未增强和尿路相位合成肾影像。材料和方法：这项回顾性研究获得了当地机构审查委员会的批准。为深度学习模型开发筛选了119名患者（平均年龄为65 ± 12岁；男性/女性75/44）的三相CT尿路造影研究数据集。为每位患者对齐了三个相位，采用仿射配准算法。开发并实施了一种自定义模型，命名为肾相CT影像合成残差变压器模型（ResNCT），采用非对比和尿路图像输入进行训练，以产生与相应的真实肾相图像进行比较的肾相图像。合成图像使用多种性能指标进行评估，包括峰值信噪比（PSNR）、结构相似性指数（SSIM）、归一化交叉相关系数（NCC）、平均绝对误差（MAE）和均方根误差（RMSE）。结果：ResNCT模型成功地从非对比和尿路图像输入生成了合成的肾相图像。与真实肾相图像相比，模型合成的图像达到了高PSNR（27.8 ± 2.7 dB）、SSIM（0.88 ± 0.05）和NCC（0.98 ± 0.02），以及低MAE（0.02 ± 0.005）和RMSE（0.042 ± 0.016）。结论：ResNCT模型合成了与真实图像高度相似的肾相CT图像。ResNCT模型提供了一种消除通过CTU检查获得肾相的手段，从而使辐射剂量减少了33%。

更新时间: 2024-05-07 19:20:32

领域: eess.IV,cs.AI,physics.med-ph,J.3

下载: http://arxiv.org/abs/2405.04629v1

SurfPro: Functional Protein Design Based on Continuous Surface

How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.

Updated: 2024-05-07 19:09:46

标题: SurfPro：基于连续表面的功能蛋白设计

摘要: 我们如何设计具有所需功能的蛋白质？我们受到一种化学直觉的启发，即几何结构和生化特性对蛋白质的功能至关重要。在本文中，我们提出了SurfPro，一种新的方法，可以在给定所需表面及其相关生化特性的情况下生成功能性蛋白质。SurfPro包括一个逐步模拟蛋白质表面的几何形状和生化特征的分层编码器，以及一个自回归解码器来产生氨基酸序列。我们在一个标准的逆向折叠基准CATH 4.2和两个功能性蛋白设计任务上评估了SurfPro：蛋白质结合物设计和酶设计。我们的SurfPro始终优于先前的最先进的逆向折叠方法，在CATH 4.2上取得了57.78%的恢复率，并在蛋白质-蛋白质结合和酶-底物相互作用得分方面取得更高的成功率。

更新时间: 2024-05-07 19:09:46

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2405.06693v1

Folded context condensation in Path Integral formalism for infinite context transformers

This short note is written for rapid communication of long context training and to share the idea of how to train it with low memory usage. In the note, we generalize the attention algorithm and neural network of Generative Pre-Trained Transformers and reinterpret it in Path integral formalism. First, the role of the transformer is understood as the time evolution of the token state and second, it is suggested that the all key-token states in the same time as the query-token can attend to the attention with the query token states. As a result of the repetitive time evolution, it is discussed that the token states in the past sequence meats the token states in the present sequence so that the attention between separated sequences becomes possible for maintaining infinite contextual information just by using low memory for limited size of sequence. For the experiment, the $12$ input token window size was taken and one GPU with $24$GB memory was used for the pre-training. It was confirmed that more than $150$ length context is preserved. The sampling result of the training, the code and the other details will be included in the revised version of this note later.

Updated: 2024-05-07 19:05:26

标题: 在无限上下文转换器路径积分形式中的折叠上下文浓缩

摘要: 这份简短的笔记旨在快速传达长篇内容训练的信息，并分享如何在低内存使用下进行训练的想法。在这份笔记中，我们将生成式预训练变换器的注意力算法和神经网络进行泛化，并在路径积分形式主义中重新解释它。首先，将变换器的作用理解为令牌状态的时间演化，其次，建议在查询令牌状态中，所有关键令牌状态在同一时间可以参与注意力。由于重复的时间演化，讨论了过去序列中的令牌状态与当前序列中的令牌状态相遇，使得在使用有限序列大小的情况下仅通过低内存维持无限上下文信息变得可能。实验中，采用了12个输入令牌窗口大小，并使用了带有24GB内存的一个GPU进行预训练。确认保留了超过150长度的上下文。培训的抽样结果、代码和其他详细信息将在稍后的修订版本中包含。

更新时间: 2024-05-07 19:05:26

领域: hep-ph,cs.AI,cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.04620v1

Multi-Margin Loss: Proposal and Application in Recommender Systems

Recommender systems guide users through vast amounts of information by suggesting items based on their predicted preferences. Collaborative filtering-based deep learning techniques have regained popularity due to their straightforward nature, relying only on user-item interactions. Typically, these systems consist of three main components: an interaction module, a loss function, and a negative sampling strategy. Initially, researchers focused on enhancing performance by developing complex interaction modules. However, there has been a recent shift toward refining loss functions and negative sampling strategies. This shift has led to an increased interest in contrastive learning, which pulls similar pairs closer while pushing dissimilar ones apart. Contrastive learning involves key practices such as heavy data augmentation, large batch sizes, and hard-negative sampling, but these also bring challenges like high memory demands and under-utilization of some negative samples. The proposed Multi-Margin Loss (MML) addresses these challenges by introducing multiple margins and varying weights for negative samples. This allows MML to efficiently utilize not only the hardest negatives but also other non-trivial negatives, offering a simpler yet effective loss function that outperforms more complex methods, especially when resources are limited. Experiments on two well-known datasets demonstrated that MML achieved up to a 20% performance improvement compared to a baseline contrastive loss function when fewer number of negative samples are used.

Updated: 2024-05-07 18:58:32

标题: 多边缘损失：在推荐系统中的提出和应用

摘要: 推荐系统通过根据用户预测的偏好建议物品，引导用户浏览大量信息。基于协同过滤的深度学习技术由于其简单直接的特性而重新受到青睐，仅依赖于用户-物品交互。通常，这些系统由三个主要组件组成：交互模块、损失函数和负采样策略。最初，研究人员专注于通过开发复杂的交互模块来提高性能。然而，最近开始转向优化损失函数和负采样策略。这种转变导致了对对比学习的增加兴趣，它将相似的对推向一起，将不相似的对推远。对比学习涉及重要实践，如大量数据增强、大批量大小和硬负采样，但这些也带来了高内存需求和某些负样本的低利用率等挑战。提出的多边缘损失（MML）通过引入多个边缘和变化权重来解决这些挑战，使MML能够有效利用不仅是最难的负样本，还有其他非平凡的负样本，提供了一个更简单但更有效的损失函数，特别是在资源有限时，性能优于更复杂的方法。在两个知名数据集上的实验表明，与基线对比损失函数相比，当使用较少数量的负样本时，MML的性能提高了高达20%。

更新时间: 2024-05-07 18:58:32

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2405.04614v1

An LLM-Tool Compiler for Fused Parallel Function Calling

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional prompting to segment tasks into multiple steps, each requiring a round-trip to the GPT APIs, leads to increased system latency and costs. Although recent advancements in parallel function calling have improved tool execution per API call, they may necessitate more detailed in-context instructions and task breakdown at the prompt level, resulting in higher engineering and production costs. Inspired by the hardware design principles of multiply-add (MAD) operations, which fuse multiple arithmetic operations into a single task from the compiler's perspective, we propose LLM-Tool Compiler, which selectively fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM. This selective fusion inherently enhances parallelization and efficiency. Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.

Updated: 2024-05-07 18:55:50

标题: 一个用于融合并行函数调用的LLM-Tool编译器（LLM-Tool是一种编译器工具）

摘要: 最先进的大型语言模型（LLMs）中的顺序推理已经将Copilots的能力扩展到复杂的功能调用，管理数千个API调用。然而，组合提示将任务分割为多个步骤的倾向，每个步骤都需要往返到GPT APIs，导致系统延迟和成本增加。尽管最近在并行功能调用方面取得了进展，提高了每个API调用的工具执行效率，但可能需要更详细的上下文指令和任务分解在提示级别，导致更高的工程和生产成本。受硬件设计原则中的乘加（MAD）操作的启发，该操作将多个算术操作融合为单个任务，我们提出了LLM-Tool Compiler，它在运行时选择性地将类似类型的工具操作融合在单个函数下，将它们呈现为LLM的统一任务。这种选择性融合从根本上增强了并行化和效率。在大规模Copilot平台上进行基准测试，LLM-Tool Compiler实现了比现有方法多四倍的并行调用，分别将令牌成本和延迟降低了40%和12%。

更新时间: 2024-05-07 18:55:50

领域: cs.PL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17438v1

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory.

Updated: 2024-05-07 18:50:09

标题: 相关噪音明显优于独立噪音在差分隐私学习中

摘要: 隐私学习算法通过向学习过程中注入噪音来实现差分隐私。尽管最常见的私有学习算法DP-SGD在每次迭代中添加独立的高斯噪音，但最近关于矩阵分解机制的研究实证表明，引入噪声的相关性可以极大地提高其效用。我们对于任何相关函数的选择，对渐近学习效用进行了表征，为线性回归给出了精确的分析界限，并将其作为一般凸函数的解决方案作为凸程序的解。我们通过这些界限展示了如何相关的噪声根据问题参数（如有效维数和条件数）明显优于普通的DP-SGD。此外，我们对于近乎最优相关函数的分析表达式规避了以前工作中用于优化噪声相关矩阵的半定程序的立方复杂性。我们通过私有深度学习实验证实了我们的理论。我们的工作在计算和内存方面都高效，并且在性能上与先前的工作相匹配或表现更好。

更新时间: 2024-05-07 18:50:09

领域: cs.LG,cs.AI,cs.CR,math.OC

下载: http://arxiv.org/abs/2310.06771v2

AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

BACKGROUND: Lung cancer's high mortality rate can be mitigated by early detection, which is increasingly reliant on artificial intelligence (AI) for diagnostic imaging. However, the performance of AI models is contingent upon the datasets used for their training and validation. METHODS: This study developed and validated the DLCSD-mD and LUNA16-mD models utilizing the Duke Lung Cancer Screening Dataset (DLCSD), encompassing over 2,000 CT scans with more than 3,000 annotations. These models were rigorously evaluated against the internal DLCSD and external LUNA16 and NLST datasets, aiming to establish a benchmark for imaging-based performance. The assessment focused on creating a standardized evaluation framework to facilitate consistent comparison with widely utilized datasets, ensuring a comprehensive validation of the model's efficacy. Diagnostic accuracy was assessed using free-response receiver operating characteristic (FROC) and area under the curve (AUC) analyses. RESULTS: On the internal DLCSD set, the DLCSD-mD model achieved an AUC of 0.93 (95% CI:0.91-0.94), demonstrating high accuracy. Its performance was sustained on the external datasets, with AUCs of 0.97 (95% CI: 0.96-0.98) on LUNA16 and 0.75 (95% CI: 0.73-0.76) on NLST. Similarly, the LUNA16-mD model recorded an AUC of 0.96 (95% CI: 0.95-0.97) on its native dataset and showed transferable diagnostic performance with AUCs of 0.91 (95% CI: 0.89-0.93) on DLCSD and 0.71 (95% CI: 0.70-0.72) on NLST. CONCLUSION: The DLCSD-mD model exhibits reliable performance across different datasets, establishing the DLCSD as a robust benchmark for lung cancer detection and diagnosis. Through the provision of our models and code to the public domain, we aim to accelerate the development of AI-based diagnostic tools and encourage reproducibility and collaborative advancements within the medical machine-learning (ML) field.

Updated: 2024-05-07 18:36:40

标题: 肺健康中的人工智能：跨多个CT扫描数据集进行检测和诊断模型的基准测试

摘要: 背景：肺癌的高死亡率可以通过早期检测得到缓解，而这种检测越来越依赖于人工智能（AI）进行诊断成像。然而，AI模型的性能取决于用于它们的训练和验证的数据集。方法：本研究利用杜克大学肺癌筛查数据集（DLCSD）开发和验证了DLCSD-mD和LUNA16-mD模型，包括超过2,000个CT扫描和超过3,000个标注。这些模型在内部DLCSD和外部LUNA16和NLST数据集上进行了严格评估，旨在建立基于成像的性能基准。评估重点是创建一个标准化的评估框架，以促进与广泛使用的数据集的一致比较，确保对模型有效性的全面验证。使用自由响应接收器工作特性（FROC）和曲线下面积（AUC）分析评估了诊断准确性。结果：在内部DLCSD数据集上，DLCSD-mD模型实现了0.93的AUC（95% CI：0.91-0.94），表现出很高的准确性。其性能在外部数据集上得到了保持，LUNA16的AUC为0.97（95% CI：0.96-0.98），NLST的AUC为0.75（95% CI：0.73-0.76）。同样，LUNA16-mD模型在其原生数据集上记录了0.96的AUC（95% CI：0.95-0.97），并展示了在DLCSD上的0.91的可传递诊断性能（95% CI：0.89-0.93），在NLST上的0.71的AUC（95% CI：0.70-0.72）。结论：DLCSD-mD模型在不同数据集上表现出可靠的性能，将DLCSD确立为肺癌检测和诊断的坚固基准。通过将我们的模型和代码提供给公共领域，我们旨在加速基于AI的诊断工具的发展，并鼓励医学机器学习领域的可重现性和合作进步。

更新时间: 2024-05-07 18:36:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04605v1

RED-PSM: Regularization by Denoising of Factorized Low Rank Models for Dynamic Imaging

Dynamic imaging addresses the recovery of a time-varying 2D or 3D object at each time instant using its undersampled measurements. In particular, in the case of dynamic tomography, only a single projection at a single view angle may be available at a time, making the problem severely ill-posed. We propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem. The first, are non-parametric factorized low rank models, also known as partially separable models (PSMs), which have been used to efficiently introduce a low-rank prior for the spatio-temporal object. The second is the recent Regularization by Denoising (RED), which provides a flexible framework to exploit the impressive performance of state-of-the-art image denoising algorithms, for various inverse problems. We propose a partially separable objective with RED and a computationally efficient and scalable optimization scheme with variable splitting and ADMM. Theoretical analysis proves the convergence of our objective to a value corresponding to a stationary point satisfying the first-order optimality conditions. Convergence is accelerated by a particular projection-domain-based initialization. We demonstrate the performance and computational improvements of our proposed RED-PSM with a learned image denoiser by comparing it to a recent deep-prior-based method known as TD-DIP. Although the main focus is on dynamic tomography, we also show performance advantages of RED-PSM in a cardiac dynamic MRI setting.

Updated: 2024-05-07 18:14:23

标题: RED-PSM: 基于去噪的分解低秩模型正则化在动态成像中的应用

摘要: 动态成像涉及使用其欠采样测量在每个时间点恢复时间变化的二维或三维对象。特别是，在动态断层扫描的情况下，一次可能只有一个视角的单个投影可用，使问题严重不适定。我们提出了一种方法RED-PSM，该方法首次将两种强大的技术结合起来解决这一具有挑战性的成像问题。第一种是非参数化分解低秩模型，也称为部分可分离模型（PSMs），已被用于有效地为时空对象引入低秩先验。第二种是最近的去噪正则化（RED），它提供了一个灵活的框架，以利用最先进的图像去噪算法的出色性能，用于各种逆问题。我们提出了一个具有RED的部分可分离目标，以及一个计算高效且可扩展的优化方案，采用变量分裂和ADMM。理论分析证明了我们的目标收敛到一个与满足一阶最优性条件的静止点相对应的值。通过特定的基于投影域的初始化，收敛加速。我们通过与最近的基于深度先验的方法TD-DIP进行比较，利用学习的图像去噪器展示了我们提出的RED-PSM的性能和计算改进。虽然主要关注于动态断层扫描，但我们还展示了RED-PSM在心脏动态MRI设置中的性能优势。

更新时间: 2024-05-07 18:14:23

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2304.03483v4

Integrating knowledge-guided symbolic regression and model-based design of experiments to automate process flow diagram development

New products must be formulated rapidly to succeed in the global formulated product market; however, key product indicators (KPIs) can be complex, poorly understood functions of the chemical composition and processing history. Consequently, scale-up must currently undergo expensive trial-and-error campaigns. To accelerate process flow diagram (PFD) optimisation and knowledge discovery, this work proposed a novel digital framework to automatically quantify process mechanisms by integrating symbolic regression (SR) within model-based design of experiments (MBDoE). Each iteration, SR proposed a Pareto front of interpretable mechanistic expressions, and then MBDoE designed a new experiment to discriminate between them while balancing PFD optimisation. To investigate the framework's performance, a new process model capable of simulating general formulated product synthesis was constructed to generate in-silico data for different case studies. The framework could effectively discover ground-truth process mechanisms within a few iterations, indicating its great potential for use within the general chemical industry for digital manufacturing and product innovation.

Updated: 2024-05-07 18:10:54

标题: 整合知识引导的符号回归和基于模型的实验设计，以自动化流程图发展

摘要: 新产品必须迅速制定才能在全球配方产品市场取得成功；然而，关键产品指标（KPIs）可能是化学成分和处理历史的复杂、不易理解的功能。因此，目前规模化必须经历昂贵的试错活动。为加速流程流程图（PFD）优化和知识发现，本研究提出了一个新颖的数字框架，通过将符号回归（SR）与基于模型的设计实验（MBDoE）相结合，自动量化过程机制。每次迭代，SR提出了一组可解释的机械表达式的帕累托前沿，然后MBDoE设计了一个新的实验来区分它们，同时平衡PFD优化。为了研究该框架的性能，构建了一个能够模拟一般配方产品合成的新的过程模型，为不同案例研究生成了虚拟数据。该框架能够在几次迭代中有效地发现真实的过程机制，表明它在一般化学工业中用于数字制造和产品创新具有巨大潜力。

更新时间: 2024-05-07 18:10:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.04592v1

ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning

Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including the estimation of 3D pose, shape, contact, human-object interaction, emotion, and more. Each of these methods works in isolation instead of synergistically. Here we address this problem and build a language-driven human understanding system -- ChatHuman, which combines and integrates the skills of many different methods. To do so, we finetune a Large Language Model (LLM) to select and use a wide variety of existing tools in response to user inputs. In doing so, ChatHuman is able to combine information from multiple tools to solve problems more accurately than the individual tools themselves and to leverage tool output to improve its ability to reason about humans. The novel features of ChatHuman include leveraging academic publications to guide the application of 3D human-related tools, employing a retrieval-augmented generation model to generate in-context-learning examples for handling new tools, and discriminating and integrating tool results to enhance 3D human understanding. Our experiments show that ChatHuman outperforms existing models in both tool selection accuracy and performance across multiple 3D human-related tasks. ChatHuman is a step towards consolidating diverse methods for human analysis into a single, powerful, system for 3D human reasoning.

Updated: 2024-05-07 17:59:31

标题: ChatHuman：使用检索增强的工具推理进行语言驱动的3D人体理解

摘要: 已提出许多方法来检测、估计和分析图像中人物的属性，包括估计3D姿势、形状、接触、人-物互动、情绪等。这些方法各自独立工作，而非协同工作。在这里，我们解决了这个问题，并建立了一个语言驱动的人类理解系统--ChatHuman，它结合和整合了许多不同方法的技能。为此，我们对大型语言模型（LLM）进行微调，以选择和使用各种现有工具来响应用户输入。通过这样做，ChatHuman能够结合多个工具的信息，比单个工具本身更准确地解决问题，并利用工具输出来提高其推理人类的能力。ChatHuman的新颖特性包括利用学术论文指导应用3D人类相关工具，采用检索增强生成模型生成上下文学习示例以处理新工具，以及辨别和整合工具结果以增强3D人类理解。我们的实验证明，ChatHuman在多个3D人类相关任务的工具选择准确性和性能方面优于现有模型。ChatHuman是将各种人类分析方法整合成一个强大系统的一步，用于3D人类推理。

更新时间: 2024-05-07 17:59:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04533v1

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing INT4 quantization methods suffer from significant runtime overhead (20-90%) when dequantizing either weights or partial sums on GPUs. To address this challenge, we introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache. QoQ stands for quattuor-octo-quattuor, which represents 4-8-4 in Latin. QoQ is implemented by the QServe inference library that achieves measured speedup. The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores. Building upon this insight, in QoQ algorithm, we introduce progressive quantization that can allow low dequantization overhead in W4A8 GEMM. Additionally, we develop SmoothAttention to effectively mitigate the accuracy degradation incurred by 4-bit KV quantization. In the QServe system, we perform compute-aware weight reordering and take advantage of register-level parallelism to reduce dequantization latency. We also make fused attention memory-bound, harnessing the performance gain brought by KV4 quantization. As a result, QServe improves the maximum achievable serving throughput of Llama-3-8B by 1.2x on A100, 1.4x on L40S; and Qwen1.5-72B by 2.4x on A100, 3.5x on L40S, compared to TensorRT-LLM. Remarkably, QServe on L40S GPU can achieve even higher throughput than TensorRT-LLM on A100. Thus, QServe effectively reduces the dollar cost of LLM serving by 3x. Code is available at https://github.com/mit-han-lab/qserve.

Updated: 2024-05-07 17:59:30

标题: QServe: 高效LLM服务的量化和系统协同设计

摘要: 量化可以加速大型语言模型（LLM）推理。超越INT8量化，研究界正在积极探索更低精度，比如INT4。然而，目前最先进的INT4量化技术只能加速低批量、边缘LLM推理，在大批量、基于云的LLM服务中没有性能提升。我们发现一个关键问题：现有的INT4量化方法在GPU上对权重或部分和进行去量化时存在显著的运行时开销（20-90%）。为了解决这一挑战，我们引入了QoQ，一种具有4位权重、8位激活和4位KV缓存的W4A8KV4量化算法。QoQ代表quattuor-octo-quattuor，表示拉丁文中的4-8-4。QoQ由QServe推理库实现，实现了加速。推动QServe的关键见解是，在GPU上LLM服务的效率受低吞吐量CUDA核心操作的重要影响。基于这一见解，在QoQ算法中，我们引入了渐进量化，可以允许W4A8 GEMM中的低去量化开销。此外，我们开发了SmoothAttention来有效减轻4位KV量化所带来的精度降级。在QServe系统中，我们执行计算感知的权重重新排序，并利用寄存器级并行性来减少去量化延迟。我们还使融合注意力受内存限制，利用KV4量化带来的性能增益。因此，与TensorRT-LLM相比，QServe将Llama-3-8B的最大可实现服务吞吐量在A100上提高了1.2倍，在L40S上提高了1.4倍；将Qwen1.5-72B在A100上提高了2.4倍，在L40S上提高了3.5倍。值得注意的是，L40S GPU上的QServe甚至可以实现比A100上的TensorRT-LLM更高的吞吐量。因此，QServe有效地将LLM服务的成本降低了3倍。代码可在https://github.com/mit-han-lab/qserve获得。

更新时间: 2024-05-07 17:59:30

领域: cs.CL,cs.AI,cs.LG,cs.PF

下载: http://arxiv.org/abs/2405.04532v1

Natural Language Counterfactuals through Representation Surgery

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.

Updated: 2024-05-07 17:58:17

标题: 通过表示手术实现自然语言虚拟语气

摘要: 针对语言模型（LMs）的表示空间进行干预已经被证明是影响模型行为的有效手段。例如，这些方法被用来消除或改变模型表示中的人口统计信息，比如性别，并在此过程中创建一个反事实的表示。然而，由于干预操作在表示空间内进行，准确理解它修改的文本方面是一项挑战。在本文中，我们提出了一种方法，将表示空间的反事实转化为字符串的反事实。我们展示了这种方法使我们能够分析与给定表示空间干预对应的语言变化，并解释用于编码特定概念的特征。此外，由此产生的反事实可以通过数据增强来用于减轻分类中的偏见。

更新时间: 2024-05-07 17:58:17

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.11355v3

PoW Security-Latency under Random Delays and the Effect of Transaction Fees

Safety guarantees and security-latency problem of Nakamoto consensus have been extensively studied in the last decade with a bounded delay model. Recent studies have shown that PoW protocol is secure under random delay models as well. In this paper, we analyze the security-latency problem, i.e., how secure a block is, after it becomes k-deep in the blockchain, under general random delay distributions. We provide tight and explicit bounds which only require determining the distribution of the number of Poisson arrivals during the random delay. We further consider potential effects of recent Bitcoin halving on the security-latency problem by extending our results.

Updated: 2024-05-07 17:57:31

标题: 工作证明安全性-在随机延迟下的延迟和交易费用的影响

摘要: 在过去的十年中，对Nakamoto共识的安全性保证和安全延迟问题进行了广泛研究，采用了有界延迟模型。最近的研究表明，在随机延迟模型下，PoW协议也是安全的。本文分析了安全延迟问题，即在区块链中一个区块深度达到k后，它有多安全，采用了一般的随机延迟分布。我们提供了紧密和明确的界限，只需要确定在随机延迟期间泊松到达的数量分布。我们进一步考虑了最近比特币减半对安全延迟问题的潜在影响，并通过扩展我们的结果进行了分析。

更新时间: 2024-05-07 17:57:31

领域: cs.CR,cs.DC,cs.DM,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.04526v1

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeBench (NCB), a challenging code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks. NCB comprises 402 high-quality problems in Python and Java, meticulously selected from natural user queries from online coding services, covering 6 different domains. Noting the extraordinary difficulty in creating testing cases for real-world queries, we also introduce a semi-automated pipeline to enhance the efficiency of test case construction. Comparing with manual solutions, it achieves an efficiency increase of more than 4 times. Our systematic experiments on 39 LLMs find that performance gaps on NCB between models with close HumanEval scores could still be significant, indicating a lack of focus on practical code synthesis scenarios or over-specified optimization on HumanEval. On the other hand, even the best-performing GPT-4 is still far from satisfying on NCB. The evaluation toolkit and development set are available at https://github.com/THUDM/NaturalCodeBench.

Updated: 2024-05-07 17:52:51

标题: NaturalCodeBench：研究人类评估和自然用户提示上的编码性能不匹配

摘要: 大型语言模型(LLMs)表现出强大的能力，能够生成用于生产活动的代码。然而，当前的代码合成基准，如HumanEval、MBPP和DS-1000，主要面向算法和数据科学的入门任务，无法满足现实世界编码中普遍存在的具有挑战性的需求。为了填补这一空白，我们提出了NaturalCodeBench (NCB)，这是一个旨在模拟真实编码任务中的复杂性和多样性的具有挑战性的代码基准。NCB包括402个高质量的问题，使用Python和Java精心选择自在线编码服务的自然用户查询中，涵盖6个不同领域。鉴于为真实世界查询创建测试案例的极大困难，我们还引入了一种半自动化流水线，以提高测试案例构建的效率。与手动解决方案相比，它实现了超过4倍的效率提升。我们对39个LLMs进行了系统实验，发现在NCB上，即使具有接近HumanEval得分的模型之间的性能差距仍然可能很大，这表明缺乏对实际代码合成场景的关注或在HumanEval上过度优化。另一方面，即使是表现最佳的GPT-4在NCB上仍然远未令人满意。评估工具包和开发集可在https://github.com/THUDM/NaturalCodeBench获取。

更新时间: 2024-05-07 17:52:51

领域: cs.CL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2405.04520v1

xLSTM: Extended Long Short-Term Memory

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

Updated: 2024-05-07 17:50:21

标题: xLSTM：扩展的长短期记忆网络

摘要: 在1990年代，长短期记忆（LSTM）的中心思想是引入了恒定误差旋转和门控。从那时起，LSTM经受住了时间的考验，并为许多深度学习成功故事做出了贡献，特别是构成了第一批大型语言模型（LLM）。然而，具有可并行化自注意力核心的Transformer技术的出现标志着一个新时代的开始，超越了规模上的LSTM。我们现在提出一个简单的问题：当将LSTM的规模扩展到数十亿个参数时，利用现代LLM的最新技术，但减轻已知的LSTM限制时，我们在语言建模方面能走多远？首先，我们引入了具有适当归一化和稳定化技术的指数门控。其次，我们修改了LSTM的记忆结构，获得了：（i）具有标量记忆、标量更新和新记忆混合的sLSTM，（ii）完全可并行化的具有矩阵记忆和协方差更新规则的mLSTM。将这些LSTM扩展集成到残差块骨干中产生了xLSTM块，然后将这些块残留堆叠到xLSTM架构中。指数门控和修改的记忆结构提升了xLSTM的能力，在性能和规模上与最先进的Transformer和状态空间模型相比表现优异。

更新时间: 2024-05-07 17:50:21

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.04517v1

Switchable Decision: Dynamic Neural Generation Networks

Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. Automatically making decisions on where to skip and how to balance quality and computation cost with constrained optimization, our dynamic neural generation networks enforce the efficient inference path and determine the optimized trade-off. Experiments across question answering, summarization, and classification benchmarks show that our method benefits from less computation cost during inference while keeping the same accuracy. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.

Updated: 2024-05-07 17:44:54

标题: 可切换的决策：动态神经生成网络

摘要: 自回归生成模型在许多不同的自然语言处理任务中取得了竞争性表现，如摘要、问答和分类。然而，它们也以推理速度慢而闻名，这使得它们在实时应用中部署具有挑战性。我们提出了一种可切换的决策，通过动态分配计算资源为每个数据实例加速推理。通过自动决定何时跳过以及如何平衡质量和计算成本，我们的动态神经生成网络强制执行高效的推理路径，并确定了优化的权衡。在问答、摘要和分类基准测试中的实验证明，我们的方法在推理过程中受益于较少的计算成本，同时保持相同的准确性。大量实验和消融研究表明，我们的方法可以普遍、有效，并对许多自然语言处理任务有益。

更新时间: 2024-05-07 17:44:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04513v1

New allometric models for the USA create a step-change in forest carbon estimation, modeling, and mapping

The United States national forest inventory (NFI) serves as the foundation for forest aboveground biomass (AGB) and carbon accounting across the nation. These data enable design-based estimates of forest carbon stocks and stock-changes at state and regional levels, but also serve as inputs to model-based approaches for characterizing forest carbon stocks and stock-changes at finer resolutions. Although NFI tree and plot-level data are often treated as truth in these models, they are in fact estimates based on regional species-group models known collectively as the Component Ratio Method (CRM). In late 2023 the Forest Inventory and Analysis (FIA) program introduced a new National Scale Volume and Biomass Estimators (NSVB) system to replace CRM nationwide and offer more precise and accurate representations of forest AGB and carbon. Given the prevalence of model-based AGB studies relying on FIA, there is concern about the transferability of methods from CRM to NSVB models, as well as the comparability of existing CRM AGB products (e.g. maps) to new and forthcoming NSVB AGB products. To begin addressing these concerns we compared previously published CRM AGB maps to new maps produced using identical methods with NSVB AGB reference data. Our results suggest that models relying on passive satellite imagery (e.g. Landsat) provide acceptable estimates of point-in-time NSVB AGB and carbon stocks, but fail to accurately quantify growth in mature closed-canopy forests. We highlight that existing estimates, models, and maps based on FIA reference data are no longer compatible with NSVB, and recommend new methods as well as updated models and maps for accommodating this step-change. Our collective ability to adopt NSVB in our modeling and mapping workflows will help us provide the most accurate spatial forest carbon data possible in order to better inform local management and decision making.

Updated: 2024-05-07 17:38:39

标题: 美国新的异速生长模型在森林碳估算、建模和制图方面带来了一次重大变革

摘要: 美国国家森林清查(NFI)是全国森林地上生物量(AGB)和碳账目的基础。这些数据使得可以设计基于样本的森林碳储量和库存变化的估计，但也可以作为输入到基于模型的方法中，以更精细的分辨率来描述森林碳库存和库存变化。尽管NFI树木和样地级别的数据经常被这些模型视为真实数据，但实际上它们是基于区域物种组模型，这些模型被统称为组分比率方法(CRM)的估计结果。2023年末，森林清查与分析(FIA)计划引入了一个新的全国规模体积和生物量估计器(NSVB)系统，以取代CRM，提供更精确和准确的森林AGB和碳的表示。考虑到依赖于FIA的基于模型的AGB研究的普遍性，人们担心从CRM到NSVB模型的方法的可转移性，以及现有CRM AGB产品(例如地图)与新的和即将推出的NSVB AGB产品的比较性。为了开始解决这些问题，我们将以前发表的CRM AGB地图与使用相同方法和NSVB AGB参考数据产生的新地图进行比较。我们的结果表明，依赖于被动卫星图像(例如Landsat)的模型提供了可接受的时点NSVB AGB和碳库存的估计，但未能准确量化成熟闭合冠层森林的生长。我们强调，基于FIA参考数据的现有估计、模型和地图不再与NSVB兼容，并建议新的方法以及更新的模型和地图来适应这一变革。我们集体采用NSVB在我们的建模和制图工作流程中将帮助我们提供可能最准确的空间森林碳数据，以更好地指导地方管理和决策。

更新时间: 2024-05-07 17:38:39

领域: stat.AP,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04507v1

Amodal Optical Flow

Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.

Updated: 2024-05-07 17:36:29

标题: 非模态光流

摘要: 光流估计在存在透明或遮挡物体的情况下是非常具有挑战性的。在这项工作中，我们在任务级别上解决了这些挑战，引入了模态光流（Amodal Optical Flow），将光流与模态感知相结合。我们定义模态光流为一个多层像素级运动场，包括场景的可见区域和被遮挡的区域，而不仅仅是表示可见区域。为了促进对这一新任务的研究，我们将AmodalSynthDrive数据集扩展为包括像素级标签，用于模态光流估计。我们提出了几个强基线模型，以及模态光流质量度量标准，以可解释的方式量化性能。此外，我们提出了新颖的AmodalFlowNet作为解决这一任务的初步步骤。AmodalFlowNet由基于Transformer的成本卷积编码器和循环Transformer解码器组成，促进了循环分层特征传播和模态语义基础。我们在大量实验中展示了模态光流的可处理性，并展示了其对诸如全景跟踪等下游任务的实用性。我们将数据集、代码和训练模型公开提供在http://amodal-flow.cs.uni-freiburg.de。

更新时间: 2024-05-07 17:36:29

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2311.07761v2

On the accuracy of interpolation based on single-layer artificial neural networks with a focus on defeating the Runge phenomenon

In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.

Updated: 2024-05-07 17:30:50

标题: 关于基于单层人工神经网络进行插值的准确性，重点解决龙格现象

摘要: 在本文中，我们考虑具有前馈架构的一层隐藏层人工神经网络（ANNs），也称为浅层或两层网络，因此结构由神经元的数量和类型确定。定义函数的参数的确定，称为训练，通过解决逼近问题来实现，通过在一组特定节点上施加插值来实现。我们提出了使用被称为极限学习机（ELM）的程序进行参数训练的情况，这导致了一个线性插值问题。在这种假设下，ANN插值函数的存在是有保证的。重点在于当给定的采样插值节点是等间距、切比雪夫和随机选择节点时，插值的准确性。研究的动机是众所周知的钟形Runge示例，它清楚地表明，只有在适当选择的节点上进行训练，例如切比雪夫节点时，全局插值多项式的构建才是准确的。为了评估随着插值节点数量的增加，我们在我们的网络中增加神经元数量并将其与插值多项式进行比较。我们使用Runge的函数和其他具有不同规则性的著名示例进行测试。正如预期的那样，全局多项式的逼近精度只有在考虑切比雪夫节点时才会增加。相反，ANN插值函数的误差总是下降，并且在大多数情况下，我们观察到收敛遵循切比雪夫节点上多项式情况中观察到的情况，尽管用于训练的节点集合不同。

更新时间: 2024-05-07 17:30:50

领域: math.NA,cs.AI,cs.NA,65D05

下载: http://arxiv.org/abs/2308.10720v2

Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.

Updated: 2024-05-07 17:25:56

标题: 快速分散梯度跟踪用于具有本地更新的联邦极小化优化

摘要: 联邦学习（FL）用于极小化优化已经成为在保护数据隐私和模型稳健性的同时跨分布节点/客户端训练模型的强大范式。在这项工作中，我们深入研究了联邦极小化优化的分散实现，提出了一种新颖的分散极小化优化算法\texttt{K-GT-Minimax}，该算法结合了局部更新和梯度跟踪技术。我们的分析展示了该算法在非凸-强凹（NC-SC）极小化优化中的通信效率和收敛速度，表明与现有方法相比具有更优越的收敛速度。\texttt{K-GT-Minimax}处理数据异构性并确保稳健性的能力突显了其在推动联邦学习研究和应用方面的重要性。

更新时间: 2024-05-07 17:25:56

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2405.04566v1

Iterative Reasoning Preference Optimization

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples.

Updated: 2024-05-07 17:25:08

标题: 迭代推理偏好优化

摘要: 最近已经显示，迭代偏好优化方法在一般的指导调优任务中表现良好，但通常在推理任务上改进有限（Yuan等，2024年，Chen等，2024年）。在这项工作中，我们开发了一种迭代方法，通过优化赢得vs. 输掉导致正确答案的推理步骤，以优化竞争生成的思维链（CoT）候选之间的偏好。我们使用修改后的DPO损失（Rafailov等，2023年）进行训练，额外添加了一个负对数似然项，我们发现这一点至关重要。我们展示推理在重复迭代这一方案时得到改进。虽然只依赖训练集中的示例，我们的方法在Llama-2-70B-Chat的GSM8K、MATH和ARC-Challenge上呈现出逐渐提高的准确性，优于其他不依赖额外数据集的基于Llama-2的模型。例如，我们看到在GSM8K上从55.6%大幅提高到81.6%，在32个样本中通过多数投票获得88.7%的准确性。

更新时间: 2024-05-07 17:25:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.19733v2

Quantum $X$-Secure $B$-Byzantine $T$-Colluding Private Information Retrieval

We consider the problems arising from the presence of Byzantine servers in a quantum private information retrieval (QPIR) setting. This is the first work to precisely define what the capabilities of Byzantine servers could be in a QPIR context. We show that quantum Byzantine servers have more capabilities than their classical counterparts due to the possibilities created by quantum encoding procedures. We focus on quantum Byzantine servers that can apply any reversible operation on their individual qudits. In this case, Byzantine servers can generate any error, i.e., this covers \emph{all} possible single qudit operations that can be applied by Byzantine servers on their qudits. We design a scheme based on cross-subspace alignment (CSA) and we show that this scheme achieves superdense coding gain in some cases.

Updated: 2024-05-07 17:19:49

标题: 量子$X$-安全$B$-拜占庭$T$-串通私人信息检索

摘要: 我们考虑在量子私人信息检索（QPIR）环境中存在拜占庭服务器所引起的问题。这是第一项精确定义拜占庭服务器在QPIR环境中可能具有的能力的工作。我们表明，由于量子编码程序所产生的可能性，量子拜占庭服务器比其经典对应物具有更多的能力。我们关注可以在其各自的量子位上应用任何可逆操作的量子拜占庭服务器。在这种情况下，拜占庭服务器可以产生任何错误，即，这涵盖了拜占庭服务器可以在其量子位上应用的\emph{所有}可能的单个量子位操作。我们设计了一个基于跨子空间对齐（CSA）的方案，并且我们表明在某些情况下，这个方案实现了超密编码增益。

更新时间: 2024-05-07 17:19:49

领域: cs.IT,cs.CR,cs.NI,eess.SP,math.IT,quant-ph

下载: http://arxiv.org/abs/2401.17252v2

Accelerating Convergence in Bayesian Few-Shot Classification

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

Updated: 2024-05-07 17:12:13

标题: 加速贝叶斯少样本分类的收敛

摘要: 贝叶斯少样本分类一直是少样本学习领域的焦点。本文将基于镜像下降的变分推断无缝集成到基于高斯过程的少样本分类中，解决了非共轭推断的挑战。通过利用非欧几里德几何，镜像下降通过沿着相应流形提供最陡的下降方向，实现了加速收敛。它还展现了关于变分分布的参数化不变性属性。实验结果表明，与基准模型相比，竞争性分类准确性、改善的不确定性量化和更快的收敛速度。此外，我们还调查了超参数和组件的影响。代码可在https://github.com/keanson/MD-BSFC 公开获取。

更新时间: 2024-05-07 17:12:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.01507v3

Terrapin Attack: Breaking SSH Channel Integrity By Sequence Number Manipulation

The SSH protocol provides secure access to network services, particularly remote terminal login and file transfer within organizational networks and to over 15 million servers on the open internet. SSH uses an authenticated key exchange to establish a secure channel between a client and a server, which protects the confidentiality and integrity of messages sent in either direction. The secure channel prevents message manipulation, replay, insertion, deletion, and reordering. At the network level, SSH uses the Binary Packet Protocol over TCP. In this paper, we show that the SSH Binary Packet Protocol is no longer a secure channel: SSH channel integrity (INT-PST, aINT-PTXT, and INT-sfCTF) is broken for three widely used encryption modes. This allows prefix truncation attacks where encrypted packets at the beginning of the SSH channel can be deleted without the client or server noticing it. We demonstrate several real-world applications of this attack. We show that we can fully break SSH extension negotiation (RFC 8308), such that an attacker can downgrade the public key algorithms for user authentication or turn off a new countermeasure against keystroke timing attacks introduced in OpenSSH 9.5. Further, we identify an implementation flaw in AsyncSSH that, together with prefix truncation, allows an attacker to redirect the victim's login into a shell controlled by the attacker. We also performed an internet-wide scan and found that 71.6% of SSH servers support a vulnerable encryption mode, while 63.2% even list it as their preferred choice. We identify two root causes that enable these attacks: First, the SSH handshake supports optional messages that are not authenticated. Second, SSH does not reset message sequence numbers when activating encryption keys. Based on this analysis, we propose effective and backward-compatible changes to SSH that mitigate our attacks.

Updated: 2024-05-07 17:09:31

标题: 陆龟攻击：通过序列号操纵破坏SSH通道完整性

摘要: SSH协议提供对网络服务的安全访问，特别是在组织网络内和超过1500万台开放互联网服务器之间进行远程终端登录和文件传输。SSH使用经过身份验证的密钥交换来建立客户端和服务器之间的安全通道，保护发送的消息的机密性和完整性。安全通道防止消息被篡改、重放、插入、删除和重新排序。在网络层面，SSH使用TCP上的二进制数据包协议。本文表明，SSH二进制数据包协议不再是一个安全通道：三种广泛使用的加密模式破坏了SSH通道的完整性（INT-PST，aINT-PTXT和INT-sfCTF）。这允许发生前缀截断攻击，其中SSH通道开头的加密数据包可以在客户端或服务器未察觉的情况下被删除。我们展示了此攻击的几个真实应用。我们展示了我们可以完全破坏SSH扩展协商（RFC 8308），使攻击者可以降级用户身份验证的公钥算法或关闭OpenSSH 9.5引入的针对击键时序攻击的新对策。此外，我们发现了AsyncSSH中的实现缺陷，结合前缀截断，使攻击者可以将受害者的登录重定向到由攻击者控制的shell。我们还进行了互联网范围的扫描，发现71.6％的SSH服务器支持一个易受攻击的加密模式，甚至有63.2％将其列为首选选择。我们确定了两个导致这些攻击的根本原因：首先，SSH握手支持未经身份验证的可选消息。其次，在激活加密密钥时，SSH不会重置消息序列号。基于这一分析，我们提出了对SSH的有效和向后兼容的更改，以减轻我们的攻击。

更新时间: 2024-05-07 17:09:31

领域: cs.CR

下载: http://arxiv.org/abs/2312.12422v2

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions

When a teacher provides examples for a student to study, these examples must be informative, enabling a student to progress from their current state toward a target concept or skill. Good teachers must therefore simultaneously infer what students already know and adapt their teaching to students' changing state of knowledge. There is increasing interest in using computational models, particularly large language models, as pedagogical tools. As students, language models in particular have shown a remarkable ability to adapt to new tasks given small numbers of examples. But how effectively can these models adapt as teachers to students of different types? To study this question, we introduce a suite of models and evaluation methods we call AdapT. AdapT has two components: (1) a collection of simulated Bayesian student models that can be used for evaluation of automated teaching methods; (2) a platform for evaluation with human students, to characterize the real-world effectiveness of these methods. We additionally introduce (3) AToM, a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimizes for the correctness of future beliefs. In evaluations of simulated students across three learning domains (fraction arithmetic, English morphology, function learning), AToM systematically outperforms LLM-based and standard Bayesian teaching models. In human experiments, both AToM and LLMs outperform non-adaptive random example selection. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.

Updated: 2024-05-07 17:05:27

标题: 走向情境教学：根据学生的误解调整示例

摘要: 当教师为学生提供示例供学习时，这些示例必须具有信息量，使学生能够从当前状态向目标概念或技能前进。因此，优秀的教师必须同时推断学生已经掌握的知识，并根据学生知识状态的变化调整教学。使用计算模型，特别是大型语言模型作为教学工具，引起了越来越多的兴趣。作为学生，语言模型特别表现出在给定少量示例的情况下适应新任务的能力。但这些模型作为教师能有多大效果地适应不同类型的学生呢？为了研究这个问题，我们引入了一套模型和评估方法，称为AdapT。AdapT由两个组成部分组成：（1）一组模拟的贝叶斯学生模型，可用于评估自动教学方法；（2）一个用于与人类学生评估的平台，以表征这些方法在现实世界中的有效性。此外，我们还介绍了（3）AToM，这是一种新的概率模型，用于自适应教学，它同时推断学生的过去信念并优化未来信念的正确性。在三个学习领域（分数算术、英语形态学、函数学习）中对模拟学生的评估中，AToM系统地优于基于LLM和标准贝叶斯教学模型。在人类实验中，AToM和LLMs都优于非自适应的随机示例选择。我们的结果突显了自适应教学任务的困难性以及学得的自适应模型解决这个问题的潜力。

更新时间: 2024-05-07 17:05:27

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04495v1

Representation Learning of Daily Movement Data Using Text Encoders

Time-series representation learning is a key area of research for remote healthcare monitoring applications. In this work, we focus on a dataset of recordings of in-home activity from people living with Dementia. We design a representation learning method based on converting activity to text strings that can be encoded using a language model fine-tuned to transform data from the same participants within a $30$-day window to similar embeddings in the vector space. This allows for clustering and vector searching over participants and days, and the identification of activity deviations to aid with personalised delivery of care.

Updated: 2024-05-07 17:04:21

标题: 使用文本编码器进行日常运动数据的表示学习

摘要: 时间序列表示学习是远程医疗监测应用的关键研究领域。在这项工作中，我们关注患有痴呆症的居住者的家庭活动记录数据集。我们设计了一种基于将活动转换为文本字符串的表示学习方法，可以使用经过微调的语言模型对数据进行编码，从而将来自相同参与者的数据在向量空间中转换为相似的嵌入。这允许对参与者和日期进行聚类和向量搜索，并识别活动偏差以帮助个性化护理的交付。

更新时间: 2024-05-07 17:04:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.04494v1

TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters

The training, testing, and deployment, of autonomous vehicles requires realistic and efficient simulators. Moreover, because of the high variability between different problems presented in different autonomous systems, these simulators need to be easy to use, and easy to modify. To address these problems we introduce TorchDriveSim and its benchmark extension TorchDriveEnv. TorchDriveEnv is a lightweight reinforcement learning benchmark programmed entirely in Python, which can be modified to test a number of different factors in learned vehicle behavior, including the effect of varying kinematic models, agent types, and traffic control patterns. Most importantly unlike many replay based simulation approaches, TorchDriveEnv is fully integrated with a state of the art behavioral simulation API. This allows users to train and evaluate driving models alongside data driven Non-Playable Characters (NPC) whose initializations and driving behavior are reactive, realistic, and diverse. We illustrate the efficiency and simplicity of TorchDriveEnv by evaluating common reinforcement learning baselines in both training and validation environments. Our experiments show that TorchDriveEnv is easy to use, but difficult to solve.

Updated: 2024-05-07 17:02:02

标题: TorchDriveEnv：一个具有反应灵敏、现实和多样化非可玩角色的自动驾驶强化学习基准测试

摘要: 自主驾驶车辆的培训、测试和部署需要现实和高效的模拟器。此外，由于不同自主系统中存在的问题之间的高度变化，这些模拟器需要易于使用和修改。为了解决这些问题，我们引入了TorchDriveSim及其基准扩展TorchDriveEnv。TorchDriveEnv是一个轻量级的强化学习基准，完全使用Python编程，可以修改以测试学习车辆行为中的许多不同因素，包括不同运动模型、代理类型和交通控制模式的影响。最重要的是，与许多基于重放的模拟方法不同，TorchDriveEnv完全集成了一种最先进的行为模拟API。这使用户能够在数据驱动的非可玩角色（NPC）的初始化和驾驶行为是反应性、现实和多样化的情况下训练和评估驾驶模型。我们通过在培训和验证环境中评估常见的强化学习基准来展示TorchDriveEnv的高效性和简单性。我们的实验表明，TorchDriveEnv易于使用，但难以解决。

更新时间: 2024-05-07 17:02:02

领域: cs.AI,cs.LG,cs.MA,cs.RO

下载: http://arxiv.org/abs/2405.04491v1

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.

Updated: 2024-05-07 17:00:27

标题: 思维链条赋能变形金刚解决固有的串行问题

摘要: 将模型指示生成一系列中间步骤，即思维链（CoT），是提高大型语言模型（LLMs）在算术和符号推理任务上准确性的高效方法。然而，CoT背后的机制仍不清楚。本文通过表达力的视角提供了对仅解码器变压器的CoT能力的理论理解。概念上，CoT赋予了模型执行固有的串行计算的能力，这在变压器中是缺乏的，特别是在深度较低时。在给定输入长度$n$的情况下，先前的研究表明，具有有限精度$\mathsf{poly}(n)$嵌入大小的常深度变压器只能在没有CoT的情况下解决$\mathsf{TC}^0$中的问题。我们首先展示了具有常量位精度的常深度变压器的更紧密的表达上界，它只能解决$\mathsf{AC}^0$中的问题，这是$\mathsf{TC}^0$的一个真子集。然而，通过$T$步的CoT，具有常量位精度和$O(\log n)$嵌入大小的常深度变压器可以解决任何可由大小为$T$的布尔电路解决的问题。从经验上看，启用CoT显著提高了在并行计算中困难的任务的准确性，包括排列群的组合、迭代平方和电路值问题，特别是对于低深度变压器。

更新时间: 2024-05-07 17:00:27

领域: cs.LG,cs.CC,stat.ML

下载: http://arxiv.org/abs/2402.12875v2

Scalable network reconstruction in subquadratic time

Network reconstruction consists in determining the unobserved pairwise couplings between $N$ nodes given only observational data on the resulting behavior that is conditioned on those couplings -- typically a time-series or independent samples from a graphical model. A major obstacle to the scalability of algorithms proposed for this problem is a seemingly unavoidable quadratic complexity of $\Omega(N^2)$, corresponding to the requirement of each possible pairwise coupling being contemplated at least once, despite the fact that most networks of interest are sparse, with a number of non-zero couplings that is only $O(N)$. Here we present a general algorithm applicable to a broad range of reconstruction problems that significantly outperforms this quadratic baseline. Our algorithm relies on a stochastic second neighbor search (Dong et al., 2011) that produces the best edge candidates with high probability, thus bypassing an exhaustive quadratic search. If we rely on the conjecture that the second-neighbor search finishes in log-linear time (Baron & Darling, 2020; 2022), we demonstrate theoretically that our algorithm finishes in subquadratic time, with a data-dependent complexity loosely upper bounded by $O(N^{3/2}\log N)$, but with a more typical log-linear complexity of $O(N\log^2N)$. In practice, we show that our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline -- in a manner consistent with our theoretical analysis -- allows for easy parallelization, and thus enables the reconstruction of networks with hundreds of thousands and even millions of nodes and edges.

Updated: 2024-05-07 16:57:08

标题: 可扩展的网络重建在次二次时间内

摘要: 网络重建是指在仅给定与这些耦合条件相关的结果行为的观测数据的情况下确定$N$个节点之间未被观测到的成对耦合——通常是时间序列或图模型中的独立样本。为这个问题提出的算法的可扩展性的一个主要障碍是看似不可避免的$\Omega(N^2)$的二次复杂度，对应于至少考虑每个可能的成对耦合一次的要求，尽管大多数感兴趣的网络都是稀疏的，具有仅为$O(N)$的非零耦合的数量。在这里，我们提出了一个适用于广泛重建问题的通用算法，其性能明显优于这个二次基线。我们的算法依赖于一种随机第二邻居搜索（Dong et al.，2011），该搜索以很高的概率产生最佳的边候选，从而绕过了耗时的二次搜索。如果我们依赖于第二邻居搜索在对数线性时间内完成的猜想（Baron & Darling, 2020; 2022），我们理论上证明我们的算法在亚二次时间内完成，其数据相关复杂度松散地上界为$O(N^{3/2}\log N)$，但具有更典型的对数线性复杂度$O(N\log^2N)$。在实践中，我们展示了我们的算法实现了比二次基线快几个数量级的性能——与我们的理论分析一致——允许轻松并行化，从而使得能够重建具有数十万甚至数百万节点和边的网络。

更新时间: 2024-05-07 16:57:08

领域: cs.DS,cs.LG,physics.data-an,stat.CO,stat.ML

下载: http://arxiv.org/abs/2401.01404v5

Network reconstruction via the minimum description length principle

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.

Updated: 2024-05-07 16:54:52

标题: 通过最小描述长度原理进行网络重建

摘要: 从动态或行为数据中重建网络的任务所涉及的一个基本问题在于以一种防止过拟合的方式确定最适合的模型复杂度，并产生一个具有统计合理数量边缘的推断网络。目前的状态基于$L_{1}$正则化结合交叉验证。然而，除了其高计算成本外，这种常见方法不必要地将稀疏性提升与权重“收缩”联系在一起。这种组合迫使在缩小和网络稀疏性引入的偏差之间进行权衡，这经常导致即使经过交叉验证后也会出现显着的过拟合。在这项工作中，我们提出了一种基于分层贝叶斯推断和权重量化的替代非参数正则化方案，该方案不依赖于权重收缩来促进稀疏性。我们的方法遵循最小描述长度（MDL）原则，并揭示了允许对数据进行最大压缩的权重分布，从而避免过度拟合而无需进行交叉验证。后者的特性使我们的方法更加快速，因为它只需要对完整数据进行一次拟合。因此，我们拥有一个有原则且高效的推断方案，可以与各种生成模型一起使用，而无需事先知道边的数量。我们还证明了我们的方案在重建人工和实际网络时能够系统地提高准确性。我们强调了使用我们的方法重建涉及$10^{4}$到$10^{5}$种微生物群落的大规模丰度样本之间的相互作用网络，并展示了推断模型如何用于预测系统中干预的结果。

更新时间: 2024-05-07 16:54:52

领域: stat.ML,cs.LG,cs.SI,physics.data-an,q-bio.PE

下载: http://arxiv.org/abs/2405.01015v2

Adapting WavLM for Speech Emotion Recognition

Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.

Updated: 2024-05-07 16:53:42

标题: 将WavLM适应于语音情感识别

摘要: 最近，对于下游任务使用语音自监督模型（SSL）引起了很多关注。尽管大型预训练模型通常优于从头开始训练的较小模型，但关于最佳微调策略的问题仍然普遍存在。在本文中，我们探讨了在MSP播客语料库上为语音情感识别任务微调WavLM Large模型的策略。更具体地，我们进行了一系列实验，重点关注使用话语中的性别和语义信息。然后，我们总结了我们的发现，并描述了我们用于提交至2024年语音情感识别挑战的最终模型。

更新时间: 2024-05-07 16:53:42

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.04485v1

OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration

Integrable partial differential equation (PDE) systems are of great interest in natural science, but are exceedingly rare and difficult to discover. To solve this, we introduce OptPDE, a first-of-its-kind machine learning approach that Optimizes PDEs' coefficients to maximize their number of conserved quantities, $n_{\rm CQ}$, and thus discover new integrable systems. We discover four families of integrable PDEs, one of which was previously known, and three of which have at least one conserved quantity but are new to the literature to the best of our knowledge. We investigate more deeply the properties of one of these novel PDE families, $u_t = (u_x+a^2u_{xxx})^3$. Our paper offers a promising schema of AI-human collaboration for integrable system discovery: machine learning generates interpretable hypotheses for possible integrable systems, which human scientists can verify and analyze, to truly close the discovery loop.

Updated: 2024-05-07 16:53:29

标题: OptPDE：通过人工智能-人类合作发现新型可积系统

摘要: 可积偏微分方程（PDE）系统在自然科学中具有极大的兴趣，但是非常罕见且难以发现。为了解决这个问题，我们引入了OptPDE，这是一种首创的机器学习方法，通过优化PDE的系数来最大化它们的守恒量$n_{\rm CQ}$，从而发现新的可积系统。我们发现了四个可积PDE家族，其中一个之前已知，另外三个至少有一个守恒量但据我们所知是新的。我们更深入地研究了其中一个新的PDE家族的特性，即$u_t = (u_x+a^2u_{xxx})^3$。我们的论文提供了一种有望的人工智能-人类协作模式，用于可积系统的发现：机器学习生成可解释的假设，用于可能的可积系统，人类科学家可以验证和分析这些假设，真正闭合发现循环。

更新时间: 2024-05-07 16:53:29

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.04484v1

Concentration Tail-Bound Analysis of Coevolutionary and Bandit Learning Algorithms

Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as evolutionary and bandit algorithms. Drift refers roughly to the expected progress towards the optimum per iteration. This paper considers the problem of deriving concentration tail-bounds on the runtime/regret of algorithms. It provides a novel drift theorem that gives precise exponential tail-bounds given positive, weak, zero and even negative drift. Previously, such exponential tail bounds were missing in the case of weak, zero, or negative drift. Our drift theorem can be used to prove a strong concentration of the runtime/regret of algorithms in AI. For example, we prove that the regret of the \rwab bandit algorithm is highly concentrated, while previous analyses only considered the expected regret. This means that the algorithm obtains the optimum within a given time frame with high probability, i.e. a form of algorithm reliability. Moreover, our theorem implies that the time needed by the co-evolutionary algorithm RLS-PD to obtain a Nash equilibrium in a \bilinear max-min-benchmark problem is highly concentrated. However, we also prove that the algorithm forgets the Nash equilibrium, and the time until this occurs is highly concentrated. This highlights a weakness in the RLS-PD which should be addressed by future work.

Updated: 2024-05-07 16:45:15

标题: 协同进化和强盗学习算法的集中尾部限制分析

摘要: 运行时分析作为人工智能理论的一个分支，研究算法在找到解决方案之前所需的迭代次数（运行时）如何取决于算法的设计和问题结构。漂移分析是一种用于估计随机算法（如进化算法和老虎机算法）运行时的最先进工具。漂移大致指的是每次迭代朝着最优解的预期进展。本文考虑了推导算法运行时/遗憾的集中尾部界限的问题。它提供了一个新颖的漂移定理，可以在正漂移、弱漂移、零漂移甚至负漂移的情况下给出精确的指数尾部界限。以前，在弱漂移、零漂移或负漂移的情况下缺少这种指数尾部界限。我们的漂移定理可以用于证明人工智能算法的运行时/遗憾高度集中。例如，我们证明了\rwab老虎机算法的遗憾高度集中，而以前的分析只考虑了预期遗憾。这意味着该算法在给定的时间范围内以很高的概率获得最优解，即一种算法可靠性形式。此外，我们的定理暗示了共同演化算法RLS-PD在\bilinear最大-最小基准问题中获得纳什均衡所需的时间是高度集中的。然而，我们也证明了该算法会忘记纳什均衡，而此情况发生的时间是高度集中的。这突显了RLS-PD的一个弱点，应该由未来的工作加以解决。

更新时间: 2024-05-07 16:45:15

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2405.04480v1

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.

Updated: 2024-05-07 16:32:18

标题: 级联注视：全局环境提取中的效率在图像恢复中的应用

摘要: 传统上，图像恢复任务依赖于卷积神经网络。然而，由于卷积算子的局部性质，它们很难捕捉全局信息。Transformer中的注意力机制有望解决这个问题，但这将带来大量的计算开销。最近许多研究集中在通过Transformer变体解决平衡性能和计算成本的挑战上。本文介绍了CascadedGaze Network（CGNet），这是一种采用全局上下文提取器（GCE）的编码器-解码器架构，一种新颖且高效的方法，用于捕捉图像恢复的全局信息。GCE模块利用卷积层间的小卷积核来学习全局依赖性，而无需自注意力。大量实验结果表明，我们的计算效率高的方法在合成图像去噪和单图像去模糊任务上表现出色，并在真实图像去噪任务上进一步推动了性能边界。

更新时间: 2024-05-07 16:32:18

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.15235v2

Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios

Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses. However, such approaches neither accurately represent real-world conditions nor adequately address the issue of noisy data availability. For instance, a blurry image cannot be simply replaced with zero vectors, and still retain information. To tackle this issue and develop a more precise MER system, we introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data. This approach includes two pivotal components: firstly, a noise scheduler that adjusts the type and level of noise in the data to emulate various realistic incomplete situations. Secondly, a Variational AutoEncoder (VAE)-based module is employed to reconstruct these robust multimodal joint representations from the noisy inputs. Notably, the introduction of the noise scheduler enables the exploration of an entirely new type of incomplete data condition, which is impossible with existing methods. Extensive experimental evaluations on the benchmark datasets IEMOCAP and CMU-MOSEI demonstrate the effectiveness of the noise scheduler and the excellent performance of our proposed model.

Updated: 2024-05-07 16:30:05

标题: 学习噪声鲁棒的多模态情绪识别联合表示在不完整数据场景下

摘要: 在实际场景中，多模态情感识别(MER)受到不同模态之间存在缺失或不完整数据的显著挑战。为了克服这些挑战，研究人员已经开始在训练阶段模拟不完整条件，以提高系统的整体鲁棒性。传统方法通常涉及丢弃数据或用零向量替换数据段来近似这些不完整性。然而，这种方法既不能准确地代表现实世界的条件，也不能充分解决嘈杂数据可用性的问题。例如，一个模糊的图像不能简单地用零向量替换，并且仍然保留信息。为了解决这个问题并开发更精确的MER系统，我们引入了一种新颖的抗噪声MER模型，能够有效地从嘈杂数据中学习稳健的多模态联合表示。这种方法包括两个关键组成部分：首先，一个噪声调度器调整数据中的噪声类型和级别，以模拟各种真实的不完整情况。其次，采用基于变分自动编码器(VAE)的模块从嘈杂输入中重构这些稳健的多模态联合表示。值得注意的是，噪声调度器的引入使得探索一种全新类型的不完整数据条件成为可能，这是现有方法无法实现的。对基准数据集IEMOCAP和CMU-MOSEI进行的广泛实验评估显示了噪声调度器的有效性以及我们提出的模型的出色性能。

更新时间: 2024-05-07 16:30:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.16114v2

Large-Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users

In this work we tackle privacy concerns in biometric verification systems that typically require server-side processing of sensitive data (e.g., fingerprints and Iris Codes). Concretely, we design a solution that allows us to query whether a given Iris Code is similar to one contained in a given database, while all queries and datasets are being protected using secure multiparty computation (MPC). Addressing the substantial performance demands of operational systems like World ID and aid distributions by the Red Cross, we propose new protocols to improve performance by more than three orders of magnitude compared to the recent state-of-the-art system Janus (S&P 24). Our final protocol can achieve a throughput of over a million Iris Code comparisons per second on a single CPU core, while protecting the privacy of both the query and database Iris Codes. We additionally investigate GPU acceleration for some building blocks of our protocol, which results in further speedups of over 38x compared to the respective multi-threaded CPU implementation.

Updated: 2024-05-07 16:29:11

标题: 大规模MPC：将私人虹膜代码唯一性检查扩展到数百万用户

摘要: 在这项工作中，我们解决了生物特征验证系统中的隐私问题，这些系统通常需要对敏感数据（如指纹和虹膜码）进行服务器端处理。具体而言，我们设计了一种解决方案，允许我们查询给定的虹膜码是否与给定数据库中的某一个类似，同时所有查询和数据集都受到安全多方计算（MPC）的保护。为了满足世界身份证和红十字会救援分发等运营系统的重大性能需求，我们提出了新的协议，相比最新的 Janus 系统（S＆P 24），可以提高性能超过三个数量级。我们的最终协议可以在单个CPU核心上实现超过一百万次虹膜码比对的吞吐量，同时保护查询和数据库虹膜码的隐私。我们还研究了GPU加速对我们协议的某些构建块，结果是相比于相应的多线程CPU实现，进一步加快了速度超过38倍。

更新时间: 2024-05-07 16:29:11

领域: cs.CR

下载: http://arxiv.org/abs/2405.04463v1

Seeing Is Not Always Believing: Invisible Collision Attack and Defence on Pre-Trained Models

Large-scale pre-trained models (PTMs) such as BERT and GPT have achieved great success in diverse fields. The typical paradigm is to pre-train a big deep learning model on large-scale data sets, and then fine-tune the model on small task-specific data sets for downstream tasks. Although PTMs have rapidly progressed with wide real-world applications, they also pose significant risks of potential attacks. Existing backdoor attacks or data poisoning methods often build up the assumption that the attacker invades the computers of victims or accesses the target data, which is challenging in real-world scenarios. In this paper, we propose a novel framework for an invisible attack on PTMs with enhanced MD5 collision. The key idea is to generate two equal-size models with the same MD5 checksum by leveraging the MD5 chosen-prefix collision. Afterwards, the two ``same" models will be deployed on public websites to induce victims to download the poisoned model. Unlike conventional attacks on deep learning models, this new attack is flexible, covert, and model-independent. Additionally, we propose a simple defensive strategy for recognizing the MD5 chosen-prefix collision and provide a theoretical justification for its feasibility. We extensively validate the effectiveness and stealthiness of our proposed attack and defensive method on different models and data sets.

Updated: 2024-05-07 16:27:05

标题: 看见并不总是相信：预训练模型上的隐形碰撞攻击与防御

摘要: 大规模预训练模型（PTMs）如BERT和GPT在多个领域取得了巨大成功。典型的范式是在大规模数据集上预训练一个大型深度学习模型，然后在小的任务特定数据集上微调模型以进行下游任务。尽管PTMs在广泛的实际应用中迅速取得进展，但它们也带来了潜在攻击的重大风险。现有的后门攻击或数据毒化方法常常基于攻击者入侵受害者的计算机或访问目标数据的假设，在现实世界的场景中具有挑战性。在本文中，我们提出了一个新颖的框架，用于对PTMs进行隐形攻击，并加强了MD5碰撞。关键思想是利用MD5选择前缀碰撞生成两个具有相同MD5校验和的等大小模型。随后，这两个“相同”的模型将部署在公共网站上，诱使受害者下载受污染的模型。与传统的深度学习模型攻击不同，这种新型攻击是灵活的、隐秘的和与模型无关的。此外，我们提出了一种识别MD5选择前缀碰撞的简单防御策略，并为其可行性提供了理论上的证明。我们在不同的模型和数据集上广泛验证了我们提出的攻击和防御方法的有效性和隐蔽性。

更新时间: 2024-05-07 16:27:05

领域: cs.CR,cs.AI,I.2.0

下载: http://arxiv.org/abs/2309.13579v2

Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF). We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within (-1, 1) in training, and fix the scale as the running max value in FHE deployment. The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42x ~ 13.64x accuracy improvement and 6.79x ~ 14.9x speedup than prior works. Further, SmartPAF enables a 14-degree PAF (f1^2 g_1^2) to achieve 7.81x speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at https://github.com/EfficientFHE/SmartPAF.

Updated: 2024-05-07 16:25:53

标题: 在同态加密中用于快速私密推理的非多项式算子的准确低次多项式逼近

摘要: 随着机器学习（ML）渗透到诸如医疗保健、面部识别和区块链等领域，保护敏感数据的需求日益加剧。全同态加密（FHE）允许对加密数据进行推断，保护了数据和ML模型的隐私。然而，它使非安全推断速度减慢了至多五个数量级，根本原因是用高次多项式近似函数（PAF）替换非多项式运算符（ReLU和MaxPooling）。我们提出了SmartPAF，一个框架，用低次度PAF替换非多项式运算符，然后通过四种技术恢复PAF近似模型的准确性：（1）系数调整（CT）-根据训练前的输入分布调整PAF系数，（2）渐进逼近（PA）-逐步替换一个非多项式运算符，然后进行微调，（3）交替训练（AT）-在解耦方式下交替训练PAF和其他线性运算符，以及（4）动态缩放（DS）/静态缩放（SS）-在训练中动态缩放PAF输入值至（-1，1），在FHE部署中将其固定为运行最大值。CT、PA、AT和DS/SS的协同效应使SmartPAF能够提高在多个数据集下用各种低度PAF近似的各种模型的准确性。对于在ImageNet-1k下的ResNet-18，SmartPAF在延迟-准确性权衡空间中发现的帕累托前沿实现了1.42倍〜13.64倍的准确性改进和6.79倍〜14.9倍的速度提升，比以前的工作更快。此外，SmartPAF使得一个14度PAF（f1^2 g_1^2）相比于通过相同69.4%替换后准确性获得的27度PAF的极小逼近，实现了7.81倍的加速。我们的代码可在https://github.com/EfficientFHE/SmartPAF上找到。

更新时间: 2024-05-07 16:25:53

领域: cs.CR

下载: http://arxiv.org/abs/2404.03216v3

A Significantly Better Class of Activation Functions Than ReLU Like Activation Functions

This paper introduces a significantly better class of activation functions than the almost universally used ReLU like and Sigmoidal class of activation functions. Two new activation functions referred to as the Cone and Parabolic-Cone that differ drastically from popular activation functions and significantly outperform these on the CIFAR-10 and Imagenette benchmmarks are proposed. The cone activation functions are positive only on a finite interval and are strictly negative except at the end-points of the interval, where they become zero. Thus the set of inputs that produce a positive output for a neuron with cone activation functions is a hyperstrip and not a half-space as is the usual case. Since a hyper strip is the region between two parallel hyper-planes, it allows neurons to more finely divide the input feature space into positive and negative classes than with infinitely wide half-spaces. In particular the XOR function can be learn by a single neuron with cone-like activation functions. Both the cone and parabolic-cone activation functions are shown to achieve higher accuracies with significantly fewer neurons on benchmarks. The results presented in this paper indicate that many nonlinear real-world datasets may be separated with fewer hyperstrips than half-spaces. The Cone and Parabolic-Cone activation functions have larger derivatives than ReLU and are shown to significantly speedup training.

Updated: 2024-05-07 16:24:03

标题: 比ReLU激活函数更好的一类激活函数

摘要: 这篇论文介绍了一种比几乎普遍使用的ReLU和Sigmoidal类激活函数更好的激活函数类别。提出了两个新的激活函数，分别称为锥形和抛物线锥形，与流行的激活函数有明显不同，并且在CIFAR-10和Imagenette基准测试中明显优于这些函数。锥形激活函数仅在有限区间上为正，并且在区间的端点以外严格为负，在端点处为零。因此，使用锥形激活函数的神经元产生正输出的输入集是一个超带而不是通常情况下的半空间。由于超带是两个平行超平面之间的区域，它允许神经元更精细地将输入特征空间划分为正负类别，而不是使用无限宽的半空间。特别地，XOR函数可以通过具有类似锥形激活函数的单个神经元学习。锥形和抛物线锥形激活函数被证明在基准测试中以更少的神经元实现更高的准确性。本文提出的结果表明，许多非线性现实世界数据集可以用更少的超带分割而不是半空间。锥形和抛物线锥形激活函数具有比ReLU更大的导数，并且显示出明显的训练加速。

更新时间: 2024-05-07 16:24:03

领域: cs.AI,cs.CV,cs.LG,cs.NE,68T07

下载: http://arxiv.org/abs/2405.04459v1

How Fragile is Relation Extraction under Entity Replacements?

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at https://github.com/wangywUST/RobustRE.

Updated: 2024-05-07 16:22:07

标题: 实体替换对关系抽取的影响有多大？

摘要: 关系抽取（RE）旨在从文本上下文中提取实体名称之间的关系。原则上，文本上下文确定了地面真相关系，RE模型应该能够正确识别文本上下文反映的关系。然而，现有研究发现，RE模型记忆实体名称模式以进行RE预测，而忽略了文本上下文。这激励我们提出一个问题：“RE模型对实体替换是否具有鲁棒性？”在这项工作中，我们在TACRED中对RE实例进行随机和类型约束的实体替换，并评估最先进的RE模型在实体替换下的表现。我们观察到最先进的RE模型在实体替换下的F1分数下降了30％-50％。这些结果表明，我们需要更多努力开发对实体替换具有鲁棒性的有效RE模型。我们在https://github.com/wangywUST/RobustRE 上发布了源代码。

更新时间: 2024-05-07 16:22:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.13551v3

Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank

Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy

Updated: 2024-05-07 16:21:28

标题: 使用预训练和交互建模在英国生物库中进行特定祖先疾病预测

摘要: 最近的全基因组关联研究（GWAS）揭示了复杂性状的遗传基础，但显示出非欧洲血统个体的代表性不足，突显了遗传研究中的重要差距。在这里，我们评估了我们是否可以利用多组学数据改善跨多种血统的疾病预测。我们评估了Group-LASSO INTERaction-NET（glinternet）和预先训练的lasso模型在英国生物银行中多种血统疾病预测中的性能。模型是在白人英国人和其他血统数据上训练的，并在超过96,000名个体的队列中验证了8种疾病。在96个训练的模型中，我们报告了16个在ROC-AUC得分方面具有统计显着增量预测性能（p值<0.05）的模型，发现了糖尿病、关节炎、胆结石、膀胱炎、哮喘和骨关节炎。对于优于基线的交互和预先训练的模型，PRS得分是预测的主要驱动因素。我们的研究结果表明，交互项和预训练都可以提高预测准确性，但只适用于有限一组疾病和适度的准确性改善。

更新时间: 2024-05-07 16:21:28

领域: cs.LG,q-bio.QM,stat.AP,stat.CO

下载: http://arxiv.org/abs/2404.17626v2

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.

Updated: 2024-05-07 16:16:00

标题: 朝向通过增量蒸馏实现持续的知识图嵌入

摘要: 传统的知识图嵌入（KGE）方法通常需要保留整个知识图（KG），当新知识出现时需要付出显著的训练成本。为了解决这个问题，提出了持续知识图嵌入（CKGE）任务，通过高效地学习新知识的同时保留良好的旧知识来训练KGE模型。然而，现有的CKGE方法严重忽略了KG中的显式图结构，这对于上述目标至关重要。一方面，现有方法通常以随机顺序学习新三元组，破坏了新KG的内部结构。另一方面，旧三元组被同等优先保留，未能有效减轻灾难性遗忘。在本文中，我们提出了一种基于增量蒸馏（IncDE）的CKGE竞争方法，该方法充分利用了KG中的显式图结构。首先，为了优化学习顺序，我们引入了一种分层策略，为逐层学习的新三元组进行排序。通过同时使用层间和层内的顺序，基于图结构特征将新三元组分组到层中。其次，为了有效保留旧知识，我们设计了一种新颖的增量蒸馏机制，促进了实体表示从前一层传递到下一层，促进了旧知识的保留。最后，我们采用了两阶段训练范式，避免由于训练不足的新知识影响而过度破坏旧知识。实验结果表明，IncDE优于最先进的基线方法。值得注意的是，增量蒸馏机制在平均倒数秩（MRR）得分中贡献了0.2%-6.5%的改进。

更新时间: 2024-05-07 16:16:00

领域: cs.AI

下载: http://arxiv.org/abs/2405.04453v1

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficiently in terms of needed samples or steps. To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers. We compare DWAtt to a basic concatenation-based layer fusion method (Concat), and compare both to a deeper model baseline -- all kept within a similar parameter budget. Our findings show that DWAtt and Concat are more step- and sample-efficient than the baseline, especially in the few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03 NER, layer fusion shows 3.68--9.73% F1 gain at different few-shot sizes. The layer fusion models presented significantly outperform the baseline in various training scenarios with different data sizes, architectures, and training constraints.

Updated: 2024-05-07 16:11:04

标题: 深度注意力（DWAtt）：一种数据高效分类的层融合方法

摘要: 预先训练的基于大规模文本数据的语言模型已经被证明能够同时编码不同类型的知识。传统上，在适应新任务或数据时只使用最后一层的特征。我们提出，当使用或微调深度预训练模型时，可能与下游任务相关的中间层特征被埋得太深，无法高效地利用所需的样本或步骤。为了验证这一点，我们提出了一种新的层融合方法：深度注意力（DWAtt），以帮助从非最终层重新提取信号。我们将DWAtt与基于串联的基本层融合方法（Concat）进行比较，并将两者与更深的模型基线进行比较--所有这些都在类似的参数预算内。我们的研究结果表明，DWAtt和Concat在步骤和样本效率上优于基线，特别是在少样本设置下。在较大的数据规模上，DWAtt优于Concat。在CoNLL-03 NER任务中，层融合在不同的少样本规模上显示出3.68-9.73%的F1增益。层融合模型在不同数据规模、架构和训练约束下明显优于基线。

更新时间: 2024-05-07 16:11:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2209.15168v2

Leveraging Intelligent Recommender system as a first step resilience measure -- A data-driven supply chain disruption response framework

Interests in the value of digital technologies for its potential uses to increase supply chain resilience (SCRes) are increasing in light to the industry 4.0 and the global pandemic. Utilization of Recommender systems (RS) as a supply chain (SC) resilience measure is neglected although RS is a capable tool to enhance SC resilience from a reactive aspect. To address this problem, this research proposed a novel data-driven supply chain disruption response framework based on the intelligent recommender system techniques and validated the conceptual model through a practical use case. Results show that our framework can be implemented as an effective SC disruption mitigation measure in the very first response phrase and help SC participants get better reaction performance after the SC disruption.

Updated: 2024-05-07 16:09:06

标题: 利用智能推荐系统作为第一步弹性措施——一种基于数据驱动的供应链中断响应框架

摘要: 对数字技术的价值以及其在增加供应链弹性方面的潜在用途的兴趣正在增加，这在工业4.0和全球大流行的背景下尤为突出。虽然推荐系统（RS）作为一种供应链（SC）弹性措施的利用被忽视，但RS是一种能够从反应性方面增强SC弹性的有效工具。为了解决这一问题，本研究提出了一种基于智能推荐系统技术的新型数据驱动供应链干扰响应框架，并通过一个实际用例验证了概念模型。结果表明，我们的框架可以作为一种有效的供应链干扰缓解措施在第一次响应阶段实施，并帮助供应链参与者在供应链干扰后获得更好的反应表现。

更新时间: 2024-05-07 16:09:06

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2404.00306v2

POV Learning: Individual Alignment of Multimodal Models using Human Perception

Aligning machine learning systems with human expectations is mostly attempted by training with manually vetted human behavioral samples, typically explicit feedback. This is done on a population level since the context that is capturing the subjective Point-Of-View (POV) of a concrete person in a specific situational context is not retained in the data. However, we argue that alignment on an individual level can boost the subjective predictive performance for the individual user interacting with the system considerably. Since perception differs for each person, the same situation is observed differently. Consequently, the basis for decision making and the subsequent reasoning processes and observable reactions differ. We hypothesize that individual perception patterns can be used for improving the alignment on an individual level. We test this, by integrating perception information into machine learning systems and measuring their predictive performance wrt.~individual subjective assessments. For our empirical study, we collect a novel data set of multimodal stimuli and corresponding eye tracking sequences for the novel task of Perception-Guided Crossmodal Entailment and tackle it with our Perception-Guided Multimodal Transformer. Our findings suggest that exploiting individual perception signals for the machine learning of subjective human assessments provides a valuable cue for individual alignment. It does not only improve the overall predictive performance from the point-of-view of the individual user but might also contribute to steering AI systems towards every person's individual expectations and values.

Updated: 2024-05-07 16:07:29

标题: POV 学习：利用人类感知进行多模态模型的个体对齐

摘要: 将机器学习系统与人类期望对齐通常是通过训练手动审核的人类行为样本，通常是明确的反馈来尝试的。这是在人口水平上进行的，因为捕捉具体人在特定情境中的主观视角的上下文没有保留在数据中。然而，我们认为在个体水平上的对齐可以显著提高与系统交互的个体用户的主观预测性能。由于每个人的感知不同，同一情况会被以不同方式观察。因此，决策的基础和随后的推理过程和可观察的反应也不同。我们假设个人感知模式可以用于改善个体水平上的对齐。我们通过将感知信息整合到机器学习系统中，并测量其对个体主观评估的预测性能进行测试。对于我们的实证研究，我们收集了一个新的多模态刺激数据集和相应的眼动追踪序列，用于处理我们的感知引导多模态变压器的新任务——感知引导的跨模态蕴涵。我们的研究结果表明，利用个人感知信号来进行主观人类评估的机器学习为个体对齐提供了宝贵的线索。它不仅从个体用户的角度改善了整体预测性能，还可能有助于引导AI系统朝着每个人的个人期望和价值观发展。

更新时间: 2024-05-07 16:07:29

领域: cs.AI

下载: http://arxiv.org/abs/2405.04443v1

AugmenTory: A Fast and Flexible Polygon Augmentation Library

Data augmentation is a key technique for addressing the challenge of limited datasets, which have become a major component in the training procedures of image processing. Techniques such as geometric transformations and color space adjustments have been thoroughly tested for their ability to artificially expand training datasets and generate semi-realistic data for training purposes. Data augmentation is the most important key to addressing the challenge of limited datasets, which have become a major component of image processing training procedures. Data augmentation techniques, such as geometric transformations and color space adjustments, are thoroughly tested for their ability to artificially expand training datasets and generate semi-realistic data for training purposes. Polygons play a crucial role in instance segmentation and have seen a surge in use across advanced models, such as YOLOv8. Despite their growing popularity, the lack of specialized libraries hampers the polygon-augmentation process. This paper introduces a novel solution to this challenge, embodied in the newly developed AugmenTory library. Notably, AugmenTory offers reduced computational demands in both time and space compared to existing methods. Additionally, the library includes a postprocessing thresholding feature. The AugmenTory package is publicly available on GitHub, where interested users can access the source code: https://github.com/Smartory/AugmenTory

Updated: 2024-05-07 16:07:05

标题: AugmenTory：一个快速灵活的多边形增强库

摘要: 数据增强是解决有限数据集挑战的关键技术，已经成为图像处理训练过程的主要组成部分。诸如几何变换和颜色空间调整等技术已经被充分测试，以人为扩展训练数据集并生成半真实数据用于训练目的。数据增强是解决有限数据集挑战的最重要关键，已经成为图像处理训练过程的主要组成部分。数据增强技术，如几何变换和颜色空间调整，已经充分测试，以人为扩展训练数据集并生成半真实数据用于训练目的。多边形在实例分割中发挥关键作用，并在高级模型（如YOLOv8）中的使用激增。尽管它们日益流行，但缺乏专门库阻碍了多边形增强过程。本文介绍了一种新颖的解决方案，体现在新开发的AugmenTory库中。值得注意的是，与现有方法相比，AugmenTory在时间和空间上减少了计算需求。此外，该库包括后处理阈值功能。AugmenTory包在GitHub上公开提供，感兴趣的用户可以访问源代码：https://github.com/Smartory/AugmenTory

更新时间: 2024-05-07 16:07:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04442v1

Label-Agnostic Forgetting: A Supervision-Free Unlearning in Deep Models

Machine unlearning aims to remove information derived from forgotten data while preserving that of the remaining dataset in a well-trained model. With the increasing emphasis on data privacy, several approaches to machine unlearning have emerged. However, these methods typically rely on complete supervision throughout the unlearning process. Unfortunately, obtaining such supervision, whether for the forgetting or remaining data, can be impractical due to the substantial cost associated with annotating real-world datasets. This challenge prompts us to propose a supervision-free unlearning approach that operates without the need for labels during the unlearning process. Specifically, we introduce a variational approach to approximate the distribution of representations for the remaining data. Leveraging this approximation, we adapt the original model to eliminate information from the forgotten data at the representation level. To further address the issue of lacking supervision information, which hinders alignment with ground truth, we introduce a contrastive loss to facilitate the matching of representations between the remaining data and those of the original model, thus preserving predictive performance. Experimental results across various unlearning tasks demonstrate the effectiveness of our proposed method, Label-Agnostic Forgetting (LAF) without using any labels, which achieves comparable performance to state-of-the-art methods that rely on full supervision information. Furthermore, our approach excels in semi-supervised scenarios, leveraging limited supervision information to outperform fully supervised baselines. This work not only showcases the viability of supervision-free unlearning in deep models but also opens up a new possibility for future research in unlearning at the representation level.

Updated: 2024-05-07 16:06:50

标题: 标签无关遗忘：深度模型中的无监督遗忘

摘要: 机器遗忘旨在消除从被遗忘数据中产生的信息，同时保留在一个经过良好训练的模型中的数据。随着对数据隐私的重视日益增加，出现了几种不同的机器遗忘方法。然而，这些方法通常依赖于在整个遗忘过程中的完全监督。不幸的是，由于对真实世界数据集进行标注涉及的巨大成本，获取这种监督（无论是针对被遗忘数据还是保留数据）可能并不切实际。这一挑战促使我们提出了一种无监督遗忘方法，在遗忘过程中不需要标签。具体而言，我们引入了一种变分方法来近似剩余数据的表示分布。利用这种近似，我们调整原始模型以在表示级别消除来自被遗忘数据的信息。为了进一步解决缺乏监督信息的问题，这种缺乏信息会阻碍与基本事实的对齐，我们引入了对比损失来促进剩余数据的表示与原始模型之间的匹配，从而保留预测性能。跨越各种遗忘任务的实验结果展示了我们提出的无标签遗忘方法Label-Agnostic Forgetting (LAF) 的有效性，该方法实现了与依赖完全监督信息的最新方法相媲美的性能。此外，我们的方法在半监督场景中表现出色，利用有限的监督信息胜过完全监督的基线。这项工作不仅展示了在深度模型中无监督遗忘的可行性，也为未来在表示级别进行遗忘的研究开辟了新的可能性。

更新时间: 2024-05-07 16:06:50

领域: cs.LG

下载: http://arxiv.org/abs/2404.00506v2

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Efficient use of GPU memory is essential for high throughput LLM inference. Prior systems reserved memory for the KV-cache ahead-of-time, resulting in wasted capacity due to internal fragmentation. Inspired by OS-based virtual memory systems, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation, enabling high-throughput LLM serving with larger batch sizes. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. This change requires attention kernels to be rewritten to support paging, and serving framework to implement a memory manager. Thus, the PagedAttention model leads to software complexity, portability issues, redundancy and inefficiency. In this paper, we propose vAttention for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention retains KV-cache in contiguous virtual memory and leverages low-level system support for demand paging, that already exists, to enable on-demand physical memory allocation. Thus, vAttention unburdens the attention kernel developer from having to explicitly support paging and avoids re-implementation of memory management in the serving framework. We show that vAttention enables seamless dynamic memory management for unchanged implementations of various attention kernels. vAttention also generates tokens up to 1.97x faster than vLLM, while processing input prompts up to 3.92x and 1.45x faster than the PagedAttention variants of FlashAttention and FlashInfer.

Updated: 2024-05-07 16:00:32

标题: 注意：为无页LMM提供动态内存管理

摘要: GPU内存的高效利用对于高吞吐量的LLM推断至关重要。之前的系统预留了KV-cache的内存，导致由于内部碎片而造成容量浪费。受基于操作系统的虚拟内存系统的启发，vLLM提出了PagedAttention以实现对KV-cache的动态内存分配。这种方法消除了碎片，使得可以使用更大的批处理大小进行高吞吐量的LLM服务。然而，为了能够动态分配物理内存，PagedAttention将KV-cache的布局从连续的虚拟内存改变为非连续的虚拟内存。这种改变需要重新编写注意力核以支持分页，并在服务框架中实现内存管理器。因此，PagedAttention模型导致了软件复杂性、可移植性问题、冗余和低效性。在本文中，我们提出了vAttention用于动态KV-cache内存管理。与PagedAttention相比，vAttention将KV-cache保留在连续的虚拟内存中，并利用低级系统支持的按需分页功能，从而实现按需物理内存分配。因此，vAttention减轻了注意力核开发人员不必显式支持分页的负担，避免了在服务框架中重新实现内存管理。我们展示vAttention使各种注意力核的实现保持不变的情况下实现了无缝的动态内存管理。vAttention还生成的tokens比vLLM快了高达1.97倍，同时处理输入提示比PagedAttention的FlashAttention和FlashInfer变体快了高达3.92倍和1.45倍。

更新时间: 2024-05-07 16:00:32

领域: cs.LG,cs.OS

下载: http://arxiv.org/abs/2405.04437v1

Transport meets Variational Inference: Controlled Monte Carlo Diffusions

Connecting optimal transport and variational inference, we present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of the \emph{Controlled Monte Carlo Diffusion} sampler (CMCD) for Bayesian computation, a score-based annealing technique that crucially adapts both forward and backward dynamics in a diffusion model. On the way, we clarify the relationship between the EM-algorithm and iterative proportional fitting (IPF) for Schr{\"o}dinger bridges, deriving as well a regularised objective that bypasses the iterative bottleneck of standard IPF-updates. Finally, we show that CMCD has a strong foundation in the Jarzinsky and Crooks identities from statistical physics, and that it convincingly outperforms competing approaches across a wide array of experiments.

Updated: 2024-05-07 16:00:21

标题: 交通遇上变分推断：受控蒙特卡罗扩散

摘要: 连接最优输运和变分推断，我们提出了一个围绕路径空间上的差异的采样和生成建模的原则性和系统性框架。我们的工作最终发展出了用于贝叶斯计算的\emph{受控蒙特卡洛扩散}采样器（CMCD），这是一种基于得分的退火技术，关键地调整了扩散模型中的前向和后向动力学。在这个过程中，我们澄清了EM算法和迭代比例拟合（IPF）对于Schr{\"o}dinger桥的关系，推导出了一个规范化的目标函数，绕过了标准IPF更新的迭代瓶颈。最后，我们展示了CMCD在统计物理学中的Jarzinsky和Crooks恒等式中有坚实的基础，并且在各种实验中令人信服地优于竞争方法。

更新时间: 2024-05-07 16:00:21

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2307.01050v7

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. The model checkpoints are available at "https://github.com/deepseek-ai/DeepSeek-V2".

Updated: 2024-05-07 15:56:43

标题: DeepSeek-V2: 一种强大、经济高效的专家混合语言模型

摘要: 我们介绍了DeepSeek-V2，这是一个强大的混合专家（MoE）语言模型，具有经济高效的训练和推理特点。它包含236B总参数，其中每个标记激活了21B，并支持128K标记的上下文长度。DeepSeek-V2采用创新架构，包括多头潜在注意力（MLA）和DeepSeekMoE。MLA通过将关键-值（KV）缓存显著压缩为潜在向量，确保了高效的推理，而DeepSeekMoE通过稀疏计算实现了以经济的成本训练强大模型。与DeepSeek 67B相比，DeepSeek-V2实现了显著更强的性能，并同时节省了42.5%的训练成本，将KV缓存减少了93.3%，并将最大生成吞吐量提升了5.76倍。我们在由81T标记组成的高质量和多源语料库上对DeepSeek-V2进行预训练，并进一步进行监督微调（SFT）和强化学习（RL）以充分发挥其潜力。评估结果显示，即使只有21B激活参数，DeepSeek-V2及其聊天版本仍在开源模型中表现出顶级性能。模型检查点可在"https://github.com/deepseek-ai/DeepSeek-V2"上找到。

更新时间: 2024-05-07 15:56:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04434v1

A Critical Survey on Fairness Benefits of Explainable AI

In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 scientific articles on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. Importantly, we notice that claims are often (i) vague and simplistic, (ii) lacking normative grounding, or (iii) poorly aligned with the actual capabilities of XAI. We suggest to conceive XAI not as an ethical panacea but as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness. Moreover, when making a claim about XAI and fairness, we emphasize the need to be more specific about what kind of XAI method is used, which fairness desideratum it refers to, how exactly it enables fairness, and who is the stakeholder that benefits from XAI.

Updated: 2024-05-07 15:50:27

标题: 一项关于可解释人工智能公平性优势的关键调查

摘要: 在这份批判性调查中，我们分析了关于可解释人工智能（XAI）与公平之间关系的典型主张，以解开这两个概念之间多维关系。基于系统性文献回顾和随后的定性内容分析，我们从175篇科学文章中确定了关于XAI所谓公平好处的七个原型主张。我们提出了关于这些主张的重要警示，并为未来围绕XAI对特定公平需求的潜力和限制进行讨论提供了一个切入点。重要的是，我们注意到这些主张往往（i）模糊和简单化，（ii）缺乏规范基础，或（iii）与XAI的实际能力不相符。我们建议将XAI视为伦理的万灵药，而是作为解决算法公平多维社会技术挑战的众多工具之一。此外，在提出关于XAI和公平的主张时，我们强调有必要更具体地说明使用了哪种XAI方法，它指向哪种公平需求，它如何准确地促进公平，以及谁是受益于XAI的利益相关者。

更新时间: 2024-05-07 15:50:27

领域: cs.AI

下载: http://arxiv.org/abs/2310.13007v6

Fully Automated Selfish Mining Analysis in Efficient Proof Systems Blockchains

We study selfish mining attacks in longest-chain blockchains like Bitcoin, but where the proof of work is replaced with efficient proof systems -- like proofs of stake or proofs of space -- and consider the problem of computing an optimal selfish mining attack which maximizes expected relative revenue of the adversary, thus minimizing the chain quality. To this end, we propose a novel selfish mining attack that aims to maximize this objective and formally model the attack as a Markov decision process (MDP). We then present a formal analysis procedure which computes an $\epsilon$-tight lower bound on the optimal expected relative revenue in the MDP and a strategy that achieves this $\epsilon$-tight lower bound, where $\epsilon>0$ may be any specified precision. Our analysis is fully automated and provides formal guarantees on the correctness. We evaluate our selfish mining attack and observe that it achieves superior expected relative revenue compared to two considered baselines. In concurrent work [Sarenche FC'24] does an automated analysis on selfish mining in predictable longest-chain blockchains based on efficient proof systems. Predictable means the randomness for the challenges is fixed for many blocks (as used e.g., in Ouroboros), while we consider unpredictable (Bitcoin-like) chains where the challenge is derived from the previous block.

Updated: 2024-05-07 15:44:39

标题: 在高效的证明系统区块链中完全自动化的自私挖矿分析

摘要: 我们研究了在像比特币这样的最长链区块链中的自私挖矿攻击，但其中工作证明被高效的证明系统替代，比如股权证明或空间证明，并考虑计算一个最优的自私挖矿攻击，以最大化对手的预期相对收入，从而最大程度地降低链的质量。为此，我们提出了一种旨在最大化这一目标的新型自私挖矿攻击，并将攻击形式化建模为马尔可夫决策过程（MDP）。然后我们提出了一个形式化分析过程，该过程计算了MDP中最优预期相对收入的$\epsilon$-紧密下界，以及实现这一$\epsilon$-紧密下界的策略，其中$\epsilon>0$可以是任意指定的精度。我们的分析是完全自动化的，并对正确性提供了形式保证。我们评估了我们的自私挖矿攻击，并观察到与两种基准相比，其实现了更优异的预期相对收入。在同时进行的工作中[Sarenche FC'24]对基于高效证明系统的可预测最长链区块链中的自私挖矿进行了自动化分析。可预测意味着挑战的随机性对于许多区块是固定的（例如在Ouroboros中使用），而我们考虑的是不可预测的（类似比特币）链，其中挑战是从前一个区块派生的。

更新时间: 2024-05-07 15:44:39

领域: cs.CR

下载: http://arxiv.org/abs/2405.04420v1

NeuroIDBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research

Biometric systems based on brain activity have been proposed as an alternative to passwords or to complement current authentication techniques. By leveraging the unique brainwave patterns of individuals, these systems offer the possibility of creating authentication solutions that are resistant to theft, hands-free, accessible, and potentially even revocable. However, despite the growing stream of research in this area, faster advance is hindered by reproducibility problems. Issues such as the lack of standard reporting schemes for performance results and system configuration, or the absence of common evaluation benchmarks, make comparability and proper assessment of different biometric solutions challenging. Further, barriers are erected to future work when, as so often, source code is not published open access. To bridge this gap, we introduce NeuroIDBench, a flexible open source tool to benchmark brainwave-based authentication models. It incorporates nine diverse datasets, implements a comprehensive set of pre-processing parameters and machine learning algorithms, enables testing under two common adversary models (known vs unknown attacker), and allows researchers to generate full performance reports and visualizations. We use NeuroIDBench to investigate the shallow classifiers and deep learning-based approaches proposed in the literature, and to test robustness across multiple sessions. We observe a 37.6% reduction in Equal Error Rate (EER) for unknown attacker scenarios (typically not tested in the literature), and we highlight the importance of session variability to brainwave authentication. All in all, our results demonstrate the viability and relevance of NeuroIDBench in streamlining fair comparisons of algorithms, thereby furthering the advancement of brainwave-based authentication through robust methodological practices.

Updated: 2024-05-07 15:41:37

标题: NeuroIDBench：一个用于脑波身份验证研究方法标准化的开源基准框架

摘要: 基于脑活动的生物特征识别系统被提议作为密码的替代方案或用来补充当前的身份验证技术。通过利用个体独特的脑波模式，这些系统提供了创建抗窃取、免持续、易访问甚至可撤销的认证解决方案的可能性。然而，尽管在这一领域的研究不断增加，但复现问题阻碍了更快的进展。诸如性能结果和系统配置的标准报告方案的缺乏，或者共同评估基准的缺失，使得不同生物特征解决方案的可比性和适当评估变得具有挑战性。此外，当源代码未公开访问时，将对未来工作设置障碍，这种情况经常发生。为了弥合这一差距，我们引入了NeuroIDBench，这是一个灵活的开源工具，用于评估基于脑波的认证模型。它包含九个不同的数据集，实现了一套全面的预处理参数和机器学习算法，可以在两种常见的对手模型（已知攻击者vs未知攻击者）下进行测试，并允许研究人员生成完整的性能报告和可视化。我们使用NeuroIDBench来研究文献中提出的浅层分类器和基于深度学习的方法，并测试跨多个会话的鲁棒性。我们观察到在未知攻击者情景下（通常在文献中未经测试）等误差率（EER）减少了37.6％，并强调了会话变异对脑波认证的重要性。总的来说，我们的结果展示了NeuroIDBench在促进脑波认证的进一步推进中通过健壮的方法论实践实现公平比较算法的可行性和相关性。

更新时间: 2024-05-07 15:41:37

领域: cs.CR

下载: http://arxiv.org/abs/2402.08656v4

Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles

This work presents Unbundle-Rewrite-Rebundle (URR), a system for detecting privacy-harming portions of bundled JavaScript code, and rewriting that code at runtime to remove the privacy harming behavior without breaking the surrounding code or overall application. URR is a novel solution to the problem of JavaScript bundles, where websites pre-compile multiple code units into a single file, making it impossible for content filters and ad-blockers to differentiate between desired and unwanted resources. Where traditional content filtering tools rely on URLs, URR analyzes the code at the AST level, and replaces harmful AST sub-trees with privacy-and-functionality maintaining alternatives. We present an open-sourced implementation of URR as a Firefox extension, and evaluate it against JavaScript bundles generated by the most popular bundling system (Webpack) deployed on the Tranco 10k. We measure the performance, measured by precision (1.00), recall (0.95), and speed (0.43s per-script) when detecting and rewriting three representative privacy harming libraries often included in JavaScript bundles, and find URR to be an effective approach to a large-and-growing blind spot unaddressed by current privacy tools.

Updated: 2024-05-07 15:38:20

标题: 解开-重写-重装：JavaScript捆绑包中侵犯隐私代码的运行时检测和重写

摘要: 这项工作介绍了Unbundle-Rewrite-Rebundle（URR）系统，用于检测捆绑的JavaScript代码中损害隐私的部分，并在运行时重写该代码以消除损害隐私的行为，同时不破坏周围代码或整体应用程序。URR是解决JavaScript捆绑问题的一种新颖解决方案，其中网站将多个代码单元预编译为单个文件，使内容过滤器和广告拦截器无法区分所需和不需要的资源。传统的内容过滤工具依赖于URL，URR在AST级别分析代码，并用维护隐私和功能性的替代方案替换有害的AST子树。我们将URR的开源实现作为Firefox扩展程序，并对在Tranco 10k上部署的最受欢迎的捆绑系统（Webpack）生成的JavaScript捆绑进行评估。当检测和重写经常包含在JavaScript捆绑中的三个代表性损害隐私库时，我们衡量了性能，精度（1.00）、召回率（0.95）和速度（每个脚本0.43秒），发现URR是一种有效的方法，可以解决当前隐私工具未解决的一个大型且不断增长的盲点。

更新时间: 2024-05-07 15:38:20

领域: cs.CR

下载: http://arxiv.org/abs/2405.00596v2

Super-Exponential Regret for UCT, AlphaGo and Variants

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.

Updated: 2024-05-07 15:35:30

标题: 超指数遗憾对于UCT、AlphaGo及其变体

摘要: 我们改进了Coquelin和Munos（2007）对UCT的下界证明，证明了UCT在$D$-链环境中可能会产生$\exp(\dots\exp(1)\dots)$的遗憾（具有$\Omega(D)$个exp项），并且一种“多项式”UCT变体在相同环境中可能会产生$\exp_2(\exp_2(D - O(\log D)))$的遗憾--原始证明中存在一个对奖励在$[0,1]$范围内的疏漏，我们在本文中进行了修正。我们还将证明调整为适用于AlphaGo的MCTS及其后继（例如，AlphaZero，Leela Zero），以展示$\exp_2(\exp_2(D - O(\log D)))$的遗憾。

更新时间: 2024-05-07 15:35:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04407v1

Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

Uncertainty estimation (UE), as an effective means of quantifying predictive uncertainty, is crucial for safe and reliable decision-making, especially in high-risk scenarios. Existing UE schemes usually assume that there are completely-labeled samples to support fully-supervised learning. In practice, however, many UE tasks often have no sufficiently-labeled data to use, such as the Multiple Instance Learning (MIL) with only weak instance annotations. To bridge this gap, this paper, for the first time, addresses the weakly-supervised issue of Multi-Instance UE (MIUE) and proposes a new baseline scheme, Multi-Instance Residual Evidential Learning (MIREL). Particularly, at the fine-grained instance UE with only weak supervision, we derive a multi-instance residual operator through the Fundamental Theorem of Symmetric Functions. On this operator derivation, we further propose MIREL to jointly model the high-order predictive distribution at bag and instance levels for MIUE. Extensive experiments empirically demonstrate that our MIREL not only could often make existing MIL networks perform better in MIUE, but also could surpass representative UE methods by large margins, especially in instance-level UE tasks.

Updated: 2024-05-07 15:31:58

标题: 弱监督残余证据学习用于多示例不确定性估计

摘要: 不确定性估计（UE）作为量化预测不确定性的有效手段，对于安全可靠的决策至关重要，特别是在高风险场景下。现有的UE方案通常假设有完全标记的样本来支持完全监督学习。然而，在实践中，许多UE任务往往没有足够标记的数据可供使用，例如只有弱实例注释的多实例学习（MIL）。为了弥补这一差距，本文首次解决了多实例不完全监督问题（MIUE），并提出了一个新的基线方案，即多实例残余证据学习（MIREL）。特别是，在只有弱监督的细粒度实例UE中，我们通过对称函数基本定理推导出了一个多实例残余算子。在这个算子推导的基础上，我们进一步提出了MIREL，以联合建模袋和实例级别的高阶预测分布，用于MIUE。大量实验证明，我们的MIREL不仅通常能够使现有的MIL网络在MIUE中表现更好，而且还能在实例级别UE任务中明显超越代表性的UE方法。

更新时间: 2024-05-07 15:31:58

领域: cs.LG

下载: http://arxiv.org/abs/2405.04405v1

Vision Mamba: A Comprehensive Survey and Taxonomy

State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP) and video understanding. By mapping sequence data to state space, long-term dependencies in the data can be better captured. In particular, modern SSMs have shown strong representational capabilities in NLP, especially in long sequence modeling, while maintaining linear time complexity. Notably, based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Given its impressive efficiency and strong long-range dependency modeling capability, Mamba is expected to become a new AI architecture that may outperform Transformer. Recently, a number of works have attempted to study the potential of Mamba in various fields, such as general vision, multi-modal, medical image analysis and remote sensing image analysis, by extending Mamba from natural language domain to visual domain. To fully understand Mamba in the visual domain, we conduct a comprehensive survey and present a taxonomy study. This survey focuses on Mamba's application to a variety of visual tasks and data types, and discusses its predecessors, recent advances and far-reaching impact on a wide range of domains. Since Mamba is now on an upward trend, please actively notice us if you have new findings, and new progress on Mamba will be included in this survey in a timely manner and updated on the Mamba project at https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy.

Updated: 2024-05-07 15:30:14

标题: Vision Mamba：一项全面的调查和分类学

摘要: 状态空间模型（SSM）是一种数学模型，用于描述和分析动态系统的行为。这种模型在控制理论、信号处理、经济学和机器学习等多个领域得到了广泛应用。在深度学习领域，状态空间模型用于处理序列数据，如时间序列分析、自然语言处理（NLP）和视频理解。通过将序列数据映射到状态空间，可以更好地捕捉数据中的长期依赖关系。特别是，现代SSM在NLP领域展现出强大的表征能力，特别是在长序列建模方面，同时保持线性时间复杂度。值得注意的是，基于最新的状态空间模型，Mamba将时变参数合并到SSM中，并制定了一种硬件感知算法，用于高效的训练和推理。由于其出色的效率和强大的长程依赖建模能力，人们期望Mamba成为一种可能超越Transformer的新的AI架构。最近，一些研究已尝试研究Mamba在各个领域的潜力，如通用视觉、多模态、医学图像分析和遥感图像分析，通过将Mamba从自然语言领域拓展到视觉领域。为了充分了解Mamba在视觉领域的应用，我们进行了全面调查并提出了一个分类研究。这项调查重点关注Mamba在各种视觉任务和数据类型上的应用，并讨论了其前身、最新进展以及对各个领域的深远影响。由于Mamba目前处于上升趋势，请在您有新发现时积极通知我们，新的关于Mamba的进展将及时包含在此调查中，并更新在Mamba项目网址上：https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy。

更新时间: 2024-05-07 15:30:14

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.04404v1

(In)Security of Mobile Apps in Developing Countries: A Systematic Literature Review

In developing countries, several key sectors, including education, finance, agriculture, and healthcare, mainly deliver their services via mobile app technology on handheld devices. As a result, mobile app security has emerged as a paramount issue in developing countries. In this paper, we investigate the state of research on mobile app security, focusing on developing countries. More specifically, we performed a systematic literature review exploring the research directions taken by existing works, the different security concerns addressed, and the techniques used by researchers to highlight or address app security issues. Our main findings are: (1) the literature includes only a few studies on mobile app security in the context of developing countries ; (2) among the different security concerns that researchers study, vulnerability detection appears to be the leading research topic; (3) FinTech apps are revealed as the main target in the relevant literature. Overall, our work highlights that there is largely room for developing further specialized techniques addressing mobile app security in the context of developing countries.

Updated: 2024-05-07 15:26:53

标题: 发展中国家移动应用程序的（不）安全性：系统文献综述

摘要: 在发展中国家，包括教育、金融、农业和医疗等几个关键领域主要通过手持设备上的移动应用技术提供服务。因此，移动应用安全已经成为发展中国家一个重要问题。本文调查了移动应用安全研究的现状，重点关注发展中国家。更具体地，我们进行了系统文献回顾，探讨了现有研究的研究方向，解决的不同安全问题以及研究人员用于突出或解决应用安全问题的技术。我们的主要发现是：（1）文献中只包括少量关于发展中国家移动应用安全的研究；（2）在研究人员研究的不同安全问题中，漏洞检测似乎是主要研究主题；（3）金融科技应用在相关文献中被揭示为主要目标。总体而言，我们的工作突出显示，在发展中国家的情况下，大部分仍有待开发更多专门技术来解决移动应用安全问题的空间。

更新时间: 2024-05-07 15:26:53

领域: cs.CR

下载: http://arxiv.org/abs/2405.05117v1

Deep Unlearning: Fast and Efficient Training-free Approach to Class Forgetting

Machine unlearning is a prominent and challenging field, driven by regulatory demands for user data deletion and heightened privacy awareness. Existing approaches involve retraining model or multiple finetuning steps for each deletion request, often constrained by computational limits and restricted data access. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate specific classes from the learned model. Our algorithm first estimates the Retain and the Forget Spaces using Singular Value Decomposition on the layerwise activations for a small subset of samples from the retain and unlearn classes, respectively. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space. Finally, we obtain the unlearned model by updating the weights to suppress the class discriminatory features from the activation spaces. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only $\sim 1.5\%$ drop in retain accuracy compared to the original model while maintaining under $1\%$ accuracy on the unlearned class samples. Further, our algorithm consistently performs well when subject to Membership Inference Attacks showing $7.8\%$ improvement on average across a variety of image classification datasets and network architectures, as compared to other baselines while being $\sim 6 \times$ more computationally efficient. Our code is available at https://github.com/sangamesh-kodge/class_forgetting.

Updated: 2024-05-07 15:26:02

标题: 深度遗忘：快速高效的无需训练方法处理类别遗忘

摘要: 机器遗忘是一个突出和具有挑战性的领域，受到监管要求用户数据删除和日益增强的隐私意识的推动。现有方法涉及重新训练模型或对每个删除请求进行多次微调步骤，通常受到计算限制和受限数据访问的约束。在这项工作中，我们引入了一种新颖的类遗忘算法，旨在从学习模型中有策略地消除特定类。我们的算法首先利用奇异值分解对保留和遗忘类别的少量样本的分层激活进行计算，估计保留空间和遗忘空间。然后计算这些空间之间的共享信息，并从遗忘空间中去除它，以隔离类别区分特征空间。最后，我们通过更新权重来抑制激活空间中的类别区分特征，获得未学习模型。我们在ImageNet上使用Vision Transformer演示了我们算法的有效性，相比原始模型，保留准确性仅下降约$1.5\%$，同时在未学习类别样本上保持$1\%$以下的准确性。此外，我们的算法在面对会员推理攻击时表现始终良好，与其他基线相比，平均表现提高了$7.8\%$，跨各种图像分类数据集和网络架构，同时计算效率提高了约$\sim 6 \times$。我们的代码可以在https://github.com/sangamesh-kodge/class_forgetting找到。

更新时间: 2024-05-07 15:26:02

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2312.00761v3

Predicting Transonic Flowfields in Non-Homogeneous Unstructured Grids Using Autoencoder Graph Convolutional Networks

This paper focuses on addressing challenges posed by non-homogeneous unstructured grids, commonly used in Computational Fluid Dynamics (CFD). Their prevalence in CFD scenarios has motivated the exploration of innovative approaches for generating reduced-order models. The core of our approach centers on geometric deep learning, specifically the utilization of graph convolutional network (GCN). The novel Autoencoder GCN architecture enhances prediction accuracy by propagating information to distant nodes and emphasizing influential points. This architecture, with GCN layers and encoding/decoding modules, reduces dimensionality based on pressure-gradient values. The autoencoder structure improves the network capability to identify key features, contributing to a more robust and accurate predictive model. To validate the proposed methodology, we analyzed two different test cases: wing-only model and wing--body configuration. Precise reconstruction of steady-state distributed quantities within a two-dimensional parametric space underscores the reliability and versatility of the implemented approach.

Updated: 2024-05-07 15:18:21

标题: 使用自动编码器图卷积网络预测非均匀非结构网格中的跨音速流场

摘要: 本文重点讨论了在计算流体力学（CFD）中常用的非均匀非结构网格所带来的挑战。这些非均匀非结构网格在CFD场景中的普遍存在促使了对生成降阶模型的创新方法的探索。我们方法的核心在于几何深度学习，具体是利用图卷积网络（GCN）。新颖的自编码器GCN架构通过将信息传播到远程节点并强调影响力点来增强预测准确性。这种架构，具有GCN层和编码/解码模块，基于压力梯度值降低了维度。自编码器结构提高了网络识别关键特征的能力，有助于构建更健壮和准确的预测模型。为了验证提出的方法，我们分析了两个不同的测试案例：仅翼模型和翼-机身配置。在二维参数空间内对稳态分布量的精确重构突显了实施方法的可靠性和多功能性。

更新时间: 2024-05-07 15:18:21

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.04396v1

Efficient Online Set-valued Classification with Bandit Feedback

Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments.

Updated: 2024-05-07 15:14:51

标题: 高效的在线集合值分类与老虎机反馈

摘要: 一致性预测是一种无分布方法，它包装了一个给定的机器学习模型，并返回一个包含真实标签的一组可能标签，其覆盖率可预先指定。在实践中，实际覆盖率的实现高度依赖于从数据中完全观察到的标签信息，无论是在模型拟合的训练阶段还是在分位数估计的校准阶段。这种依赖关系在带有自助反馈的在线学习环境中构成了挑战，其中学习者只能访问行为的正确性（即，拉动了一个手臂），而不能获取真实标签的完整信息。特别是，当拉动的手臂不正确时，学习者只知道拉动的那个不是真正的类标签，但不知道哪个标签是真实的。此外，自助反馈进一步导致校准的标记数据集变小，仅限于具有正确操作的实例，从而影响分位数估计的准确性。为了解决这些限制，我们提出了基于类别的自助一致性预测（BCCP），提供了类别特定的覆盖保证。通过对涉及真实标签的估计量进行无偏估计，BCCP通过随机梯度下降训练模型并进行集合推理。我们的方法克服了每次迭代中稀疏标记数据的挑战，并将一致性预测的可靠性和适用性推广到在线决策环境中。

更新时间: 2024-05-07 15:14:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.04393v1

Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs

Artificial neural networks (ANNs) perform extraordinarily on numerous tasks including classification or prediction, e.g., speech processing and image classification. These new functions are based on a computational model that is enabled to select freely all necessary internal model parameters as long as it eventually delivers the functionality it is supposed to exhibit. Here, we review the connection between the model parameter selection in machine learning (ML) algorithms running on ANNs and the epistemological theory of neopragmatism focusing on the theory's utility and anti-representationalist aspects. To understand the consequences of the model parameter selection of an ANN, we suggest using neopragmatist theories whose implications are well studied. Incidentally, neopragmatism's notion of optimization is also based on utility considerations. This means that applying this approach elegantly reveals the inherent connections between optimization in ML, using a numerical method during the learning phase, and optimization in the ethical theory of consequentialism, where it occurs as a maxim of action. We suggest that these connections originate from the way relevance is calculated in ML systems. This could ultimately reveal a tendency for specific actions in ML systems.

Updated: 2024-05-07 15:11:42

标题: 实用主义智能：实用性原则如何指导人工神经网络

摘要: 人工神经网络（ANNs）在许多任务中表现出色，包括分类或预测，例如语音处理和图像分类。这些新功能基于一个计算模型，该模型可以自由选择所有必要的内部模型参数，只要最终实现所需的功能即可。在这里，我们审查了运行在人工神经网络上的机器学习（ML）算法中的模型参数选择与新实用主义理论的认识论理论以及反再现主义方面的联系。为了了解人工神经网络的模型参数选择的后果，我们建议使用经过深入研究的新实用主义理论的含义。顺便说一句，新实用主义对优化的概念也是基于效用考虑的。这意味着应用这种方法可以优雅地揭示机器学习中优化、在学习阶段使用数值方法以及伦理理论中的后果主义优化之间的内在联系，后者在行动准则中发生。我们认为这些联系源于机器学习系统中计算相关性的方式。这最终可能揭示机器学习系统中特定行动的倾向。

更新时间: 2024-05-07 15:11:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04386v1

A Survey on Neural Question Generation: Methods, Applications, and Prospects

In this survey, we present a detailed examination of the advancements in Neural Question Generation (NQG), a field leveraging neural network techniques to generate relevant questions from diverse inputs like knowledge bases, texts, and images. The survey begins with an overview of NQG's background, encompassing the task's problem formulation, prevalent benchmark datasets, established evaluation metrics, and notable applications. It then methodically classifies NQG approaches into three predominant categories: structured NQG, which utilizes organized data sources, unstructured NQG, focusing on more loosely structured inputs like texts or visual content, and hybrid NQG, drawing on diverse input modalities. This classification is followed by an in-depth analysis of the distinct neural network models tailored for each category, discussing their inherent strengths and potential limitations. The survey culminates with a forward-looking perspective on the trajectory of NQG, identifying emergent research trends and prospective developmental paths. Accompanying this survey is a curated collection of related research papers, datasets and codes, systematically organized on Github, providing an extensive reference for those delving into NQG.

Updated: 2024-05-07 15:08:56

标题: 一个关于神经问答生成的调查：方法，应用和前景

摘要: 在这份调查中，我们对神经问题生成（NQG）领域的进展进行了详细审查，这是一个利用神经网络技术从多样输入（如知识库、文本和图像）生成相关问题的领域。调查从NQG的背景概述开始，包括任务的问题形式化、流行的基准数据集、建立的评估指标和显著的应用。然后，它系统地将NQG方法分为三个主要类别：结构化NQG，利用组织良好的数据源，非结构化NQG，侧重于更松散结构的输入，如文本或视觉内容，以及混合NQG，利用多样的输入模态。这种分类后面是对为每个类别量身定制的不同神经网络模型的深入分析，讨论它们固有的优势和潜在的局限性。调查以一个面向未来的视角结束，指出NQG的发展轨迹，识别新兴的研究趋势和潜在的发展路径。伴随这份调查的是一个精选的相关研究论文、数据集和代码的收藏，系统地组织在Github上，为那些深入研究NQG的人提供广泛的参考。

更新时间: 2024-05-07 15:08:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.18267v2

Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.

Updated: 2024-05-07 15:07:34

标题: 通过强化学习引导的组合化学实现具有极端特性的材料发现

摘要: 大多数材料发现的目标是发现比当前已知材料更优越的材料。从根本上讲，这接近于外推，这是大多数机器学习模型的弱点，这些模型学习数据的概率分布。在这里，我们开发了强化学习引导的组合化学，这是一个基于规则的分子设计师，由经过训练的策略驱动，以选择下一个分子片段以获得目标分子。由于我们的模型有潜力生成所有可能的分子结构，这些结构可以通过分子片段的组合获得，因此可以发现具有优越性能的未知分子。我们在理论上和实际上证明，我们的模型比概率分布学习模型更适合发现更好的化合物。在一个旨在发现具有七种极端目标属性的分子的实验中，我们的模型在10万次试验中发现了1,315个所有目标命中分子和7,629个五个目标命中分子，而概率分布学习模型失败了。此外，已经确认在分子片段的结合规则下生成的每个分子都是100%的化学有效的。为了说明在实际问题中的性能，我们还展示我们的模型在两个实际应用中的表现良好：发现蛋白质结合分子和HIV抑制剂。

更新时间: 2024-05-07 15:07:34

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2303.11833v2

Towards Stability of Parameter-free Optimization

Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, AdamG (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying AdamG is our golden step size derived for the AdaGrad-Norm algorithm, which is expected to help AdaGrad-Norm preserve the tuning-free convergence and approximate the optimal step size in expectation w.r.t. various optimization scenarios. To better evaluate tuning-free performance, we propose a novel evaluation criterion, stability, to comprehensively assess the efficacy of parameter-free optimizers in addition to classical performance criteria. Empirical results demonstrate that compared with other parameter-free baselines, AdamG achieves superior performance, which is consistently on par with Adam using a manually tuned learning rate across various optimization tasks.

Updated: 2024-05-07 14:58:12

标题: 朝向无参数优化的稳定性

摘要: 超参数调整，尤其是在自适应梯度训练方法中选择合适的学习率，仍然是一个挑战。为了解决这个问题，本文提出了一种新颖的无参数优化器AdamG（带有黄金步长的Adam），旨在自动适应不同的优化问题而无需手动调整。AdamG的核心技术是我们为AdaGrad-Norm算法推导的黄金步长，预计可以帮助AdaGrad-Norm保持无调整收敛性，并在各种优化场景中对期望的最优步长进行近似。为了更好地评估无调整性能，我们提出了一个新颖的评估标准，稳定性，以全面评估无参数优化器的有效性，除了传统性能标准。实证结果表明，与其他无参数基线相比，AdamG实现了卓越的性能，在各种优化任务中始终与手动调整学习率的Adam持平。

更新时间: 2024-05-07 14:58:12

领域: cs.LG

下载: http://arxiv.org/abs/2405.04376v1

Leveraging LSTM and GAN for Modern Malware Detection

The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.

Updated: 2024-05-07 14:57:24

标题: 利用LSTM和GAN进行现代恶意软件检测

摘要: 病毒软件的激增在网络空间中的危险程度等同于气候变化对生态系统的影响。在全球社区在网络安全技术和员工培训方面进行重大投资的情况下，已经陷入了与网络安全威胁的永恒战争中。病毒软件的多样形式和不断变化的面貌不断推动着网络安全从业者采用各种方法，如检测和缓解来应对这一问题。一些旧的习性，如基于签名的检测和行为分析，对于病毒软件类型的快速演变适应速度较慢。因此，本文提出利用深度学习模型、LSTM网络和GANs来提高病毒软件检测的准确性和速度。这种快速增长的最先进技术利用基于原始字节流的数据和深度学习架构，人工智能技术提供了比传统方法更好的准确性和性能。LSTM和GAN模型的集成是用于合成数据的技术，导致培训数据集的扩展，从而提高了检测准确性。本文使用了VirusShare数据集作为所提出模型的培训和评估集，该数据集包含一百万多个病毒软件的唯一样本。通过彻底的数据准备，包括标记化、增强以及模型训练，LSTM和GAN模型在任务中表现出比直接分类器更好的性能。研究结果显示，深度学习的效率在积极的网络安全防御中起着决定性作用，准确率达到98%。此外，本文研究了集成学习和模型融合方法的输出，作为减少偏见和提高模型复杂性的途径。

更新时间: 2024-05-07 14:57:24

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04373v1

Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

In this study, explainable machine learning techniques are applied to predict the toxicity of mussels in the Gulf of Trieste (Adriatic Sea) caused by harmful algal blooms. By analysing a newly created 28-year dataset containing records of toxic phytoplankton in mussel farming areas and toxin concentrations in mussels (Mytilus galloprovincialis), we train and evaluate the performance of ML models to accurately predict diarrhetic shellfish poisoning (DSP) events. The random forest model provided the best prediction of positive toxicity results based on the F1 score. Explainability methods such as permutation importance and SHAP identified key species (Dinophysis fortii and D. caudata) and environmental factors (salinity, river discharge and precipitation) as the best predictors of DSP outbreaks. These findings are important for improving early warning systems and supporting sustainable aquaculture practices.

Updated: 2024-05-07 14:55:42

标题: 在长期监测有毒藻类水华数据的基础上，解释机器学习用于预测亚得里亚海贝类毒性

摘要: 在这项研究中，可解释的机器学习技术被应用于预测因有害藻类水华引起的亚得里亚海（Trieste湾）贻贝的毒性。通过分析一个新创建的28年数据集，其中包含了贻贝养殖区域中有毒浮游植物和贻贝（Mytilus galloprovincialis）中毒素浓度的记录，我们训练和评估机器学习模型的性能，以准确预测腹泻性贝类中毒（DSP）事件。随机森林模型基于F1分数提供了对正毒性结果最佳的预测。解释性方法，如排列重要性和SHAP，确定了关键物种（Dinophysis fortii和D. caudata）和环境因素（盐度、河流排放和降水）作为DSP爆发的最佳预测因子。这些发现对于改善早期预警系统并支持可持续水产养殖实践是重要的。

更新时间: 2024-05-07 14:55:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04372v1

Inferring Discussion Topics about Exploitation of Vulnerabilities from Underground Hacking Forums

The increasing sophistication of cyber threats necessitates proactive measures to identify vulnerabilities and potential exploits. Underground hacking forums serve as breeding grounds for the exchange of hacking techniques and discussions related to exploitation. In this research, we propose an innovative approach using topic modeling to analyze and uncover key themes in vulnerabilities discussed within these forums. The objective of our study is to develop a machine learning-based model that can automatically detect and classify vulnerability-related discussions in underground hacking forums. By monitoring and analyzing the content of these forums, we aim to identify emerging vulnerabilities, exploit techniques, and potential threat actors. To achieve this, we collect a large-scale dataset consisting of posts and threads from multiple underground forums. We preprocess and clean the data to ensure accuracy and reliability. Leveraging topic modeling techniques, specifically Latent Dirichlet Allocation (LDA), we uncover latent topics and their associated keywords within the dataset. This enables us to identify recurring themes and prevalent discussions related to vulnerabilities, exploits, and potential targets.

Updated: 2024-05-07 14:54:32

标题: 从地下黑客论坛推断有关利用漏洞的讨论主题

摘要: 随着网络威胁的日益复杂化，需要采取主动措施来识别漏洞和潜在的利用方式。地下黑客论坛成为黑客技术交流和利用讨论的温床。在这项研究中，我们提出了一种创新方法，使用主题建模来分析和揭示这些论坛中讨论的漏洞的关键主题。我们研究的目标是开发一个基于机器学习的模型，可以自动检测和分类地下黑客论坛中与漏洞相关的讨论。通过监测和分析这些论坛的内容，我们旨在识别新兴的漏洞、利用技术和潜在的威胁行为者。为了实现这一目标，我们收集了来自多个地下论坛的帖子和主题的大规模数据集。我们对数据进行预处理和清洗，以确保准确性和可靠性。利用主题建模技术，特别是Latent Dirichlet Allocation (LDA)，我们揭示了数据集中的潜在主题及其相关关键词。这使我们能够识别与漏洞、利用和潜在目标相关的重复主题和普遍讨论。

更新时间: 2024-05-07 14:54:32

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04561v1

Community Detection for Heterogeneous Multiple Social Networks

The community plays a crucial role in understanding user behavior and network characteristics in social networks. Some users can use multiple social networks at once for a variety of objectives. These users are called overlapping users who bridge different social networks. Detecting communities across multiple social networks is vital for interaction mining, information diffusion, and behavior migration analysis among networks. This paper presents a community detection method based on nonnegative matrix tri-factorization for multiple heterogeneous social networks, which formulates a common consensus matrix to represent the global fused community. Specifically, the proposed method involves creating adjacency matrices based on network structure and content similarity, followed by alignment matrices which distinguish overlapping users in different social networks. With the generated alignment matrices, the method could enhance the fusion degree of the global community by detecting overlapping user communities across networks. The effectiveness of the proposed method is evaluated with new metrics on Twitter, Instagram, and Tumblr datasets. The results of the experiments demonstrate its superior performance in terms of community quality and community fusion.

Updated: 2024-05-07 14:52:34

标题: 多社交网络异构社区检测

摘要: 社区在理解社交网络中用户行为和网络特征方面发挥着至关重要的作用。一些用户可以同时使用多个社交网络来实现各种目标。这些用户被称为重叠用户，他们跨越不同的社交网络。在多个社交网络上检测社区对于交互挖掘、信息扩散和网络之间行为迁移分析至关重要。本文提出了一种基于非负矩阵三因子分解的多个异构社交网络的社区检测方法，该方法制定了一个共识矩阵来表示全局融合的社区。具体来说，所提出的方法涉及基于网络结构和内容相似性创建邻接矩阵，然后是区分不同社交网络中重叠用户的对齐矩阵。通过生成的对齐矩阵，该方法可以通过检测跨网络的重叠用户社区来增强全局社区的融合程度。该方法的有效性是通过Twitter、Instagram和Tumblr数据集上的新指标进行评估的。实验结果表明，在社区质量和社区融合方面，该方法的性能优越。

更新时间: 2024-05-07 14:52:34

领域: cs.SI,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.04371v1

Zero Grads: Learning Local Surrogate Losses for Non-Differentiable Graphics

Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a ``surrogate'' that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to other derivative-free algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.

Updated: 2024-05-07 14:50:00

标题: Zero Grads：学习非可微图形的本地代理损失

摘要: 基于梯度的优化算法现在在图形学领域普遍存在，但遗憾的是不能应用于梯度未定义或为零的问题。为了绕过这个问题，损失函数可以手动替换为一个具有类似最小值但可微分的“替代品”。我们提出的ZeroGrads框架通过学习目标函数的神经逼近来自动化这一过程，进而可以通过任意黑盒图形管线进行微分。我们在主动平滑的目标函数的版本上对替代品进行训练，并鼓励局部性，将替代品的容量集中在当前训练阶段的重点上。拟合是在线执行的，与参数优化同时进行，并且是自监督的，无需预先计算的数据或预训练模型。由于采样目标代价昂贵（需要完整的渲染或模拟器运行），我们设计了一个高效的采样方案，可以在可处理的运行时间内实现竞争性表现，而几乎没有额外开销。我们展示了在图形学中优化各种非凸、非可微黑盒问题的能力，例如渲染中的可见性、过程建模中的离散参数空间或基于物理的动画中的最优控制。与其他无导数算法相比，我们的方法在高维度上表现良好，我们在具有多达35k个相互关联变量的问题上进行了演示。

更新时间: 2024-05-07 14:50:00

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2308.05739v2

Global Scale Self-Supervised Channel Charting with Sensor Fusion

The sensing and positioning capabilities foreseen in 6G have great potential for technology advancements in various domains, such as future smart cities and industrial use cases. Channel charting has emerged as a promising technology in recent years for radio frequency-based sensing and localization. However, the accuracy of these techniques is yet far behind the numbers envisioned in 6G. To reduce this gap, in this paper, we propose a novel channel charting technique capitalizing on the time of arrival measurements from surrounding Transmission Reception Points (TRPs) along with their locations and leveraging sensor fusion in channel charting by incorporating laser scanner data during the training phase of our algorithm. The proposed algorithm remains self-supervised during training and test phases, requiring no geometrical models or user position ground truth. Simulation results validate the achievement of a sub-meter level localization accuracy using our algorithm 90% of the time, outperforming the state-of-the-art channel charting techniques and the traditional triangulation-based approaches.

Updated: 2024-05-07 14:33:45

标题: 全球尺度自监督通道图表与传感器融合

摘要: 6G所预见的感知和定位能力在未来智能城市和工业应用等各个领域的技术进步具有巨大潜力。近年来，频率感知和本地化的通道制图技术已经成为一项有前途的技术。然而，这些技术的准确性仍远远落后于6G所设想的数字。为了缩小这一差距，在本文中，我们提出了一种新颖的通道制图技术，利用周围传输接收点（TRP）的到达时间测量以及它们的位置，并在算法的训练阶段中通过整合激光扫描仪数据来利用传感器融合进行通道制图。所提出的算法在训练和测试阶段保持自我监督，不需要几何模型或用户位置基准。模拟结果验证了我们的算法在90%的时间内实现了亚米级定位精度，超越了最先进的通道制图技术和传统的三角测量方法。

更新时间: 2024-05-07 14:33:45

领域: cs.IT,cs.AI,math.IT

下载: http://arxiv.org/abs/2405.04357v1

SmmPack: Obfuscation for SMM Modules with TPM Sealed Key

System Management Mode (SMM) is the highest-privileged operating mode of x86 and x86-64 processors. Through SMM exploitation, attackers can tamper with the Unified Extensible Firmware Interface (UEFI) firmware, disabling the security mechanisms implemented by the operating system and hypervisor. Vulnerabilities enabling SMM code execution are often reported as Common Vulnerabilities and Exposures (CVEs); however, no security mechanisms currently exist to prevent attackers from analyzing those vulnerabilities. To increase the cost of vulnerability analysis of SMM modules, we introduced SmmPack. The core concept of SmmPack involves encrypting an SMM module with the key securely stored in a Trusted Platform Module (TPM). We assessed the effectiveness of SmmPack in preventing attackers from obtaining and analyzing SMM modules using various acquisition methods. Our results show that SmmPack significantly increases the cost by narrowing down the means of module acquisition. Furthermore, we demonstrated that SmmPack operates without compromising the performance of the original SMM modules. We also clarified the management and adoption methods of SmmPack, as well as the procedure for applying BIOS updates, and demonstrated that the implementation of SmmPack is realistic.

Updated: 2024-05-07 14:33:05

标题: SmmPack：带有TPM密封密钥的SMM模块的混淆

摘要: 系统管理模式（SMM）是x86和x86-64处理器的最高权限操作模式。通过SMM利用，攻击者可以篡改统一可扩展固件接口（UEFI）固件，从而禁用操作系统和超级监视器实施的安全机制。经常有关于允许SMM代码执行的漏洞被报告为通用漏洞和暴露（CVEs）；然而，目前没有安全机制可以防止攻击者分析这些漏洞。为了增加对SMM模块漏洞分析的成本，我们引入了SmmPack。SmmPack的核心概念是使用安全存储在可信平台模块（TPM）中的密钥加密SMM模块。我们评估了SmmPack在阻止攻击者获取和分析SMM模块时的有效性，使用各种获取方法。我们的结果显示，SmmPack通过缩小模块获取手段显著增加了成本。此外，我们证明了SmmPack在不损害原始SMM模块性能的情况下运行。我们还澄清了SmmPack的管理和采用方法，以及应用BIOS更新的程序，并证明了SmmPack的实施是现实的。

更新时间: 2024-05-07 14:33:05

领域: cs.CR

下载: http://arxiv.org/abs/2405.04355v1

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, we study the use of instructions in IR systems. First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions. FollowIR repurposes detailed instructions -- also known as narratives -- developed for professional assessors to evaluate retrieval systems. In particular, we build our benchmark from three collections curated for shared tasks at the Text REtrieval Conference (TREC). These collections contains hundreds to thousands of labeled documents per query, making them suitable for our exploration. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements after fine-tuning on our training set.

Updated: 2024-05-07 14:25:15

标题: FollowIR：评估和教授信息检索模型遵循指示

摘要: 现代语言模型（LMs）能够遵循长而复杂的指令，从而实现大量和多样化的用户请求。虽然信息检索（IR）模型将这些LMs用作其架构的支柱，但几乎没有一个允许用户在查询中提供详细的指令，从而限制了它们满足复杂信息需求的能力。在这项工作中，我们研究了指令在IR系统中的使用。首先，我们介绍了我们的数据集FollowIR，其中包含严格的指令评估基准以及一个用于帮助IR模型学习更好地遵循现实指令的训练集。FollowIR重新利用了专业评估人员为评估检索系统开发的详细指令，也称为叙述。具体而言，我们从为Text REtrieval Conference（TREC）的共享任务策划的三个集合构建了我们的基准。这些集合每个查询包含数百到数千个标记文档，适合我们的探索。通过这个过程，我们可以通过一个新的成对评估框架来衡量IR模型遵循指令的能力。我们的结果表明，现有的检索模型未能正确使用指令，仅将其用于基本关键字，并且难以理解长篇信息。然而，我们展示了IR模型可以学习遵循复杂指令的可能性：我们的新FollowIR-7B模型在我们的训练集上微调后有显著改进。

更新时间: 2024-05-07 14:25:15

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.15246v3

Revisiting character-level adversarial attacks

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in https://github.com/LIONS-EPFL/Charmer.

Updated: 2024-05-07 14:23:22

标题: 重新审视字符级对抗性攻击

摘要: 在自然语言处理中的对抗攻击中，通常在字符或标记级别应用扰动。标记级别的攻击因采用基于梯度的方法而备受关注，但容易改变句子语义，导致产生无效的对抗示例。而字符级别的攻击虽然容易保持语义，但由于不能轻松采用流行的基于梯度的方法，因此受到的关注较少，被认为易于防御。挑战这些观念，我们介绍了Charmer，一种高效的基于查询的对抗攻击方法，能够实现高攻击成功率（ASR），同时生成高度相似的对抗示例。我们的方法成功针对小型（BERT）和大型（Llama 2）模型。具体来说，在BERT与SST-2上，Charmer将ASR提高了4.84个百分点，并将USE相似性提高了8个百分点，相较于先前的技术。我们的实现可在https://github.com/LIONS-EPFL/Charmer找到。

更新时间: 2024-05-07 14:23:22

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.04346v1

Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications

Neural Radiance Fields (NeRFs) have become a rapidly growing research field with the potential to revolutionize typical photogrammetric workflows, such as those used for 3D scene reconstruction. As input, NeRFs require multi-view images with corresponding camera poses as well as the interior orientation. In the typical NeRF workflow, the camera poses and the interior orientation are estimated in advance with Structure from Motion (SfM). But the quality of the resulting novel views, which depends on different parameters such as the number and distribution of available images, as well as the accuracy of the related camera poses and interior orientation, is difficult to predict. In addition, SfM is a time-consuming pre-processing step, and its quality strongly depends on the image content. Furthermore, the undefined scaling factor of SfM hinders subsequent steps in which metric information is required. In this paper, we evaluate the potential of NeRFs for industrial robot applications. We propose an alternative to SfM pre-processing: we capture the input images with a calibrated camera that is attached to the end effector of an industrial robot and determine accurate camera poses with metric scale based on the robot kinematics. We then investigate the quality of the novel views by comparing them to ground truth, and by computing an internal quality measure based on ensemble methods. For evaluation purposes, we acquire multiple datasets that pose challenges for reconstruction typical of industrial applications, like reflective objects, poor texture, and fine structures. We show that the robot-based pose determination reaches similar accuracy as SfM in non-demanding cases, while having clear advantages in more challenging scenarios. Finally, we present first results of applying the ensemble method to estimate the quality of the synthetic novel view in the absence of a ground truth.

Updated: 2024-05-07 14:22:32

标题: 用神经辐射场进行工业机器人应用的新视角综合

摘要: 神经辐射场（NeRFs）已成为一个快速发展的研究领域，有潜力彻底改变典型的摄影测量工作流程，例如用于3D场景重建的工作流程。作为输入，NeRFs需要带有相应相机姿态和内部定位的多视图图像。在典型的NeRF工作流程中，相机姿态和内部定位是通过运动结构（SfM）提前估计的。但取决于不同参数（如可用图像的数量和分布以及相关相机姿态和内部定位的准确性）的结果新视图的质量难以预测。此外，SfM是一个耗时的预处理步骤，其质量在很大程度上取决于图像内容。此外，SfM的未定义比例因子阻碍了需要度量信息的后续步骤。在本文中，我们评估了NeRFs在工业机器人应用中的潜力。我们提出了一种替代SfM预处理的方法：我们使用连接到工业机器人末端执行器的已校准相机拍摄输入图像，并基于机器人运动学确定准确的相机姿态。然后，我们通过与地面真实值比较和基于集成方法计算的内部质量指标，研究了新视图的质量。为了评估目的，我们获取了多个数据集，这些数据集对于反射物体、贫乏纹理和细微结构等工业应用的典型重建构成挑战。我们展示了基于机器人的姿态确定在非苛刻情况下达到与SfM相似的准确性，同时在更具挑战性的场景中具有明显优势。最后，我们展示了应用集成方法估计合成新视图质量的初步结果。

更新时间: 2024-05-07 14:22:32

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.04345v1

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.

Updated: 2024-05-07 14:19:09

标题: 通过秘密数据集分区和本德分解提高度量差分隐私的可扩展性

摘要: 度量差分隐私（mDP）将差分隐私（DP）的概念扩展为一种新的数据扰动范式。它旨在保护表示为一般度量空间的机密数据，例如以词嵌入形式编码的文本数据或道路网络或网格地图上的地理位置数据。为了在mDP下推导出最佳的数据扰动机制，一个广泛使用的方法是线性规划（LP），然而，LP可能会受到决策变量的多项式爆炸的影响，使其在大规模mDP中变得不实用。本文的目标是开发一个新的计算框架，以增强基于LP的mDP的可扩展性。考虑到mDP约束在秘密记录之间建立的连接，我们将原始秘密数据集分成各种子集。基于这种分区，我们重新构建了mDP的LP问题，并通过Benders分解解决它，Benders分解分为两个阶段：（1）一个主程序来管理跨子集的扰动计算和（2）一组子问题，每个子问题管理子集内的扰动派生。我们在多个数据集上的实验结果，包括道路网络/网格地图中的地理位置数据、文本数据和合成数据，强调了我们提出的机制具有卓越的可扩展性和效率。

更新时间: 2024-05-07 14:19:09

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.04344v1

The Curse of Diversity in Ensemble-Based Exploration

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.

Updated: 2024-05-07 14:14:50

标题: 合奏探索中多样性的诅咒

摘要: 我们在深度强化学习中发现了一个令人惊讶的现象：训练一个多样化的数据共享代理集合 - 一个已经被广泛接受的探索策略 - 可以显著地降低单个集合成员的性能，与标准的单一代理训练相比。通过仔细分析，我们将性能下降归因于在每个集合成员的共享训练数据中自生成数据的比例较低，以及单个集合成员从这种高度脱离政策数据中学习的低效性。因此，我们将这一现象称为多样性的诅咒。我们发现，几个直观的解决方案 - 如更大的重放缓冲区或更小的集合大小 - 要么未能一致地减轻性能损失，要么削弱了集合的优势。最后，我们展示了表示学习对抗多样性诅咒的潜力，提出了一种名为交叉集合表示学习（CERL）的新方法，适用于离散和连续控制领域。我们的工作为基于集合的探索中的意外陷阱提供了宝贵的见解，并为将来类似方法的应用提出了重要的警告。

更新时间: 2024-05-07 14:14:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.04342v1

Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction

Predicting Remaining Useful Life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time series sensory data from such systems, deep learning models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies of individual sensors, spatial dependencies emerge as important correlations among these sensors, which can be naturally modelled by a temporal graph that describes time-varying spatial relationships. However, the majority of existing studies have relied on capturing discrete snapshots of this temporal graph, a coarse-grained approach that leads to loss of temporal information. Moreover, given the variety of heterogeneous sensors, it becomes vital that such inherent heterogeneity is leveraged for RUL prediction in temporal sensor graphs. To capture the nuances of the temporal and spatial relationships and heterogeneous characteristics in an interconnected graph of sensors, we introduce a novel model named Temporal and Heterogeneous Graph Neural Networks (THGNN). Specifically, THGNN aggregates historical data from neighboring nodes to accurately capture the temporal dynamics and spatial correlations within the stream of sensor data in a fine-grained manner. Moreover, the model leverages Feature-wise Linear Modulation (FiLM) to address the diversity of sensor types, significantly improving the model's capacity to learn the heterogeneity in the data sources. Finally, we have validated the effectiveness of our approach through comprehensive experiments. Our empirical findings demonstrate significant advancements on the N-CMAPSS dataset, achieving improvements of up to 19.2% and 31.6% in terms of two different evaluation metrics over state-of-the-art methods.

Updated: 2024-05-07 14:08:57

标题: 时间和异质图神经网络用于剩余寿命预测

摘要: 预测剩余可用寿命（RUL）在涉及各种相互关联传感器的工业系统的预测和健康管理中起着至关重要的作用。鉴于这些系统不断产生的时间序列传感数据，深度学习模型已经突显出在识别这些数据中的复杂、非线性时间依赖关系方面的作用。除了个别传感器的时间依赖关系外，空间依赖关系也表现为这些传感器之间的重要相关性，可以通过描述时变空间关系的时间图自然地建模。然而，现有研究大多依赖于捕捉这种时间图的离散快照，这种粗粒度方法导致了时间信息的丢失。此外，鉴于各种异构传感器的多样性，利用这种固有的异质性对于在时间传感器图中预测RUL变得至关重要。为了捕捉传感器互联图中的时间和空间关系以及异质特征的微妙之处，我们引入了一种名为时间和异质图神经网络（THGNN）的新模型。具体而言，THGNN对来自相邻节点的历史数据进行聚合，以准确捕捉传感器数据流中的时间动态和空间相关性。此外，该模型利用特征级线性调制（FiLM）来解决传感器类型的多样性，显著提高了模型学习数据来源中异质性的能力。最后，我们通过全面的实验证实了我们方法的有效性。我们的实证发现在N-CMAPSS数据集上取得了显著进展，相比最先进方法，两种不同评估指标分别提高了高达19.2%和31.6%。

更新时间: 2024-05-07 14:08:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.04336v1

Rethinking How to Evaluate Language Model Jailbreak

Large language models (LLMs) have become increasingly integrated with various applications. To ensure that LLMs do not generate unsafe responses, they are aligned with safeguards that specify what content is restricted. However, such alignment can be bypassed to produce prohibited content using a technique commonly referred to as jailbreak. Different systems have been proposed to perform the jailbreak automatically. These systems rely on evaluation methods to determine whether a jailbreak attempt is successful. However, our analysis reveals that current jailbreak evaluation methods have two limitations. (1) Their objectives lack clarity and do not align with the goal of identifying unsafe responses. (2) They oversimplify the jailbreak result as a binary outcome, successful or not. In this paper, we propose three metrics, safeguard violation, informativeness, and relative truthfulness, to evaluate language model jailbreak. Additionally, we demonstrate how these metrics correlate with the goal of different malicious actors. To compute these metrics, we introduce a multifaceted approach that extends the natural language generation evaluation method after preprocessing the response. We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems. The benchmark dataset is labeled by three annotators. We compare our multifaceted approach with three existing jailbreak evaluation methods. Experiments demonstrate that our multifaceted evaluation outperforms existing methods, with F1 scores improving on average by 17% compared to existing baselines. Our findings motivate the need to move away from the binary view of the jailbreak problem and incorporate a more comprehensive evaluation to ensure the safety of the language model.

Updated: 2024-05-07 14:06:23

标题: 重新思考如何评估语言模型破解

摘要: 大型语言模型（LLMs）越来越多地与各种应用程序集成在一起。为了确保LLMs不会生成不安全的响应，它们与指定受限内容的安全措施保持一致。然而，这种对齐可以通过一种常被称为越狱的技术来绕过，从而生成被禁止的内容。已经提出了不同的系统来自动执行越狱。这些系统依赖于评估方法来确定越狱尝试是否成功。然而，我们的分析显示当前的越狱评估方法存在两个限制。（1）它们的目标缺乏明确性，并且与识别不安全响应的目标不一致。（2）他们将越狱结果过于简化为二元结果，成功或失败。在本文中，我们提出了三个指标，即安全违规、信息量和相对真实性，用于评估语言模型的越狱。此外，我们展示了这些指标如何与不同恶意行为者的目标相关。为了计算这些指标，我们引入了一个多方面的方法，扩展了自然语言生成评估方法，对响应进行预处理。我们在由三个恶意意图数据集和三个越狱系统生成的基准数据集上评估我们的指标。基准数据集由三名注释者标记。我们将我们的多方面方法与三种现有的越狱评估方法进行比较。实验表明，我们的多方面评估优于现有方法，F1得分平均提高了17％，与现有基线相比。我们的发现促使我们需要摆脱对越狱问题的二元观点，并结合更全面的评估来确保语言模型的安全。

更新时间: 2024-05-07 14:06:23

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.06407v3

Riemannian Laplace Approximation with the Fisher Metric

Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.

Updated: 2024-05-07 14:05:49

标题: 用费舍尔度量的黎曼拉普拉斯逼近

摘要: Laplace的方法是用高斯分布在其模式处近似目标密度。由于Bernstein-von Mises定理，这种方法在贝叶斯推断中是计算效率高且渐近精确的。但是对于复杂的目标和有限数据后验，它经常是一种过于粗糙的近似。最近对Laplace近似的一般化将高斯近似根据选择的黎曼几何进行转换，提供了更丰富的近似族，同时仍保持计算效率。然而，正如本文所示，其性质在很大程度上取决于选择的度量，事实上，在以前的工作中采用的度量导致近似过于狭窄，甚至在无限数据的极限情况下也存在偏差。我们通过进一步发展近似族，推导出两种在无限数据极限下精确的替代变体，扩展了该方法的理论分析，并在一系列实验中展示了实际改进。

更新时间: 2024-05-07 14:05:49

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2311.02766v6

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and accurately capture the true solutions under various scenarios. In this paper, we introduce a novel Transformer-based framework, termed PINNsFormer, designed to address this limitation. PINNsFormer can accurately approximate PDE solutions by utilizing multi-head attention mechanisms to capture temporal dependencies. PINNsFormer transforms point-wise inputs into pseudo sequences and replaces point-wise PINNs loss with a sequential loss. Additionally, it incorporates a novel activation function, Wavelet, which anticipates Fourier decomposition through deep neural networks. Empirical results demonstrate that PINNsFormer achieves superior generalization ability and accuracy across various scenarios, including PINNs failure modes and high-dimensional PDEs. Moreover, PINNsFormer offers flexibility in integrating existing learning schemes for PINNs, further enhancing its performance.

Updated: 2024-05-07 14:04:16

标题: PINNsFormer：基于Transformer的物理信息神经网络框架

摘要: Physics-Informed Neural Networks (PINNs)已经成为一种有前途的深度学习框架，用于近似偏微分方程（PDEs）的数值解。然而，传统的PINNs依赖于多层感知器（MLP），忽视了实际物理系统中固有的关键时间依赖性，因此无法全局传播初始条件约束并准确捕捉各种情况下的真实解。在本文中，我们介绍了一种新颖的基于Transformer的框架，称为PINNsFormer，旨在解决这一局限性。PINNsFormer可以通过利用多头注意机制捕捉时间依赖性来准确近似PDE解。PINNsFormer将点对点输入转换为伪序列，并用顺序损失替换点对点PINNs损失。此外，它还结合了一种新颖的激活函数Wavelet，通过深度神经网络预测傅里叶分解。实证结果表明，PINNsFormer在各种情况下，包括PINNs故障模式和高维PDEs中，都能实现优越的泛化能力和准确性。此外，PINNsFormer在整合现有的PINNs学习方案方面提供了灵活性，进一步增强了其性能。

更新时间: 2024-05-07 14:04:16

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2307.11833v3

One-Class Classification as GLRT for Jamming Detection in Private 5G Networks

5G mobile networks are vulnerable to jamming attacks that may jeopardize valuable applications such as industry automation. In this paper, we propose to analyze radio signals with a dedicated device to detect jamming attacks. We pursue a learning approach, with the detector being a CNN implementing a GLRT. To this end, the CNN is trained as a two-class classifier using two datasets: one of real legitimate signals and another generated artificially so that the resulting classifier implements the GLRT. The artificial dataset is generated mimicking different types of jamming signals. We evaluate the performance of this detector using experimental data obtained from a private 5G network and several jamming signals, showing the technique's effectiveness in detecting the attacks.

Updated: 2024-05-07 14:02:34

标题: 一类分类作为GLRT用于私人5G网络中的干扰检测

摘要: 5G移动网络容易受到干扰攻击，可能会危及诸如工业自动化等宝贵的应用。在本文中，我们提出使用专用设备分析无线电信号以检测干扰攻击。我们采用了一种学习方法，检测器是一个实现GLRT的CNN。为此，CNN被训练为一个两类分类器，使用两个数据集：一个是真实合法信号，另一个是人为生成的，以便生成的分类器实现GLRT。人为数据集是通过模拟不同类型的干扰信号生成的。我们使用从私人5G网络和几种干扰信号获得的实验数据评估了该检测器的性能，展示了该技术在检测攻击方面的有效性。

更新时间: 2024-05-07 14:02:34

领域: eess.SP,cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.09565v1

A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field remain underexplored areas. This white paper seeks to unpack the relationship between open data and generative AI and explore possible components of a new Fourth Wave of Open Data: Is open data becoming AI ready? Is open data moving towards a data commons approach? Is generative AI making open data more conversational? Will generative AI improve open data quality and provenance? Towards this end, we provide a new Spectrum of Scenarios framework. This framework outlines a range of scenarios in which open data and generative AI could intersect and what is required from a data quality and provenance perspective to make open data ready for those specific scenarios. These scenarios include: pertaining, adaptation, inference and insight generation, data augmentation, and open-ended exploration. Through this process, we found that in order for data holders to embrace generative AI to improve open data access and develop greater insights from open data, they first must make progress around five key areas: enhance transparency and documentation, uphold quality and integrity, promote interoperability and standards, improve accessibility and useability, and address ethical considerations.

Updated: 2024-05-07 14:01:33

标题: 第四波开放数据？探索开放数据和生成式人工智能情景的范围

摘要: 自2022年底以来，生成式人工智能已经席卷全球，广泛使用的工具包括ChatGPT、Gemini和Claude。生成式人工智能和大型语言模型（LLM）应用正在改变个人查找和获取数据和知识的方式。然而，开放数据和生成式人工智能之间复杂的关系，以及它在推动创新领域的巨大潜力仍然是尚未被充分探讨的领域。本白皮书旨在揭示开放数据与生成式人工智能之间的关系，并探讨一个新的开放数据第四波的可能组成部分：开放数据是否已准备好迎接人工智能？开放数据是否正在走向数据共同体的方式？生成式人工智能是否使开放数据更具对话性？生成式人工智能是否会提高开放数据的质量和来源？为此，我们提供了一个新的情景谱系框架。该框架概述了一系列开放数据和生成式人工智能可能交叉的情景，以及从数据质量和来源角度来看，使开放数据为这些具体情景做好准备所需的内容。这些情景包括：相关、适应、推断和洞察生成、数据增强和开放式探索。通过这一过程，我们发现，为了数据持有者能够拥抱生成式人工智能，改善开放数据的获取和从开放数据中获得更多洞见，他们首先必须在以下五个关键领域取得进展：增强透明度和文档化、维护质量和完整性、促进互操作性和标准、改善可访问性和易用性，以及解决伦理考虑。

更新时间: 2024-05-07 14:01:33

领域: cs.AI

下载: http://arxiv.org/abs/2405.04333v1

WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive security analysis but also a pressing need for specialized tools that can aid developers in reducing vulnerabilities during the development process. To fill the void, we present a comprehensive security analysis of browser-based wallets in this paper, along with the development of an automated tool designed for this purpose. We first compile a taxonomy of security vulnerabilities resident in cryptocurrency wallets by harvesting historical security reports. Based on this, we design WALLETRADAR, an automated detection framework that can accurately identify security issues based on static and dynamic analysis. Evaluation of 96 popular browser-based wallets shows WALLETRADAR's effectiveness, by successfully automating the detection process in 90% of these wallets with high precision. This evaluation has led to the discovery of 116 security vulnerabilities corresponding to 70 wallets. By the time of this paper, we have received confirmations of 10 vulnerabilities from 8 wallet developers, with over $2,000 bug bounties. Further, we observed that 12 wallet developers have silently fixed 16 vulnerabilities after our disclosure. WALLETRADAR can effectively automate the identification of security risks in cryptocurrency wallets, thereby enhancing software development quality and safety in the blockchain ecosystem.

Updated: 2024-05-07 14:01:27

标题: WALLETRADAR：实现自动化检测基于浏览器的加密货币钱包中漏洞的方法

摘要: 加密货币钱包作为区块链生态系统的基础设施，已经看到了显著的用户增长，特别是在基于浏览器的钱包（即浏览器扩展程序）中。然而，这种扩张伴随着安全挑战，使这些钱包成为恶意活动的主要目标。尽管有大量的用户群体，但对于综合安全分析存在显著差距，同时在开发过程中急需专门的工具来帮助开发人员减少漏洞。为填补这一空白，本文提出了对基于浏览器的钱包进行全面安全分析，并开发了专门用于此目的的自动化工具。我们首先通过收集历史安全报告编制了加密货币钱包中存在的安全漏洞分类。基于此，我们设计了WALLETRADAR，一个自动检测框架，可以通过静态和动态分析准确识别安全问题。对96个流行的基于浏览器的钱包进行评估显示，WALLETRADAR的有效性，成功自动化检测了90%的这些钱包，具有较高的精确度。这次评估发现了116个安全漏洞，对应70个钱包。截至本文撰写时，我们已经收到了8个钱包开发人员确认的10个漏洞，获得了超过2000美元的漏洞赏金。此外，我们观察到，在我们披露后，有12个钱包开发人员悄悄修复了16个漏洞。WALLETRADAR可以有效自动化识别加密货币钱包中的安全风险，从而提高区块链生态系统中软件开发的质量和安全性。

更新时间: 2024-05-07 14:01:27

领域: cs.CR

下载: http://arxiv.org/abs/2405.04332v1

Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem

We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems where the statistical model is a Bayesian network consisting of observations drawn from a mixture of a Gaussian distribution embedded in unrelated clutter, known as the clutter problem. The method employs the reparameterization trick to move the gradient operator inside the expectation and relies on the assumption that, because the likelihood factorizes over the observed data, the variational distribution is generally more compactly supported than the Gaussian distribution in the likelihood factors. This allows efficient local approximation of the individual likelihood factors, which leads to an analytical solution for the integral defining the gradient expectation. We integrate the proposed gradient approximation as the expectation step in an EM (Expectation Maximization) algorithm for maximizing ELBO and test against classical deterministic approaches in Bayesian inference, such as the Laplace approximation, Expectation Propagation and Mean-Field Variational Inference. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.

Updated: 2024-05-07 14:00:29

标题: 在杂乱问题背景下ELBO梯度的分析近似

摘要: 我们提出了一种用于近似计算证据下界（ELBO）梯度的解析解，在变分推断问题中，统计模型是一个由从混合高斯分布中抽取的观测数据组成的贝叶斯网络，被称为混乱问题。该方法利用重新参数化技巧将梯度算子移入期望内部，并依赖于一个假设，即由于似然在观测数据上因子化，变分分布通常比似然因子中的高斯分布更紧凑。这允许有效地局部近似各个似然因子，从而导致定义梯度期望的积分的解析解。我们将所提出的梯度近似集成为EM（期望最大化）算法中的期望步骤，用于最大化ELBO，并与贝叶斯推断中的经典确定性方法进行测试，如拉普拉斯近似、期望传播和均场变分推断。所提出的方法展示了良好的准确性和收敛速度，同时具有线性计算复杂度。

更新时间: 2024-05-07 14:00:29

领域: cs.LG,stat.ML,G.3

下载: http://arxiv.org/abs/2404.10550v2

Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-based Methods

Topological consistency plays a crucial role in the task of boundary segmentation for reticular images, such as cell membrane segmentation in neuron electron microscopic images, grain boundary segmentation in material microscopic images and road segmentation in aerial images. In these fields, topological changes in segmentation results have a serious impact on the downstream tasks, which can even exceed the misalignment of the boundary itself. To enhance the topology accuracy in segmentation results, we propose the Skea-Topo Aware loss, which is a novel loss function that takes into account the shape of each object and topological significance of the pixels. It consists of two components. First, a skeleton-aware weighted loss improves the segmentation accuracy by better modeling the object geometry with skeletons. Second, a boundary rectified term effectively identifies and emphasizes topological critical pixels in the prediction errors using both foreground and background skeletons in the ground truth and predictions. Experiments prove that our method improves topological consistency by up to 7 points in VI compared to 13 state-of-art methods, based on objective and subjective assessments across three different boundary segmentation datasets. The code is available at https://github.com/clovermini/Skea_topo.

Updated: 2024-05-07 13:55:57

标题: 使用基于骨架的方法增强边界分割以提高拓扑精度

摘要: 拓扑一致性在边界分割任务中起着至关重要的作用，例如在神经元电子显微图像中的细胞膜分割、材料显微图像中的晶界分割以及航空图像中的道路分割。在这些领域中，分割结果中的拓扑变化对下游任务有严重影响，甚至可能超过边界本身的错位。为了提高分割结果中的拓扑准确性，我们提出了一种新型损失函数Skea-Topo Aware loss，考虑了每个对象的形状和像素的拓扑重要性。该损失函数由两个组成部分构成。首先，骨架感知加权损失通过更好地模拟骨架来改善分割的准确性。其次，边界修正项有效地识别和强调了预测错误中的拓扑关键像素，利用了地面实况和预测中的前景和背景骨架。实验证明，相对于13种最新方法，我们的方法在三个不同的边界分割数据集上通过客观和主观评估，将拓扑一致性提高了高达7个VI点。源代码可在https://github.com/clovermini/Skea_topo找到。

更新时间: 2024-05-07 13:55:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18539v2

Detecting 5G Narrowband Jammers with CNN, k-nearest Neighbors, and Support Vector Machines

5G cellular networks are particularly vulnerable against narrowband jammers that target specific control sub-channels in the radio signal. One mitigation approach is to detect such jamming attacks with an online observation system, based on machine learning. We propose to detect jamming at the physical layer with a pre-trained machine learning model that performs binary classification. Based on data from an experimental 5G network, we study the performance of different classification models. A convolutional neural network will be compared to support vector machines and k-nearest neighbors, where the last two methods are combined with principal component analysis. The obtained results show substantial differences in terms of classification accuracy and computation time.

Updated: 2024-05-07 13:54:12

标题: 使用卷积神经网络、k最近邻和支持向量机检测5G窄带干扰器

摘要: 5G蜂窝网络特别容易受到针对特定控制子信道的窄带干扰的影响。一种缓解方法是利用基于机器学习的在线观测系统来检测此类干扰攻击。我们提出利用一个预训练的机器学习模型，在物理层检测干扰，进行二元分类。基于实验性5G网络的数据，我们研究了不同分类模型的性能。卷积神经网络将与支持向量机和k-最近邻进行比较，后两种方法结合主成分分析。所获得的结果显示在分类准确度和计算时间方面存在显著差异。

更新时间: 2024-05-07 13:54:12

领域: eess.SP,cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.09564v1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.

Updated: 2024-05-07 13:50:40

标题: 花岗岩代码模型：用于代码智能的一系列开放基础模型

摘要: 在代码上训练的大型语言模型（LLMs）正在改变软件开发过程。越来越多的代码LLMs正在被集成到软件开发环境中，以提高人类程序员的生产力，而基于LLM的代理人也开始展现处理复杂任务的潜力。要实现代码LLMs的全部潜力，需要具备广泛的能力，包括代码生成、修复错误、解释和文档化代码、维护存储库等。在这项工作中，我们介绍了Granite系列的仅解码器代码模型，用于代码生成任务，训练过程中使用了116种编程语言编写的代码。Granite Code模型系列包括从3到34亿参数大小的模型，适用于从复杂应用现代化任务到设备内存受限用例的应用。对一系列任务的全面评估表明，Granite Code模型在可用的开源代码LLMs中持续达到最先进的性能。Granite Code模型系列针对企业软件开发工作流程进行了优化，并在各种编码任务（例如代码生成、修复和解释）中表现良好，使其成为一种多功能的代码模型。我们在Apache 2.0许可下发布了所有Granite Code模型，供研究和商业用途使用。

更新时间: 2024-05-07 13:50:40

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2405.04324v1

Beyond human subjectivity and error: a novel AI grading system

The grading of open-ended questions is a high-effort, high-impact task in education. Automating this task promises a significant reduction in workload for education professionals, as well as more consistent grading outcomes for students, by circumventing human subjectivity and error. While recent breakthroughs in AI technology might facilitate such automation, this has not been demonstrated at scale. It this paper, we introduce a novel automatic short answer grading (ASAG) system. The system is based on a fine-tuned open-source transformer model which we trained on large set of exam data from university courses across a large range of disciplines. We evaluated the trained model's performance against held-out test data in a first experiment and found high accuracy levels across a broad spectrum of unseen questions, even in unseen courses. We further compared the performance of our model with that of certified human domain experts in a second experiment: we first assembled another test dataset from real historical exams - the historic grades contained in that data were awarded to students in a regulated, legally binding examination process; we therefore considered them as ground truth for our experiment. We then asked certified human domain experts and our model to grade the historic student answers again without disclosing the historic grades. Finally, we compared the hence obtained grades with the historic grades (our ground truth). We found that for the courses examined, the model deviated less from the official historic grades than the human re-graders - the model's median absolute error was 44 % smaller than the human re-graders', implying that the model is more consistent than humans in grading. These results suggest that leveraging AI enhanced grading can reduce human subjectivity, improve consistency and thus ultimately increase fairness.

Updated: 2024-05-07 13:49:59

标题: 超越人类主观性和错误：一种新颖的人工智能评分系统

摘要: 开放性问题的评分是教育中一项高投入、高影响的任务。自动化这一任务有望显著减轻教育专业人员的工作量，同时通过规避人为主观性和错误，为学生提供更一致的评分结果。虽然最近人工智能技术的突破可能促进这种自动化，但尚未在大规模上进行证明。在本文中，我们介绍了一种新颖的自动短答案评分（ASAG）系统。该系统基于一个经过微调的开源变换器模型，我们使用来自大学课程的大量考试数据对其进行了训练，跨越了各种学科。我们在第一次实验中对训练模型的性能进行了评估，使用独立测试数据，发现在广泛范围的未见问题中具有高准确性水平，甚至在未见课程中也是如此。我们在第二次实验中进一步比较了我们模型与经过认证的人类领域专家的表现：我们首先从真实历史考试中组装了另一个测试数据集 - 该数据中包含的历史成绩是在受监管、具有法律约束力的考试过程中授予学生的；因此，我们将其视为我们实验的基准。然后，我们要求经过认证的人类领域专家和我们的模型再次对历史学生答案进行评分，而不透露历史成绩。最后，我们将因此获得的成绩与历史成绩（我们的基准）进行比较。我们发现，在所考察的课程中，模型与官方历史成绩的偏差比人类再评分者小，模型的中位绝对误差比人类再评分者的小44％，这意味着模型在评分方面比人类更一致。这些结果表明，利用人工智能增强评分可以减少人为主观性，提高一致性，从而最终增加公平性。

更新时间: 2024-05-07 13:49:59

领域: cs.AI

下载: http://arxiv.org/abs/2405.04323v1

Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images

Non--Contact Atomic Force Microscopy with CO--functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR--AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024--bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR--AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4\%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification. By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR--AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6\%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions

Updated: 2024-05-07 13:47:35

标题: 原子力显微镜图像中的分子指纹提取的分子识别

摘要: 非接触式原子力显微镜与CO功能化金属尖端（简称HR-AFM）提供了对吸附在表面上的单个分子内部结构的全新分辨率访问。先前的研究表明，深度学习（DL）模型可以检索编码在3D堆叠的恒定高度HR-AFM图像中的化学和结构信息，从而实现分子识别。在这项工作中，我们通过使用分子结构的一个成熟描述，即拓扑指纹，即半径为2的1024位扩展连接化学指纹（ECFP4），克服了它们的局限性，该指纹是为了次级结构和相似性搜索而开发的。 ECFP提供了分子的局部结构信息，每个位与分子中的特定次级结构相关。我们的DL模型能够从3D HR-AFM堆栈中提取这个优化的结构描述符，并通过虚拟筛选使用它来根据其预测的ECFP4识别分子，理论图像的检索准确率为95.4％。此外，与以前的DL模型不同，该方法为每个候选分子分配了一个置信度分数，即Tanimoto相似性，从而提供有关识别可靠性的信息。通过构建，某种次级结构在分子中出现的次数在散列过程中丢失，这是使其对机器学习应用有用所必需的。我们展示了可以通过另一个DL模型提供的全局信息，该模型从相同的HR-AFM堆栈中预测化学式，将基于指纹的虚拟筛选与全局信息相结合，将识别准确率提高到97.6％。最后，我们对实验图像进行了有限测试，取得了在实际条件下应用该流程的有希望的结果。

更新时间: 2024-05-07 13:47:35

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2405.04321v1

PatentGPT: A Large Language Model for Intellectual Property

In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain. Using this standard process, we have trained the PatentGPT series models based on open-source pretrained models. By evaluating them on the open-source IP-oriented benchmark MOZIP, our domain-specific LLMs outperforms GPT-4, indicating the effectiveness of the proposed training procedure and the expertise of the PatentGPT models in the IP domain. Remarkably, our model surpassed GPT-4 on the 2019 China Patent Agent Qualification Examination, scoring 65 and matching human expert levels. Additionally, the PatentGPT model, which utilizes the SMoE architecture, achieves performance comparable to that of GPT-4 in the IP domain and demonstrates a better cost-performance ratio on long-text tasks, potentially serving as an alternative to GPT-4 within the IP domain.

Updated: 2024-05-07 13:44:23

标题: 专利GPT：知识产权的大型语言模型

摘要: 近年来，由于大型语言模型(LLMs)在各种自然语言处理任务中表现出色，吸引了广泛关注，并在各个领域得到了广泛应用。然而，在知识产权(IP)领域应用大型语言模型是具有挑战性的，因为在这一领域需要专业知识、隐私保护和处理极长文本。在这份技术报告中，我们首次提出了一种低成本、标准化的培训IP导向LLMs的程序，满足知识产权领域的独特需求。通过使用这一标准流程，我们基于开源预训练模型训练了PatentGPT系列模型。通过在开源IP导向基准MOZIP上的评估，我们的领域特定LLMs表现优于GPT-4，表明了所提出的培训程序的有效性和PatentGPT模型在知识产权领域的专业知识。值得注意的是，我们的模型在2019年中国专利代理人资格考试中超过了GPT-4，得分为65，并与人类专家水平相匹配。此外，利用SMoE架构的PatentGPT模型在知识产权领域达到了与GPT-4相当的性能，并在长文本任务上表现出更好的成本性能比，可能成为知识产权领域内GPT-4的替代方案。

更新时间: 2024-05-07 13:44:23

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2404.18255v4

Material Property Prediction using Graphs based on Generically Complete Isometry Invariants

The structure-property hypothesis says that the properties of all materials are determined by an underlying crystal structure. The main obstacle was the ambiguity of conventional crystal representations based on incomplete or discontinuous descriptors that allow false negatives or false positives. This ambiguity was resolved by the ultra-fast Pointwise Distance Distribution (PDD), which distinguished all periodic structures in the world's largest collection of real materials (Cambridge Structural Database). The state-of-the-art results in property predictions were previously achieved by graph neural networks based on various graph representations of periodic crystals, including the Crystal Graph with vertices at all atoms in a crystal unit cell. This work adapts the Pointwise Distance Distribution for a simpler graph whose vertex set is not larger than the asymmetric unit of a crystal structure. The new Distribution Graph reduces mean-absolute-error by 0.6\%-12\% while having 44\%-88\% of the number of vertices when compared to the crystal graph when applied on the Materials Project and Jarvis-DFT datasets using CGCNN and ALIGNN. Methods for hyper-parameters selection for the graph are backed by the theoretical results of the Pointwise Distance Distribution and are then experimentally justified.

Updated: 2024-05-07 13:41:58

标题: 使用基于普遍完全同构不变量的图形预测材料属性

摘要: 结构-性质假设认为所有材料的性质都由潜在的晶体结构确定。主要障碍是传统晶体表示的模糊性，这些表示基于不完整或不连续的描述符，允许出现假阴性或假阳性。这种模糊性得以解决通过超快速的点对距离分布（PDD），该方法可以区分世界上最大的真实材料集合（剑桥结构数据库）中所有周期结构。先前基于各种周期晶体的图表示，包括具有晶体单元胞中所有原子的顶点的晶体图，实现了性质预测的最新结果。这项工作将点对距离分布调整为一个更简单的图，其顶点集不大于晶体结构的非对称单元。新的分布图在应用于Materials Project和Jarvis-DFT数据集时，使用CGCNN和ALIGNN，将平均绝对误差减少了0.6\%-12\%，同时在顶点数量上仅为晶体图的44\%-88\%。图的超参数选择方法得到了点对距离分布的理论结果的支持，随后在实验中得到了验证。

更新时间: 2024-05-07 13:41:58

领域: physics.comp-ph,cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2212.11246v3

Cross-IQA: Unsupervised Learning for Image Quality Assessment

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.

Updated: 2024-05-07 13:35:51

标题: 跨IQA：无监督学习用于图像质量评估

摘要: 自动感知图像质量是一个具有挑战性的问题，每天影响数十亿互联网和社交媒体用户。为了推进这一领域的研究，我们提出了一种基于视觉变换器（ViT）模型的无参考图像质量评估（NR-IQA）方法，名为Cross-IQA。所提出的Cross-IQA方法可以从未标记的图像数据中学习图像质量特征。我们构建了合成图像重建的预训练任务，以无监督方式提取基于ViT块的图像质量信息。Cross-IQA的预训练编码器用于微调线性回归模型进行得分预测。实验结果表明，与经典的全参考IQA和相同数据集下的NR-IQA相比，Cross-IQA在评估图像的低频降解信息（例如颜色变化、模糊等）方面表现出最先进的性能。

更新时间: 2024-05-07 13:35:51

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2405.04311v1

Improving Offline Reinforcement Learning with Inaccurate Simulators

Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.

Updated: 2024-05-07 13:29:41

标题: 使用不准确的模拟器改善离线强化学习

摘要: 离线强化学习（RL）提供了一种有望避免与真实环境进行昂贵在线交互的方法。然而，离线RL的性能高度依赖于数据集的质量，这可能会在学习过程中导致外推误差。在许多机器人应用中，通常会提供一个不准确的模拟器。然而，直接从不准确的模拟器收集的数据不能直接用于离线RL，因为众所周知的探索-利用困境和不准确模拟与真实环境之间的动态差距。为了解决这些问题，我们提出了一种新颖的方法，更好地结合离线数据集和不准确模拟数据。具体地，我们预先训练一个生成对抗网络（GAN）模型来拟合离线数据集的状态分布。在此基础上，我们从不准确模拟器中收集数据，从生成器提供的分布开始，并使用鉴别器重新加权模拟数据。我们在D4RL基准测试和一个真实世界的操纵任务中的实验结果证实，我们的方法可以更好地从不准确模拟器和有限的离线数据集中获益，以实现比最先进方法更好的性能。

更新时间: 2024-05-07 13:29:41

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04307v1

World Models for Autonomous Driving: An Initial Survey

In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration.

Updated: 2024-05-07 13:28:48

标题: 自动驾驶的世界模型：初步调查

摘要: 在自动驾驶领域快速发展的背景下，准确预测未来事件并评估其影响对于安全和效率至关重要，关键地促进决策过程。世界模型已经成为一种变革性方法，使自动驾驶系统能够综合和解释大量传感器数据，从而预测潜在的未来场景并弥补信息缺失。本文对自动驾驶中世界模型的当前状态和未来进展进行了初步审查，涵盖了它们的理论基础、实际应用以及正在进行的研究努力，旨在克服现有的限制。强调世界模型在推动自动驾驶技术发展中的重要作用，本调查旨在为研究社区提供一个基础参考，促进对这一新兴领域的迅速访问和理解，并激发持续的创新和探索。

更新时间: 2024-05-07 13:28:48

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2403.02622v3

A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields

Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use of harmful chemical pesticides that have negative health and environmental impacts. As a result, a large amount of pesticide is wasted on areas without significant pest infestation. This brings to attention the urgent need for an intelligent autonomous system that can locate and spray sufficiently large infestations selectively within the complex crop canopies. We have developed a large multi-scale dataset for aphid cluster detection and segmentation, collected from actual sorghum fields and meticulously annotated to include clusters of aphids. Our dataset comprises a total of 54,742 image patches, showcasing a variety of viewpoints, diverse lighting conditions, and multiple scales, highlighting its effectiveness for real-world applications. In this study, we trained and evaluated four real-time semantic segmentation models and three object detection models specifically for aphid cluster segmentation and detection. Considering the balance between accuracy and efficiency, Fast-SCNN delivered the most effective segmentation results, achieving 80.46% mean precision, 81.21% mean recall, and 91.66 frames per second (FPS). For object detection, RT-DETR exhibited the best overall performance with a 61.63% mean average precision (mAP), 92.6% mean recall, and 72.55 on an NVIDIA V100 GPU. Our experiments further indicate that aphid cluster segmentation is more suitable for assessing aphid infestations than using detection models.

Updated: 2024-05-07 13:27:58

标题: 一份新的数据集及在高粱田中蚜虫簇群检测和分割的比较研究

摘要: 蚜虫的大量侵害是导致小麦和高粱田地广泛损坏的主要原因之一，也是植物病毒的最常见传播媒介之一，导致农业产量显著减少。为解决这一问题，农民通常使用效率低下的有害化学农药，对健康和环境造成负面影响。因此，大量农药被浪费在没有显著害虫侵害的地区。这引起了对一种智能自主系统的迫切需求，该系统能够在复杂的作物冠层中定位并有选择性地喷洒足够大的虫害。我们开发了一个大型多尺度数据集，用于检测和分割蚜虫聚集，这些数据集是从实际高粱田地中收集的，并经过精心注释以包括蚜虫聚集。我们的数据集包括总共54,742个图像补丁，展示了各种视角、多样的光照条件和多种尺度，突出了它在实际应用中的有效性。在这项研究中，我们针对蚜虫聚集的分割和检测专门训练和评估了四种实时语义分割模型和三种目标检测模型。考虑到精度和效率之间的平衡，Fast-SCNN提供了最有效的分割结果，实现了80.46%的平均精度、81.21%的平均召回率和每秒91.66帧。对于目标检测，RT-DETR展示了最佳的整体性能，具有61.63%的平均精度(mAP)、92.6%的平均召回率，并在NVIDIA V100 GPU上达到72.55。我们的实验进一步表明，蚜虫聚集分割比使用检测模型更适合评估蚜虫侵害。

更新时间: 2024-05-07 13:27:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04305v1

TransformerFAM: Feedback attention is working memory

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

Updated: 2024-05-07 13:23:46

标题: TransformerFAM：反馈关注即工作记忆

摘要: Transformer已经彻底改变了深度学习，但是它们的二次复杂度限制了它们处理无限长输入的能力。我们提出了一种新颖的Transformer架构Feedback Attention Memory（FAM），利用反馈循环使网络能够关注自己的潜在表示。这种设计促进了Transformer内部工作记忆的出现，使其能够处理无限长的序列。TransformerFAM不需要额外的权重，可以无缝集成预训练模型。我们的实验表明，TransformerFAM显著改善了Transformer在长上下文任务中的性能，无论模型大小为1B、8B还是24B。这些结果展示了赋予大型语言模型（LLMs）处理无限长度序列的潜力。

更新时间: 2024-05-07 13:23:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.09173v3

Explainable Multi-Label Classification of MBTI Types

In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.

Updated: 2024-05-07 13:21:55

标题: MBTI类型的可解释性多标签分类解释

摘要: 在这项研究中，我们旨在确定最有效的机器学习模型，以从Reddit帖子和Kaggle数据集准确分类Myers-Briggs类型指标（MBTI）类型。我们应用了使用二元关联方法的多标签分类。我们采用可解释人工智能（XAI）方法来突出过程和结果的透明度和可理解性。为了实现这一目标，我们尝试了玻璃盒学习模型，即设计用于简单性、透明性和可解释性的模型。我们选择了k-最近邻、多项式朴素贝叶斯和逻辑回归作为玻璃盒模型。我们展示了如果排除具有观察者（S）特征的类别，则多项式朴素贝叶斯和k-最近邻表现更好，而当所有类别都有> 550条记录时，逻辑回归获得最佳结果。

更新时间: 2024-05-07 13:21:55

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2405.02349v2

A Novel Approach to Chest X-ray Lung Segmentation Using U-net and Modified Convolutional Block Attention Module

Lung segmentation in chest X-ray images is of paramount importance as it plays a crucial role in the diagnosis and treatment of various lung diseases. This paper presents a novel approach for lung segmentation in chest X-ray images by integrating U-net with attention mechanisms. The proposed method enhances the U-net architecture by incorporating a Convolutional Block Attention Module (CBAM), which unifies three distinct attention mechanisms: channel attention, spatial attention, and pixel attention. The channel attention mechanism enables the model to concentrate on the most informative features across various channels. The spatial attention mechanism enhances the model's precision in localization by focusing on significant spatial locations. Lastly, the pixel attention mechanism empowers the model to focus on individual pixels, further refining the model's focus and thereby improving the accuracy of segmentation. The adoption of the proposed CBAM in conjunction with the U-net architecture marks a significant advancement in the field of medical imaging, with potential implications for improving diagnostic precision and patient outcomes. The efficacy of this method is validated against contemporary state-of-the-art techniques, showcasing its superiority in segmentation performance.

Updated: 2024-05-07 13:21:19

标题: 一种使用U-net和修改的卷积块注意模块进行胸部X射线肺部分割的新方法

摘要: 在胸部X射线图像中的肺部分割至关重要，因为它在各种肺部疾病的诊断和治疗中起着至关重要的作用。本文提出了一种新颖的方法，通过将U-net与注意力机制相结合，在胸部X射线图像中进行肺部分割。所提出的方法通过整合卷积块注意力模块（CBAM）来增强U-net架构，该模块统一了三种不同的注意力机制：通道注意力、空间注意力和像素注意力。通道注意力机制使模型能够集中于各个通道中最具信息量的特征。空间注意力机制通过关注重要的空间位置来增强模型的定位精度。最后，像素注意力机制使模型能够专注于单个像素，进一步优化模型的焦点，从而提高分割的准确性。采用所提出的CBAM与U-net架构相结合，在医学成像领域标志着一项重大进步，具有提高诊断精度和患者预后的潜在影响。该方法的有效性通过与当代最先进技术的验证，展示了其在分割性能方面的优越性。

更新时间: 2024-05-07 13:21:19

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.14322v2

Behaviour Planning: A Toolkit for Diverse Planning

Diverse planning is the problem of generating plans with distinct characteristics. This is valuable for many real-world scenarios, including applications related to plan recognition and business process automation. In this work, we introduce \emph{Behaviour Planning}, a diverse planning toolkit that can characterise and generate diverse plans based on modular diversity models. We present a qualitative framework for describing diversity models, a planning approach for generating plans aligned with any given diversity model, and provide a practical implementation of an SMT-based behaviour planner. We showcase how the qualitative approach offered by Behaviour Planning allows it to overcome various challenges faced by previous approaches. Finally, the experimental evaluation shows the effectiveness of Behaviour Planning in generating diverse plans compared to state-of-the-art approaches.

Updated: 2024-05-07 13:18:22

标题: 行为规划：多样化规划的工具箱

摘要: 多样化规划是生成具有不同特征的计划的问题。这对于许多现实世界场景很有价值，包括与计划识别和业务流程自动化相关的应用。在这项工作中，我们引入了“行为规划”，一个多样化规划工具包，可以基于模块化多样性模型来描述和生成多样化计划。我们提出了一个描述多样性模型的定性框架，一个生成与任何给定多样性模型对齐的计划的规划方法，并提供了基于SMT的行为规划器的实际实现。我们展示了行为规划提供的定性方法如何克服先前方法面临的各种挑战。最后，实验评估显示了与最先进方法相比，行为规划在生成多样化计划方面的有效性。

更新时间: 2024-05-07 13:18:22

领域: cs.AI

下载: http://arxiv.org/abs/2405.04300v1

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it

Updated: 2024-05-07 13:15:47

标题: Paint-it: 通过深度卷积纹理映射优化和基于物理的渲染实现文本到纹理的合成

摘要: 我们提出了Paint-it，一种基于文本驱动的高保真度纹理贴图合成方法，通过神经重新参数化纹理优化来处理3D网格。Paint-it通过综合优化合成纹理贴图，利用得分蒸馏采样（SDS）。我们观察到直接应用SDS会产生质量不佳的纹理质量，因为其梯度噪声较大。我们揭示了在使用SDS时纹理参数化的重要性。具体地，我们提出了深度卷积物理渲染（DC-PBR）参数化，该参数化利用随机初始化的基于卷积的神经内核重新参数化基于物理的渲染（PBR）纹理贴图，而不是标准的基于像素的参数化。我们展示了DC-PBR固有地根据纹理频率安排优化课程，并自然地过滤掉SDS中的噪声信号。在实验中，Paint-it仅通过文本描述在15分钟内获得了出色的质量PBR纹理贴图。我们通过为大型网格数据集合成高质量纹理贴图并展示在流行的图形引擎中进行光照和材质控制的测试应用，展示了Paint-it的通用性和实用性。项目页面：https://kim-youwang.github.io/paint-it

更新时间: 2024-05-07 13:15:47

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2312.11360v2

Open Implementation and Study of BEST-RQ for Speech Processing

Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2.0. Despite BEST-RQ's great performance, details are lacking in the original paper, such as the amount of GPU/TPU hours used in pre-training, and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR and speech translation. In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks. We discuss the details and differences of our implementation. We show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.

Updated: 2024-05-07 13:11:37

标题: 开放式实现和研究BEST-RQ用于语音处理

摘要: 自监督学习（Self-Supervised Learning，SSL）已被证明在各种语音任务中非常有用。然而，这些方法通常在数据、内存和计算资源方面要求很高。基于BERT的具有随机投影量化器的语音预训练（BEST-RQ）是一种SSL方法，已在自动语音识别（ASR）中表现出色，同时比其他SSL方法（如wav2vec 2.0）更简单。尽管BEST-RQ表现出色，但原始论文缺乏详细信息，如预训练中使用的GPU/TPU小时数量，并且没有官方易于使用的开源实现。此外，BEST-RQ尚未在ASR和语音翻译之外的其他下游任务上进行评估。在这项工作中，我们描述了一个随机投影量化器的重新实现，并对其与wav2vec 2.0在四个下游任务上进行初步研究进行了比较。我们讨论了我们实现的细节和差异。我们表明，随机投影量化器可以实现与wav2vec 2.0相似的下游性能，同时将训练时间减少了两倍以上。

更新时间: 2024-05-07 13:11:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.04296v1

Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework

Structured finance, which involves restructuring diverse assets into securities like MBS, ABS, and CDOs, enhances capital market efficiency but presents significant due diligence challenges. This study explores the integration of artificial intelligence (AI) with traditional asset review processes to improve efficiency and accuracy in structured finance. Using both open-sourced and close-sourced large language models (LLMs), we demonstrate that AI can automate the verification of information between loan applications and bank statements effectively. While close-sourced models such as GPT-4 show superior performance, open-sourced models like LLAMA3 offer a cost-effective alternative. Dual-agent systems further increase accuracy, though this comes with higher operational costs. This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.

Updated: 2024-05-07 13:09:49

标题: 提高结构化金融中基础资产审核的效率和准确性：多智能体框架的应用

摘要: 结构化金融涉及将各种资产重组为MBS、ABS和CDO等证券，提高了资本市场的效率，但也带来了重大的尽职调查挑战。本研究探讨了人工智能（AI）与传统资产审查流程的整合，以提高结构化金融中的效率和准确性。通过使用开源和封闭源的大型语言模型（LLMs），我们证明了AI可以有效地自动化贷款申请和银行对账单之间的信息核实。尽管像GPT-4这样的封闭源模型表现更好，但类似LLAMA3这样的开源模型提供了一种经济实惠的替代方案。双代理系统进一步提高了准确性，尽管这会带来更高的运营成本。这项研究突显了AI在最小化人为错误和简化尽职调查方面的潜力，建议在金融文件分析和风险管理中广泛应用AI。

更新时间: 2024-05-07 13:09:49

领域: cs.AI

下载: http://arxiv.org/abs/2405.04294v1

Mitigating Clickbait: An Approach to Spoiler Generation Using Multitask Learning

This study introduces 'clickbait spoiling', a novel technique designed to detect, categorize, and generate spoilers as succinct text responses, countering the curiosity induced by clickbait content. By leveraging a multi-task learning framework, our model's generalization capabilities are significantly enhanced, effectively addressing the pervasive issue of clickbait. The crux of our research lies in generating appropriate spoilers, be it a phrase, an extended passage, or multiple, depending on the spoiler type required. Our methodology integrates two crucial techniques: a refined spoiler categorization method and a modified version of the Question Answering (QA) mechanism, incorporated within a multi-task learning paradigm for optimized spoiler extraction from context. Notably, we have included fine-tuning methods for models capable of handling longer sequences to accommodate the generation of extended spoilers. This research highlights the potential of sophisticated text processing techniques in tackling the omnipresent issue of clickbait, promising an enhanced user experience in the digital realm.

Updated: 2024-05-07 13:09:25

标题: 减轻点击诱导：使用多任务学习生成剧透的方法

摘要: 这项研究介绍了“点击诱导剧透”（clickbait spoiling），这是一种旨在检测、分类和生成简洁文本回应的新技术，以应对点击诱导内容引发的好奇心。通过利用多任务学习框架，我们模型的泛化能力得到显著增强，有效应对了点击诱导的普遍问题。我们研究的核心在于生成适当的剧透内容，无论是短语、扩展段落还是多个，取决于所需的剧透类型。我们的方法论整合了两种关键技术：精细的剧透分类方法和修改版的问答（QA）机制，结合在一个多任务学习范式中，以优化从上下文中提取剧透。值得注意的是，我们已经包含了针对能够处理更长序列的模型的微调方法，以适应生成扩展剧透内容。这项研究强调了复杂文本处理技术在解决点击诱导这一无处不在问题中的潜力，承诺在数字领域提供增强的用户体验。

更新时间: 2024-05-07 13:09:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04292v1

Approximate Bayesian Class-Conditional Models under Continuous Representation Shift

For models consisting of a classifier in some representation space, learning online from a non-stationary data stream often necessitates changes in the representation. So, the question arises of what is the best way to adapt the classifier to shifts in representation. Current methods only slowly change the classifier to representation shift, introducing noise into learning as the classifier is misaligned to the representation. We propose DeepCCG, an empirical Bayesian approach to solve this problem. DeepCCG works by updating the posterior of a class conditional Gaussian classifier such that the classifier adapts in one step to representation shift. The use of a class conditional Gaussian classifier also enables DeepCCG to use a log conditional marginal likelihood loss to update the representation. To perform the update to the classifier and representation, DeepCCG maintains a fixed number of examples in memory and so a key part of DeepCCG is selecting what examples to store, choosing the subset that minimises the KL divergence between the true posterior and the posterior induced by the subset. We explore the behaviour of DeepCCG in online continual learning (CL), demonstrating that it performs well against a spectrum of online CL methods and that it reduces the change in performance due to representation shift.

Updated: 2024-05-07 13:05:39

标题: 近似贝叶斯类条件模型在连续表征偏移下的研究

摘要: 对于由某个表示空间中的分类器组成的模型，从非稳态数据流中在线学习通常需要对表示进行更改。因此，如何最好地使分类器适应表示变化成为一个问题。当前的方法只是缓慢地改变分类器以适应表示变化，导致学习中引入噪音，因为分类器与表示不一致。我们提出了DeepCCG，这是一种经验贝叶斯方法来解决这个问题。DeepCCG通过更新类条件高斯分类器的后验概率使得分类器在一步中适应表示变化。使用类条件高斯分类器还使得DeepCCG能够使用对数条件边际似然损失来更新表示。为了对分类器和表示进行更新，DeepCCG在内存中维护了固定数量的示例，因此DeepCCG的一个关键部分是选择存储哪些示例，选择最小化真后验分布与子集诱导的后验分布之间的KL散度的子集。我们探索了DeepCCG在在线持续学习（CL）中的行为，展示它在一系列在线CL方法中表现良好，并且减少了由于表示变化而导致的性能变化。

更新时间: 2024-05-07 13:05:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.19076v2

Accelerating Material Property Prediction using Generically Complete Isometry Invariants

Periodic material or crystal property prediction using machine learning has grown popular in recent years as it provides a computationally efficient replacement for classical simulation methods. A crucial first step for any of these algorithms is the representation used for a periodic crystal. While similar objects like molecules and proteins have a finite number of atoms and their representation can be built based upon a finite point cloud interpretation, periodic crystals are unbounded in size, making their representation more challenging. In the present work, we adapt the Pointwise Distance Distribution (PDD), a continuous and generically complete isometry invariant for periodic point sets, as a representation for our learning algorithm. The PDD distinguished all (more than 660 thousand) periodic crystals in the Cambridge Structural Database as purely periodic sets of points without atomic types. We develop a transformer model with a modified self-attention mechanism that combines PDD with compositional information via a spatial encoding method. This model is tested on the crystals of the Materials Project and Jarvis-DFT databases and shown to produce accuracy on par with state-of-the-art methods while being several times faster in both training and prediction time.

Updated: 2024-05-07 13:05:07

标题: 使用通用完备等距不变量加速材料性质预测

摘要: 最近几年来，使用机器学习预测周期性材料或晶体性质已经变得流行，因为它为经典模拟方法提供了计算效率更高的替代方案。任何这些算法的关键第一步是用于周期性晶体的表示。虽然类似的对象如分子和蛋白质具有有限数量的原子，它们的表示可以基于有限点云解释构建，但周期性晶体的大小是无限的，使它们的表示更具挑战性。在这项研究中，我们将点对点距离分布（PDD）作为我们学习算法的表示，PDD是一种连续和通用完备的等距不变性，适用于周期性点集。PDD区分了剑桥结构数据库中所有（超过66万个）周期性晶体，将它们纯粹视为点集而不考虑原子类型。我们开发了一个带有修改的自注意机制的变换器模型，通过空间编码方法将PDD与组合信息结合。该模型在Materials Project和Jarvis-DFT数据库的晶体上进行了测试，显示出与最先进方法相当的准确性，同时在训练和预测时间上快几倍。

更新时间: 2024-05-07 13:05:07

领域: cs.LG,cs.CG,physics.comp-ph

下载: http://arxiv.org/abs/2401.15089v2

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-domain multi-task mixture-of-experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.

Updated: 2024-05-07 12:54:56

标题: M3oE：多领域多任务专家混合推荐框架

摘要: 多域推荐和多任务推荐已经证明了它们在利用不同领域和目标的共同信息进行全面用户建模方面的有效性。然而，实际推荐通常面临同时处理多个领域和任务的挑战，这不能很好地通过当前的方法解决。为此，我们引入了M3oE，一种自适应的多领域多任务专家混合推荐框架。M3oE集成了多领域信息，跨领域和任务之间的知识映射，并优化多个目标。我们利用三个专家混合模块分别学习用户的共同偏好、领域方面和任务方面，以解决多个领域和任务之间复杂依赖关系的问题。此外，我们设计了一个两级融合机制，精确控制跨不同领域和任务的特征提取和融合。该框架的适应性进一步通过应用AutoML技术得到增强，这允许动态结构优化。据作者所知，我们的M3oE是第一个自适应解决多领域多任务推荐问题的尝试。针对两个基准数据集进行的广泛实验与各种基线对比表明了M3oE的优趀性能。实现代码可用于确保可复现性。

更新时间: 2024-05-07 12:54:56

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.18465v2

On the Foundations of Earth and Climate Foundation Models

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.

Updated: 2024-05-07 12:54:54

标题: 关于地球和气候基础模型的研究

摘要: 基础模型在推动地球和气候科学方面具有巨大潜力，然而，当前的方法可能并不是最佳的，因为它们只关注了一个理想的地球和气候基础模型的一些基本特征。为了打造理想的地球基础模型，我们定义了十一个特征，这些特征将使这样一个基础模型以环境和人为中心的方式对任何地球科学后续应用都具有益处。我们进一步探讨了实现理想模型和评估地球基础模型的前进方式。基础模型之后会出现什么？能源效率的适应、对抗性防御和可解释性是新兴方向之一。

更新时间: 2024-05-07 12:54:54

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2405.04285v1

Uncertainty Quantification Metrics for Deep Regression

When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.

Updated: 2024-05-07 12:46:45

标题: 深度回归的不确定性量化指标

摘要: 在将深度神经网络部署到机器人或其他物理系统时，学习模型应可靠地量化预测不确定性。可靠的不确定性使下游模块能够推断其行为的安全性。在这项工作中，我们着眀于评估这种不确定性的度量标准。具体而言，我们专注于回归任务，并研究稀疏化误差下的面积（AUSE）、校准误差、斯皮尔曼秩相关和负对数似然（NLL）。使用合成回归数据集，我们研究了这些度量标准在四种典型不确定性类型下的行为，它们在测试集大小方面的稳定性，并揭示它们的优势和劣势。我们的研究结果表明，校准误差是最稳定和可解释的度量标准，但AUSE和NLL也有各自的用例。我们不建议使用斯皮尔曼秩相关来评估不确定性，并建议用AUSE替代。

更新时间: 2024-05-07 12:46:45

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.04278v1

Zero-Shot Stitching in Reinforcement Learning using Relative Representations

Visual Reinforcement Learning is a popular and powerful framework that takes full advantage of the Deep Learning breakthrough. However, it is also known that variations in the input (e.g., different colors of the panorama due to the season of the year) or the task (e.g., changing the speed limit for a car to respect) could require complete retraining of the agents. In this work, we leverage recent developments in unifying latent representations to demonstrate that it is possible to combine the components of an agent, rather than retrain it from scratch. We build upon the recent relative representations framework and adapt it for Visual RL. This allows us to create completely new agents capable of handling environment-task combinations never seen during training. Our work paves the road toward a more accessible and flexible use of reinforcement learning.

Updated: 2024-05-07 12:45:07

标题: 强化学习中使用相对表示进行零样本拼接

摘要: 视觉强化学习是一种流行且强大的框架，充分利用了深度学习的突破。然而，众所周知，输入的变化（例如，由于一年四季的不同，全景的颜色不同）或任务的变化（例如，改变汽车的速度限制以尊重）可能需要完全重新训练代理。在这项工作中，我们利用最近统一潜在表示的发展来证明可以组合代理的组件，而不是从头开始重新训练它。我们建立在最近的相对表示框架之上，并将其调整为视觉强化学习。这使我们能够创建全新的代理，能够处理在训练期间从未见过的环境-任务组合。我们的工作为更易于访问和灵活使用强化学习铺平了道路。

更新时间: 2024-05-07 12:45:07

领域: cs.LG,cs.AI,cs.CV,68T07,I.2.6

下载: http://arxiv.org/abs/2404.12917v2

Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.

Updated: 2024-05-07 12:42:32

标题: 利用热带珊瑚礁、鸟类和无关声音，在海洋生物声学中实现卓越的迁移学习

摘要: 机器学习有潜力彻底改变生态评估中的被动声学监测（PAM）。然而，高昂的标注和计算成本限制了该领域的有效性。通用的预训练网络可以克服这些成本，但高质量的预训练需要庞大的标注库，目前主要限于鸟类分类。在这里，我们使用珊瑚礁生物声学确定了数据不足领域的最佳预训练策略。我们组建了ReefSet，一个大型的珊瑚礁声音标注库，虽然与鸟类库相比较小，仅占样本数量的2%。通过测试少样本迁移学习性能，我们观察到在鸟类音频上预训练相比于仅在ReefSet或无关音频上预训练具有显著优越的泛化能力。然而，我们的关键发现表明，利用鸟类、珊瑚礁和无关音频进行交叉领域混合预训练可以最大化珊瑚礁的泛化能力。我们的预训练网络SurfPerch为利用最少标注和计算成本进行海洋PAM数据自动分析提供了坚实基础。

更新时间: 2024-05-07 12:42:32

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.16436v2

Class-Balanced and Reinforced Active Learning on Graphs

Graph neural networks (GNNs) have demonstrated significant success in various applications, such as node classification, link prediction, and graph classification. Active learning for GNNs aims to query the valuable samples from the unlabeled data for annotation to maximize the GNNs' performance at a lower cost. However, most existing algorithms for reinforced active learning in GNNs may lead to a highly imbalanced class distribution, especially in highly skewed class scenarios. GNNs trained with class-imbalanced labeled data are susceptible to bias toward majority classes, and the lower performance of minority classes may lead to a decline in overall performance. To tackle this issue, we propose a novel class-balanced and reinforced active learning framework for GNNs, namely, GCBR. It learns an optimal policy to acquire class-balanced and informative nodes for annotation, maximizing the performance of GNNs trained with selected labeled nodes. GCBR designs class-balance-aware states, as well as a reward function that achieves trade-off between model performance and class balance. The reinforcement learning algorithm Advantage Actor-Critic (A2C) is employed to learn an optimal policy stably and efficiently. We further upgrade GCBR to GCBR++ by introducing a punishment mechanism to obtain a more class-balanced labeled set. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed approaches, achieving superior performance over state-of-the-art baselines.

Updated: 2024-05-07 12:42:02

标题: 平衡类别和强化主动学习在图上的应用

摘要: 图神经网络（GNNs）在各种应用中取得了显著的成功，如节点分类、链接预测和图分类。GNNs的主动学习旨在从未标记数据中查询有价值的样本进行注释，以最大化GNNs的性能并降低成本。然而，大多数现有的针对GNNs的强化主动学习算法可能导致高度不平衡的类分布，尤其在高度倾斜的类别场景中。通过使用类别不平衡的标记数据训练的GNNs容易对多数类别产生偏见，而少数类别的性能下降可能导致整体性能下降。为了解决这个问题，我们提出了一种新颖的面向GNNs的类平衡和强化主动学习框架，即GCBR。它学习了一种获取类平衡和信息丰富节点进行注释的最佳策略，从而最大化使用选定标记节点训练的GNNs的性能。GCBR设计了类平衡感知状态，以及一个实现模型性能和类平衡之间权衡的奖励函数。采用增强学习算法Advantage Actor-Critic（A2C）稳定高效地学习最佳策略。我们通过引入惩罚机制将GCBR升级为GCBR++，以获得更加类平衡的标记集。在多个数据集上进行的大量实验证明了提出方法的有效性，其性能优于最先进的基线模型。

更新时间: 2024-05-07 12:42:02

领域: cs.LG

下载: http://arxiv.org/abs/2402.10074v3

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.

Updated: 2024-05-07 12:41:31

标题: BUDDy: 使用扩散模型进行单声道盲源解混的无监督去混响

摘要: 本文提出了一种基于后验采样和扩散模型的无监督单声道方法，用于联合盲目去混响和房间冲激响应估计。我们使用具有指数衰减的滤波器来参数化混响算子的每个频率子带，并在语音发声沿反向扩散轨迹细化时迭代估计相应的参数。一个测量一致性标准强化了生成的语音与混响测量的忠实度，而一个无条件的扩散模型实现了干净语音生成的强先验。在没有任何房间冲激响应或耦合混响-无混响数据的情况下，我们可以成功地在各种声学场景中进行去混响。我们的方法显著优于以前的盲目无监督基线，并且我们展示了与盲目监督方法相比，其对未见声学条件的提高鲁棒性。音频样本和代码可在线获取。

更新时间: 2024-05-07 12:41:31

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2405.04272v1

Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.

Updated: 2024-05-07 12:37:55

标题: 分布式元强化学习中集体开放式探索的出现

摘要: 最近的研究表明，通过元强化学习在开放性任务分布上进行自我对抗训练的智能体可以产生复杂的合作行为。虽然结果令人印象深刻，但我们认为自我对抗和其他集中式训练技术并不能准确反映出在自然界中通用的集体探索策略是如何产生的：通过去中心化训练和在开放性任务分布上。因此，在这项工作中，我们研究了集体探索策略的出现，其中多个智能体在开放式任务分布上元学习独立的循环策略。为此，我们引入了一个新颖的环境，其中包含一个开放式的程序生成任务空间，动态地将来自五种不同任务类型的多个子任务组合在一起，形成了一个广泛的任务树分布。我们展示了在我们的环境中训练的去中心化智能体在测试时面对新对象时表现出强大的泛化能力。此外，尽管在训练过程中从未被强制合作，这些智能体学会了集体探索策略，使它们能够解决在训练过程中从未遇到过的新任务。我们进一步发现，智能体学到的集体探索策略扩展到了一个开放式任务设置，使它们能够解决深度是训练中所见两倍的任务树。我们的开源代码以及智能体的视频可以在我们的伴侣网站上找到。

更新时间: 2024-05-07 12:37:55

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2311.00651v3

The Detection of a Possible Exoplanet Orbiting KIC 1718360 Using Machine Learning

This paper presents the detection of a periodic dimming event in the lightcurve of the G1.5IV-V type star KIC 1718360. This is based on visible-light observations conducted by both the TESS and Kepler space telescopes. Analysis of the data points toward a possible orbiting body with a radius of approximately 1.048 Earth Radii with a period of 2.938 days, as well as a semi-major axis of 0.04 AU. The initial observation was made in Kepler Quarter 16 data using the One-Class SVM machine learning method. Subsequent observations by the TESS space telescope corroborate these findings. While still requiring further data to validate, these results may contribute to a growing body of data of Earthlike planets with short-period orbits.

Updated: 2024-05-07 12:34:18

标题: 使用机器学习检测可能围绕KIC 1718360存在的外行星

摘要: 这篇论文介绍了在G1.5IV-V型恒星KIC 1718360的光变曲线中检测到周期性变暗事件。这是基于TESS和Kepler空间望远镜进行的可见光观测。数据分析指向可能存在一个半径约为1.048地球半径、周期为2.938天的轨道天体，以及一个半长轴为0.04天文单位的可能性。最初的观测是在Kepler第16季度数据中使用One-Class SVM机器学习方法进行的。TESS空间望远镜的后续观测证实了这些发现。虽然仍需要进一步数据来验证，但这些结果可能有助于不断增长的具有短周期轨道的类地行星数据。

更新时间: 2024-05-07 12:34:18

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2405.05282v1

Verified Neural Compressed Sensing

We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.

Updated: 2024-05-07 12:20:12

标题: 已验证的神经压缩感知

摘要: 我们开发了第一个（据我们所知）经过证明正确的神经网络，用于精确的计算任务，其正确性证明由自动验证算法生成，无需任何人工输入。以前关于神经网络验证的工作集中在部分规范上，即使满足了这些规范，也不能确保神经网络永远不会出错。我们专注于将神经网络验证应用于具有精确正确性概念的计算任务，其中经过验证的正确神经网络可以证明地解决手头的任务，没有任何附加条件。特别地，我们开发了一种方法来训练和验证第一个经过证明正确的神经网络，用于压缩感知，即从小于向量维度的测量中恢复稀疏向量。我们展示了对于适度的问题维度（高达50），我们可以训练神经网络，可以从线性和二进制线性测量中经过证明地恢复一个稀疏向量。此外，我们展示了网络的复杂性（神经元/层的数量）可以根据问题难度进行调整，并解决传统压缩感知方法无法证明有效的问题。

更新时间: 2024-05-07 12:20:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04260v1

Proceedings of the 2nd International Workshop on Adaptive Cyber Defense

The 2nd International Workshop on Adaptive Cyber Defense was held at the Florida Institute of Technology, Florida. This workshop was organized to share research that explores unique applications of Artificial Intelligence (AI) and Machine Learning (ML) as foundational capabilities for the pursuit of adaptive cyber defense. The cyber domain cannot currently be reliably and effectively defended without extensive reliance on human experts. Skilled cyber defenders are in short supply and often cannot respond fast enough to cyber threats. Building on recent advances in AI and ML the Cyber defense research community has been motivated to develop new dynamic and sustainable defenses through the adoption of AI and ML techniques to cyber settings. Bridging critical gaps between AI and Cyber researchers and practitioners can accelerate efforts to create semi-autonomous cyber defenses that can learn to recognize and respond to cyber attacks or discover and mitigate weaknesses in cooperation with other cyber operation systems and human experts. Furthermore, these defenses are expected to be adaptive and able to evolve over time to thwart changes in attacker behavior, changes in the system health and readiness, and natural shifts in user behavior over time. The workshop was comprised of invited keynote talks, technical presentations and a panel discussion about how AI/ML can enable autonomous mitigation of current and future cyber attacks. Workshop submissions were peer reviewed by a panel of domain experts with a proceedings consisting of six technical articles exploring challenging problems of critical importance to national and global security. Participation in this workshop offered new opportunities to stimulate research and innovation in the emerging domain of adaptive and autonomous cyber defense.

Updated: 2024-05-07 12:18:55

标题: 第二届国际自适应网络防御研讨会论文集

摘要: 第二届自适应网络防御国际研讨会在佛罗里达理工学院举行。本次研讨会旨在分享探索人工智能（AI）和机器学习（ML）作为自适应网络防御基础能力的独特应用的研究成果。当前网络领域无法在没有广泛依赖人类专家的情况下可靠有效地进行防御。熟练的网络防御者供不应求，并且通常无法及时对抗网络威胁。借鉴最新的人工智能和机器学习进展，网络防御研究社区积极开发通过采用AI和ML技术来应对网络环境的新型动态和可持续防御措施。弥合AI和网络研究人员与从业者之间的关键差距可以加速努力，创造出可以学习识别和应对网络攻击或在与其他网络操作系统和人类专家合作中发现和减轻弱点的半自主网络防御。此外，这些防御措施预计将是自适应的，并能够随着时间的推移不断演变，以应对攻击者行为的变化，系统健康和准备状况的变化以及用户行为随着时间的推移而发生的自然变化。研讨会由邀请的主题演讲、技术报告和关于AI/ML如何实现当前和未来网络攻击的自主化缓解的小组讨论组成。研讨会提交内容由一组领域专家进行同行评审，包括六篇技术文章，探讨对国家和全球安全至关重要的具有挑战性的问题。参加本次研讨会为在新兴领域自适应和自主网络防御中刺激研究和创新提供了新的机会。

更新时间: 2024-05-07 12:18:55

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2308.09520v4

VAEneu: A New Avenue for VAE Application on Probabilistic Forecasting

This paper presents VAEneu, an innovative autoregressive method for multistep ahead univariate probabilistic time series forecasting. We employ the conditional VAE framework and optimize the lower bound of the predictive distribution likelihood function by adopting the Continuous Ranked Probability Score (CRPS), a strictly proper scoring rule, as the loss function. This novel pipeline results in forecasting sharp and well-calibrated predictive distribution. Through a comprehensive empirical study, VAEneu is rigorously benchmarked against 12 baseline models across 12 datasets. The results unequivocally demonstrate VAEneu's remarkable forecasting performance. VAEneu provides a valuable tool for quantifying future uncertainties, and our extensive empirical study lays the foundation for future comparative studies for univariate multistep ahead probabilistic forecasting.

Updated: 2024-05-07 12:13:11

标题: VAEneu：VAE在概率预测中的新应用途径

摘要: 本文介绍了VAEneu，一种创新的自回归方法，用于多步齐向量概率时间序列预测。我们采用条件VAE框架，并通过采用连续排名概率分数（CRPS）作为损失函数来优化预测分布可能性函数的下限。这种新颖的流程导致了预测尖锐且校准良好的预测分布。通过全面的实证研究，VAEneu在12个数据集上与12个基准模型进行了严格的基准测试。结果明确表明VAEneu具有卓越的预测性能。VAEneu为量化未来不确定性提供了有价值的工具，我们广泛的实证研究为未来的单变量多步齐向量概率预测的比较研究奠定了基础。

更新时间: 2024-05-07 12:13:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04252v1

A General Model for Detecting Learner Engagement: Implementation and Evaluation

Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.

Updated: 2024-05-07 12:11:15

标题: 一个检测学习者参与度的综合模型：实施与评估

摘要: 考虑到学习者的参与对学习者和教师都有互惠益处。教师可以帮助学习者提高他们的注意力、参与度、动机和兴趣。另一方面，教师可以通过评估所有学习者的累积结果并升级他们的培训计划来提高他们的教学表现。本文提出了一个通用的、轻量级的模型，用于选择和处理特征以检测学习者的参与水平，同时保留随时间的顺序时间关系。在训练和测试过程中，我们分析了公开可用的DAiSEE数据集中的视频，捕捉学习者参与动态的本质。我们还提出了一种适应性策略，用于找到利用与教育相关的数据集的情感状态的新标签，从而改善模型的判断。建议的模型在特定实现中实现了68.57%的准确率，并在检测学习者的参与水平方面胜过了研究的最新模型。

更新时间: 2024-05-07 12:11:15

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.04251v1

Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks

As Internet of Things (IoT) technology advances, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Cooperative Inference Systems (CISs) address this performance trade-off by enabling smaller devices to offload part of their inference tasks to more capable devices. These systems often deploy hierarchical models that share numerous parameters, exemplified by Deep Neural Networks (DNNs) that utilize strategies like early exits or ordered dropout. In such instances, Federated Learning (FL) may be employed to jointly train the models within a CIS. Yet, traditional training methods have overlooked the operational dynamics of CISs during inference, particularly the potential high heterogeneity in serving rates across clients. To address this gap, we propose a novel FL approach designed explicitly for use in CISs that accounts for these variations in serving rates. Our framework not only offers rigorous theoretical guarantees, but also surpasses state-of-the-art (SOTA) training algorithms for CISs, especially in scenarios where inference request rates or data availability are uneven among clients.

Updated: 2024-05-07 12:07:06

标题: 联邦学习用于合作推理系统：早期退出网络的案例

摘要: 随着物联网（IoT）技术的进步，传感器和智能手机等终端设备逐渐配备了适合其本地内存和计算约束的人工智能模型。本地推理降低了通信成本和延迟；然而，这些较小的模型通常表现不佳，与部署在边缘服务器或云中的更复杂模型相比。合作推理系统（CISs）通过使较小设备将部分推理任务卸载到更强大的设备来解决这种性能折衷。这些系统通常部署共享许多参数的分层模型，例如利用早期退出或有序退出等策略的深度神经网络（DNNs）。在这种情况下，联邦学习（FL）可以用于联合训练CIS中的模型。然而，传统的训练方法忽略了推理过程中CIS的操作动态，尤其是客户端之间服务速率可能存在高度异质性的情况。为了解决这一差距，我们提出了一种专门设计用于CIS的新颖FL方法，考虑了这些服务速率的变化。我们的框架不仅提供严格的理论保证，而且在CIS中超越了最先进的训练算法，特别是在客户端之间推理请求率或数据可用性不均匀的情况下。

更新时间: 2024-05-07 12:07:06

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2405.04249v1

Exploring Correlations of Self-supervised Tasks for Graphs

Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations. Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations. Through this process, we unveil the task correlations between various self-supervised tasks and can measure their expressive capabilities, which are closely related to downstream performance. By analyzing the correlation values between tasks across various datasets, we reveal the complexity of task correlations and the limitations of existing multi-task learning methods. To obtain more capable representations, we propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training. The experimental results indicate that our method significantly outperforms existing methods across various downstream tasks.

Updated: 2024-05-07 12:02:23

标题: 探究图中自监督任务的相关性

摘要: 图形自监督学习在训练具有信息量的表示时引发了一波研究热潮，而无需访问任何标记数据。然而，我们对图形自监督学习的理解仍然有限，各种自监督任务之间的内在关系仍未被探索。我们的论文旨在基于任务相关性提供对图形自监督学习的新理解。具体地，我们评估由一个特定任务训练的表示在其他任务上的性能，并定义相关值以量化任务相关性。通过这一过程，我们揭示了各种自监督任务之间的任务相关性，并可以衡量它们的表现能力，这与下游性能密切相关。通过分析不同数据集之间任务之间的相关值，我们揭示了任务相关性的复杂性以及现有多任务学习方法的局限性。为了获得更具能力的表示，我们提出了图形任务相关建模（GraphTCM）来描绘任务相关性，并利用它来增强图形自监督训练。实验结果表明，我们的方法在各种下游任务中显著优于现有方法。

更新时间: 2024-05-07 12:02:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04245v1

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts. The recurrent unit is a two-stage segmenter built upon a frozen VLM. Thus, our model retains the VLM's broad vocabulary space and equips it with segmentation ability. Experiments show that our method outperforms not only the training-free counterparts, but also those fine-tuned with millions of data samples, and sets the new state-of-the-art records for both zero-shot semantic and referring segmentation. Concretely, we improve the current record by 28.8, 16.0, and 6.9 mIoU on Pascal VOC, COCO Object, and Pascal Context.

Updated: 2024-05-07 12:00:34

标题: CLIP作为RNN：无需训练即可分割无数视觉概念

摘要: 现有的开放词汇图像分割方法需要在掩模标签和/或图像文本数据集上进行微调步骤。掩模标签是劳动密集型的，这限制了分割数据集中类别的数量。因此，在微调之后，预训练的VLM的词汇容量严重减少。然而，如果不进行微调，受弱图像文本监督训练的VLM往往会做出次优的掩模预测。为了缓解这些问题，我们引入了一个新颖的循环框架，逐渐过滤掉不相关的文本，并提高掩模质量而无需训练。循环单元是一个建立在冻结的VLM之上的两阶段分割器。因此，我们的模型保留了VLM的广泛词汇空间，并赋予了它分割能力。实验证明，我们的方法不仅优于无需训练的对应方法，还优于那些用数百万数据样本微调的方法，并为零样本语义和指代分割设定了新的最先进记录。具体而言，在Pascal VOC、COCO Object和Pascal Context上，我们将当前记录提高了28.8、16.0和6.9个mIoU。

更新时间: 2024-05-07 12:00:34

领域: cs.CV,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2312.07661v3

Exploring the Potential of Robot-Collected Data for Training Gesture Classification Systems

Sensors and Artificial Intelligence (AI) have revolutionized the analysis of human movement, but the scarcity of specific samples presents a significant challenge in training intelligent systems, particularly in the context of diagnosing neurodegenerative diseases. This study investigates the feasibility of utilizing robot-collected data to train classification systems traditionally trained with human-collected data. As a proof of concept, we recorded a database of numeric characters using an ABB robotic arm and an Apple Watch. We compare the classification performance of the trained systems using both human-recorded and robot-recorded data. Our primary objective is to determine the potential for accurate identification of human numeric characters wearing a smartwatch using robotic movement as training data. The findings of this study offer valuable insights into the feasibility of using robot-collected data for training classification systems. This research holds broad implications across various domains that require reliable identification, particularly in scenarios where access to human-specific data is limited.

Updated: 2024-05-07 11:58:34

标题: 探索机器人收集数据在训练手势分类系统中的潜力

摘要: 传感器和人工智能（AI）已经彻底改变了人体运动分析，但特定样本的稀缺性在训练智能系统方面提出了重大挑战，特别是在诊断神经退行性疾病的背景下。本研究探讨了利用机器人收集的数据训练传统上使用人类收集数据的分类系统的可行性。作为概念验证，我们使用ABB机器人臂和Apple Watch记录了一个数字字符数据库。我们比较了使用人类记录和机器人记录数据训练的系统的分类性能。我们的主要目标是确定使用机器人运动作为训练数据准确识别戴着智能手表的人类数字字符的潜力。本研究的发现为使用机器人收集的数据训练分类系统的可行性提供了宝贵的见解。这项研究的结果在需要可靠识别的各个领域具有广泛的意义，特别是在访问人类特定数据受限的情况下。

更新时间: 2024-05-07 11:58:34

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.04241v1

LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Operating effectively in complex environments while complying with specified constraints is crucial for the safe and successful deployment of robots that interact with and operate around people. In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time. We propose a data-driven diffusion-based framework, LTLDoG, that modifies the inference steps of the reverse process given an instruction specified using finite linear temporal logic ($\text{LTL}_f$). LTLDoG leverages a satisfaction value function on $\text{LTL}_f$ and guides the sampling steps using its gradient field. This value function can also be trained to generalize to new instructions not observed during training, enabling flexible test-time adaptability. Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.

Updated: 2024-05-07 11:54:22

标题: LTLDoG：满足安全扩散规划中的时间扩展符号约束

摘要: 在复杂环境中有效运作并遵守指定的约束对于安全和成功部署与人类交互和操作的机器人至关重要。在这项工作中，我们专注于在测试时生成符合新颖静态和时间延长约束/指令的长期轨迹。我们提出了一种基于数据驱动扩散的框架，LTLDoG，它修改了反向过程的推理步骤，给定使用有限线性时间逻辑（LTLf）指定的指令。LTLDoG利用LTLf上的满足值函数，并使用其梯度场引导采样步骤。这个值函数也可以训练以泛化到在训练期间未观察到的新指令，从而实现灵活的测试时适应性。机器人导航和操作的实验表明，该方法能够生成满足指定障碍物避让和访问顺序的轨迹。

更新时间: 2024-05-07 11:54:22

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.04235v1

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as understanding some professional photography techniques, on par with Sora -- the most powerful reported text-to-video generator. Finally, we perform initial experiments on other controllable video generation, including canny-to-video generation, video prediction and subject-driven generation, which demonstrate promising results.

Updated: 2024-05-07 11:52:49

标题: Vidu：一种高度一致、动态和熟练的文本到视频生成器，带有扩散模型

摘要: 我们介绍了Vidu，一种高性能的文本到视频生成器，能够在单次生成中生成长达16秒的1080p视频。Vidu是一个扩散模型，其骨干是U-ViT，它解锁了处理长视频的可伸缩性和能力。Vidu表现出强大的连贯性和动态性，能够生成逼真和富有想象力的视频，同时也能够理解一些专业摄影技术，与Sora（目前报道的最强大的文本到视频生成器）相媲美。最后，我们对其他可控视频生成进行了初步实验，包括canny-to-video生成、视频预测和主题驱动生成，结果显示出有希望的成果。

更新时间: 2024-05-07 11:52:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04233v1

Unveiling the optimization process of Physics Informed Neural Networks: How accurate and competitive can PINNs be?

This study investigates the potential accuracy boundaries of physics-informed neural networks, contrasting their approach with previous similar works and traditional numerical methods. We find that selecting improved optimization algorithms significantly enhances the accuracy of the results. Simple modifications to the loss function may also improve precision, offering an additional avenue for enhancement. Despite optimization algorithms having a greater impact on convergence than adjustments to the loss function, practical considerations often favor tweaking the latter due to ease of implementation. On a global scale, the integration of an enhanced optimizer and a marginally adjusted loss function enables a reduction in the loss function by several orders of magnitude across diverse physical problems. Consequently, our results obtained using compact networks (typically comprising 2 or 3 layers of 20-30 neurons) achieve accuracies comparable to finite difference schemes employing thousands of grid points. This study encourages the continued advancement of PINNs and associated optimization techniques for broader applications across various fields.

Updated: 2024-05-07 11:50:25

标题: 揭示物理信息神经网络的优化过程：PINNs能有多准确和竞争力？

摘要: 这项研究调查了物理信息神经网络的潜在精度界限，将它们的方法与先前类似作品和传统数值方法进行对比。我们发现选择改进的优化算法显著提高了结果的准确性。对损失函数进行简单修改也可能改善精度，为增强提供了另一途径。尽管优化算法对收敛的影响大于对损失函数的调整，但实际考虑通常会偏向于调整后者，因为实施更容易。从全局的角度看，使用增强优化器和略微调整的损失函数的整合可以使多种物理问题的损失函数减少数个数量级。因此，我们使用紧凑网络（通常包含2或3层20-30个神经元）获得的结果达到了使用数千个网格点的有限差分方案的准确性水平。这项研究鼓励继续推动物理信息神经网络和相关优化技术在各个领域的广泛应用。

更新时间: 2024-05-07 11:50:25

领域: physics.comp-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04230v1

Toward Deep Drum Source Separation

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.

Updated: 2024-05-07 11:50:04

标题: 朝向深度鼓声源分离

摘要: 过去，鼓声分离领域面临着重大挑战，主要是由于数据可用性有限，阻碍了采用在其他相关音频应用中取得成功的尖端深度学习方法。在本文中，我们介绍了StemGMD，这是一个大规模的音频数据集，包含孤立的单乐器鼓声音频。每个音频片段都是使用十种真实声音的声学鼓套件，从富有表现力的鼓乐演奏的MIDI录音合成而成。StemGMD总共包括1224小时的音频数据，是迄今为止最大的鼓声音频数据集，也是第一个包含九件标准鼓套件中每个乐器孤立音频片段的数据集。我们利用StemGMD开发了LarsNet，一种新颖的深度鼓声分离模型。通过一系列专用的U-Net，LarsNet可以比实时更快地从立体声鼓混音中分离出五个音轨，并且显示出明显优于最先进的非负谱时因子分解方法。

更新时间: 2024-05-07 11:50:04

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2312.09663v2

Iterative Experience Refinement of Software-Developing Agents

Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks iterative refinement and thus hampers agents' adaptability. In this paper, we introduce the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution. We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches. Augmented with our heuristic experience elimination, the method prioritizes high-quality and frequently-used experiences, effectively managing the experience space and enhancing efficiency. Extensive experiments show that while the successive pattern may yield superior results, the cumulative pattern provides more stable performance. Moreover, experience elimination facilitates achieving better performance using just 11.54% of a high-quality subset.

Updated: 2024-05-07 11:33:49

标题: 软件开发代理的迭代经验细化

摘要: 由大型语言模型（LLMs）驱动的自主代理在软件开发等各种场景中展现出实现高度自主性的巨大潜力。最近的研究表明，LLM代理可以利用过去的经验来减少错误并提高效率。然而，依赖于启发式获取的固定过去经验集合的静态经验范式缺乏迭代优化，从而阻碍了代理的适应性。在本文中，我们介绍了迭代经验优化框架，使LLM代理能够在任务执行过程中迭代地优化经验。我们提出了两种基本模式：连续模式，基于任务批次内最近的经验进行优化，以及累积模式，获取跨越所有先前任务批次的经验。结合我们的启发式经验消除，该方法优先考虑高质量和频繁使用的经验，有效管理经验空间并提高效率。大量实验证明，虽然连续模式可能会产生更优异的结果，但累积模式提供了更稳定的性能。此外，经验消除有助于仅使用高质量子集的情况下实现更好的性能，仅需使用高质量子集的11.54%。

更新时间: 2024-05-07 11:33:49

领域: cs.CL,cs.AI,cs.MA,cs.SE

下载: http://arxiv.org/abs/2405.04219v1

Introducing a microstructure-embedded autoencoder approach for reconstructing high-resolution solution field data from a reduced parametric space

In this study, we develop a novel multi-fidelity deep learning approach that transforms low-fidelity solution maps into high-fidelity ones by incorporating parametric space information into a standard autoencoder architecture. This method's integration of parametric space information significantly reduces the need for training data to effectively predict high-fidelity solutions from low-fidelity ones. In this study, we examine a two-dimensional steady-state heat transfer analysis within a highly heterogeneous materials microstructure. The heat conductivity coefficients for two different materials are condensed from a 101 x 101 grid to smaller grids. We then solve the boundary value problem on the coarsest grid using a pre-trained physics-informed neural operator network known as Finite Operator Learning (FOL). The resulting low-fidelity solution is subsequently upscaled back to a 101 x 101 grid using a newly designed enhanced autoencoder. The novelty of the developed enhanced autoencoder lies in the concatenation of heat conductivity maps of different resolutions to the decoder segment in distinct steps. Hence the developed algorithm is named microstructure-embedded autoencoder (MEA). We compare the MEA outcomes with those from finite element methods, the standard U-Net, and various other upscaling techniques, including interpolation functions and feedforward neural networks (FFNN). Our analysis shows that MEA outperforms these methods in terms of computational efficiency and error on test cases. As a result, the MEA serves as a potential supplement to neural operator networks, effectively upscaling low-fidelity solutions to high fidelity while preserving critical details often lost in traditional upscaling methods, particularly at sharp interfaces like those seen with interpolation.

Updated: 2024-05-07 11:28:02

标题: 引入一种微结构嵌入的自动编码器方法，用于从降维参数空间中重建高分辨率解场数据

摘要: 在这项研究中，我们开发了一种新颖的多保真度深度学习方法，通过将参数空间信息整合到标准自动编码器结构中，将低保真度解决方案映射转化为高保真度解决方案。该方法整合了参数空间信息，显著减少了训练数据的需求，有效地从低保真度解决方案预测高保真度解决方案。在这项研究中，我们考察了高度异质材料微结构中的二维稳态热传导分析。两种不同材料的热传导系数从101 x 101网格压缩到更小的网格。然后，我们使用一个称为有限运算学习（FOL）的经过预训练的物理信息神经操作符网络在最粗糙的网格上解决边界值问题。随后，利用新设计的增强型自动编码器将得到的低保真度解决方案提升回到101 x 101网格。开发的增强型自动编码器的创新之处在于在不同步骤中将不同分辨率的热传导图拼接到解码器部分。因此，该开发的算法被命名为微结构嵌入式自动编码器（MEA）。我们将MEA的结果与有限元方法、标准U-Net以及各种其他提升技术（包括插值函数和前馈神经网络（FFNN））进行比较。我们的分析表明，MEA在计算效率和测试案例误差方面优于这些方法。因此，MEA可作为神经操作符网络的潜在补充，有效地将低保真度解决方案提升至高保真度，同时保留传统提升方法中经常丢失的关键细节，特别是在插值中看到的尖锐界面。

更新时间: 2024-05-07 11:28:02

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.01975v2

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

Updated: 2024-05-07 11:27:13

标题: NL2Plan：来自最小文本描述的鲁棒LLM驱动规划

摘要: 今天的经典规划器功能强大，但是在PDDL等格式中建模输入任务是繁琐且容易出错的。相比之下，利用大型语言模型（LLMs）进行规划允许几乎任何输入文本，但并不保证计划质量甚至正确性。为了将这两种方法的优点结合起来，一些工作已经开始使用LLMs来自动化PDDL创建过程的部分。然而，这些方法仍然需要各种程度的专业知识输入。我们提出NL2Plan，第一个领域无关的离线LLM驱动规划系统。NL2Plan使用LLM从短文本提示中逐步提取必要信息，然后创建领域和问题的完整PDDL描述，最终由经典规划器解决。我们在四个规划领域上评估NL2Plan，并发现它解决了15个任务中的10个 - 显然优于简单的思维链推理LLM方法，仅解决了2个任务。此外，在五个失败案例中的两个中，NL2Plan没有返回无效计划，而是报告无法解决任务。除了在端到端模式下使用NL2Plan外，用户还可以检查和纠正其所有中间结果，例如PDDL表示，增加可解释性，使其成为PDDL创建的辅助工具。

更新时间: 2024-05-07 11:27:13

领域: cs.AI

下载: http://arxiv.org/abs/2405.04215v1

Green Tsetlin Redefining Efficiency in Tsetlin Machine Frameworks

Green Tsetlin (GT) is a Tsetlin Machine (TM) framework developed to solve real-world problems using TMs. Several frameworks already exist that provide access to TM implementations. However, these either lack features or have a research-first focus. GT is an easy-to-use framework that aims to lower the complexity and provide a production-ready TM implementation that is great for experienced practitioners and beginners. To this end, GT establishes a clear separation between training and inference. A C++ backend with a Python interface provides competitive training and inference performance, with the option of running in pure Python. It also integrates support for critical components such as exporting trained models, hyper-parameter search, and cross-validation out-of-the-box.

Updated: 2024-05-07 11:24:56

标题: 绿色Tsetlin重新定义Tsetlin机框架中的效率

摘要: Green Tsetlin (GT) 是一个 Tsetlin Machine (TM) 框架，旨在使用 TM 解决现实世界的问题。已经存在几个提供 TM 实现访问的框架。然而，这些要么缺乏功能，要么专注于研究。GT 是一个易于使用的框架，旨在降低复杂性，并提供一个适合有经验的实践者和初学者的生产就绪 TM 实现。为此，GT 在训练和推理之间建立了明确的分离。具有 Python 接口的 C++ 后端提供竞争力强的训练和推理性能，并具有在纯 Python 中运行的选项。它还集成了对关键组件的支持，如导出经过训练的模型、超参数搜索和开箱即用的交叉验证。

更新时间: 2024-05-07 11:24:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04212v1

The Role of Pseudo-labels in Self-training Linear Classifiers on High-dimensional Gaussian Mixture Data

Self-training (ST) is a simple yet effective semi-supervised learning method. However, why and how ST improves generalization performance by using potentially erroneous pseudo-labels is still not well understood. To deepen the understanding of ST, we derive and analyze a sharp characterization of the behavior of iterative ST when training a linear classifier by minimizing the ridge-regularized convex loss on binary Gaussian mixtures, in the asymptotic limit where input dimension and data size diverge proportionally. The results show that ST improves generalization in different ways depending on the number of iterations. When the number of iterations is small, ST improves generalization performance by fitting the model to relatively reliable pseudo-labels and updating the model parameters by a large amount at each iteration. This suggests that ST works intuitively. On the other hand, with many iterations, ST can gradually improve the direction of the classification plane by updating the model parameters incrementally, using soft labels and small regularization. It is argued that this is because the small update of ST can extract information from the data in an almost noiseless way. However, in the presence of label imbalance, the generalization performance of ST underperforms supervised learning with true labels. To overcome this, two heuristics are proposed to enable ST to achieve nearly compatible performance with supervised learning even with significant label imbalance.

Updated: 2024-05-07 11:22:49

标题: 伪标签在高维高斯混合数据上自训练线性分类器中的作用

摘要: 自我训练（ST）是一种简单而有效的半监督学习方法。然而，为什么以及如何通过使用潜在错误的伪标签改善泛化性能仍然不太清楚。为了加深对ST的理解，我们推导并分析了在渐近极限下，当输入维度和数据大小成比例发散时，通过最小化二元高斯混合上的岭正则化凸损失来训练线性分类器时，迭代ST行为的尖锐特征化。结果显示，ST以不同方式改善泛化性能，这取决于迭代次数。当迭代次数较少时，ST通过将模型拟合到相对可靠的伪标签并在每次迭代时大幅更新模型参数来改善泛化性能。这表明ST的工作方式是直观的。另一方面，通过许多迭代，ST可以通过逐步更新模型参数，使用软标签和小的正则化，逐渐改善分类平面的方向。据认为，这是因为ST的小更新可以以几乎无噪声的方式从数据中提取信息。然而，在存在标签不平衡的情况下，ST的泛化性能不如带有真实标签的监督学习。为了克服这一点，提出了两种启发式方法，使ST能够实现几乎与带有显著标签不平衡的监督学习相容的性能。

更新时间: 2024-05-07 11:22:49

领域: stat.ML,cond-mat.dis-nn,cond-mat.stat-mech,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2205.07739v3

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Intelligent vehicles have demonstrated excellent capabilities in many transportation scenarios. The inference capabilities of neural networks using cameras limit the accuracy of accident detection in complex transportation systems. This paper presents AccidentBlip2, a pure vision-based multi-modal large model Blip2 for accident detection. Our method first processes the multi-view images through ViT-14g and sends the multi-view features into the cross-attention layer of Q-Former. Different from Blip2's Q-Former, our Motion Q-Former extends the self-attention layer with the temporal-attention layer. In the inference process, the queries generated from previous frames are input into Motion Q-Former to aggregate temporal information. Queries are updated with an auto-regressive strategy and are sent to a MLP to detect whether there is an accident in the surrounding environment. Our AccidentBlip2 can be extended to a multi-vehicle cooperative system by deploying Motion Q-Former on each vehicle and simultaneously fusing the generated queries into the MLP for auto-regressive inference. Our approach outperforms existing video large language models in detection accuracy in both single-vehicle and multi-vehicle systems.

Updated: 2024-05-07 11:21:57

标题: AccidentBlip2：多视角动作Blip2的事故检测

摘要: 智能车辆在许多交通场景中展示出优秀的能力。使用摄像头的神经网络的推理能力限制了复杂交通系统中事故检测的准确性。本文提出了AccidentBlip2，这是一个纯基于视觉的多模态大型模型Blip2，用于事故检测。我们的方法首先通过ViT-14g处理多视角图像，并将多视角特征发送到Q-Former的交叉注意力层。与Blip2的Q-Former不同，我们的Motion Q-Former通过时间注意力层扩展了自注意力层。在推理过程中，从先前帧生成的查询被输入Motion Q-Former以聚合时间信息。查询使用自回归策略更新，并发送到MLP以检测周围环境中是否发生事故。我们的AccidentBlip2可以通过在每辆车上部署Motion Q-Former并同时将生成的查询融合到MLP中进行自回归推理，扩展为多车辆协作系统。我们的方法在单车和多车系统中的检测准确性方面优于现有的视频大型语言模型。

更新时间: 2024-05-07 11:21:57

领域: cs.AI

下载: http://arxiv.org/abs/2404.12149v4

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.

Updated: 2024-05-07 11:20:10

标题: NOVA：基于NoC的矢量单元，用于在CNN加速器上映射注意力层

摘要: 注意机制正在变得越来越受欢迎，在神经网络模型中被用于多个领域，如自然语言处理（NLP）和视觉应用，尤其是在边缘计算中。然而，注意层很难映射到现有的神经加速器上，因为它们具有更高密度的非线性运算，这导致了今天的向量单元的低效利用。本文介绍了一种称为NOVA的基于NoC的向量单元，可以在加速器的NoC内执行非线性运算，并可以叠加到现有的神经加速器上，以映射边缘上的注意层。我们的结果表明，当运行现有基于注意机制的神经网络时，NOVA架构的能效比最先进的硬件逼近器高达37.8倍。

更新时间: 2024-05-07 11:20:10

领域: cs.AR,cs.AI,cs.LG,B.2.4

下载: http://arxiv.org/abs/2405.04206v1

Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts

AI technologies have become more widely adopted in wireless communications. As an emerging type of AI technologies, the generative artificial intelligence (GAI) gains lots of attention in communication security. Due to its powerful learning ability, GAI models have demonstrated superiority over conventional AI methods. However, GAI still has several limitations, including high computational complexity and limited adaptability. Mixture of Experts (MoE), which uses multiple expert models for prediction through a gate mechanism, proposes possible solutions. Firstly, we review GAI model's applications in physical layer communication security, discuss limitations, and explore how MoE can help GAI overcome these limitations. Furthermore, we propose an MoE-enabled GAI framework for network optimization problems for communication security. To demonstrate the framework's effectiveness, we provide a case study in a cooperative friendly jamming scenario. The experimental results show that the MoE-enabled framework effectively assists the GAI algorithm, solves its limitations, and enhances communication security.

Updated: 2024-05-07 11:13:17

标题: 通过专家混合的生成式人工智能增强物理层通信安全

摘要: AI技术在无线通信中得到了更广泛的应用。作为一种新兴的AI技术，生成式人工智能（GAI）在通信安全领域引起了广泛关注。由于其强大的学习能力，GAI模型已经证明优于传统的AI方法。然而，GAI仍然存在一些限制，包括高计算复杂性和有限的适应性。采用多专家模型通过门控机制进行预测的专家混合（MoE）提出了可能的解决方案。首先，我们回顾了GAI模型在物理层通信安全中的应用，讨论了限制，并探讨了MoE如何帮助GAI克服这些限制。此外，我们提出了一个MoE启用的GAI框架，用于通信安全的网络优化问题。为了证明框架的有效性，我们在一个合作友好的干扰场景中进行了案例研究。实验结果表明，MoE启用的框架有效地辅助了GAI算法，解决了其限制，并增强了通信安全性。

更新时间: 2024-05-07 11:13:17

领域: cs.CR

下载: http://arxiv.org/abs/2405.04198v1

On Using Admissible Bounds for Learning Forward Search Heuristics

In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.

Updated: 2024-05-07 11:11:47

标题: 关于使用可接受的界限来学习前向搜索启发式的研究

摘要: 近年来，人们越来越关注利用现代机器学习技术来学习前向搜索算法的启发式函数。尽管如此，关于它们应该学习什么、如何训练它们以及为什么这样做，理论上的了解仍然很少。这种缺乏理解导致文献中采用了不同的训练目标（次优成本 vs 最优成本 vs 可接受启发式）和损失函数（例如，平方误差 vs 绝对误差）。在这项工作中，我们专注于如何有效地利用可接受启发式提供的信息进行启发式学习。我们认为通过最小化均方误差（MSE）从多项式时间可接受启发式中学习并不是正确的方法，因为其结果只是一个嘈杂的、不可接受的一个有效计算启发式的副本。相反，我们提出将学习的启发式建模为一个截断的高斯分布，其中可接受启发式不是作为训练目标，而是作为该分布的下界。这导致了与文献中通常使用的MSE不同的损失函数，该函数隐含地将学习的启发式建模为一个高斯分布。我们进行了实验，应用了MSE和我们的新型损失函数来从最优计划成本中学习启发式。结果显示，我们提出的方法在训练过程中收敛更快，并产生更好的启发式。

更新时间: 2024-05-07 11:11:47

领域: cs.AI,cs.LG,I.2.8

下载: http://arxiv.org/abs/2308.11905v3

Learning Governing Equations of Unobserved States in Dynamical Systems

Data-driven modelling and scientific machine learning have been responsible for significant advances in determining suitable models to describe data. Within dynamical systems, neural ordinary differential equations (ODEs), where the system equations are set to be governed by a neural network, have become a popular tool for this challenge in recent years. However, less emphasis has been placed on systems that are only partially-observed. In this work, we employ a hybrid neural ODE structure, where the system equations are governed by a combination of a neural network and domain-specific knowledge, together with symbolic regression (SR), to learn governing equations of partially-observed dynamical systems. We test this approach on two case studies: A 3-dimensional model of the Lotka-Volterra system and a 5-dimensional model of the Lorenz system. We demonstrate that the method is capable of successfully learning the true underlying governing equations of unobserved states within these systems, with robustness to measurement noise.

Updated: 2024-05-07 11:02:06

标题: 学习动力系统中未观测状态的控制方程

摘要: 数据驱动建模和科学机器学习已经在确定描述数据的合适模型方面取得了重大进展。在动力系统中，神经普通微分方程（ODEs），其中系统方程被设置为由神经网络控制，近年来已成为这一挑战的流行工具。然而，对于仅部分观测到的系统，人们对此并没有放置太多重点。在这项工作中，我们采用了一种混合神经ODE结构，其中系统方程由神经网络和领域特定知识的组合控制，再结合符号回归（SR），来学习部分观测动力系统的控制方程。我们在两个案例研究上测试了这种方法：Lotka-Volterra系统的三维模型和Lorenz系统的五维模型。我们证明该方法能够成功地学习这些系统中未观测状态的真实基础控制方程，并对测量噪声具有鲁棒性。

更新时间: 2024-05-07 11:02:06

领域: cs.LG

下载: http://arxiv.org/abs/2404.18572v2

Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models

Previous research on scanpath prediction has mainly focused on group models, disregarding the fact that the scanpaths and attentional behaviors of individuals are diverse. The disregard of these differences is especially detrimental to social human-robot interaction, whereby robots commonly emulate human gaze based on heuristics or predefined patterns. However, human gaze patterns are heterogeneous and varying behaviors can significantly affect the outcomes of such human-robot interactions. To fill this gap, we developed a deep learning-based social cue integration model for saliency prediction to instead predict scanpaths in videos. Our model learned scanpaths by recursively integrating fixation history and social cues through a gating mechanism and sequential attention. We evaluated our approach on gaze datasets of dynamic social scenes, observed under the free-viewing condition. The introduction of fixation history into our models makes it possible to train a single unified model rather than the resource-intensive approach of training individual models for each set of scanpaths. We observed that the late neural integration approach surpasses early fusion when training models on a large dataset, in comparison to a smaller dataset with a similar distribution. Results also indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models. We hypothesize that this outcome is a result of the group saliency representations instilling universal attention in the model, while the supervisory signal and fixation history guide it to learn personalized attentional behaviors, providing the unified model a benefit over individual models due to its implicit representation of universal attention.

Updated: 2024-05-07 10:58:27

标题: 统一动态注视路径预测器优于单独训练的神经模型

摘要: 以往关于扫视路径预测的研究主要集中在群体模型上，忽略了个体的扫视路径和注意行为的多样性。忽视这些差异对社交人机交互特别有害，因为机器人通常基于启发式或预定义模式模仿人类注视。然而，人类注视模式是异质的，不同的行为可以显著影响这种人机交互的结果。为填补这一空白，我们开发了一个基于深度学习的社交线索整合模型，用于预测视频中的显著性。我们的模型通过门控机制和顺序注意力递归地整合注视历史和社交线索来学习扫视路径。我们在动态社交场景的注视数据集上评估了我们的方法，在自由观看条件下观察。将注视历史引入我们的模型使得可以训练一个统一的模型，而不是为每组扫视路径单独训练模型的资源密集型方法。我们观察到，当在大型数据集上训练模型时，后期神经整合方法优于早期融合，与具有类似分布的较小数据集相比。结果还表明，一个统一的模型，在所有观察者的扫视路径上训练，表现与或优于单独训练的模型。我们假设这一结果是由于群体显著性表示在模型中灌输了通用关注，而监督信号和注视历史引导其学习个性化的注意行为，使统一模型由于其隐式表示通用关注而优于个体模型。

更新时间: 2024-05-07 10:58:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02929v2

Scalable physical source-to-field inference with hypernetworks

We present a generative model that amortises computation for the field around e.g. gravitational or magnetic sources. Exact numerical calculation has either computational complexity $\mathcal{O}(M\times{}N)$ in the number of sources and field evaluation points, or requires a fixed evaluation grid to exploit fast Fourier transforms. Using an architecture where a hypernetwork produces an implicit representation of the field around a source collection, our model instead performs as $\mathcal{O}(M + N)$, achieves accuracy of $\sim\!4\%-6\%$, and allows evaluation at arbitrary locations for arbitrary numbers of sources, greatly increasing the speed of e.g. physics simulations. We also examine a model relating to the physical properties of the output field and develop two-dimensional examples to demonstrate its application. The code for these models and experiments is available at https://github.com/cmt-dtu-energy/hypermagnetics.

Updated: 2024-05-07 10:54:20

标题: 使用超网络实现可伸缩的物理源到场推断

摘要: 我们提出了一个生成模型，用于摊销计算量，适用于环绕引力或磁场源的场景。精确的数值计算要么具有$\mathcal{O}(M\times{}N)$的计算复杂度（其中M为源的数量，N为场评估点的数量），要么需要一个固定的评估网格来利用快速傅立叶变换。使用一个超网络生成源集合周围场的隐式表示的架构，我们的模型实际上具有$\mathcal{O}(M + N)$的性能，达到了约4%-6%的精度，并允许在任意位置对任意数量的源进行评估，大大提高了物理模拟的速度。我们还研究了一个与输出场的物理特性相关的模型，并开发了二维示例来展示其应用。这些模型和实验的代码可以在https://github.com/cmt-dtu-energy/hypermagnetics 上找到。

更新时间: 2024-05-07 10:54:20

领域: cs.LG,cs.CE,physics.comp-ph

下载: http://arxiv.org/abs/2405.05981v1

Effective and Robust Adversarial Training against Data and Label Corruptions

Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.

Updated: 2024-05-07 10:53:20

标题: 有效和稳健的对抗训练：针对数据和标签污染

摘要: 由于数据扰动和标签噪声导致的腐败在来自不可靠来源的数据集中普遍存在，这对模型训练构成了重大威胁。尽管已经有努力开发鲁棒模型，但当前的学习方法通常忽视了两种腐败可能共存的情况，限制了模型的有效性和实用性。在本文中，我们开发了一种有效和鲁棒的对抗训练（ERAT）框架，以同时处理两种类型的腐败（即数据和标签），而无需事先了解其具体细节。我们提出了一种混合对抗训练，围绕多个潜在对抗扰动，并结合基于类重新平衡样本选择的半监督学习，以增强模型对双重腐败的韧性。一方面，在提出的对抗训练中，扰动生成模块通过将一个DNN模型作为受害者来学习多个替代恶意数据扰动，同时训练模型以保持原始数据和混合扰动数据之间的语义一致性。预期使模型能够应对现实世界数据腐败中的不可预测扰动。另一方面，设计了一种类重新平衡数据选择策略，以公平区分干净标签和嘈杂标签。相应地，通过丢弃噪声标签来执行半监督学习。广泛的实验表明了提出的ERAT框架的优越性。

更新时间: 2024-05-07 10:53:20

领域: cs.LG,cs.CV,68T30,I.4.0

下载: http://arxiv.org/abs/2405.04191v1

Explainable Classification Techniques for Quantum Dot Device Measurements

In the physical sciences, there is an increased need for robust feature representations of image data: image acquisition, in the generalized sense of two-dimensional data, is now widespread across a large number of fields, including quantum information science, which we consider here. While traditional image features are widely utilized in such cases, their use is rapidly being supplanted by Neural Network-based techniques that often sacrifice explainability in exchange for high accuracy. To ameliorate this trade-off, we propose a synthetic data-based technique that results in explainable features. We show, using Explainable Boosting Machines (EBMs), that this method offers superior explainability without sacrificing accuracy. Specifically, we show that there is a meaningful benefit to this technique in the context of quantum dot tuning, where human intervention is necessary at the current stage of development.

Updated: 2024-05-07 10:50:37

标题: 量子点器件测量的可解释分类技术

摘要: 在物理科学中，对图像数据的稳健特征表示需求增加：图像采集，在广义上指二维数据，现在在许多领域广泛应用，包括我们在这里考虑的量子信息科学。虽然传统的图像特征在这些情况下被广泛使用，但它们的使用正在迅速被基于神经网络的技术所取代，这些技术通常在高准确性的交换中牺牲了可解释性。为了改善这种权衡，我们提出了一种基于合成数据的技术，可以产生可解释的特征。我们使用可解释性提升机器（EBMs）展示，这种方法提供了优越的可解释性，同时不牺牲准确性。具体地，我们展示了这种技术在量子点调谐的背景下有意义的益处，在目前的发展阶段需要人类干预。

更新时间: 2024-05-07 10:50:37

领域: cs.CV,cond-mat.mes-hall,cs.LG

下载: http://arxiv.org/abs/2402.13699v3

Detecting AI-Generated Sentences in Realistic Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.

Updated: 2024-05-07 10:40:06

标题: 在真实的人工智能与人类协作混合文本中检测人工智能生成的句子：挑战、策略和见解

摘要: 这项研究探讨了在人工智能协作混合文本中检测句子级别AI生成文本的挑战。现有研究中，用于混合文本的AI生成文本检测通常依赖于合成数据集。这些数据集通常涉及具有有限边界的混合文本。我们认为，在检测混合文本中的AI生成内容的研究应涵盖在真实环境中生成的不同类型的混合文本，以更好地为现实世界的应用提供信息。因此，我们的研究利用了CoAuthor数据集，该数据集包括通过人类作者和智能写作系统在多轮交互中合作生成的多样化、真实的混合文本。我们采用了一个两步骤、基于分割的流程：(i)检测给定混合文本中包含具有一致作者的句子的段落，(ii)对识别的每个段落进行作者分类。我们的实证研究结果突显了：(1)在混合文本中检测AI生成的句子总体上是一项具有挑战性的任务，因为(1.1) 人类作者基于个人喜好选择甚至编辑AI生成的句子增加了识别段落作者的难度；(1.2) 在混合文本中相邻句子之间作者的频繁变化使得段落检测器在识别具有一致作者的段落时遇到困难；(1.3) 混合文本中文本段的长度较短，提供了有限的风格线索以可靠地确定作者；(2) 在开始检测过程之前，评估混合文本中段落的平均长度是有益的。这种评估有助于决定是(2.1)对于具有较长段落的混合文本采用基于文本分割的策略，还是(2.2)对于具有较短段落的文本采用直接的逐句分类策略。

更新时间: 2024-05-07 10:40:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.03506v2

Detecting music deepfakes is easy but actually hard

In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.

Updated: 2024-05-07 10:39:19

标题: 检测音乐深度伪造很容易，但实际上很困难。

摘要: 在面对新一代生成模型的时代，检测人工生成内容已经成为至关重要的事情。在用户友好的平台上，能够在几秒钟内创建可信的一分钟长的音乐 deepfakes 构成了对流媒体服务欺诈的真正威胁，并对人类艺术家构成不公平竞争。本文展示了在包含真实音频和伪造重建的数据集上训练分类器的可能性（和令人惊讶的简易性），实现了令人信服的 99.8% 准确率。据我们所知，这标志着音乐 deepfake 检测器的第一次出版，这是一个将有助于音乐伪造监管的工具。然而，受其他领域关于伪造检测的几十年文献的启发，我们强调，一个好的测试成绩并不是故事的终结。我们从直接的机器学习框架中退后，并揭示了许多可能会在部署的检测器中出现问题的方面：校准、对音频操作的稳健性、泛化到未知模型、可解释性和追索权的可能性。这第二部分作为未来研究步骤的立场，并对虚假内容检查市场的繁荣提出了警告。

更新时间: 2024-05-07 10:39:19

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.04181v1

Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models

The rapid advancement in text-to-video (T2V) generative models has enabled the synthesis of high-fidelity video content guided by textual descriptions. Despite this significant progress, these models are often susceptible to hallucination, generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this critical issue, we introduce the SoraDetector, a novel unified framework designed to detect hallucinations across diverse large T2V models, including the cutting-edge Sora model. Our framework is built upon a comprehensive analysis of hallucination phenomena, categorizing them based on their manifestation in the video content. Leveraging the state-of-the-art keyframe extraction techniques and multimodal large language models, SoraDetector first evaluates the consistency between extracted video content summary and textual prompts, then constructs static and dynamic knowledge graphs (KGs) from frames to detect hallucination both in single frames and across frames. Sora Detector provides a robust and quantifiable measure of consistency, static and dynamic hallucination. In addition, we have developed the Sora Detector Agent to automate the hallucination detection process and generate a complete video quality report for each input video. Lastly, we present a novel meta-evaluation benchmark, T2VHaluBench, meticulously crafted to facilitate the evaluation of advancements in T2V hallucination detection. Through extensive experiments on videos generated by Sora and other large T2V models, we demonstrate the efficacy of our approach in accurately detecting hallucinations. The code and dataset can be accessed via GitHub.

Updated: 2024-05-07 10:39:14

标题: Sora检测器：针对大型文本到视频模型的统一幻觉检测

摘要: 文本到视频（T2V）生成模型的快速发展已经实现了高保真度视频内容的合成，这些内容是由文本描述引导的。尽管取得了显著进展，但这些模型往往容易出现幻觉，生成与输入文本相矛盾的内容，这对它们的可靠性和实际部署构成了挑战。为解决这一关键问题，我们引入了SoraDetector，这是一个新颖的统一框架，旨在检测跨越不同大型T2V模型的幻觉，包括最先进的Sora模型。我们的框架建立在对幻觉现象的全面分析基础上，根据视频内容中的表现对其进行分类。利用最先进的关键帧提取技术和多模态大型语言模型，SoraDetector首先评估提取的视频内容摘要与文本提示之间的一致性，然后从帧中构建静态和动态知识图（KGs）以检测单帧和跨帧的幻觉。Sora Detector提供了一种稳健且可量化的一致性、静态和动态幻觉度量。此外，我们开发了Sora Detector Agent来自动化幻觉检测过程，并为每个输入视频生成完整的视频质量报告。最后，我们提出了一个新颖的元评估基准T2VHaluBench，精心设计以促进T2V幻觉检测进展的评估。通过对由Sora和其他大型T2V模型生成的视频进行广泛实验，我们展示了我们的方法在准确检测幻觉方面的有效性。代码和数据集可以通过GitHub访问。

更新时间: 2024-05-07 10:39:14

领域: cs.LG

下载: http://arxiv.org/abs/2405.04180v1

Benchmarks and leaderboards for sound demixing tasks

Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demixing, as well as their ensembles, on these benchmarks. For the models' assessments, we provide the leaderboard at https://mvsep.com/quality_checker/, giving a comparison for a range of models. The new benchmark datasets are available for download. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem. The proposed solution was evaluated in the context of the Music Demixing Challenge 2023 and achieved top results in different tracks of the challenge. The code and the approach are open-sourced on GitHub.

Updated: 2024-05-07 10:35:10

标题: 声音分离任务的基准和排行榜

摘要: 音乐分离是将给定的单一音频信号中的不同轨道分离成各个组件，例如鼓、低音和人声与伴奏的其他部分。源分离对于一系列领域都是有用的，包括娱乐和助听器。在本文中，我们介绍了两个用于声源分离任务的新基准，并比较了流行的声音分离模型及其集成在这些基准上的表现。对于模型的评估，我们提供了排行榜，网址是https://mvsep.com/quality_checker/，以便比较各种模型。新的基准数据集可供下载。我们还开发了一种基于不同模型集成的音频分离新方法，该方法最适合于特定的音轨。所提出的解决方案在2023年音乐分离挑战赛中进行了评估，并在挑战的不同音轨中取得了最佳成绩。代码和方法已在GitHub上开源。

更新时间: 2024-05-07 10:35:10

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2305.07489v2

Mitigating Nonlinear Algorithmic Bias in Binary Classification

This paper proposes the use of causal modeling to detect and mitigate algorithmic bias that is nonlinear in the protected attribute. We provide a general overview of our approach. We use the German Credit data set, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on age bias and the problem of binary classification. We show that the probability of getting correctly classified as "low risk" is lowest among young people. The probability increases with age nonlinearly. To incorporate the nonlinearity into the causal model, we introduce a higher order polynomial term. Based on the fitted causal model, the de-biased probability estimates are computed, showing improved fairness with little impact on overall classification accuracy. Causal modeling is intuitive and, hence, its use can enhance explicability and promotes trust among different stakeholders of AI.

Updated: 2024-05-07 10:22:23

标题: 减轻二元分类中的非线性算法偏差

摘要: 本文提出使用因果建模来检测和减轻在受保护属性中是非线性的算法偏差。我们提供了我们方法的一般概述。我们使用德国信用数据集，该数据集可以从UC Irvine机器学习库下载，来开发（1）一个被视为黑匣子的预测模型，和（2）一个用于偏差减轻的因果模型。在本文中，我们关注年龄偏差和二元分类问题。我们表明被正确分类为“低风险”的概率在年轻人中最低。这个概率随着年龄的增长呈非线性增加。为了将非线性因素纳入因果模型中，我们引入了一个高阶多项式项。基于拟合的因果模型，计算出去偏差的概率估计，显示出对整体分类准确率影响小的公平性改善。因果建模直观，因此其使用可以增强可解释性，并促进AI的各方利益相关者之间的信任。

更新时间: 2024-05-07 10:22:23

领域: cs.LG,cs.CY,stat.AP

下载: http://arxiv.org/abs/2312.05429v3

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.

Updated: 2024-05-07 10:20:07

标题: LLM能够深度检测复杂恶意查询吗？一种通过模糊意图进行越狱的框架

摘要: 为了展示和解决潜在的恶意行为，我们提出了一个理论假设和分析方法，并引入了一种名为IntentObfuscator的新的黑盒越狱攻击方法论，利用这一识别到的缺陷来混淆用户提示背后的真实意图。这种方法迫使LLM（大型语言模型）无意中生成受限内容，绕过它们内置的内容安全措施。我们在这个框架下详细介绍了两种实现方式：“模糊意图”和“制造模糊”，通过操纵查询复杂性和不确定性，有效地规避了恶意意图检测。我们在多个模型上经验验证了IntentObfuscator方法的有效性，包括ChatGPT-3.5、ChatGPT-4、Qwen和Baichuan，取得了平均越狱成功率为69.21\%。值得注意的是，我们在ChatGPT-3.5上的测试，该模型声称拥有1亿周活跃用户，实现了令人瞩目的83.65\%的成功率。我们还将验证扩展到各种类型的敏感内容，如图形暴力、种族主义、性别歧视、政治敏感性、网络安全威胁和犯罪技能，进一步证明了我们的研究结果对增强“红队”策略对抗LLM内容安全框架的重大影响。

更新时间: 2024-05-07 10:20:07

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.03654v2

FedStale: leveraging stale client updates in federated learning

Federated learning algorithms, such as FedAvg, are negatively affected by data heterogeneity and partial client participation. To mitigate the latter problem, global variance reduction methods, like FedVARP, leverage stale model updates for non-participating clients. These methods are effective under homogeneous client participation. Yet, this paper shows that, when some clients participate much less than others, aggregating updates with different levels of staleness can detrimentally affect the training process. Motivated by this observation, we introduce FedStale, a novel algorithm that updates the global model in each round through a convex combination of "fresh" updates from participating clients and "stale" updates from non-participating ones. By adjusting the weight in the convex combination, FedStale interpolates between FedAvg, which only uses fresh updates, and FedVARP, which treats fresh and stale updates equally. Our analysis of FedStale convergence yields the following novel findings: i) it integrates and extends previous FedAvg and FedVARP analyses to heterogeneous client participation; ii) it underscores how the least participating client influences convergence error; iii) it provides practical guidelines to best exploit stale updates, showing that their usefulness diminishes as data heterogeneity decreases and participation heterogeneity increases. Extensive experiments featuring diverse levels of client data and participation heterogeneity not only confirm these findings but also show that FedStale outperforms both FedAvg and FedVARP in many settings.

Updated: 2024-05-07 10:11:42

标题: FedStale: 利用过期的客户端更新在联邦学习中

摘要: 联邦学习算法，如FedAvg，受到数据异质性和部分客户参与的负面影响。为了缓解后者问题，全局方差缩减方法，如FedVARP，利用过时的模型更新来处理非参与客户。这些方法在客户参与程度相同的情况下是有效的。然而，本文显示，当一些客户参与程度远低于其他客户时，汇总不同程度陈旧度的更新可能会对训练过程产生不利影响。受到这一观察的启发，我们引入了FedStale，一种新颖算法，通过将参与客户的“新鲜”更新和非参与客户的“陈旧”更新的凸组合，在每一轮更新全局模型。通过调整凸组合中的权重，FedStale在FedAvg（仅使用新鲜更新）和FedVARP（将新鲜和陈旧更新视为相等）之间进行插值。我们对FedStale收敛性的分析得出以下新发现：i）它整合和扩展了先前对FedAvg和FedVARP在异质客户参与中的分析；ii）它强调了最不参与的客户如何影响收敛误差；iii）它提供了最佳利用陈旧更新的实用指南，表明随着数据异质性减少和参与程度异质性增加，它们的有用性会降低。大量实验涉及不同程度的客户数据和参与程度异质性，不仅证实了这些发现，还显示FedStale在许多情况下优于FedAvg和FedVARP。

更新时间: 2024-05-07 10:11:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04171v1

GraphGPT: Graph Instruction Tuning for Large Language Models

Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Traditional methods often depend on fine-tuning with task-specific labels, limiting their effectiveness when labeled data is scarce. Our research tackles this by advancing graph model generalization in zero-shot learning environments. Inspired by the success of large language models (LLMs), we aim to create a graph-oriented LLM capable of exceptional generalization across various datasets and tasks without relying on downstream graph data. We introduce the GraphGPT framework, which integrates LLMs with graph structural knowledge through graph instruction tuning. This framework includes a text-graph grounding component to link textual and graph structures and a dual-stage instruction tuning approach with a lightweight graph-text alignment projector. These innovations allow LLMs to comprehend complex graph structures and enhance adaptability across diverse datasets and tasks. Our framework demonstrates superior generalization in both supervised and zero-shot graph learning tasks, surpassing existing benchmarks. The open-sourced model implementation of our GraphGPT is available at https://github.com/HKUDS/GraphGPT.

Updated: 2024-05-07 10:10:14

标题: GraphGPT：用于大型语言模型的图指令调整

摘要: 图神经网络（GNNs）通过节点之间的递归交换和聚合，已经发展成了理解图结构的工具。为了增强鲁棒性，自监督学习（SSL）已经成为数据增强的重要工具。传统方法通常依赖于使用任务特定标签进行微调，当标记数据稀缺时，它们的有效性受到限制。我们的研究通过在零样本学习环境中推进图模型泛化来解决这个问题。受到大型语言模型（LLMs）成功的启发，我们旨在创建一个面向图的LLM，能够在不依赖下游图数据的情况下在各种数据集和任务中实现卓越的泛化能力。我们引入了GraphGPT框架，该框架通过图指令调整将LLMs与图结构知识相结合。该框架包括一个文本-图关联组件，用于连接文本和图结构，以及一个双阶段指令调整方法，具有轻量级的图文对齐投影仪。这些创新使LLMs能够理解复杂的图结构，并增强在各种数据集和任务中的适应性。我们的框架在监督学习和零样本图学习任务中展示出优越的泛化能力，超过了现有的基准。我们的GraphGPT的开源模型实现可在https://github.com/HKUDS/GraphGPT上找到。

更新时间: 2024-05-07 10:10:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.13023v3

Three variations of Heads or Tails Game for Bitcoin

We present three very simple variants of the classic Heads or Tails game using chips, each of which contributes to our understanding of the Bitcoin protocol. The first variant addresses the issue of temporary Bitcoin forks, which occur when two miners discover blocks simultaneously. We determine the threshold at which an honest but temporarily ``Byzantine'' miner persists in mining on their fork to save his orphaned blocks. The second variant of Heads or Tails game is biased in favor of the player and helps to explain why the difficulty adjustment formula is vulnerable to attacks of Nakamoto's consensus. We derive directly and in a simple way, without relying on a Markov decision solver as was the case until now, the threshold beyond which a miner without connectivity finds it advantageous to adopt a deviant mining strategy on Bitcoin. The third variant of Heads or Tails game is unbiased and demonstrates that this issue in the Difficulty Adjustment formula can be fully rectified. Our results are in agreement with the existing literature that we clarify both qualitatively and quantitatively using very simple models and scripts that are easy to implement.

Updated: 2024-05-07 10:08:15

标题: 比特币猜正反游戏的三种变体

摘要: 我们提出了三种非常简单的经典猜正反游戏的变体，使用筹码，每种变体都有助于我们对比特币协议的理解。第一种变体解决了临时比特币分叉的问题，即当两个矿工同时发现区块时发生的情况。我们确定了一个阈值，即一个诚实但临时“拜占庭”矿工坚持在他们的分叉上挖矿以保存他的孤儿区块。猜正反游戏的第二种变体偏向于玩家，并有助于解释为什么难度调整公式容易受到中本聪共识攻击的影响。我们直接推导出了一个阈值，超过这个阈值，一个没有连接性的矿工发现采用偏离性挖矿策略对比特币更有利，而无需像以前那样依赖马尔可夫决策解决器。猜正反游戏的第三种变体是无偏的，并且证明了难度调整公式中的这个问题可以完全纠正。我们的结果与现有文献一致，我们使用非常简单的模型和易于实现的脚本在定性和定量上澄清了这个问题。

更新时间: 2024-05-07 10:08:15

领域: cs.CR,math.PR,68M01, 60G40, 91A60

下载: http://arxiv.org/abs/2405.04168v1

Overcoming challenges of translating deep-learning models for glioblastoma: the ZGBM consortium

Objective: To report imaging protocol and scheduling variance in routine care of glioblastoma patients in order to demonstrate challenges of integrating deep-learning models in glioblastoma care pathways. Additionally, to understand the most common imaging studies and image contrasts to inform the development of potentially robust deep-learning models. Methods: MR imaging data were analysed from a random sample of five patients from the prospective cohort across five participating sites of the ZGBM consortium. Reported clinical and treatment data alongside DICOM header information were analysed to understand treatment pathway imaging schedules. Results: All sites perform all structural imaging at every stage in the pathway except for the presurgical study, where in some sites only contrast-enhanced T1-weighted imaging is performed. Diffusion MRI is the most common non-structural imaging type, performed at every site. Conclusion: The imaging protocol and scheduling varies across the UK, making it challenging to develop machine-learning models that could perform robustly at other centres. Structural imaging is performed most consistently across all centres. Advances in knowledge: Successful translation of deep-learning models will likely be based on structural post-treatment imaging unless there is significant effort made to standardise non-structural or peri-operative imaging protocols and schedules.

Updated: 2024-05-07 10:04:08

标题: 克服翻译胶质母细胞瘤深度学习模型的挑战：ZGBM联盟

摘要: 目标：报告脑胶质母细胞瘤患者常规护理中的影像学协议和安排变化，以展示整合深度学习模型在脑胶质母细胞瘤护理路径中的挑战。此外，为了了解最常见的影像研究和图像对比，以指导潜在强大的深度学习模型的发展。方法：从ZGBM联盟的五个参与站点的前瞻队列中随机抽取了五名患者的MR影像数据进行分析。分析了报告的临床和治疗数据以及DICOM头信息，以了解治疗路径影像安排。结果：所有站点在路径的每个阶段都进行所有结构影像，除了在某些站点仅进行增强T1加权影像的术前研究。扩散MRI是最常见的非结构性影像类型，在每个站点都有进行。结论：英国各地的影像协议和安排存在差异，这使得在其他中心开发能够稳健执行的机器学习模型具有挑战性。所有中心最一致进行结构性影像。知识进展：成功地翻译深度学习模型可能会基于结构性后治疗影像，除非有重大努力来标准化非结构性或围手术影像的协议和安排。

更新时间: 2024-05-07 10:04:08

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2405.05980v1

Opportunities for machine learning in scientific discovery

Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how the scientific community can increasingly leverage ML techniques to achieve scientific discoveries. We observe that the applicability and opportunity of ML depends strongly on the nature of the problem domain, and whether we have full ({\it e.g.}, turbulence), partial ({\it e.g.}, computational biochemistry), or no ({\it e.g.}, neuroscience) {\it a-priori} knowledge about the governing equations and physical properties of the system. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries. Throughout these diverse fields, there is a theme that ML is enabling researchers to embrace complexity in observational data that was previously intractable to classic analysis and numerical investigations.

Updated: 2024-05-07 09:58:02

标题: 机器学习在科学发现中的机会

摘要: 技术进步大大增加了计算能力和数据可用性，使得强大的机器学习（ML）技术可以在各个领域应用。然而，我们利用ML方法进行科学发现的能力，即获得关于自然过程的基本和形式化知识，仍处于萌芽阶段。在这篇综述中，我们探讨科学界如何越来越多地利用ML技术来实现科学发现。我们观察到，ML的适用性和机会强烈取决于问题域的性质，以及我们是否对系统的控制方程和物理属性有完全（例如湍流）、部分（例如计算生物化学）或没有（例如神经科学）先验知识。尽管仍存在挑战，但原则性地使用ML正在为基础科学发现开辟新途径。在这些不同领域中，一个主题是ML使研究人员能够接受以前难以处理的经典分析和数值研究无法解决的观测数据中的复杂性。

更新时间: 2024-05-07 09:58:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04161v1

Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the cross-modality inputs, such as images, videos and text. Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module. Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts. We also introduce "LLM-Bridge" method to resolve the heterogeneous representation problems between descriptive texts of different modalities. We conduct a series of objective and subjective evaluations on the proposed model, and results indicate that our model surpasses the performance of current state-of-the-art models. Our codes and examples is availble at: https://github.com/WangTooNaive/MozartsTouch

Updated: 2024-05-07 09:55:39

标题: 莫扎特之触：基于预训练大型模型的轻量级多模态音乐生成框架

摘要: 近年来，AI生成内容（AIGC）已经经历了快速发展，促进了音乐、图像和其他形式的艺术表达在各个行业中的生成。然而，关于通用多模态音乐生成模型的研究仍然很少。为了填补这一空白，我们提出了一个多模态音乐生成框架莫扎特之触（Mozart's Touch）。它可以生成与跨模态输入（如图像、视频和文本）对齐的音乐。莫扎特之触由三个主要组件组成：多模态字幕模块、大型语言模型（LLM）理解和桥接模块以及音乐生成模块。与传统方法不同，莫扎特之触不需要对预训练模型进行训练或微调，通过清晰可解释的提示提供效率和透明度。我们还引入了"LLM-Bridge"方法来解决不同模态的描述性文本之间的异构表示问题。我们对所提出的模型进行了一系列客观和主观评估，结果表明我们的模型超越了当前最先进模型的性能。我们的代码和示例可在以下链接找到：https://github.com/WangTooNaive/MozartsTouch

更新时间: 2024-05-07 09:55:39

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.02801v2

The Impact of Background Removal on Performance of Neural Networks for Fashion Image Classification and Segmentation

Fashion understanding is a hot topic in computer vision, with many applications having great business value in the market. Fashion understanding remains a difficult challenge for computer vision due to the immense diversity of garments and various scenes and backgrounds. In this work, we try removing the background from fashion images to boost data quality and increase model performance. Having fashion images of evident persons in fully visible garments, we can utilize Salient Object Detection to achieve the background removal of fashion data to our expectations. A fashion image with the background removed is claimed as the "rembg" image, contrasting with the original one in the fashion dataset. We conducted extensive comparative experiments with these two types of images on multiple aspects of model training, including model architectures, model initialization, compatibility with other training tricks and data augmentations, and target task types. Our experiments show that background removal can effectively work for fashion data in simple and shallow networks that are not susceptible to overfitting. It can improve model accuracy by up to 5% in the classification on the FashionStyle14 dataset when training models from scratch. However, background removal does not perform well in deep neural networks due to incompatibility with other regularization techniques like batch normalization, pre-trained initialization, and data augmentations introducing randomness. The loss of background pixels invalidates many existing training tricks in the model training, adding the risk of overfitting for deep models.

Updated: 2024-05-07 09:52:27

标题: 背景去除对时尚图像分类和分割神经网络性能的影响

摘要: 时尚理解是计算机视觉中的热门话题，许多应用在市场上具有巨大的商业价值。由于服装种类繁多、场景和背景各异，时尚理解对计算机视觉来说仍然是一个困难的挑战。在这项工作中，我们尝试从时尚图片中去除背景，以提高数据质量并提高模型性能。通过使用显着对象检测，在完全可见服装的明显人物的时尚图片上，我们可以实现时尚数据的背景去除。去除背景的时尚图片被称为“rembg”图像，与时尚数据集中的原始图像形成对比。我们对这两种类型的图像进行了广泛的比较实验，包括模型架构、模型初始化、与其他训练技巧和数据增强的兼容性，以及目标任务类型等多个方面的模型训练。我们的实验证明，背景去除可以在不容易过拟合的简单和浅层网络中有效地处理时尚数据。当从头开始训练模型时，它可以将时尚风格14数据集上的分类模型的准确率提高多达5%。然而，由于与其他正则化技术（如批量归一化、预训练初始化和引入随机性的数据增强）不兼容，背景去除在深度神经网络中表现不佳。背景像素的丢失使模型训练中许多现有的训练技巧失效，为深度模型引入了过拟合的风险。

更新时间: 2024-05-07 09:52:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.09764v2

How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability

Transformer-based language models are treated as black-boxes because of their large number of parameters and complex internal interactions, which is a serious safety concern. Mechanistic Interpretability (MI) intends to reverse-engineer neural network behaviors in terms of human-understandable components. In this work, we focus on understanding how GPT-2 Small performs the task of predicting three-letter acronyms. Previous works in the MI field have focused so far on tasks that predict a single token. To the best of our knowledge, this is the first work that tries to mechanistically understand a behavior involving the prediction of multiple consecutive tokens. We discover that the prediction is performed by a circuit composed of 8 attention heads (~5% of the total heads) which we classified in three groups according to their role. We also demonstrate that these heads concentrate the acronym prediction functionality. In addition, we mechanistically interpret the most relevant heads of the circuit and find out that they use positional information which is propagated via the causal mask mechanism. We expect this work to lay the foundation for understanding more complex behaviors involving multiple-token predictions.

Updated: 2024-05-07 09:50:57

标题: GPT-2如何预测首字母缩写？通过机械解释提取和理解一个电路

摘要: 基于Transformer的语言模型被视为黑匣子，因为其庞大的参数和复杂的内部交互，这是一个严重的安全问题。机械解释性（MI）旨在将神经网络的行为逆向工程为人类可理解的组件。在这项工作中，我们专注于理解GPT-2 Small如何执行预测三字母缩写的任务。迄今为止，机械解释领域的先前工作主要集中在预测单个标记的任务上。据我们所知，这是第一项尝试在涉及预测多个连续标记的行为方面进行机械理解的工作。我们发现，预测是由由8个注意头（约占总头数的5%）组成的电路执行的，我们根据其角色将其分类为三组。我们还证明这些头集中了首字母缩写预测功能。此外，我们机械解释了电路中最相关的头，并发现它们使用通过因果蒙版机制传播的位置信息。我们期望这项工作为理解涉及多标记预测的更复杂行为奠定基础。

更新时间: 2024-05-07 09:50:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.04156v1

CAKE: Sharing Slices of Confidential Data on Blockchain

Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.

Updated: 2024-05-07 09:44:04

标题: CAKE：在区块链上共享机密数据片段

摘要: 合作信息系统通常涉及在分布式环境中进行协作过程的各种实体。区块链技术提供了一种机制，可以自动化这些过程，即使参与者之间存在部分信任。存储在区块链上的数据被复制到网络中的所有节点，确保所有参与者都可以访问。虽然这一方面有助于追溯性、完整性和持久性，但由于保密问题，公共区块链在企业设置中的采用面临挑战。在本文中，我们介绍了一个名为Control Access via Key Encryption（CAKE）的软件工具，旨在确保涉及公共区块链的场景中的数据保密性。在概述其核心组件和功能后，我们展示了CAKE在物流领域内真实世界的网络安全项目中的应用。

更新时间: 2024-05-07 09:44:04

领域: cs.CR

下载: http://arxiv.org/abs/2405.04152v1

Gas Source Localization Using physics Guided Neural Networks

This work discusses a novel method for estimating the location of a gas source based on spatially distributed concentration measurements taken, e.g., by a mobile robot or flying platform that follows a predefined trajectory to collect samples. The proposed approach uses a Physics-Guided Neural Network to approximate the gas dispersion with the source location as an additional network input. After an initial offline training phase, the neural network can be used to efficiently solve the inverse problem of localizing the gas source based on measurements. The proposed approach allows avoiding rather costly numerical simulations of gas physics needed for solving inverse problems. Our experiments show that the method localizes the source well, even when dealing with measurements affected by noise.

Updated: 2024-05-07 09:41:39

标题: 气体源定位的物理引导神经网络

摘要: 这项工作讨论了一种估计气体源位置的新方法，该方法基于空间分布浓度测量，这些测量可以通过移动机器人或飞行平台采集，该机器人或飞行平台按照预定义的轨迹收集样本。所提出的方法使用物理引导的神经网络来近似气体扩散，其中源位置作为额外的网络输入。在初始的离线训练阶段之后，神经网络可以有效地用于根据测量结果解决本地化气体源的逆问题。所提出的方法可以避免解决逆问题所需的较昂贵的气体物理数值模拟。我们的实验表明，即使处理受噪声影响的测量，该方法也能很好地定位气体源。

更新时间: 2024-05-07 09:41:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.04151v1

On regularized polynomial functional regression

This article offers a comprehensive treatment of polynomial functional regression, culminating in the establishment of a novel finite sample bound. This bound encompasses various aspects, including general smoothness conditions, capacity conditions, and regularization techniques. In doing so, it extends and generalizes several findings from the context of linear functional regression as well. We also provide numerical evidence that using higher order polynomial terms can lead to an improved performance.

Updated: 2024-05-07 09:38:40

标题: 关于正则化多项式函数回归

摘要: 本文对多项式函数回归进行了全面的研究，最终建立了一个新的有限样本上界。这个上界包括了一些方面，包括一般的平滑条件、容量条件和正则化技术。通过这样做，它扩展并概括了线性函数回归背景下的一些发现。我们还提供了数值证据表明，使用高阶多项式项可以提高性能。

更新时间: 2024-05-07 09:38:40

领域: math.NA,cs.LG,cs.NA,math.ST,stat.TH,65K10, 62G20, 62J05

下载: http://arxiv.org/abs/2311.03036v2

CoverLib: Classifiers-equipped Experience Library by Iterative Problem Distribution Coverage Maximization for Domain-tuned Motion Planning

Library-based methods are known to be very effective for fast motion planning by adapting an experience retrieved from a precomputed library. This article presents CoverLib, a principled approach for constructing and utilizing such a library. CoverLib iteratively adds an experience-classifier-pair to the library, where each classifier corresponds to an adaptable region of the experience within the problem space. This iterative process is an active procedure, as it selects the next experience based on its ability to effectively cover the uncovered region. During the query phase, these classifiers are utilized to select an experience that is expected to be adaptable for a given problem. Experimental results demonstrate that CoverLib effectively mitigates the trade-off between plannability and speed observed in global (e.g. sampling-based) and local (e.g. optimization-based) methods. As a result, it achieves both fast planning and high success rates over the problem domain. Moreover, due to its adaptation-algorithm-agnostic nature, CoverLib seamlessly integrates with various adaptation methods, including nonlinear programming-based and sampling-based algorithms.

Updated: 2024-05-07 09:36:54

标题: CoverLib: 通过迭代问题分布覆盖最大化为领域定制的运动规划提供分类器装备的经验库

摘要: 基于库的方法以从预先计算的库中检索的经验适应快速运动规划而闻名。本文介绍了CoverLib，这是一种构建和利用这种库的原则性方法。CoverLib通过迭代向库中添加经验-分类器对，其中每个分类器对应于问题空间内的一个可适应区域。这个迭代过程是一个主动过程，因为它基于下一个经验选择其能够有效覆盖未覆盖区域的能力。在查询阶段，这些分类器被用来选择一个预计对给定问题可适应的经验。实验结果表明，CoverLib有效地缓解了全局（例如基于采样的）和局部（例如基于优化的）方法中观察到的可规划性和速度之间的权衡。因此，它在问题域上实现了快速规划和高成功率。此外，由于其适应算法无关性，CoverLib可以无缝地与各种适应方法集成，包括基于非线性规划和基于采样的算法。

更新时间: 2024-05-07 09:36:54

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02968v2

Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks. In this paper, we introduce \emph{Fleet of Agents (FoA)}, a novel framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. FoA spawns a multitude of agents, each exploring autonomously, followed by a selection phase where resampling based on a heuristic value function optimizes the balance between exploration and exploitation. This mechanism enables dynamic branching, adapting the exploration strategy based on discovered solutions. We experimentally validate FoA using two benchmark tasks, "Game of 24" and "Mini-Crosswords". FoA outperforms the previously proposed Tree-of-Thoughts method in terms of efficacy and efficiency: it significantly decreases computational costs (by calling the value function less frequently) while preserving comparable or even superior accuracy.

Updated: 2024-05-07 09:36:23

标题: 代理人舰队：使用遗传粒子滤波器协调大型语言模型进行问题解决

摘要: 大型语言模型（LLMs）已经显著发展，从简单的输出生成到复杂的推理，从独立使用到嵌入到更广泛的框架中。本文介绍了一种新颖的框架，称为\emph{代理舰队（FoA）}，利用LLMs作为代理通过动态树搜索，采用遗传类型的粒子滤波方法。FoA产生了许多代理，每个代理都在自主探索，然后在一个选择阶段，根据启发式值函数重新采样来优化探索和开发之间的平衡。这种机制使得动态分支成为可能，根据发现的解决方案调整探索策略。我们通过两个基准任务，“24点游戏”和“迷你填字游戏”，对FoA进行了实验验证。FoA在效果和效率方面优于先前提出的“思维树”方法：它显著降低了计算成本（较少调用价值函数），同时保持了相当甚至更优的准确性。

更新时间: 2024-05-07 09:36:23

领域: cs.CL,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.06691v1

Multiparameter regularization and aggregation in the context of polynomial functional regression

Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for multiple parameter regularization and presenting a theoretically grounded method for dealing with the associated parameters. This method facilitates the aggregation of models with varying regularization parameters. The efficacy of the proposed approach is assessed through evaluations on both synthetic and some real-world medical data, revealing promising results.

Updated: 2024-05-07 09:26:20

标题: 多参数正则化和聚合在多项式函数回归的背景下

摘要: 最近在多项式函数回归方面的大部分研究结果都集中在对单参数正则化方案的深入探索上。相反，本研究通过引入一种多参数正则化算法，并提出一种理论上基础的处理相关参数的方法，超越了这一框架。该方法有助于整合具有不同正则化参数的模型。通过对合成数据和一些真实的医疗数据进行评估，验证了所提出方法的有效性，并展示了令人期待的结果。

更新时间: 2024-05-07 09:26:20

领域: stat.ML,cs.LG,cs.NA,math.NA,math.ST,stat.TH,65K10, 62G20

下载: http://arxiv.org/abs/2405.04147v1

Fantômas: Understanding Face Anonymization Reversibility

Face images are a rich source of information that can be used to identify individuals and infer private information about them. To mitigate this privacy risk, anonymizations employ transformations on clear images to obfuscate sensitive information, all while retaining some utility. Albeit published with impressive claims, they sometimes are not evaluated with convincing methodology. Reversing anonymized images to resemble their real input -- and even be identified by face recognition approaches -- represents the strongest indicator for flawed anonymization. Some recent results indeed indicate that this is possible for some approaches. It is, however, not well understood, which approaches are reversible, and why. In this paper, we provide an exhaustive investigation in the phenomenon of face anonymization reversibility. Among other things, we find that 11 out of 15 tested face anonymizations are at least partially reversible and highlight how both reconstruction and inversion are the underlying processes that make reversal possible.

Updated: 2024-05-07 09:20:13

标题: Fantômas：了解人脸匿名化的可逆性

摘要: 面部图像是一个丰富的信息源，可用于识别个人并推断其私人信息。为了减轻这种隐私风险，匿名化采用对清晰图像进行转换以混淆敏感信息，同时保留一些效用。尽管以令人印象深刻的声明发表，但它们有时并没有用令人信服的方法进行评估。将匿名化的图像反转以使其类似于其真实输入，甚至可以被面部识别方法识别，代表了匿名化缺陷的最强指标。一些最近的结果确实表明，对于某些方法来说，这是可能的。然而，目前尚不清楚哪些方法是可逆的，以及为什么。在本文中，我们对面部匿名化可逆性现象进行了全面调查。除其他发现外，我们发现在15种经过测试的面部匿名化中，有11种至少部分可逆，并强调重建和反演是使反转成为可能的基础过程。

更新时间: 2024-05-07 09:20:13

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2210.10651v3

GPT-Enabled Cybersecurity Training: A Tailored Approach for Effective Awareness

This study explores the limitations of traditional Cybersecurity Awareness and Training (CSAT) programs and proposes an innovative solution using Generative Pre-Trained Transformers (GPT) to address these shortcomings. Traditional approaches lack personalization and adaptability to individual learning styles. To overcome these challenges, the study integrates GPT models to deliver highly tailored and dynamic cybersecurity learning expe-riences. Leveraging natural language processing capabilities, the proposed approach personalizes training modules based on individual trainee pro-files, helping to ensure engagement and effectiveness. An experiment using a GPT model to provide a real-time and adaptive CSAT experience through generating customized training content. The findings have demonstrated a significant improvement over traditional programs, addressing issues of en-gagement, dynamicity, and relevance. GPT-powered CSAT programs offer a scalable and effective solution to enhance cybersecurity awareness, provid-ing personalized training content that better prepares individuals to miti-gate cybersecurity risks in their specific roles within the organization.

Updated: 2024-05-07 09:08:00

标题: GPT-启用的网络安全培训：有效意识的个性化方法

摘要: 本研究探讨了传统网络安全意识和培训（CSAT）计划的局限性，并提出了一种创新解决方案，利用生成式预训练变压器（GPT）来解决这些缺点。传统方法缺乏个性化和适应个体学习风格的能力。为了克服这些挑战，该研究将GPT模型整合到其中，提供高度定制和动态的网络安全学习体验。利用自然语言处理能力，所提出的方法根据个体培训者的个人资料个性化培训模块，有助于确保参与度和有效性。通过使用GPT模型进行实时和自适应的CSAT体验实验，生成定制的培训内容。研究结果显示，与传统计划相比，有显著改进，解决了参与度、动态性和相关性等问题。由GPT驱动的CSAT计划提供了一个可扩展且有效的解决方案，提高了网络安全意识，提供个性化的培训内容，更好地帮助个人在组织内的特定角色中减轻网络安全风险。

更新时间: 2024-05-07 09:08:00

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04138v1

Enriched BERT Embeddings for Scholarly Publication Classification

With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.

Updated: 2024-05-07 09:05:20

标题: 为学术出版物分类而丰富的BERT嵌入

摘要: 随着学术文献的快速扩展和预印本的增多，研究人员面临着手动组织和标记大量文章的日益挑战。NSLP 2024 FoRC共享任务I旨在以竞赛的形式应对这一挑战。其目标是开发一个能够从给定文章中预测Open Research Knowledge Graph（ORKG）研究领域的123个预定义类别之一的分类器。本文介绍了我们的研究结果。首先，我们丰富了数据集（包含从ORKG和arXiv获取的英文学术文章），然后利用不同的预训练语言模型（PLMs），特别是BERT，并探索它们在这一下游任务中的迁移学习效果。我们的实验涵盖了基于特征和微调的迁移学习方法，使用了针对科学任务进行优化的各种PLMs，包括SciBERT、SciNCL和SPECTER2。我们进行了超参数调整，并研究了来自OpenAlex、Semantic Scholar和Crossref等文献数据库的数据增强的影响。我们的结果表明，微调预训练模型显著提升了分类性能，其中SPECTER2表现最准确。此外，通过用额外的元数据丰富数据集显著改善了分类结果，特别是在整合来自S2AG、OpenAlex和Crossref等信息时。我们最有效的方法达到了加权F1分数为0.7415。总的来说，我们的研究为学术出版物分类的可靠自动化系统的进步做出了贡献，为解决繁琐的手动策划过程提供了潜在解决方案，从而帮助研究人员高效地找到相关资源。

更新时间: 2024-05-07 09:05:20

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.04136v1

In-context Learning for Automated Driving Scenarios

One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here.

Updated: 2024-05-07 09:04:52

标题: 上下文学习在自动驾驶场景中的应用

摘要: 当前强化学习（RL）驱动的自动驾驶（AD）代理面临的关键挑战之一是以经济有效的方式实现灵活、精确和类似人类行为。本文介绍了一种创新方法，利用大型语言模型（LLM）以直观和有效的方式优化RL奖励函数，以人为中心地引导RL代理的行为。我们开发了一个框架，将说明和动态环境描述输入LLM中。然后，LLM利用这些信息来帮助生成奖励，从而将RL代理的行为引导向更类似人类驾驶的模式。实验结果表明，这种方法不仅使RL代理更具人类特征，而且还实现了更好的性能。此外，还调查了各种奖励代理和奖励塑造策略，揭示了提示设计对塑造AD车辆行为的重要影响。这些发现为更先进和类似人类的自动驾驶系统的发展提供了有希望的方向。我们的实验数据和源代码可以在这里找到。

更新时间: 2024-05-07 09:04:52

领域: cs.AI

下载: http://arxiv.org/abs/2405.04135v1

Link Me Baby One More Time: Social Music Discovery on Spotify

We explore the social and contextual factors that influence the outcome of person-to-person music recommendations and discovery. Specifically, we use data from Spotify to investigate how a link sent from one user to another results in the receiver engaging with the music of the shared artist. We consider several factors that may influence this process, such as the strength of the sender-receiver relationship, the user's role in the Spotify social network, their music social cohesion, and how similar the new artist is to the receiver's taste. We find that the receiver of a link is more likely to engage with a new artist when (1) they have similar music taste to the sender and the shared track is a good fit for their taste, (2) they have a stronger and more intimate tie with the sender, and (3) the shared artist is popular amongst the receiver's connections. Finally, we use these findings to build a Random Forest classifier to predict whether a shared music track will result in the receiver's engagement with the shared artist. This model elucidates which type of social and contextual features are most predictive, although peak performance is achieved when a diverse set of features are included. These findings provide new insights into the multifaceted mechanisms underpinning the interplay between music discovery and social processes.

Updated: 2024-05-07 09:02:21

标题: 再链接我一次：在Spotify上进行社交音乐发现

摘要: 我们探讨了影响人际音乐推荐和发现结果的社会和环境因素。具体来说，我们使用来自Spotify的数据，研究了一个用户向另一个用户发送的链接如何导致接收者参与到共享艺术家的音乐中。我们考虑了几个可能影响这一过程的因素，如发送者-接收者关系的强度，用户在Spotify社交网络中的角色，他们的音乐社交凝聚力，以及新艺术家与接收者口味的相似程度。我们发现，接收链接的用户更有可能与新艺术家互动，当（1）他们与发送者有相似的音乐口味且分享的音乐符合其口味，（2）他们与发送者的联系更紧密更亲密，以及（3）共享的艺术家在接收者的关系中很受欢迎时。最后，我们利用这些发现建立了一个随机森林分类器，以预测共享的音乐曲目是否会导致接收者与共享的艺术家互动。这个模型阐明了哪种类型的社会和环境特征最具预测性，尽管在包括多种特征时能够实现最佳性能。这些发现为揭示音乐发现和社会过程之间相互作用的多方面机制提供了新的见解。

更新时间: 2024-05-07 09:02:21

领域: cs.SI,cs.IR,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2401.08818v2

Geometry and Dynamics of LayerNorm

A technical note aiming to offer deeper intuition for the LayerNorm function common in deep neural networks. LayerNorm is defined relative to a distinguished 'neural' basis, but it does more than just normalize the corresponding vector elements. Rather, it implements a composition -- of linear projection, nonlinear scaling, and then affine transformation -- on input activation vectors. We develop both a new mathematical expression and geometric intuition, to make the net effect more transparent. We emphasize that, when LayerNorm acts on an N-dimensional vector space, all outcomes of LayerNorm lie within the intersection of an (N-1)-dimensional hyperplane and the interior of an N-dimensional hyperellipsoid. This intersection is the interior of an (N-1)-dimensional hyperellipsoid, and typical inputs are mapped near its surface. We find the direction and length of the principal axes of this (N-1)-dimensional hyperellipsoid via the eigen-decomposition of a simply constructed matrix.

Updated: 2024-05-07 09:01:02

标题: 层归一化的几何和动力学

摘要: 一项旨在为深度神经网络常见的LayerNorm函数提供更深刻直觉的技术说明。LayerNorm相对于一个明确的“神经”基准定义，但它不仅仅是对应向量元素的归一化。相反，它在输入激活向量上实现了线性投影、非线性缩放，然后是仿射变换的组合。我们开发了一个新的数学表达和几何直觉，以使总效果更加透明。我们强调，当LayerNorm作用于一个N维向量空间时，所有LayerNorm的结果都位于一个(N-1)维超平面和一个N维超椭球体的内部的交集中。这个交集是一个(N-1)维超椭球体的内部，典型的输入被映射到其表面附近。我们通过一个简单构建的矩阵的特征分解来找到这个(N-1)维超椭球体的主轴的方向和长度。

更新时间: 2024-05-07 09:01:02

领域: cs.LG

下载: http://arxiv.org/abs/2405.04134v1

Large Language Models (LLMs) as Agents for Augmented Democracy

We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, the accuracy of the out of sample predictions lie in the range 69%-76% and are significantly better at predicting the preferences of liberal and college educated participants. At the population level, we aggregate preferences using an adaptation of the Borda score and compare the ranking of policy proposals obtained from a probabilistic sample of participants and from data augmented using LLMs. We find that the augmented data predicts the preferences of the full population of participants better than probabilistic samples alone when these represent less than 30% to 40% of the total population. These results indicate that LLMs are potentially useful for the construction of systems of augmented democracy.

Updated: 2024-05-07 08:57:18

标题: 大型语言模型（LLMs）作为增强民主的代理者

摘要: 我们探讨了一个基于现成的LLMs进行微调的增强民主系统的能力，该系统建立在总结2022年巴西总统选举期间收集的67项政策提案中个人喜好数据的基础上。我们使用训练-测试交叉验证设置来估计LLMs预测个体政治选择和完整样本参与者的集体偏好的准确性。在个体水平上，样本外预测的准确性在69%-76%的范围内，并且在预测自由主义和受过大学教育的参与者的偏好方面表现更好。在人口水平上，我们使用Borda分数的改编来汇总偏好，并比较从参与者的概率样本和使用LLMs增强的数据中获得的政策提案排名。我们发现，当这些样本代表总人口不到30%至40%时，增强数据比单纯的概率样本更好地预测完整参与者人口的偏好。这些结果表明，LLMs在增强民主系统构建中有潜在用途。

更新时间: 2024-05-07 08:57:18

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.03452v2

LordNet: An Efficient Neural Network for Learning to Solve Parametric Partial Differential Equations without Simulated Data

Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, which we refer to it as mean squared residual (MSR) loss constructed by the discretized PDE. We investigate the physical information in the MSR loss, which we called long-range entanglements, and identify the challenge that the neural network requires the capacity to model the long-range entanglements in the spatial domain of the PDE, whose patterns vary in different PDEs. To tackle the challenge, we propose LordNet, a tunable and efficient neural network for modeling various entanglements. Inspired by the traditional solvers, LordNet models the long-range entanglements with a series of matrix multiplications, which can be seen as the low-rank approximation to the general fully-connected layers and extracts the dominant pattern with reduced computational cost. The experiments on solving Poisson's equation and (2D and 3D) Navier-Stokes equation demonstrate that the long-range entanglements from the MSR loss can be well modeled by the LordNet, yielding better accuracy and generalization ability than other neural networks. The results show that the Lordnet can be $40\times$ faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.

Updated: 2024-05-07 08:54:59

标题: LordNet：一种高效的神经网络，用于学习解决参数化偏微分方程而无需模拟数据

摘要: 神经算子作为一种强大的逼近非线性算子在无限维函数空间之间的方法，在加速偏微分方程（PDE）的解决方面已被证明是很有前景的。然而，它需要大量的模拟数据，收集起来可能成本很高。这可以通过从物理约束损失中学习物理学来避免，我们将其称为通过离散化PDE构建的均方残差（MSR）损失。我们研究了MSR损失中的物理信息，我们称之为长程纠缠，并确定了神经网络需要具备的能力，以在PDE的空间域中建模长程纠缠，其模式在不同的PDE中变化。为了解决这一挑战，我们提出了LordNet，这是一个可调节和高效的神经网络，用于建模各种纠缠。受传统求解器的启发，LordNet用一系列矩阵乘法来建模长程纠缠，可以看作是对一般全连接层的低秩逼近，并以降低的计算成本提取主要模式。在解决泊松方程和（2D和3D）Navier-Stokes方程的实验中，结果表明，从MSR损失中获得的长程纠缠可以很好地被LordNet建模，比其他神经网络具有更好的准确性和泛化能力。结果表明，LordNet比传统的PDE求解器快40倍。此外，LordNet在准确性和效率方面优于其他现代神经网络架构，并且参数尺寸最小。

更新时间: 2024-05-07 08:54:59

领域: cs.LG

下载: http://arxiv.org/abs/2206.09418v3

Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning

The latest developments in Natural Language Processing (NLP) have demonstrated remarkable progress in a code-text retrieval problem. As the Transformer-based models used in this task continue to increase in size, the computational costs and time required for end-to-end fine-tuning become substantial. This poses a significant challenge for adapting and utilizing these models when computational resources are limited. Motivated by these concerns, we propose a fine-tuning framework that leverages Parameter-Efficient Fine-Tuning (PEFT) techniques. Moreover, we adopt contrastive learning objectives to improve the quality of bimodal representations learned by transformer models. Additionally, for PEFT methods we provide extensive benchmarking, the lack of which has been highlighted as a crucial problem in the literature. Based on the thorough experimentation with the CodeT5+ model conducted on two datasets, we demonstrate that the proposed fine-tuning framework has the potential to improve code-text retrieval performance by tuning only 0.4% parameters at most.

Updated: 2024-05-07 08:50:25

标题: 优化联合文本和源代码嵌入以进行参数高效微调的检索任务

摘要: 自然语言处理（NLP）领域的最新发展在代码文本检索问题中取得了显著进展。随着在该任务中使用的基于Transformer的模型不断增大，进行端到端微调所需的计算成本和时间变得相当可观。这在计算资源有限时对于调整和利用这些模型构成了重大挑战。受到这些问题的激励，我们提出了一个利用参数高效微调（PEFT）技术的微调框架。此外，我们采用对比学习目标来改善Transformer模型学习的双模态表示的质量。此外，针对PEFT方法，我们提供了广泛的基准测试，这在文献中曾被强调为一个关键问题。通过在两个数据集上对CodeT5+模型进行深入实验，我们展示了所提出的微调框架有潜力通过调整最多仅0.4％的参数来提高代码文本检索性能。

更新时间: 2024-05-07 08:50:25

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2405.04126v1

Comparative Study of Recurrent Neural Networks for Virtual Analog Audio Effects Modeling

Analog electronic circuits are at the core of an important category of musical devices. The nonlinear features of their electronic components give analog musical devices a distinctive timbre and sound quality, making them highly desirable. Artificial neural networks have rapidly gained popularity for the emulation of analog audio effects circuits, particularly recurrent networks. While neural approaches have been successful in accurately modeling distortion circuits, they require architectural improvements that account for parameter conditioning and low latency response. In this article, we explore the application of recent machine learning advancements for virtual analog modeling. We compare State Space models and Linear Recurrent Units against the more common Long Short Term Memory networks. These have shown promising ability in sequence to sequence modeling tasks, showing a notable improvement in signal history encoding. Our comparative study uses these black box neural modeling techniques with a variety of audio effects. We evaluate the performance and limitations using multiple metrics aiming to assess the models' ability to accurately replicate energy envelopes, frequency contents, and transients in the audio signal. To incorporate control parameters we employ the Feature wise Linear Modulation method. Long Short Term Memory networks exhibit better accuracy in emulating distortions and equalizers, while the State Space model, followed by Long Short Term Memory networks when integrated in an encoder decoder structure, outperforms others in emulating saturation and compression. When considering long time variant characteristics, the State Space model demonstrates the greatest accuracy. The Long Short Term Memory and, in particular, Linear Recurrent Unit networks present more tendency to introduce audio artifacts.

Updated: 2024-05-07 08:47:40

标题: 虚拟模拟音频效果建模中循环神经网络的比较研究

摘要: 模拟电子电路是一类重要的音乐设备的核心。它们的电子组件的非线性特征赋予了模拟音乐设备独特的音色和声音质量，使其备受欢迎。人工神经网络已迅速流行起来，用于模拟模拟音频效果电路，尤其是循环网络。虽然神经方法在准确建模失真电路方面取得成功，但它们需要考虑参数调节和低延迟响应的架构改进。本文探讨了最新机器学习进展在虚拟模拟建模中的应用。我们将状态空间模型和线性递归单元与更常见的长短期记忆网络进行比较。这些模型在序列到序列建模任务中显示出了有希望的能力，提高了信号历史编码的质量。我们使用这些黑匣子神经建模技术对各种音频效果进行了比较研究。我们使用多种指标评估性能和限制，旨在评估模型准确复制能量包络、频率内容和音频信号中的瞬变的能力。为了整合控制参数，我们采用特征逐线性调制方法。长短期记忆网络在模拟失真和均衡器方面表现更准确，而状态空间模型，在编码器解码器结构中与长短期记忆网络结合使用时，在模拟饱和和压缩方面胜过其他模型。在考虑长时间变化特征时，状态空间模型表现出最高的准确性。长短期记忆网络和特别是线性递归单元网络更容易引入音频伪影。

更新时间: 2024-05-07 08:47:40

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2405.04124v1

Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a novel device selection solution called FedRank, which is an end-to-end, ranking-based approach that is pre-trained by imitation learning against state-of-the-art analytical approaches. It not only considers data and system heterogeneity at runtime but also adaptively and efficiently chooses the most suitable clients for model training. Specifically, FedRank views client selection in FL as a ranking problem and employs a pairwise training strategy for the smart selection process. Additionally, an imitation learning-based approach is designed to counteract the cold-start issues often seen in state-of-the-art learning-based approaches. Experimental results reveal that \model~ boosts model accuracy by 5.2\% to 56.9\%, accelerates the training convergence up to $2.01 \times$ and saves the energy consumption up to $40.1\%$.

Updated: 2024-05-07 08:44:29

标题: 使用模仿学习的基于排名的客户端选择在高效的联邦学习中的应用

摘要: 联邦学习（FL）使多个设备能够共同训练一个共享模型，同时确保数据隐私。在每一轮训练中参与设备的选择对模型性能和训练效率至关重要，特别是考虑到设备间训练能力和数据分布的巨大异质性。为了解决这些挑战，我们引入了一种名为FedRank的新型设备选择解决方案，这是一种端到端的基于排名的方法，通过模仿学习预先训练，可以胜过最先进的分析方法。它不仅考虑了运行时的数据和系统异质性，还能自适应且高效地选择最适合模型训练的客户端。具体而言，FedRank将FL中的客户端选择视为一个排名问题，并采用了一种成对训练策略进行智能选择过程。此外，设计了一种基于模仿学习的方法来抵消状态艺术学习方法中经常出现的冷启动问题。实验结果显示，该模型将模型准确性提高了5.2\%至56.9%，加快了训练收敛速度达到了2.01倍，并节省了能耗高达40.1%。

更新时间: 2024-05-07 08:44:29

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.04122v1

Policy Learning with a Language Bottleneck

Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like features such as generalization, interpretability and human inter-operability. Inspired by the rich interactions between language and decision-making in humans, we introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules that capture the strategies underlying their most rewarding behaviors. PLLB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules. In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show that PLLB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users, enabling more effective human-AI coordination.

Updated: 2024-05-07 08:40:21

标题: 具有语言瓶颈的政策学习

摘要: 现代人工智能系统，如自动驾驶汽车和游戏玩家代理，实现了超人类的表现，但往往缺乏类似于人类的特征，如泛化能力、可解释性和人类互操作性。受到人类语言和决策之间丰富交互的启发，我们引入了一种名为Policy Learning with a Language Bottleneck（PLLB）的框架，使人工智能代理能够生成捕捉其最有益行为背后策略的语言规则。PLLB在语言模型引导下交替进行规则生成步骤和代理学习新策略的更新步骤。在一个双人沟通游戏、一个解迷任务和两个图像重建任务中，我们展示了PLLB代理不仅能够学习更可解释和泛化的行为，而且还能与人类用户分享学到的规则，实现更有效的人工智能协调。

更新时间: 2024-05-07 08:40:21

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.04118v1

A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

Split Learning (SL) is a distributed learning framework renowned for its privacy-preserving features and minimal computational requirements. Previous research consistently highlights the potential privacy breaches in SL systems by server adversaries reconstructing training data. However, these studies often rely on strong assumptions or compromise system utility to enhance attack performance. This paper introduces a new semi-honest Data Reconstruction Attack on SL, named Feature-Oriented Reconstruction Attack (FORA). In contrast to prior works, FORA relies on limited prior knowledge, specifically that the server utilizes auxiliary samples from the public without knowing any client's private information. This allows FORA to conduct the attack stealthily and achieve robust performance. The key vulnerability exploited by FORA is the revelation of the model representation preference in the smashed data output by victim client. FORA constructs a substitute client through feature-level transfer learning, aiming to closely mimic the victim client's representation preference. Leveraging this substitute client, the server trains the attack model to effectively reconstruct private data. Extensive experiments showcase FORA's superior performance compared to state-of-the-art methods. Furthermore, the paper systematically evaluates the proposed method's applicability across diverse settings and advanced defense strategies.

Updated: 2024-05-07 08:38:35

标题: 一个隐秘的恶意行为者：针对分布式学习的特征导向重构攻击

摘要: 分割学习（SL）是一个分布式学习框架，以其保护隐私特性和最小的计算要求而闻名。先前的研究一直强调SL系统中可能存在的隐私泄露问题，服务器对手可能通过重构训练数据来实现。然而，这些研究通常依赖于强假设或牺牲系统效用来增强攻击性能。本文介绍了一种新的半诚实数据重构攻击SL的方法，命名为面向特征的重构攻击（FORA）。与先前的工作相比，FORA依赖于有限的先验知识，特别是服务器利用来自公共领域的辅助样本，而不知道任何客户端的私人信息。这使得FORA能够隐秘地进行攻击并实现强大的性能。FORA利用的关键漏洞是受害客户端在被捕获的数据输出中的模型表示偏好的泄露。FORA通过特征级别的迁移学习构建了一个替代客户端，旨在紧密模仿受害客户端的表示偏好。利用这个替代客户端，服务器训练攻击模型以有效重构私人数据。大量实验展示了FORA相对于最先进方法的卓越性能。此外，本文系统地评估了所提方法在不同环境和先进防御策略中的适用性。

更新时间: 2024-05-07 08:38:35

领域: cs.CR

下载: http://arxiv.org/abs/2405.04115v1

Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: general framework and practical models

We introduce a general theory of epistemic random fuzzy sets for reasoning with fuzzy or crisp evidence. This framework generalizes both the Dempster-Shafer theory of belief functions, and possibility theory. Independent epistemic random fuzzy sets are combined by the generalized product-intersection rule, which extends both Dempster's rule for combining belief functions, and the product conjunctive combination of possibility distributions. We introduce Gaussian random fuzzy numbers and their multi-dimensional extensions, Gaussian random fuzzy vectors, as practical models for quantifying uncertainty about scalar or vector quantities. Closed-form expressions for the combination, projection and vacuous extension of Gaussian random fuzzy numbers and vectors are derived.

Updated: 2024-05-07 08:38:27

标题: 使用认知随机模糊集对模糊和不确定证据进行推理：一般框架和实用模型

摘要: 我们引入了一个关于认知随机模糊集的一般理论，用于处理模糊或明确的证据推理。该框架推广了Dempster-Shafer信念函数理论和可能性理论。独立的认知随机模糊集通过广义乘积交集规则进行组合，这扩展了合并信念函数的Dempster规则和可能性分布的乘积并联组合。我们引入了高斯随机模糊数及其多维扩展，即高斯随机模糊向量，作为量化关于标量或向量数量的不确定性的实用模型。推导了高斯随机模糊数和向量的组合、投影和虚拟扩展的封闭形式表达式。

更新时间: 2024-05-07 08:38:27

领域: cs.AI,stat.ME

下载: http://arxiv.org/abs/2202.08081v4

Acceleration Algorithms in GNNs: A Survey

Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the research community. In this paper, we present a systematic review of acceleration algorithms in GNNs, which can be categorized into three main topics based on their purpose: training acceleration, inference acceleration, and execution acceleration. Specifically, we summarize and categorize the existing approaches for each main topic, and provide detailed characterizations of the approaches within each category. Additionally, we review several libraries related to acceleration algorithms in GNNs and discuss our Scalable Graph Learning (SGL) library. Finally, we propose promising directions for future research. A complete summary is presented in our GitHub repository: https://github.com/PKU-DAIR/SGL/blob/main/Awsome-GNN-Acceleration.md.

Updated: 2024-05-07 08:34:33

标题: 图神经网络中的加速算法：一项调查

摘要: 图神经网络（GNNs）已经在各种基于图的任务中展示出有效性。然而，它们在训练和推理方面的低效性为扩展到真实世界和大规模图应用带来了挑战。为了解决这些关键挑战，提出了一系列算法来加速GNNs的训练和推理，吸引了研究界的越来越多的关注。在本文中，我们对GNNs中的加速算法进行了系统性回顾，可以根据其目的分为三个主要主题：训练加速、推理加速和执行加速。具体而言，我们总结和分类了每个主题中现有的方法，并对每个类别中的方法进行了详细的特征描述。此外，我们还回顾了几个与GNNs中加速算法相关的库，并讨论了我们的可扩展图学习（SGL）库。最后，我们提出了未来研究的有前景的方向。完整的摘要可在我们的GitHub仓库中找到：https://github.com/PKU-DAIR/SGL/blob/main/Awsome-GNN-Acceleration.md。

更新时间: 2024-05-07 08:34:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04114v1

Adaptive Least Mean pth Power Graph Neural Networks

In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage of adaptive filtering in handling noise and missing observations as well as the online update capability. The incorporated graph neural network within the LMP-GNN can train and update filter parameters online instead of predefined filter parameters in previous methods, outputting more accurate prediction results. The adaptive update scheme of the LMP-GNN follows the solution of a $l_p$-norm optimization, rooting to the minimum dispersion criterion, and yields robust estimation results for time-varying graph signals under impulsive noise. A special case of LMP-GNN named the Sign-GNN is also provided and analyzed, Experiment results on two real-world datasets of temperature graph and traffic graph under four different noise distributions prove the effectiveness and robustness of our proposed LMP-GNN.

Updated: 2024-05-07 08:28:51

标题: 自适应最小均方pth幂图神经网络

摘要: 在存在冲击噪声和缺失观测的情况下，准确在线预测时变图信号在许多应用领域中面临着重要挑战。我们提出了自适应最小均方$p^{th}$幂图神经网络（LMP-GNN），这是一个结合自适应滤波器和图神经网络的通用框架，用于在线图信号估计。LMP-GNN保留了自适应滤波器处理噪声和缺失观测的优势，同时具有在线更新能力。LMP-GNN中的图神经网络可以在线训练和更新滤波器参数，而不是使用以前方法中预定义的滤波器参数，从而输出更准确的预测结果。LMP-GNN的自适应更新方案遵循$l_p$-范数优化的解决方案，根据最小离散标准，针对冲击噪声下的时变图信号产生稳健的估计结果。我们还提供并分析了LMP-GNN的一个特例——Sign-GNN。在两个真实世界数据集（温度图和交通图）上的实验结果表明，我们提出的LMP-GNN的有效性和稳健性。

更新时间: 2024-05-07 08:28:51

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.04111v1

The Malware as a Service ecosystem

The goal of this chapter is to illuminate the operational frameworks, key actors, and significant cybersecurity implications of the Malware as a Service (MaaS) ecosystem. Highlighting the transformation of malware proliferation into a service-oriented model, the chapter discusses how MaaS democratises access to sophisticated cyberattack capabilities, enabling even those with minimal technical knowledge to execute catastrophic cyberattacks. The discussion extends to the roles within the MaaS ecosystem, including malware developers, affiliates, initial access brokers, and the essential infrastructure providers that support these nefarious activities. The study emphasises the profound challenges MaaS poses to traditional cybersecurity defences, rendered ineffective against the constantly evolving and highly adaptable threats generated by MaaS platforms. With the increase in malware sophistication, there is a parallel call for a paradigm shift in defensive strategies, advocating for dynamic analysis, behavioural detection, and the integration of AI and machine learning techniques. By exploring the intricacies of the MaaS ecosystem, including the economic motivations driving its growth and the blurred lines between legitimate service models and cyber crime, the chapter presents a comprehensive overview intended to foster a deeper understanding among researchers and cybersecurity professionals. The ultimate goal is to aid in developing more effective strategies for combating the spread of commoditised malware threats and safeguarding against the increasing accessibility and scalability of cyberattacks facilitated by the MaaS model.

Updated: 2024-05-07 08:25:12

标题: 作为服务生态系统中的恶意软件

摘要: 该章节的目标是阐明恶意软件即服务（MaaS）生态系统的运营框架、关键参与者和重要的网络安全影响。强调恶意软件传播向服务导向模式转变，该章节讨论了MaaS如何使得对复杂网络攻击能力的获取民主化，使得即使是技术知识有限的人也能执行灾难性的网络攻击。讨论延伸至MaaS生态系统内的角色，包括恶意软件开发者、附属机构、初始访问经纪人以及支持这些邪恶活动的基础设施提供者。研究强调了MaaS对传统网络安全防御所带来的深远挑战，这些防御措施无法应对不断演变和高度适应性的MaaS平台生成的威胁。随着恶意软件复杂性的增加，人们呼吁在防御策略上进行范式转变，倡导动态分析、行为检测以及整合人工智能和机器学习技术。通过探索MaaS生态系统的复杂性，包括推动其增长的经济动机以及合法服务模式与网络犯罪之间模糊的界限，该章节提供了一个旨在增进研究人员和网络安全专业人员更深入了解的综合概述。最终目标是帮助制定更有效的策略来对抗商品化恶意软件威胁的传播，并防止由MaaS模式促成的网络攻击的可及性和可扩展性的增加。

更新时间: 2024-05-07 08:25:12

领域: cs.CR

下载: http://arxiv.org/abs/2405.04109v1

A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated an emerging model commercialization for the purpose of reinforcement on model performance, such as licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may trigger concerns of the unauthorized replications or misuses over the model, so that the benefit of the model ownership will be violated. Model identity auditing is a challenging issue in protecting intellectual property of DNN models and verifying the integrity and ownership of models for guaranteeing trusts in transactions is one of the critical obstacles. In this paper, we focus on the above issue and propose a novel Accumulator-enabled Auditing for Distributed Identity of DNN Model (A2-DIDM) that utilizes blockchain and zero-knowledge techniques to protect data and function privacy while ensuring the lightweight on-chain ownership verification. The proposed model presents a scheme of identity records via configuring model weight checkpoints with corresponding zero-knowledge proofs, which incorporates predicates to capture incremental state changes in model weight checkpoints. Our scheme ensures both computational integrity of DNN training process and programmability, so that the uniqueness of the weight checkpoint sequence in a DNN model is preserved, ensuring the correctness of the model identity auditing. In addition, A2-DIDM also addresses privacy protections in distributed identity via a proposed method of accumulators. We systematically analyze the security and robustness of our proposed model and further evaluate the effectiveness and usability of auditing DNN model identities.

Updated: 2024-05-07 08:24:50

标题: A2-DIDM: 面向分布式DNN模型身份的隐私保护累加器启用审计

摘要: 最近，生成式人工智能（GenAI）的快速发展促进了一种新兴的模型商业化模式，以增强模型性能，例如Deep Neural Network（DNN）模型的许可或交易。然而，DNN模型交易可能引发未经授权的复制或滥用模型的担忧，从而侵犯模型所有权的利益。模型身份审计是保护DNN模型知识产权和验证模型的完整性和所有权以确保交易中的信任的一个重要障碍。在本文中，我们关注上述问题，并提出了一种新颖的基于累加器的分布式DNN模型身份审计（A2-DIDM）方法，利用区块链和零知识技术来保护数据和功能隐私，同时确保链上所有权验证的轻量级。所提出的模型通过配置与零知识证明相对应的模型权重检查点来呈现身份记录方案，其中包含捕捉模型权重检查点中增量状态变化的谓词。我们的方案确保了DNN训练过程的计算完整性和可编程性，从而保留了DNN模型中权重检查点序列的唯一性，确保了模型身份审计的正确性。此外，A2-DIDM还通过提出的累加器方法解决了分布式身份的隐私保护问题。我们系统地分析了我们提出的模型的安全性和稳健性，并进一步评估了审计DNN模型身份的有效性和可用性。

更新时间: 2024-05-07 08:24:50

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04108v1

Continual Learning in the Presence of Repetition

Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the strategy, repetition in the data stream naturally stems from the environment. This report provides a summary of the CLVision challenge at CVPR 2023, which focused on the topic of repetition in class-incremental learning. The report initially outlines the challenge objective and then describes three solutions proposed by finalist teams that aim to effectively exploit the repetition in the stream to learn continually. The experimental results from the challenge highlight the effectiveness of ensemble-based solutions that employ multiple versions of similar modules, each trained on different but overlapping subsets of classes. This report underscores the transformative potential of taking a different perspective in CL by employing repetition in the data stream to foster innovative strategy design.

Updated: 2024-05-07 08:15:48

标题: 在重复出现的情况下的持续学习

摘要: Continual learning (CL) 提供了在不断发展的环境中训练模型的框架。尽管在现实世界问题中，先前看到的对象或任务的重现是常见的，但是数据流中的重复概念在标准的CL基准中往往不被考虑。与基于缓冲区的策略中的复习机制不同，其中样本的重复由策略控制，数据流中的重复自然地源自环境。本报告总结了CVPR 2023年的CLVision挑战赛，该挑战赛专注于课程增量学习中的重复主题。报告首先概述了挑战的目标，然后描述了由入围团队提出的三种解决方案，旨在有效地利用数据流中的重复来实现持续学习。挑战的实验结果突显了基于集合的解决方案的有效性，这些解决方案利用多个类似模块的不同版本，每个版本在不同但有重叠的类别子集上进行训练。本报告强调了通过在数据流中利用重复来促进创新策略设计的转变潜力。

更新时间: 2024-05-07 08:15:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04101v1

ESP: Extro-Spective Prediction for Long-term Behavior Reasoning in Emergency Scenarios

Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous state variation in history for the emergency event, named the Extro-Spective Prediction (ESP) problem. Based on the proposed dataset, a flexible feature encoder for ESP is introduced to various prediction methods as a seamless plug-in, and its consistent performance improvement underscores its efficacy. Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds. Interestingly, as our ESP features can be described in human-readable language naturally, the application of integrating into ChatGPT also shows huge potential. The ESP-dataset and all benchmarks are released at https://dingrui-wang.github.io/ESP-Dataset/.

Updated: 2024-05-07 08:15:37

标题: ESP：紧急情况下长期行为推理的外部预测

摘要: 紧急场景安全是完全自动驾驶的关键里程碑，可靠的准时预测对于在紧急情况下保持安全至关重要。然而，这些紧急情况是长尾的且难以收集，这限制了系统获得可靠的预测。在本文中，我们建立了一个新的数据集，旨在针对历史上不显著的状态变化进行长期预测，该数据集命名为Extro-Spective Prediction（ESP）问题。基于提出的数据集，引入了一种灵活的特征编码器用于ESP，作为各种预测方法的无缝插件，并其一致的性能改进突显其功效。此外，提出了一种名为clamped temporal error（CTE）的新度量标准，以更全面地评估预测性能，特别是在亚秒级的时间敏感紧急事件中。有趣的是，由于我们的ESP特征可以自然地用人类可读的语言描述，将其整合到ChatGPT中的应用也显示出巨大潜力。ESP数据集和所有基准测试已在https://dingrui-wang.github.io/ESP-Dataset/发布。

更新时间: 2024-05-07 08:15:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04100v1

PINQI: An End-to-End Physics-Informed Approach to Learned Quantitative MRI Reconstruction

Quantitative Magnetic Resonance Imaging (qMRI) enables the reproducible measurement of biophysical parameters in tissue. The challenge lies in solving a nonlinear, ill-posed inverse problem to obtain the desired tissue parameter maps from acquired raw data. While various learned and non-learned approaches have been proposed, the existing learned methods fail to fully exploit the prior knowledge about the underlying MR physics, i.e. the signal model and the acquisition model. In this paper, we propose PINQI, a novel qMRI reconstruction method that integrates the knowledge about the signal, acquisition model, and learned regularization into a single end-to-end trainable neural network. Our approach is based on unrolled alternating optimization, utilizing differentiable optimization blocks to solve inner linear and non-linear optimization tasks, as well as convolutional layers for regularization of the intermediate qualitative images and parameter maps. This design enables PINQI to leverage the advantages of both the signal model and learned regularization. We evaluate the performance of our proposed network by comparing it with recently published approaches in the context of highly undersampled $T_1$-mapping, using both a simulated brain dataset, as well as real scanner data acquired from a physical phantom and in-vivo data from healthy volunteers. The results demonstrate the superiority of our proposed solution over existing methods and highlight the effectiveness of our method in real-world scenarios.

Updated: 2024-05-07 08:12:53

标题: PINQI：一种端到端的物理启发方法，用于学习定量MRI重建

摘要: Quantitative Magnetic Resonance Imaging (qMRI) 能够可靠地测量组织中的生物物理参数。挑战在于解决一个非线性、病态的逆问题，从获取的原始数据中获得所需的组织参数图。虽然已经提出了各种学习和非学习方法，但现有的学习方法未能充分利用关于基础磁共振物理的先验知识，即信号模型和采集模型。在本文中，我们提出了PINQI，一种新颖的qMRI重建方法，将关于信号、采集模型和学习正则化的知识集成到一个端到端可训练的神经网络中。我们的方法基于展开的交替优化，利用可微优化块来解决内部线性和非线性优化任务，以及卷积层用于正则化中间定性图像和参数图。这种设计使PINQI能够利用信号模型和学习正则化的优势。我们通过将我们提出的网络与最近发表的方法在高度欠采样的$T_1$-mapping的背景下进行比较，使用模拟的大脑数据集，以及从物理幻影和健康志愿者的实际扫描数据中获得的体内数据，评估了我们提出的网络的性能。结果表明我们提出的解决方案优于现有方法，并突出了我们的方法在真实场景中的有效性。

更新时间: 2024-05-07 08:12:53

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2306.11023v2

Binarized Simplicial Convolutional Neural Networks

Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on simplicial complexes named Binarized Simplicial Convolutional Neural Networks (Bi-SCNN) based on the combination of simplicial convolution with a binary-sign forward propagation strategy. The usage of the Hodge Laplacian on a binary-sign forward propagation enables Bi-SCNN to efficiently and effectively represent simplicial features that have higher-order structures than traditional graph node representations. Compared to the previous Simplicial Convolutional Neural Networks, the reduced model complexity of Bi-SCNN shortens the execution time without sacrificing the prediction performance and is less prone to the over-smoothing effect. Experimenting with real-world citation and ocean-drifter data confirmed that our proposed Bi-SCNN is efficient and accurate.

Updated: 2024-05-07 08:05:20

标题: 二值化单纯形卷积神经网络

摘要: 图神经网络有一个局限性，即仅处理图节点上的特征，忽略高维结构（如边和三角形）上的数据。单纯卷积神经网络（SCNN）使用单纯复合表示高阶结构，以打破这个限制，尽管仍然缺乏时间效率。本文提出了一种基于单纯复合的神经网络架构，名为二值化单纯卷积神经网络（Bi-SCNN），它结合了单纯卷积和二进制符号前向传播策略。在二进制符号前向传播中使用霍奇拉普拉斯算子使得Bi-SCNN能够有效地表示具有比传统图节点表示更高阶结构的单纯特征。与之前的单纯卷积神经网络相比，Bi-SCNN的模型复杂性降低了，缩短了执行时间，同时不损害预测性能，也不容易出现过度平滑效果。实验证实，我们提出的Bi-SCNN在处理实际的引用和海洋漂流数据时既高效又准确。

更新时间: 2024-05-07 08:05:20

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.04098v1

Parameter uncertainties for imperfect surrogate models in the low-noise regime

Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, the loss ignores misspecification, where models are imperfect. Parameter uncertainties from Bayesian regression are thus significantly underestimated and vanish in the large data limit. This is particularly problematic when building models of low-noise, or near-deterministic, calculations, as the main source of uncertainty is neglected. We analyze the generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show posterior distributions must cover every training point to avoid a divergent generalization error and design an ansatz that respects this constraint, which for linear models incurs minimal overhead. This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors where existing schemes fail, allowing this important source of uncertainty to be incorporated in computational workflows.

Updated: 2024-05-07 08:03:42

标题: 低噪声条件下不完美代理模型的参数不确定性

摘要: 贝叶斯回归通过最小化期望损失来确定模型参数，这是真实泛化误差的一个上限。然而，该损失忽略了错误规范，即模型是不完美的。因此，贝叶斯回归中的参数不确定性被严重低估，并在大数据极限下消失。当构建低噪声或接近确定性计算的模型时，这一点尤为棘手，因为主要的不确定性来源被忽视了。我们分析了错误规范、接近确定性的替代模型的泛化误差，这在科学和工程领域具有广泛的相关性。我们展示后验分布必须覆盖每个训练点，以避免发散的泛化误差，并设计了一个符合这一约束的假设，对于线性模型而言，带来最小的额外开销。我们在模型问题上进行了演示，然后应用于千维数据集的原子机器学习。我们的高效错误规范感知方案能够在现有方案失败时准确预测和限定测试误差，从而允许将这一重要的不确定性来源纳入计算工作流程中。

更新时间: 2024-05-07 08:03:42

领域: stat.ML,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2402.01810v3

Families of sequences with good family complexity and cross-correlation measure

In this paper we study pseudorandomness of a family of sequences in terms of two measures, the family complexity ($f$-complexity) and the cross-correlation measure of order $\ell$. We consider sequences not only on binary alphabet but also on $k$-symbols ($k$-ary) alphabet. We first generalize some known methods on construction of the family of binary pseudorandom sequences. We prove a bound on the $f$-complexity of a large family of binary sequences of Legendre-symbols of certain irreducible polynomials. We show that this family as well as its dual family have both a large family complexity and a small cross-correlation measure up to a rather large order. Next, we present another family of binary sequences having high $f$-complexity and low cross-correlation measure. Then we extend the results to the family of sequences on $k$-symbols alphabet.

Updated: 2024-05-07 08:03:03

标题: 具有良好家庭复杂度和交叉相关度量的序列家族

摘要: 在这篇论文中，我们研究了一类序列的伪随机性，通过两个度量来衡量，即家族复杂度（$f$-complexity）和阶为$\ell$的互相关度量。我们不仅考虑二进制字母表上的序列，还考虑$k$个符号（$k$-ary）字母表上的序列。我们首先推广了一些已知方法，构建了二进制伪随机序列家族。我们证明了一大类二进制序列（特定不可约多项式的Legendre符号）的$f$-complexity的上界。我们展示了这个家族及其对偶家族在相当大的阶数上都具有较大的家族复杂度和较小的互相关度量。接下来，我们提出了另一个具有高$f$-complexity和低互相关度量的二进制序列家族。然后，我们将结果扩展到$k$个符号字母表上的序列家族。

更新时间: 2024-05-07 08:03:03

领域: cs.IT,cs.CR,math.IT,math.NT,11K45, 94A55, 94A60

下载: http://arxiv.org/abs/2004.13938v3

Boolean Variation and Boolean Logic BackPropagation

The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.

Updated: 2024-05-07 08:02:37

标题: 布尔变异和布尔逻辑反向传播

摘要: 本文引入了布尔集合的变异概念，并基于此发展了布尔逻辑反向传播原理。利用这一概念，可以构建深度模型，其中权重和激活值为布尔数，并使用布尔逻辑而非实数算术进行操作。特别地，布尔深度模型可以直接在布尔域中进行训练，无需潜在权重。没有梯度，只有逻辑被合成并通过层进行反向传播。

更新时间: 2024-05-07 08:02:37

领域: cs.LG,cs.DM,cs.LO,math.OC

下载: http://arxiv.org/abs/2311.07427v2

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

Updated: 2024-05-07 07:57:15

标题: 揭开幻象：理解人类对音视频深度伪造的知觉

摘要: 当代深度伪造技术的出现引起了机器学习研究的重视，因为人工智能（AI）生成的合成媒体增加了误解的发生，并且很难与真实内容区分开来。目前，机器学习技术已被广泛研究用于自动检测深度伪造。然而，人类感知力却较少被探讨。恶意深度伪造最终可能导致公共和社会问题。我们人类能够正确感知我们观看的视频内容的真实性吗？答案显然不确定；因此，本文旨在通过主观研究评估人类辨别深度伪造视频的能力。我们通过将人类观察者与五种最先进的视听深度伪造检测模型进行比较来呈现我们的发现。为此，我们使用游戏化概念为110名参与者（55名母语为英语的人和55名非英语为母语的人）提供了一个基于网络的平台，他们可以访问一系列40个视频（20个真实和20个伪造）来确定其真实性。每个参与者在不同的随机顺序中两次进行了实验，观看相同的40个视频。这些视频是手动选择的来自FakeAVCeleb数据集。我们发现，在相同的40个视频上评估时，所有AI模型的表现都优于人类。研究还揭示，虽然欺骗并非不可能，但人类往往高估了他们的检测能力。我们的实验结果可能有助于衡量人类与机器的表现，推进取证分析，并促进自适应对策的发展。

更新时间: 2024-05-07 07:57:15

领域: cs.CV,cs.AI,cs.CY,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.04097v1

Going Proactive and Explanatory Against Malware Concept Drift

Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs. To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.

Updated: 2024-05-07 07:55:45

标题: 应对恶意软件概念漂移的积极主动和解释性措施

摘要: 基于深度学习的恶意软件分类器面临重大挑战，原因是概念漂移。恶意软件的快速演变，尤其是新家族的出现，可能导致分类准确率降低到接近随机水平。先前的研究主要集中在检测漂移样本，依赖于专家主导的分析和标记以进行模型重新训练。然而，这些方法经常缺乏对恶意软件概念的全面理解，并为有效的漂移适应提供有限的指导，导致检测性能不稳定和高人工标记成本。为了解决这些限制，我们引入了DREAM，这是一个设计用来超越现有漂移检测器能力并建立解释性漂移适应过程的新系统。DREAM通过模型敏感性和数据自治性增强漂移检测。该检测器采用半监督方法进行训练，通过分类器反馈主动捕获恶意软件行为概念。在测试过程中，它利用检测器自身生成的样本，消除了对大量训练数据的依赖。对于漂移适应，DREAM扩大了人类干预，使恶意软件标签和概念解释嵌入到检测器的潜在空间中进行修订。为了确保对概念漂移的全面响应，它促进了分类器和检测器的协调更新过程。我们的评估表明，DREAM能够有效提高漂移检测准确性，并降低在不同恶意软件数据集和分类器之间适应过程中的专家分析工作量。

更新时间: 2024-05-07 07:55:45

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04095v1

DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects

Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features. Experimental results demonstrated that using DCNN as the backbone network for classifying certain fine-grained benchmark datasets achieved performance advantage improvements of 13.5--19.5% and 2.2--12.9%, respectively, compared to other advanced convolution or attention-based fine-grained backbones.

Updated: 2024-05-07 07:51:28

标题: DCNN：使用交互式深度学习鉴别器实现的双交叉神经网络，用于细粒度对象

摘要: 细粒度图像的准确分类仍然是基于卷积操作或自注意机制的骨干网络中的挑战。本研究提出了一种新颖的双流神经网络（DCNN），结合了卷积操作和自注意机制的优势，以提高细粒度图像分类的准确性。构建弱监督学习骨干模型DCNN的主要新颖设计特点包括：（a）提取异构数据，（b）保持特征图分辨率不变，（c）扩大感受野，（d）融合全局表示和局部特征。实验结果表明，将DCNN作为骨干网络用于分类特定细粒度基准数据集，性能优势分别比其他先进的卷积或注意力基础的细粒度骨干提高了13.5-19.5%和2.2-12.9%。

更新时间: 2024-05-07 07:51:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04093v1

Language Models as Black-Box Optimizers for Vision-Language Models

Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data. However, many VLMs rely on proprietary data and are not open-source, which restricts the use of white-box approaches for fine-tuning. As such, we aim to develop a black-box approach to optimize VLMs through natural language prompts, thereby avoiding the need to access model parameters, feature embeddings, or even output logits. We propose employing chat-based LLMs to search for the best text prompt for VLMs. Specifically, we adopt an automatic hill-climbing procedure that converges to an effective prompt by evaluating the performance of current prompts and asking LLMs to refine them based on textual feedback, all within a conversational process without human-in-the-loop. In a challenging 1-shot image classification setup, our simple approach surpasses the white-box continuous prompting method (CoOp) by an average of 1.5% across 11 datasets including ImageNet. Our approach also outperforms both human-engineered and LLM-generated prompts. We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search. In addition, we find that the text prompts generated through our strategy are not only more interpretable but also transfer well across different VLM architectures in a black-box manner. Lastly, we apply our framework to optimize the state-of-the-art black-box VLM (DALL-E 3) for text-to-image generation, prompt inversion, and personalization.

Updated: 2024-05-07 07:47:38

标题: 语言模型作为视觉-语言模型的黑盒优化器

摘要: 视觉语言模型（VLMs）在经过预训练的大规模网络数据集上展现出了出色的能力，在使用最少数据进行微调时，在下游任务上表现出色。然而，许多VLMs依赖于专有数据，不是开源的，这限制了使用白盒方法进行微调。因此，我们的目标是通过自然语言提示开发一种黑盒方法来优化VLMs，从而避免访问模型参数、特征嵌入甚至输出logits的需要。我们提议采用基于聊天的LLMs来搜索最佳的文本提示给VLMs。具体而言，我们采用自动爬山过程，通过评估当前提示的性能并要求LLMs根据文本反馈进行改进，从而收敛到一个有效的提示，所有这些都在一个对话过程中进行，无需人为干预。在一个具有挑战性的1-shot图像分类设置中，我们的简单方法在包括ImageNet在内的11个数据集上平均超过了白盒连续提示方法（CoOp）1.5%。我们的方法也优于人工设计和LLM生成的提示。我们强调了结合正面和负面提示的对话反馈的优势，表明LLMs可以利用文本反馈中的隐含梯度方向进行更高效的搜索。此外，我们发现通过我们的策略生成的文本提示不仅更易解释，而且在黑盒方式下在不同VLM架构之间具有良好的传递性。最后，我们将我们的框架应用于优化最先进的黑盒VLM（DALL-E 3）用于文本到图像生成、提示反转和个性化。

更新时间: 2024-05-07 07:47:38

领域: cs.CL,cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2309.05950v4

Counterfactual and Semifactual Explanations in Abstract Argumentation: Formal Foundations, Complexity and Computation

Explainable Artificial Intelligence and Formal Argumentation have received significant attention in recent years. Argumentation-based systems often lack explainability while supporting decision-making processes. Counterfactual and semifactual explanations are interpretability techniques that provide insights into the outcome of a model by generating alternative hypothetical instances. While there has been important work on counterfactual and semifactual explanations for Machine Learning models, less attention has been devoted to these kinds of problems in argumentation. In this paper, we explore counterfactual and semifactual reasoning in abstract Argumentation Framework. We investigate the computational complexity of counterfactual- and semifactual-based reasoning problems, showing that they are generally harder than classical argumentation problems such as credulous and skeptical acceptance. Finally, we show that counterfactual and semifactual queries can be encoded in weak-constrained Argumentation Framework, and provide a computational strategy through ASP solvers.

Updated: 2024-05-07 07:27:27

标题: 抽象论证中的反事实和半事实解释：形式基础、复杂性和计算

摘要: 可解释的人工智能和形式论证近年来受到了重视。基于论证的系统通常在支持决策过程时缺乏可解释性。对事实和半事实的解释是提供模型结果洞察的可解释性技术，通过生成替代的假设实例。虽然对机器学习模型的事实和半事实解释已经有重要的工作，但在论证中对这些问题的关注较少。在本文中，我们探讨了抽象论证框架中的事实和半事实推理。我们研究了基于事实和半事实推理问题的计算复杂性，表明它们通常比诸如信任和怀疑接受等经典论证问题更难。最后，我们展示了事实和半事实查询可以在弱约束的论证框架中进行编码，并通过ASP求解器提供了一个计算策略。

更新时间: 2024-05-07 07:27:27

领域: cs.AI

下载: http://arxiv.org/abs/2405.04081v1

Polynomial XL: A Variant of the XL Algorithm Using Macaulay Matrices over Polynomial Rings

Solving a system of $m$ multivariate quadratic equations in $n$ variables over finite fields (the MQ problem) is one of the important problems in the theory of computer science. The XL algorithm (XL for short) is a major approach for solving the MQ problem with linearization over a coefficient field. Furthermore, the hybrid approach with XL (h-XL) is a variant of XL guessing some variables beforehand. In this paper, we present a variant of h-XL, which we call the \textit{polynomial XL (PXL)}. In PXL, the whole $n$ variables are divided into $k$ variables to be fixed and the remaining $n-k$ variables as ``main variables'', and we generate a Macaulay matrix with respect to the $n-k$ main variables over a polynomial ring of the $k$ (sub-)variables. By eliminating some columns of the Macaulay matrix over the polynomial ring before guessing $k$ variables, the amount of operations required for each guessed value can be reduced compared with h-XL. Our complexity analysis of PXL (under some practical assumptions and heuristics) gives a new theoretical bound, and it indicates that PXL could be more efficient than other algorithms in theory on the random system with $n=m$, which is the case of general multivariate signatures. For example, on systems over the finite field with ${2^8}$ elements with $n=m=80$, the numbers of operations deduced from the theoretical bounds of the hybrid approaches with XL and Wiedemann XL, Crossbred, and PXL with optimal $k$ are estimated as $2^{252}$, $2^{234}$, $2^{237}$, and $2^{220}$, respectively.

Updated: 2024-05-07 07:24:37

标题: 多项式XL：使用多项式环上的马考莱矩阵的XL算法的变种

摘要: 解决在有限域上的$n$个变量中的$m$个多元二次方程组（MQ问题）是计算机科学理论中的重要问题之一。 XL算法（简称XL）是解决MQ问题的一个主要方法，线性化系数场。此外，具有XL的混合方法（h-XL）是XL的一个变体，事先猜测一些变量。在本文中，我们提出了h-XL的一个变体，称为\textit {多项式XL（PXL）}。在PXL中，将整个$n$个变量分为$k$个要固定的变量和剩余的$n-k$个变量作为“主要变量”，并且在$k$（子）变量的多项式环上生成与$n-k$主要变量相关的Macaulay矩阵。通过在猜测$k$个变量之前在多项式环上消除Macaulay矩阵的某些列，与h-XL相比，可以减少每个猜测值所需的操作数量。我们对PXL的复杂性分析（在一些实际假设和启发式条件下）提供了一个新的理论界限，并且表明在$n=m$的随机系统上，PXL在理论上可能比其他算法更有效，这是一般多元签名的情况。例如，在具有${2^8}$个元素的有限域上的系统上，其中$n=m=80$，根据具有XL和Wiedemann XL、Crossbred和PXL的混合方法的理论界限，最优$k$的操作次数分别估计为$2^{252}$、$2^{234}$、$2^{237}$和$2^{220}$。

更新时间: 2024-05-07 07:24:37

领域: cs.SC,cs.CR,math.AC

下载: http://arxiv.org/abs/2112.05023v2

WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer

Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response.

Updated: 2024-05-07 07:21:20

标题: WISER：弱监督和监督表示学习以改善癌症药物反应预测

摘要: 癌症是全球主要死因之一，由基因组变化引起，并在患者之间表现出异质性。为了推进个性化治疗策略的研究，实验室中确定各种药物对来源于癌症的细胞（`细胞系'）的有效性。然而，由于生物和环境差异，细胞系和人类之间基因组数据和药物反应的分布变化。此外，虽然许多癌症患者的基因组文件readily available，但相应药物反应数据的稀缺限制了能够有效预测患者药物反应的机器学习模型的能力。最近的癌症药物反应预测方法主要遵循了无监督域不变表示学习范式，然后是下游药物反应分类步骤。在两个阶段同时引入监督是具有挑战性的，原因是患者对药物的反应异质性和药物反应数据有限。本文通过第一阶段的一种新颖的表示学习方法和第二阶段的弱监督来解决这些挑战。在真实患者数据上的实验结果证明了我们的方法（WISER）在预测个性化药物反应方面优于最先进的替代方案。

更新时间: 2024-05-07 07:21:20

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2405.04078v1

A simple theory for training response of deep neural networks

Deep neural networks give us a powerful method to model the training dataset's relationship between input and output. We can regard that as a complex adaptive system consisting of many artificial neurons that work as an adaptive memory as a whole. The network's behavior is training dynamics with a feedback loop from the evaluation of the loss function. We already know the training response can be constant or shows power law-like aging in some ideal situations. However, we still have gaps between those findings and other complex phenomena, like network fragility. To fill the gap, we introduce a very simple network and analyze it. We show the training response consists of some different factors based on training stages, activation functions, or training methods. In addition, we show feature space reduction as an effect of stochastic training dynamics, which can result in network fragility. Finally, we discuss some complex phenomena of deep networks.

Updated: 2024-05-07 07:20:15

标题: 一个关于深度神经网络训练响应的简单理论

摘要: 深度神经网络为我们提供了一种强大的方法来建模训练数据集中输入和输出之间的关系。我们可以将其视为由许多人工神经元组成的复杂自适应系统，整体上作为自适应记忆。网络的行为是训练动态，其中包括来自损失函数评估的反馈回路。我们已经知道，在一些理想情况下，训练响应可以是恒定的或显示类似幂律的老化。然而，我们仍然存在这些发现和其他复杂现象之间的差距，比如网络脆弱性。为了填补这一差距，我们引入了一个非常简单的网络并对其进行分析。我们展示了训练响应根据训练阶段、激活函数或训练方法等不同因素而不同。此外，我们展示了特征空间的缩减作为随机训练动态的一个效果，这可能导致网络脆弱性。最后，我们讨论了深度网络的一些复杂现象。

更新时间: 2024-05-07 07:20:15

领域: cond-mat.dis-nn,cs.AI,cs.LG,nlin.AO

下载: http://arxiv.org/abs/2405.04074v1

Quantum Unpredictability

Unpredictable functions (UPFs) play essential roles in classical cryptography, including message authentication codes (MACs) and digital signatures. In this paper, we introduce a quantum analog of UPFs, which we call unpredictable state generators (UPSGs). UPSGs are implied by pseudorandom function-like states generators (PRFSs), which are a quantum analog of pseudorandom functions (PRFs), and therefore UPSGs could exist even if one-way functions do not exist, similar to other recently introduced primitives like pseudorandom state generators (PRSGs), one-way state generators (OWSGs), and EFIs. In classical cryptography, UPFs are equivalent to PRFs, but in the quantum case, the equivalence is not clear, and UPSGs could be weaker than PRFSs. Despite this, we demonstrate that all known applications of PRFSs are also achievable with UPSGs. They include IND-CPA-secure secret-key encryption and EUF-CMA-secure MACs with unclonable tags. Our findings suggest that, for many applications, quantum unpredictability, rather than quantum pseudorandomness, is sufficient.

Updated: 2024-05-07 07:19:25

标题: 量子不可预测性

摘要: 不可预测功能（UPFs）在经典密码学中扮演着重要角色，包括消息认证码（MACs）和数字签名。在本文中，我们介绍了UPFs的量子模拟，我们称之为不可预测状态生成器（UPSGs）。UPSGs由伪随机函数状态生成器（PRFSs）所隐含，这是伪随机函数（PRFs）的量子模拟，因此即使单向函数不存在，UPSGs也可能存在，类似于最近引入的其他原语，如伪随机状态生成器（PRSGs）、单向状态生成器（OWSGs）和EFIs。在经典密码学中，UPFs等效于PRFs，但在量子情况下，等效性并不明确，UPSGs可能比PRFSs更弱。尽管如此，我们证明了所有已知的PRFSs应用也可以通过UPSGs实现。它们包括IND-CPA安全的秘密密钥加密和具有不可克隆标签的EUF-CMA安全的MACs。我们的发现表明，对于许多应用来说，量子不可预测性足够，而不是量子伪随机性。

更新时间: 2024-05-07 07:19:25

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2405.04072v1

An Improved Reversible Data Hiding Algorithm Based on Reconstructed Mapping for PVO-k

Reversible Data Hiding (RDH) is a practical and efficient technique for information encryption. Among its methods, the Pixel-Value Ordering (PVO) algorithm and its variants primarily modify prediction errors to embed information. However, both the classic PVO and its improved versions, such as IPVO and PVO-k, share a common limitation: their maximum data embedding capacity for a given grayscale image is relatively low. This poses a challenge when large amounts of data need to be embedded into an image. In response to these issues, this paper proposes an improved design targeting the PVO-k algorithm. We have reconstructed the mapping scheme of the PVO-k algorithm to maximize the number of pixels that can embed encrypted information. Experimental validations show that our proposed scheme significantly surpasses previous algorithms in terms of the maximum data embedding capacity. For instance, when embedding information into a grayscale image of an airplane, our method's capacity exceeds that of PVO-k by 11,207 bits, PVO by 8,004 bits, and IPVO by 4,562 bits. The results demonstrate that our algorithm holds substantial advantages over existing methods and introduces innovative mapping ideas, laying a foundation for future research in reversible data hiding in images.

Updated: 2024-05-07 07:15:53

标题: 一种基于重建映射的改进型PVO-k可逆数据隐藏算法

摘要: 可逆数据隐藏（RDH）是一种实用和高效的信息加密技术。在其方法中，像素值排序（PVO）算法及其变种主要修改预测误差以嵌入信息。然而，经典的PVO及其改进版本，如IPVO和PVO-k，都存在一个共同的限制：对于给定的灰度图像，它们的最大数据嵌入容量相对较低。当需要将大量数据嵌入到图像中时，这就构成了一个挑战。为了应对这些问题，本文提出了一个针对PVO-k算法的改进设计。我们重新构建了PVO-k算法的映射方案，以最大化可以嵌入加密信息的像素数量。实验证实，我们提出的方案在最大数据嵌入容量方面显著超过了先前的算法。例如，在将信息嵌入到一幅飞机的灰度图像中时，我们的方法容量超过了PVO-k 11,207比特，PVO 8,004比特，以及IPVO 4,562比特。结果表明，我们的算法在现有方法上具有明显优势，并引入了创新的映射思想，为将来在图像中的可逆数据隐藏领域的研究奠定了基础。

更新时间: 2024-05-07 07:15:53

领域: cs.CR

下载: http://arxiv.org/abs/2405.04068v1

MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2) F-CNNs can only obtain surrounding information through local receptive fields. To address the above challenge, we propose a new segmentation framework based on attention mechanisms, named MFA-Net (Multi-Scale Feature Fusion Attention Network). The proposed framework can learn more meaningful feature maps among multiple scales and result in more accurate automatic segmentation. We compare our proposed MFA-Net with SOTA methods on two 2D liver CT datasets. The experimental results show that our MFA-Net produces more precise segmentation on images with different scales.

Updated: 2024-05-07 07:10:44

标题: MFA-Net：肝肿瘤分割的多尺度特征融合注意力网络

摘要: 在医学CT图像中感兴趣器官的分割有助于疾病的诊断。尽管基于全卷积神经网络（F-CNNs）的最新方法在许多分割任务中取得了成功，但融合不同尺度图像的特征仍然是一个挑战：（1）由于缺乏空间意识，F-CNNs在不同空间位置共享相同的权重。（2）F-CNNs只能通过局部感受野获取周围信息。为了解决上述挑战，我们提出了一种基于注意机制的新的分割框架，名为MFA-Net（Multi-Scale Feature Fusion Attention Network）。所提出的框架可以在多个尺度之间学习更有意义的特征图，并导致更准确的自动分割。我们将我们提出的MFA-Net与SOTA方法在两个2D肝脏CT数据集上进行了比较。实验结果表明，我们的MFA-Net在不同尺度的图像上产生更精确的分割。

更新时间: 2024-05-07 07:10:44

领域: cs.AI

下载: http://arxiv.org/abs/2405.04064v1

Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning

Autonomous vehicles (AVs) have the potential to prevent accidents caused by drivers errors and reduce road traffic risks. Due to the nature of heavy vehicles, whose collisions cause more serious crashes, the weights of vehicles need to be considered when making driving strategies aimed at reducing the potential risks and their consequences in the context of autonomous driving. This study develops an autonomous driving strategy based on risk anticipation, considering the weights of surrounding vehicles and using hierarchical deep reinforcement learning. A risk indicator integrating surrounding vehicles weights, based on the risk field theory, is proposed and incorporated into autonomous driving decisions. A hybrid action space is designed to allow for left lane changes, right lane changes and car-following, which enables AVs to act more freely and realistically whenever possible. To solve the above hybrid decision-making problem, a hierarchical proximal policy optimization (HPPO) algorithm with an attention mechanism (AT-HPPO) is developed, providing great advantages in maintaining stable performance with high robustness and generalization. An indicator, potential collision energy in conflicts (PCEC), is newly proposed to evaluate the performance of the developed AV driving strategy from the perspective of the consequences of potential accidents. The performance evaluation results in simulation and dataset demonstrate that our model provides driving strategies that reduce both the likelihood and consequences of potential accidents, at the same time maintaining driving efficiency. The developed method is especially meaningful for AVs driving on highways, where heavy vehicles make up a high proportion of the traffic.

Updated: 2024-05-07 07:07:59

标题: 考虑车辆重量的风险预测自动驾驶策略，基于分层深度强化学习

摘要: 自动驾驶车辆（AVs）有潜力预防由驾驶员错误引起的事故，并减少道路交通风险。由于重型车辆的性质，其碰撞导致更严重的事故，因此在制定旨在降低潜在风险及其后果的驾驶策略时，需要考虑车辆的重量。本研究基于风险预测开发了一种自动驾驶策略，考虑周围车辆的重量，并使用分层深度强化学习。提出了一个基于风险场理论的整合周围车辆重量的风险指标，并将其纳入自动驾驶决策中。设计了一个混合动作空间，允许左车道变道、右车道变道和跟车，这使得AVs在可能的情况下能够更自由、更现实地行动。为解决上述混合决策问题，开发了一种带有注意机制（AT-HPPO）的分层近端策略优化（HPPO）算法，具有在保持高稳定性和泛化性能方面的巨大优势。提出了一个新指标，冲突中潜在碰撞能量（PCEC），用于从潜在事故后果的角度评估所开发的AV驾驶策略的性能。通过模拟和数据集的性能评估结果表明，我们的模型提供了既降低潜在事故发生可能性又降低事故后果的驾驶策略，同时保持驾驶效率。所开发的方法对于高速公路上行驶的AVs尤为重要，因为重型车辆占交通量的很大比例。

更新时间: 2024-05-07 07:07:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2401.08661v2

Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Divergence measures play a central role in machine learning and become increasingly essential in deep learning. However, valid and computationally efficient divergence measures for multiple (more than two) distributions are scarcely investigated. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both unavoidable and essential. Examples include clustering, multi-source domain adaptation or generalization, and multi-view learning, among others. Although calculating the mean of pairwise distances between any two distributions serves as a common way to quantify the total divergence among multiple distributions, it is crucial to acknowledge that this approach is not straightforward and requires significant computational resources. In this study, we introduce a new divergence measure for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD), which is inspired by the classic Cauchy-Schwarz divergence. Additionally, we provide a closed-form sample estimator based on kernel density estimation, making it convenient and straightforward to use in various machine-learning applications. Finally, we apply the proposed GCSD to two challenging machine learning tasks, namely deep learning-based clustering and the problem of multi-source domain adaptation. The experimental results showcase the impressive performance of GCSD in both tasks, highlighting its potential application in machine-learning areas that involve quantifying multiple distributions.

Updated: 2024-05-07 07:07:44

标题: Generalized Cauchy-Schwarz散度及其深度学习应用

摘要: 发散度量在机器学习中起着核心作用，并在深度学习中变得越来越重要。然而，对于多个（超过两个）分布而言，有效且计算效率高的发散度量研究得很少。这在需要同时管理多个分布不可避免且至关重要的领域中变得尤为关键。例如聚类、多源域自适应或泛化以及多视图学习等领域。尽管计算任意两个分布之间的平均距离作为量化多个分布之间总发散性的常见方法，但需要认识到这种方法并不直接，并且需要大量计算资源。在本研究中，我们引入了一种针对多个分布的新的发散度量，名为广义柯西-施瓦茨发散度（GCSD），受经典柯西-施瓦茨发散度启发而来。此外，我们提供了基于核密度估计的闭合式样本估计器，使其在各种机器学习应用中使用方便和直接。最后，我们将提出的GCSD应用于两个具有挑战性的机器学习任务，即基于深度学习的聚类和多源域自适应问题。实验结果展示了GCSD在这两个任务中的出色表现，突显了其在涉及量化多个分布的机器学习领域中的潜在应用。

更新时间: 2024-05-07 07:07:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04061v1

Bidirectional Adversarial Autoencoders for the design of Plasmonic Metasurfaces

Deep Learning has been a critical part of designing inverse design methods that are computationally efficient and accurate. An example of this is the design of photonic metasurfaces by using their photoluminescent spectrum as the input data to predict their topology. One fundamental challenge of these systems is their ability to represent nonlinear relationships between sets of data that have different dimensionalities. Existing design methods often implement a conditional Generative Adversarial Network in order to solve this problem, but in many cases the solution is unable to generate structures that provide multiple peaks when validated. It is demonstrated that in response to the target spectrum, the Bidirectional Adversarial Autoencoder is able to generate structures that provide multiple peaks on several occasions. As a result the proposed model represents an important advance towards the generation of nonlinear photonic metasurfaces that can be used in advanced metasurface design.

Updated: 2024-05-07 06:57:42

标题: 双向对抗自编码器用于等离激元超表面设计

摘要: 深度学习已经成为设计反向设计方法中的关键部分，这些方法在计算效率和精确度方面都表现出色。一个例子是利用光致发光光谱作为输入数据来预测光子超表面的拓扑结构。这些系统面临的一个基本挑战是能够表示不同维度数据之间的非线性关系。现有的设计方法通常实现一个条件生成对抗网络来解决这个问题，但在许多情况下，解决方案无法生成验证时提供多个峰值的结构。实验证明，双向对抗自动编码器能够根据目标光谱生成在多个场合提供多个峰值的结构。因此，所提出的模型代表了朝着生成可用于先进超表面设计的非线性光子超表面的重要进展。

更新时间: 2024-05-07 06:57:42

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2405.04056v1

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.

Updated: 2024-05-07 06:52:34

标题: 使用OpenAI的GPT评估由大型语言模型生成的文本摘要

摘要: 这项研究检验了OpenAI的GPT模型作为独立评估者的有效性，用于评估由Hugging Face的六个基于transformer的模型生成的文本摘要：DistilBART、BERT、ProphetNet、T5、BART和PEGASUS。我们基于高质量摘要的基本属性-简洁性、相关性、连贯性和可读性-使用传统的ROUGE和潜在语义分析（LSA）等指标对这些摘要进行了评估。独特的是，我们将GPT不作为摘要器，而是作为评估者，允许其独立评估摘要质量而无需预定义的指标。我们的分析揭示了GPT评估和传统指标之间的显著相关性，特别是在评估相关性和连贯性方面。结果表明GPT作为评估文本摘要的强大工具的潜力，提供了补充已建立指标的见解，并为自然语言处理任务中基于transformer的模型的比较分析提供了基础。

更新时间: 2024-05-07 06:52:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04053v1

Learning Linear Block Error Correction Codes

Error correction codes are a crucial part of the physical communication layer, ensuring the reliable transfer of data over noisy channels. The design of optimal linear block codes capable of being efficiently decoded is of major concern, especially for short block lengths. While neural decoders have recently demonstrated their advantage over classical decoding techniques, the neural design of the codes remains a challenge. In this work, we propose for the first time a unified encoder-decoder training of binary linear block codes. To this end, we adapt the coding setting to support efficient and differentiable training of the code for end-to-end optimization over the order two Galois field. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient. Our results show that (i) the proposed decoder outperforms existing neural decoding on conventional codes, (ii) the suggested framework generates codes that outperform the {analogous} conventional codes, and (iii) the codes we developed not only excel with our decoder but also show enhanced performance with traditional decoding techniques.

Updated: 2024-05-07 06:47:12

标题: 学习线性分组纠错码

摘要: 错误校正码是物理通信层的重要组成部分，确保数据在嘈杂信道上可靠传输。设计能够高效解码的最佳线性块码对于短块长度尤为关键。尽管神经解码器最近展示出优于经典解码技术的优势，但编码的神经设计仍然是一个挑战。在这项工作中，我们首次提出了二进制线性块码的统一编码器-解码器训练。为此，我们调整编码设置以支持端到端优化的高效可微训练，覆盖二阶伽罗瓦域。我们还提出了一种新颖的Transformer模型，其中自注意掩模以可微分方式执行，以便对代码梯度进行高效反向传播。我们的结果表明，（i）所提出的解码器优于现有神经解码器对传统编码的性能，（ii）建议的框架生成的代码优于类似的传统代码，（iii）我们开发的代码不仅在我们的解码器中表现出色，而且在传统解码技术中也显示出增强的性能。

更新时间: 2024-05-07 06:47:12

领域: cs.IT,cs.AI,math.IT

下载: http://arxiv.org/abs/2405.04050v1

FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.

Updated: 2024-05-07 06:44:46

标题: FLIP: 为CTR预测实现ID-based模型与预训练语言模型的细粒度对齐

摘要: 点击率（CTR）预测在各种个性化在线服务中扮演核心功能模块的角色。传统的基于ID的CTR预测模型将表格模态的one-hot编码ID特征作为输入，通过特征交互建模捕捉协同信号。但是，one-hot编码丢弃了原始特征文本中所包含的语义信息。最近，预训练语言模型（PLMs）的出现催生了另一种范式，即将通过硬提示模板获得的文本模态句子作为输入，并采用PLMs提取语义知识。然而，PLMs通常将输入文本数据标记为子词标记，并忽略了字段级的协同信号。因此，这两条研究线关注相同输入数据（即文本和表格模态）的不同特征，形成了明显的互补关系。在本文中，我们提出进行基于ID的模型和预训练语言模型（FLIP）的精细特征级对齐用于CTR预测。我们为掩码语言和表格建模设计了一项新颖的联合重建预训练任务。具体来说，一种模态的掩码数据（即标记或特征）必须在另一种模态的帮助下恢复，从而通过双模态之间的充分互信息提取建立特征级互动和对齐。此外，我们提出对基于ID的模型和PLM进行联合微调，用于下游CTR预测任务，从而通过结合两种模型的优势实现卓越性能。对三个真实数据集的大量实验表明，FLIP优于SOTA基线，并且非常兼容各种基于ID的模型和PLMs。代码位于\url{https://github.com/justarter/FLIP}。

更新时间: 2024-05-07 06:44:46

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.19453v3

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

As Earth science enters the era of big data, artificial intelligence (AI) not only offers great potential for solving geoscience problems, but also plays a critical role in accelerating the understanding of the complex, interactive, and multiscale processes of Earth's behavior. As geoscience AI models are progressively utilized for significant predictions in crucial situations, geoscience researchers are increasingly demanding their interpretability and versatility. This study proposes an interpretable geoscience artificial intelligence (XGeoS-AI) framework to unravel the mystery of image recognition in the Earth sciences, and its effectiveness and versatility is demonstrated by taking computed tomography (CT) image recognition as an example. Inspired by the mechanism of human vision, the proposed XGeoS-AI framework generates a threshold value from a local region within the whole image to complete the recognition. Different kinds of artificial intelligence (AI) methods, such as Support Vector Regression (SVR), Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), can be adopted as the AI engines of the proposed XGeoS-AI framework to efficiently complete geoscience image recognition tasks. Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems. Interpretable AI should receive more and more attention in the field of the Earth sciences, which is the key to promoting more rational and wider applications of AI in the field of Earth sciences. In addition, the proposed interpretable framework may be the forerunner of technological innovation in the Earth sciences.

Updated: 2024-05-07 06:44:01

标题: 可解释的地球科学人工智能（XGeoS-AI）：应用于解析图像识别

摘要: 随着地球科学进入大数据时代，人工智能不仅为解决地球科学问题提供了巨大潜力，而且在加速理解地球行为复杂、互动和多尺度过程中发挥了关键作用。随着地球科学人工智能模型逐渐用于关键情况下的重要预测，地球科学研究人员对其可解释性和多功能性的需求也在增加。本研究提出了一个可解释的地球科学人工智能（XGeoS-AI）框架，以揭示地球科学中图像识别的奥秘，并以计算机断层扫描（CT）图像识别为例展示其有效性和多功能性。受人类视觉机制启发，提出的XGeoS-AI框架从整个图像中的局部区域生成阈值以完成识别。不同类型的人工智能（AI）方法，如支持向量回归（SVR）、多层感知器（MLP）、卷积神经网络（CNN），可以作为提出的XGeoS-AI框架的AI引擎，高效完成地球科学图像识别任务。实验结果表明，提出的框架的有效性、多功能性和启发性在解决地球科学图像识别问题方面具有巨大潜力。在地球科学领域，可解释的人工智能应该受到越来越多的关注，这是促进人工智能在地球科学领域更加合理和广泛应用的关键。此外，提出的可解释框架可能是地球科学技术创新的先驱。

更新时间: 2024-05-07 06:44:01

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2311.04940v2

Watermarking Neuromorphic Brains: Intellectual Property Protection in Spiking Neural Networks

As spiking neural networks (SNNs) gain traction in deploying neuromorphic computing solutions, protecting their intellectual property (IP) has become crucial. Without adequate safeguards, proprietary SNN architectures are at risk of theft, replication, or misuse, which could lead to significant financial losses for the owners. While IP protection techniques have been extensively explored for artificial neural networks (ANNs), their applicability and effectiveness for the unique characteristics of SNNs remain largely unexplored. In this work, we pioneer an investigation into adapting two prominent watermarking approaches, namely, fingerprint-based and backdoor-based mechanisms to secure proprietary SNN architectures. We conduct thorough experiments to evaluate the impact on fidelity, resilience against overwrite threats, and resistance to compression attacks when applying these watermarking techniques to SNNs, drawing comparisons with their ANN counterparts. This study lays the groundwork for developing neuromorphic-aware IP protection strategies tailored to the distinctive dynamics of SNNs.

Updated: 2024-05-07 06:42:30

标题: 数字水印技术在脉冲神经网络中的应用：智力财产保护

摘要: 随着尖峰神经网络（SNNs）在部署神经形态计算解决方案中得到推广，保护其知识产权（IP）变得至关重要。如果没有足够的保护措施，专有的SNN架构面临被盗窃、复制或滥用的风险，这可能给所有者带来重大的财务损失。虽然为人工神经网络（ANNs）探索了广泛的IP保护技术，但这些技术在SNNs的独特特征上的适用性和有效性仍然未被充分探讨。在这项工作中，我们首次对两种著名的水印技术进行了调查，即基于指纹和后门的机制，以保护专有的SNN架构。我们进行了彻底的实验，评估了在将这些水印技术应用于SNNs时对保真度、抗覆盖威胁的抵抗力以及抗压缩攻击的影响，并与其ANN对应物进行比较。这项研究为开发针对SNNs独特动态的神经形态感知IP保护策略奠定了基础。

更新时间: 2024-05-07 06:42:30

领域: cs.CR,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.04049v1

MBCT: A Monero-Based Covert Transmission Approach with On-chain Dynamic Session Key Negotiation

Traditional covert transmission (CT) approaches have been hindering CT application while blockchain technology offers new avenue. Current blockchain-based CT approaches require off-chain negotiation of critical information and often overlook the dynamic session keys updating, which increases the risk of message and key leakage. Additionally, in some approaches the covert transactions exhibit obvious characteristics that can be easily detected by third-parties. Moreover, most approaches do not address the issue of decreased reliability of message transmission in blockchain attack scenarios. Bitcoin- and Ethereum-based approaches also have the issue of transaction linkability, which can be tackled by Monero-based approaches because of the privacy protection mechanisms in Monero. However, Monero-based CT has the problem of sender repudiation. In this paper, we propose a novel Monero-Based CT approach (MBCT), which enables on-chain session key dynamically updating without off-chain negotiation. MBCT can assure non-repudiation of transmission participants, confidentiality of keys, reliability of message transmission and less observable characteristics. There are achieved by the three components in MBCT, namely, a sender authentication method, a dynamically on-chain session key updating method and a state feedback method. We implement MBCT in Monero-0.18.1.0 and the experiment results demonstrate its high embedding capacity of MBCT.

Updated: 2024-05-07 06:32:16

标题: MBCT：基于门罗币的隐蔽传输方法，具有链上动态会话密钥协商

摘要: 传统的隐蔽传输（CT）方法一直在阻碍CT应用的发展，而区块链技术提供了新的途径。当前基于区块链的CT方法需要在链外协商关键信息，通常忽视动态会话密钥更新，这增加了消息和密钥泄露的风险。此外，在一些方法中，隐蔽交易展示出明显的特征，容易被第三方检测出来。此外，大多数方法没有解决在区块链攻击场景中消息传输可靠性降低的问题。基于比特币和以太坊的方法也存在交易可追踪性的问题，而基于门罗币的方法可以通过门罗的隐私保护机制来解决这个问题。然而，基于门罗币的CT存在发送方否认的问题。在本文中，我们提出了一种新颖的基于门罗币的CT方法（MBCT），可以实现在链上动态更新会话密钥而无需链外协商。MBCT可以确保传输参与者的不可否认性，密钥的保密性，消息传输的可靠性和不易观察的特征。这是通过MBCT中的三个组件实现的，即发送方认证方法，动态在链上会话密钥更新方法和状态反馈方法。我们在Monero-0.18.1.0中实现了MBCT，并实验结果表明其具有较高的MBCT嵌入能力。

更新时间: 2024-05-07 06:32:16

领域: cs.CR

下载: http://arxiv.org/abs/2405.04046v1

Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.

Updated: 2024-05-07 06:29:06

标题: 可扩展的垂直联邦学习：通过数据增强和摊销推断

摘要: 垂直联邦学习（VFL）已经成为跨多个客户之间协作模型估计的范式，每个客户持有不同的协变量集的方法。本文介绍了在VFL环境中拟合贝叶斯模型的第一个全面框架。我们提出了一种新颖的方法，利用数据增强技术将VFL问题转化为与现有贝叶斯联邦学习算法兼容的形式。我们提出了一种创新的模型形式，适用于特定的VFL场景，其中联合似然分解为客户特定似然的乘积。为了减轻数据增强带来的维度挑战，它随着观测量和客户数量的增加而增加，我们开发了一种分解的摊销变分逼近方法，实现了与观测量数量无关的可伸缩性。通过对逻辑回归、多层回归和新颖的分层贝叶斯分割神经网络模型进行广泛的数值实验，展示了我们框架的有效性。我们的工作为在垂直分区数据情景下进行隐私保护、去中心化的贝叶斯推断铺平了道路，为各个领域的研究和应用开辟了新的途径。

更新时间: 2024-05-07 06:29:06

领域: stat.CO,cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2405.04043v1

Space-time Reinforcement Network for Video Object Segmentation

Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames. Despite these methods having superior performance, they suffer from two issues: 1) Challenging data can destroy the space-time coherence between adjacent video frames. 2) Pixel-level matching will lead to undesired mismatching caused by the noises or distractors. To address the aforementioned issues, we first propose to generate an auxiliary frame between adjacent frames, serving as an implicit short-temporal reference for the query one. Next, we learn a prototype for each video object and prototype-level matching can be implemented between the query and memory. The experiment demonstrated that our network outperforms the state-of-the-art method on the DAVIS 2017, achieving a J&F score of 86.4%, and attains a competitive result 85.0% on YouTube VOS 2018. In addition, our network exhibits a high inference speed of 32+ FPS.

Updated: 2024-05-07 06:26:30

标题: 时空强化网络用于视频目标分割

摘要: 最近，视频对象分割（VOS）网络通常使用基于记忆的方法：对于每个查询帧，通过空间-时间匹配预测掩模到记忆帧。尽管这些方法具有更优越的性能，但它们面临两个问题：1）具有挑战性的数据可能破坏相邻视频帧之间的空间-时间一致性。2）像素级匹配会导致由噪声或干扰物引起的不良匹配。为解决上述问题，我们首次提出在相邻帧之间生成一个辅助帧，作为查询帧的隐式短时参考。接下来，我们为每个视频对象学习一个原型，并在查询和记忆之间实现原型级匹配。实验证明，我们的网络在DAVIS 2017上优于最先进的方法，实现了86.4%的J&F分数，并在YouTube VOS 2018上取得了竞争结果85.0%。此外，我们的网络展示了高达32+ FPS的高推理速度。

更新时间: 2024-05-07 06:26:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04042v1

Feature Map Convergence Evaluation for Functional Module

Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneering in the issue, we propose an evaluation method based on feature map analysis to gauge the convergence of model, thereby assessing functional modules' training maturity. We construct a quantitative metric named as the Feature Map Convergence Score (FMCS) and develop Feature Map Convergence Evaluation Network (FMCE-Net) to measure and predict the convergence degree of models respectively. FMCE-Net achieves remarkable predictive accuracy for FMCS across multiple image classification experiments, validating the efficacy and robustness of the introduced approach. To the best of our knowledge, this is the first independent evaluation method for functional modules, offering a new paradigm for the training assessment towards perception models.

Updated: 2024-05-07 06:25:49

标题: 功能模块的特征图收敛评估

摘要: 自动驾驶感知模型通常由多个功能模块组成，通过复杂的关系相互作用来实现环境理解。然而，感知模型主要通过端到端训练来优化为黑盒，缺乏对功能模块的独立评估，这给解释性和优化带来了困难。在这个问题上，我们提出了一种基于特征图分析的评估方法，以评估模型的收敛性，从而评估功能模块的训练成熟度。我们构建了一个名为特征图收敛分数（FMCS）的定量指标，并开发了特征图收敛评估网络（FMCE-Net）分别测量和预测模型的收敛程度。FMCE-Net在多个图像分类实验中取得了显著的预测准确性，验证了引入方法的有效性和稳健性。据我们所知，这是功能模块的第一个独立评估方法，为感知模型的训练评估提供了一个新的范式。

更新时间: 2024-05-07 06:25:49

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.04041v1

Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.

Updated: 2024-05-07 06:23:02

标题: 利用GPT增强文本摘要：最小化幻觉的策略

摘要: 在这项研究中，我们使用DistilBERT模型生成抽取式摘要，使用T5模型生成生成式摘要。此外，我们通过结合DistilBERT和T5模型生成混合摘要。我们研究的核心是实施基于GPT的精炼过程，以减少AI生成摘要中常见的幻觉问题。我们评估未经精炼的摘要，并在精炼后，我们还使用一系列传统和新颖的指标评估精炼的摘要，展示了摘要准确性和可靠性的显著提高。结果突出显示在减少幻觉内容方面取得了显著改善，从而增加了摘要的事实完整性。

更新时间: 2024-05-07 06:23:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04039v1

Creativity and Machine Learning: A Survey

There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.

Updated: 2024-05-07 06:19:46

标题: 创造力与机器学习：一项调查

摘要: 机器学习和创造力领域日益受到关注。本调查概述了计算创造力理论的历史和现状，关键的机器学习技术（包括生成式深度学习）以及相应的自动评估方法。在介绍了该领域的关键贡献后，我们概述了当前的研究挑战和该领域的新兴机遇。

更新时间: 2024-05-07 06:19:46

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2104.02726v5

Differentially Private Post-Processing for Fair Regression

This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. It consists of three steps: first, the output distributions are estimated privately via histogram density estimation and the Laplace mechanism, then their Wasserstein barycenter is computed, and the optimal transports to the barycenter are used for post-processing to satisfy fairness. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram, in which using less bins always favors fairness at the expense of error.

Updated: 2024-05-07 06:09:37

标题: 私密后处理对公平回归的不同处理

摘要: 本文描述了一种针对学习满足统计平等的公平回归器的差分隐私后处理算法，解决了机器学习模型在敏感数据上训练时的隐私问题以及它们传播历史偏见的公平性问题。我们的算法可以应用于后处理任何给定的回归器，通过重新映射其输出来提高公平性。它包括三个步骤：首先，通过直方图密度估计和拉普拉斯机制私下估计输出分布，然后计算它们的Wasserstein重心，并使用到重心的最佳传输进行后处理以满足公平性。我们分析了算法的样本复杂度，并提供了公平性保证，揭示了在直方图中选择的箱数会引起的统计偏差和方差之间的权衡，使用更少的箱总是有利于公平性，但会以错误为代价。

更新时间: 2024-05-07 06:09:37

领域: cs.LG,cs.CR,cs.CY

下载: http://arxiv.org/abs/2405.04034v1

Locally Differentially Private In-Context Learning

Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results.

Updated: 2024-05-07 06:05:43

标题: 局部差分隐私上下文学习

摘要: 大型预训练语言模型（LLMs）展示了惊人的上下文学习（ICL）能力。在部署大型语言模型中的一个重要应用是为某些特定任务增加LLMs与私人数据库。这种有前景的商业应用的主要问题是LLMs已经被证明会记忆它们的训练数据，并且它们的提示数据容易受到成员推理攻击（MIA）和提示泄露攻击的影响。为了解决这个问题，我们将LLMs视为不可信的隐私，并在标签敏感的环境中提出了一种局部差分隐私框架的上下文学习（LDP-ICL）。考虑到通过梯度下降在变形器中的上下文学习机制，我们对在这种LDP-ICL中隐私和效用之间的权衡进行了分析，用于分类。此外，我们将LDP-ICL应用于离散分布估计问题。最后，我们进行了几个实验来展示我们的分析结果。

更新时间: 2024-05-07 06:05:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04032v1

Enabling Privacy-Preserving and Publicly Auditable Federated Learning

Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the influence of malicious participants on training; 3) how to ensure that private gradients and models are not leaked to third parties. Many solutions have been proposed to address these issues, while solving the above three problems simultaneously is seldom considered. In this paper, we propose a publicly auditable and privacy-preserving federated learning scheme that is resistant to malicious participants uploading gradients with wrong directions and enables anyone to audit and verify the correctness of the training process. In particular, we design a robust aggregation algorithm capable of detecting gradients with wrong directions from malicious participants. Then, we design a random vector generation algorithm and combine it with zero sharing and blockchain technologies to make the joint training process publicly auditable, meaning anyone can verify the correctness of the training. Finally, we conduct a series of experiments, and the experimental results show that the model generated by the protocol is comparable in accuracy to the original FL approach while keeping security advantages.

Updated: 2024-05-07 06:03:10

标题: 实现隐私保护和公开可审计的联邦学习

摘要: 联邦学习（FL）因支持多方共同训练模型而不移动私人数据集而受到广泛关注。然而，在FL中仍存在许多安全问题值得讨论。本文考虑三个主要问题：1）如何确保训练过程可以被任何第三方公开审计；2）如何避免恶意参与者对训练的影响；3）如何确保私人梯度和模型不会泄漏给第三方。已经提出了许多解决方案来解决这些问题，但同时解决上述三个问题很少被考虑。在本文中，我们提出了一个公开审计和隐私保护的联邦学习方案，能够抵抗恶意参与者上传带有错误方向梯度的影响，并使任何人都能够审计和验证训练过程的正确性。具体而言，我们设计了一个强大的聚合算法，能够检测恶意参与者上传的错误方向梯度。然后，我们设计了一个随机向量生成算法，并将其与零共享和区块链技术相结合，使联合训练过程公开审计，意味着任何人都可以验证训练的正确性。最后，我们进行了一系列实验，实验结果显示，该协议生成的模型在准确性上与原始FL方法相当，同时保持了安全优势。

更新时间: 2024-05-07 06:03:10

领域: cs.CR,C.2.2; C.2.4; E.3

下载: http://arxiv.org/abs/2405.04029v1

Federated Control in Markov Decision Processes

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

Updated: 2024-05-07 05:59:10

标题: 马尔可夫决策过程中的联邦控制

摘要: 我们研究马尔可夫决策过程中的联邦控制问题。为了解决具有大状态空间的MDP问题，引入了多个学习代理以协作学习其最优策略，而无需传递本地收集的经验。在我们的设置中，这些代理具有有限的能力，这意味着它们在训练过程中被限制在整体状态空间的不同区域内。面对受限区域之间的差异，我们首先引入泄漏概率的概念，以了解这种异质性如何影响学习过程，然后提出了一个我们称为联邦Q协议（FedQ）的新型通信协议，该协议定期聚合代理对其受限区域的知识，并相应地修改他们的学习问题以进行进一步训练。在理论分析方面，我们证明了FedQ作为通信协议的正确性，然后给出了通过RL oracle获得的算法FedQ-X的样本复杂度的一般结果，并最终对FedQ-SynQ的样本复杂度进行了彻底研究。具体而言，当工作负载在代理之间均匀分布时，已经显示出FedQ-X在样本复杂度方面具有线性加速。此外，我们在各种环境中进行实验，以证明我们方法的效率。

更新时间: 2024-05-07 05:59:10

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.04026v1

Optimal Group Fair Classifiers from Linear Post-Processing

We propose a post-processing algorithm for fair classification that mitigates model bias under a unified family of group fairness criteria covering statistical parity, equal opportunity, and equalized odds, applicable to multi-class problems and both attribute-aware and attribute-blind settings. It achieves fairness by re-calibrating the output score of the given base model with a "fairness cost" -- a linear combination of the (predicted) group memberships. Our algorithm is based on a representation result showing that the optimal fair classifier can be expressed as a linear post-processing of the loss function and the group predictor, derived via using these as sufficient statistics to reformulate the fair classification problem as a linear program. The parameters of the post-processor are estimated by solving the empirical LP. Experiments on benchmark datasets show the efficiency and effectiveness of our algorithm at reducing disparity compared to existing algorithms, including in-processing, especially on larger problems.

Updated: 2024-05-07 05:58:44

标题: 线性后处理的最优组公平分类器 (Note: "linear post-processing" refers to a type of data processing technique)

摘要: 我们提出了一个后处理算法，用于公平分类，可以在统计平等、机会平等和平等几率等一系列群体公平标准下减轻模型偏见，适用于多类问题以及属性感知和属性盲目设置。它通过重新校准给定基本模型的输出分数来实现公平性，其中包括一个“公平成本”--预测的群体成员的线性组合。我们的算法基于一个表示结果，该结果表明最优公平分类器可以表示为损失函数和群体预测器的线性后处理，通过使用这些作为充分统计量来重构公平分类问题，进而将其转化为线性规划。后处理器的参数通过解决经验LP来估计。在基准数据集上的实验表明，与现有算法相比，我们的算法在减少差异方面效率和效果显著，特别是在更大的问题上。

更新时间: 2024-05-07 05:58:44

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2405.04025v1

The Role of Federated Learning in a Wireless World with Foundation Models

Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms.

Updated: 2024-05-07 05:55:46

标题: 《在具有基础模型的无线世界中的联邦学习作用》

摘要: 基础模型（FMs）是最近使多个全新生成式人工智能（AI）应用成为可能的通用人工智能模型。FMs的快速进展为下一代无线网络的愿景提供了重要的背景，其中联邦学习（FL）是分布式网络智能的关键推动因素。目前，FMs与FL之间的相互作用的探索仍处于起步阶段。自然地，FMs能够提升FL的性能，同时FL也可以利用分散的数据和计算资源来辅助FMs的训练。然而，FMs对计算资源、存储和通信开销的异常高要求将对FL启用的无线网络构成重要挑战。在本文中，我们探讨了FMs在无线网络上是否适合FL的程度，包括对研究挑战和机遇的广泛概述。特别地，我们讨论了实现未来智能网络的多种新范式，这些范式将FMs和FL整合在一起。我们还整合了与这些范式相关的几个广泛研究方向。

更新时间: 2024-05-07 05:55:46

领域: cs.NI,cs.DC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.04003v3

BrainLeaks: On the Privacy-Preserving Properties of Neuromorphic Architectures against Model Inversion Attacks

With the mainstream integration of machine learning into security-sensitive domains such as healthcare and finance, concerns about data privacy have intensified. Conventional artificial neural networks (ANNs) have been found vulnerable to several attacks that can leak sensitive data. Particularly, model inversion (MI) attacks enable the reconstruction of data samples that have been used to train the model. Neuromorphic architectures have emerged as a paradigm shift in neural computing, enabling asynchronous and energy-efficient computation. However, little to no existing work has investigated the privacy of neuromorphic architectures against model inversion. Our study is motivated by the intuition that the non-differentiable aspect of spiking neural networks (SNNs) might result in inherent privacy-preserving properties, especially against gradient-based attacks. To investigate this hypothesis, we propose a thorough exploration of SNNs' privacy-preserving capabilities. Specifically, we develop novel inversion attack strategies that are comprehensively designed to target SNNs, offering a comparative analysis with their conventional ANN counterparts. Our experiments, conducted on diverse event-based and static datasets, demonstrate the effectiveness of the proposed attack strategies and therefore questions the assumption of inherent privacy-preserving in neuromorphic architectures.

Updated: 2024-05-07 05:53:46

标题: BrainLeaks: 关于神经形态架构对模型反漂逆向攻击的隐私保护特性

摘要: 随着机器学习被广泛整合进诸如医疗保健和金融等安全敏感领域，对数据隐私的担忧日益加剧。传统的人工神经网络（ANNs）已被发现容易受到几种可能泄漏敏感数据的攻击。特别是，模型反演（MI）攻击可以重建已用于训练模型的数据样本。神经形态架构已成为神经计算中的一种范式转变，实现了异步和节能的计算。然而，几乎没有现有研究调查神经形态架构对模型反演的隐私保护能力。我们的研究受到这样的直觉的启发，即脉冲神经网络（SNNs）的不可微分特性可能导致固有的隐私保护属性，特别是针对基于梯度的攻击。为了验证这一假设，我们提出了对SNNs的隐私保护能力进行深入探讨。具体来说，我们开发了全面设计的新型反演攻击策略，旨在针对SNNs，提供与传统ANN对应物的比较分析。我们在多样化的基于事件和静态数据集上进行的实验表明，所提出的攻击策略的有效性，并因此质疑了神经形态架构中固有的隐私保护假设。

更新时间: 2024-05-07 05:53:46

领域: cs.CR,cs.LG,cs.NE

下载: http://arxiv.org/abs/2402.00906v2

Robust and Reusable Fuzzy Extractors for Low-entropy Rate Randomness Sources

Fuzzy extractors (FE) are cryptographic primitives that extract reliable cryptographic key from noisy real world random sources such as biometric sources. The FE generation algorithm takes a source sample, extracts a key and generates some helper data that will be used by the reproduction algorithm to recover the key. Reusability of FE guarantees that security holds when FE is used multiple times with the same source, and robustness of FE requires tampering with the helper data be detectable. In this paper, we consider information theoretic FEs, define a strong notion of reusability, and propose strongly robust and reusable FEs (srrFE) that provides the strongest combined notion of reusability and robustness for FEs. We give two constructions, one for reusable FEs and one for srrFE with information theoretic (IT) security for structured sources. The constructions are for structured sources and use sample-then-lock approach. We discuss each construction and show their unique properties in relation to existing work. Construction 2 is the first robust and reusable FE with IT-security without assuming random oracle. The robustness is achieved by using an IT-secure MAC with security against key-shift attack, which can be of independent interest.

Updated: 2024-05-07 05:48:02

标题: 强大且可重复使用的用于低熵率随机源的模糊提取器

摘要: 模糊提取器（FE）是一种从嘈杂的现实世界随机源（如生物特征源）中提取可靠加密密钥的密码原语。FE生成算法接受一个源样本，提取一个密钥，并生成一些辅助数据，这些数据将被再现算法用于恢复密钥。FE的可重用性保证了在多次使用相同源时安全性得以保持，FE的鲁棒性要求辅助数据的篡改是可以被检测到的。在本文中，我们考虑信息理论性FE，定义了一个强大的可重用性概念，并提出了强韧性和可重用性的结合概念最强的FE（srrFE），为FE提供了最强的可重用性和鲁棒性概念。我们提供了两种构造方法，一种是可重用的FE的构造方法，另一种是针对结构化源的带有信息理论（IT）安全性的srrFE的构造方法。这些构造方法适用于结构化源，并采用样本-锁定方法。我们讨论了每种构造方法，并展示了它们与现有工作的独特属性。构造方法2是第一个具有IT安全性的强韧性和可重用性FE，而不需要假设随机预言。通过使用对抗密钥移位攻击的IT安全MAC来实现鲁棒性，这可以是一个独立的关注点。

更新时间: 2024-05-07 05:48:02

领域: cs.CR

下载: http://arxiv.org/abs/2405.04021v1

Delphi: Efficient Asynchronous Approximate Agreement for Distributed Oracles

Agreement protocols are crucial in various emerging applications, spanning from distributed (blockchains) oracles to fault-tolerant cyber-physical systems. In scenarios where sensor/oracle nodes measure a common source, maintaining output within the convex range of correct inputs, known as convex validity, is imperative. Present asynchronous convex agreement protocols employ either randomization, incurring substantial computation overhead, or approximate agreement techniques, leading to high $\mathcal{\tilde{O}}(n^3)$ communication for an $n$-node system. This paper introduces Delphi, a deterministic protocol with $\mathcal{\tilde{O}}(n^2)$ communication and minimal computation overhead. Delphi assumes that honest inputs are bounded, except with negligible probability, and integrates agreement primitives from literature with a novel weighted averaging technique. Experimental results highlight Delphi's superior performance, showcasing a significantly lower latency compared to state-of-the-art protocols. Specifically, for an $n=160$-node system, Delphi achieves an 8x and 3x improvement in latency within CPS and AWS environments, respectively.

Updated: 2024-05-07 05:47:42

标题: Delphi：用于分布式Oracle的高效异步近似一致性达成协议

摘要: 协议的一致性在各种新兴应用中至关重要，涵盖了分布式（区块链）神谕到容错的网络物理系统。在传感器/神谕节点测量共同源的场景中，保持输出在正确输入的凸范围内，即凸有效性，是至关重要的。目前的异步凸协议采用随机化或近似协议技术，导致高达$\mathcal{\tilde{O}}(n^3)$的通信开销，适用于$n$个节点系统。本文介绍了Delphi，这是一个确定性协议，通信开销为$\mathcal{\tilde{O}}(n^2)$，计算开销极小。Delphi假设诚实输入是有界的，除了可以忽略的概率外，并将文献中的协议与一种新颖的加权平均技术相结合。实验结果突显了Delphi出色的性能，展示了与最先进协议相比显著更低的延迟。具体而言，对于一个$n=160$个节点的系统，Delphi在CPS和AWS环境中分别实现了8倍和3倍的延迟改进。

更新时间: 2024-05-07 05:47:42

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2405.02431v2

Deep Regression Representation Learning with Topology

Most works studying representation learning focus only on classification and neglect regression. Yet, the learning objectives and therefore the representation topologies of the two tasks are fundamentally different: classification targets class separation, leading to disconnected representations, whereas regression requires ordinality with respect to the target, leading to continuous representations. We thus wonder how the effectiveness of a regression representation is influenced by its topology, with evaluation based on the Information Bottleneck (IB) principle. The IB principle is an important framework that provides principles for learning effectiveness representations. We establish two connections between it and the topology of regression representations. The first connection reveals that a lower intrinsic dimension of the feature space implies a reduced complexity of the representation Z. This complexity can be quantified as the conditional entropy of Z on the target space Y and serves as an upper bound on the generalization error. The second connection suggests learning a feature space that is topologically similar to the target space will better align with the IB principle. Based on these two connections, we introduce PH-Reg, a regularizer specific to regression that matches the intrinsic dimension and topology of the feature space with the target space. Experiments on synthetic and real-world regression tasks demonstrate the benefits of PH-Reg.

Updated: 2024-05-07 05:32:55

标题: 深度回归表示学习与拓扑结构

摘要: 大多数研究代表学习的作品仅关注分类，忽视回归。然而，这两个任务的学习目标和因此表示拓扑 fundamentally 不同：分类目标是类别分离，导致断开的表示，而回归需要与目标相关的顺序性，导致连续的表示。因此，我们想知道回归表示的有效性如何受其拓扑的影响，评估基于信息瓶颈（IB）原则。信息瓶颈原则是一个重要的框架，提供了学习有效性表示的原则。我们建立了它与回归表示拓扑之间的两个连接。第一个连接揭示了特征空间的较低内在维度意味着表示 Z 的复杂性降低。这种复杂性可以量化为 Z 在目标空间 Y 上的条件熵，并作为泛化误差的上界。第二个连接建议学习一个与目标空间拓扑相似的特征空间将更好地与信息瓶颈原则保持一致。基于这两个连接，我们引入了 PH-Reg，一种特定于回归的正则化器，将特征空间的内在维度和拓扑与目标空间匹配。对合成和真实世界的回归任务的实验表明了 PH-Reg 的好处。

更新时间: 2024-05-07 05:32:55

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.13904v3

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(\epsilon^{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(\epsilon^{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(\epsilon^{-2})$ complexity in the existing literature.

Updated: 2024-05-07 05:29:55

标题: 一个改进的有限时间分析：使用深度神经网络的时序差分学习

摘要: 具有神经网络函数参数化的时间差异（TD）学习算法在许多实际的大规模强化学习任务中已经取得了良好的经验成功。然而，由于动作值近似的非线性，对这些算法的理论理解仍然具有挑战性。在本文中，我们开发了一个改进的神经网络TD方法的非渐近分析，其中包括一个通用的$L$层神经网络。我们开发了新的证明技术，并推导出了一个改进的新的$\tilde{\mathcal{O}}(\epsilon^{-1})$样本复杂度。据我们所知，这是神经网络TD的第一个有限时间分析，在马尔可夫采样下实现了$\tilde{\mathcal{O}}(\epsilon^{-1})$的复杂度，而不是现有文献中已知的$\tilde{\mathcal{O}}(\epsilon^{-2})$的复杂度。

更新时间: 2024-05-07 05:29:55

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.04017v1

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties

Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i.e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.

Updated: 2024-05-07 05:23:56

标题: MDPs下分布式达到-避免性质的认证策略验证和综合

摘要: 马尔可夫决策过程（MDPs）是在存在不确定性的情况下做出决策的经典模型。通常将它们视为状态转换器，规划目标是基于MDP状态路径定义的。越来越受欢迎的另一种观点是将它们视为分布转换器，从而产生一系列MDP状态上的概率分布。例如，在建模机器人群体或化学反应网络时，可达性和安全性属性自然地以状态上的概率分布形式定义。验证这种分布属性被认为是困难的，并且通常超出了经典基于状态的验证技术的范围。在这项工作中，我们考虑在分布可达性规范下在MDPs中认证策略（即控制器）的验证和合成问题。通过认证，我们的意思是，除了一个策略外，我们还旨在合成一个（可检验的）证书，确保MDP确实满足该属性。因此，给定目标分布集和MDP状态上的不安全分布集，我们的目标是为给定策略合成证书，或合成一个策略以及一个证书，证明可以达到目标分布同时避免不安全分布。为了解决这个问题，我们引入了分布可达-避免证书的新概念，并提出了自动化程序，用于（1）为给定策略合成证书，以及（2）合成一个策略和证书，两者都提供了对证书正确性的形式保证。我们的实验评估表明，我们的方法能够解决几个非平凡的例子，包括多智能体机器人群体模型，合成认证策略和证明现有策略的能力。

更新时间: 2024-05-07 05:23:56

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2405.04015v1

Explainability-Informed Targeted Malware Misclassification

In recent years, there has been a surge in malware attacks across critical infrastructures, requiring further research and development of appropriate response and remediation strategies in malware detection and classification. Several works have used machine learning models for malware classification into categories, and deep neural networks have shown promising results. However, these models have shown its vulnerabilities against intentionally crafted adversarial attacks, which yields misclassification of a malicious file. Our paper explores such adversarial vulnerabilities of neural network based malware classification system in the dynamic and online analysis environments. To evaluate our approach, we trained Feed Forward Neural Networks (FFNN) to classify malware categories based on features obtained from dynamic and online analysis environments. We use the state-of-the-art method, SHapley Additive exPlanations (SHAP), for the feature attribution for malware classification, to inform the adversarial attackers about the features with significant importance on classification decision. Using the explainability-informed features, we perform targeted misclassification adversarial white-box evasion attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks against the trained classifier. Our results demonstrated high evasion rate for some instances of attacks, showing a clear vulnerability of a malware classifier for such attacks. We offer recommendations for a balanced approach and a benchmark for much-needed future research into evasion attacks against malware classifiers, and develop more robust and trustworthy solutions.

Updated: 2024-05-07 04:59:19

标题: 可解释性信息指导的有针对性恶意软件误分类

摘要: 近年来，恶意软件攻击在关键基础设施中急剧增加，需要进一步研究和开发适当的应对和修复策略，以便在恶意软件检测和分类中进行。一些研究使用机器学习模型将恶意软件分类为不同类别，深度神经网络显示出了有希望的结果。然而，这些模型对故意制作的对抗性攻击显示出了脆弱性，导致恶意文件的误分类。我们的论文探讨了基于神经网络的恶意软件分类系统在动态和在线分析环境中的对抗性脆弱性。为了评估我们的方法，我们训练了前馈神经网络（FFNN）根据从动态和在线分析环境中获得的特征对恶意软件类别进行分类。我们使用最先进的方法，SHapley Additive exPlanations（SHAP），用于恶意软件分类的特征归因，以告知对抗性攻击者对分类决策具有显著重要性的特征。利用解释性特征，我们使用快速梯度符号方法（FGSM）和投影梯度下降（PGD）攻击对训练过的分类器进行有针对性的误分类对抗性白盒攻击。我们的结果显示，在某些攻击实例中存在高误识别率，显示了恶意软件分类器对此类攻击的明显脆弱性。我们提供了一个平衡方法的建议，并为迫切需要的对抗性攻击研究提供了基准，以及开发更加稳健和可信赖的解决方案。

更新时间: 2024-05-07 04:59:19

领域: cs.CR

下载: http://arxiv.org/abs/2405.04010v1

Structured Click Control in Transformer-based Interactive Segmentation

Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on graph neural networks, which adaptively obtains graph nodes via the global similarity of user-clicked Transformer tokens. Then the graph nodes will be aggregated to obtain structured interaction features. Finally, the dual cross-attention will be used to inject structured interaction features into vision Transformer features, thereby enhancing the control of clicks over segmentation results. Extensive experiments demonstrated the proposed algorithm can serve as a general structure in improving Transformer-based interactive segmenta?tion performance. The code and data will be released at https://github.com/hahamyt/scc.

Updated: 2024-05-07 04:57:25

标题: 基于Transformer的交互式分割中的结构化点击控制

摘要: 基于点击点的交互式分割由于其高效性而受到广泛关注。然而，现有算法很难在多次点击后获得精确和稳健的响应。在这种情况下，分割结果往往变化很小，甚至比以前更糟。为了提高响应的稳健性，我们提出了一种基于图神经网络的结构化点击意图模型，通过自适应地通过用户点击的Transformer标记的全局相似性获取图节点。然后，图节点将被聚合以获得结构化交互特征。最后，双重交叉注意力将用于将结构化交互特征注入视觉Transformer特征中，从而增强点击对分割结果的控制。大量实验表明，所提出的算法可以作为改进基于Transformer的交互式分割性能的通用结构。代码和数据将在https://github.com/hahamyt/scc 上发布。

更新时间: 2024-05-07 04:57:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04009v1

High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Radiative heat transfer is a fundamental process in high energy density physics and inertial fusion. Accurately predicting the behavior of Marshak waves across a wide range of material properties and drive conditions is crucial for design and analysis of these systems. Conventional numerical solvers and analytical approximations often face challenges in terms of accuracy and computational efficiency. In this work, we propose a novel approach to model Marshak waves using Fourier Neural Operators (FNO). We develop two FNO-based models: (1) a base model that learns the mapping between the drive condition and material properties to a solution approximation based on the widely used analytic model by Hammer & Rosen (2003), and (2) a model that corrects the inaccuracies of the analytic approximation by learning the mapping to a more accurate numerical solution. Our results demonstrate the strong generalization capabilities of the FNOs and show significant improvements in prediction accuracy compared to the base analytic model.

Updated: 2024-05-07 04:44:59

标题: 在扩散区域内的高能量密度辐射传递与傅里叶神经算子

摘要: 辐射传热是高能量密度物理和惯性聚变中的基本过程。准确预测Marshak波在广泛的材料性质和驱动条件下的行为对于设计和分析这些系统至关重要。传统的数值求解器和分析近似通常在精确性和计算效率方面面临挑战。在这项工作中，我们提出了一种使用傅里叶神经算子（FNO）模拟Marshak波的新方法。我们开发了两个基于FNO的模型：（1）一个基本模型，学习驱动条件和材料性质之间的映射到基于Hammer＆Rosen（2003）广泛使用的分析模型的解决方案近似，和（2）一个模型，通过学习映射到更准确的数值解来纠正分析近似的不准确性。我们的结果展示了FNO的强大泛化能力，并与基本分析模型相比显示了预测精度的显着提高。

更新时间: 2024-05-07 04:44:59

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2405.04003v1

Sharpness-Aware Data Poisoning Attack

Recent research has highlighted the vulnerability of Deep Neural Networks (DNNs) against data poisoning attacks. These attacks aim to inject poisoning samples into the models' training dataset such that the trained models have inference failures. While previous studies have executed different types of attacks, one major challenge that greatly limits their effectiveness is the uncertainty of the re-training process after the injection of poisoning samples, including the re-training initialization or algorithms. To address this challenge, we propose a novel attack method called ''Sharpness-Aware Data Poisoning Attack (SAPA)''. In particular, it leverages the concept of DNNs' loss landscape sharpness to optimize the poisoning effect on the worst re-trained model. It helps enhance the preservation of the poisoning effect, regardless of the specific retraining procedure employed. Extensive experiments demonstrate that SAPA offers a general and principled strategy that significantly enhances various types of poisoning attacks.

Updated: 2024-05-07 04:41:52

标题: 锐度感知数据投毒攻击

摘要: 最近的研究突显了深度神经网络（DNNs）对数据毒化攻击的脆弱性。这些攻击旨在将毒化样本注入到模型的训练数据集中，使训练出的模型出现推理失败。尽管先前的研究执行了不同类型的攻击，但一个重要挑战极大地限制了它们的有效性，即在注入毒化样本后重新训练过程的不确定性，包括重新训练的初始化或算法。为了解决这一挑战，我们提出了一种称为“尖锐感知数据毒化攻击（SAPA）”的新型攻击方法。具体来说，它利用了DNNs损失景观的尖锐度概念，以优化对最糟糕重新训练模型的毒化效果。它有助于增强毒化效果的保留，无论采用的具体重新训练程序是什么。大量实验证明，SAPA提供了一种通用且原则性的策略，显著增强了各种类型的毒化攻击。

更新时间: 2024-05-07 04:41:52

领域: cs.CR

下载: http://arxiv.org/abs/2305.14851v2

A Case-Based Persistent Memory for a Large Language Model

Case-based reasoning (CBR) as a methodology for problem-solving can use any appropriate computational technique. This position paper argues that CBR researchers have somewhat overlooked recent developments in deep learning and large language models (LLMs). The underlying technical developments that have enabled the recent breakthroughs in AI have strong synergies with CBR and could be used to provide a persistent memory for LLMs to make progress towards Artificial General Intelligence.

Updated: 2024-05-07 04:36:42

标题: 一个大型语言模型的基于案例的持久性记忆

摘要: 基于案例推理（CBR）作为解决问题的方法论可以使用任何适当的计算技术。这篇立场文件认为，CBR研究人员在深度学习和大型语言模型（LLMs）的最新发展方面有所忽视。促成人工智能最新突破的基础技术发展与CBR具有强烈的协同作用，并可以用于为LLMs提供持久记忆，以向人工通用智能迈进。

更新时间: 2024-05-07 04:36:42

领域: cs.AI,I.2.0

下载: http://arxiv.org/abs/2310.08842v2

AC4: Algebraic Computation Checker for Circuit Constraints in ZKPs

ZKP systems have surged attention and held a fundamental role in contemporary cryptography. Zk-SNARK protocols dominate the ZKP usage, often implemented through arithmetic circuit programming paradigm. However, underconstrained or overconstrained circuits may lead to bugs. Underconstrained circuits refer to circuits that lack the necessary constraints, resulting in unexpected solutions in the circuit and causing the verifier to accept a bogus witness. Overconstrained circuits refer to circuits that are constrained excessively, resulting in the circuit lacking necessary solutions and causing the verifier to accept no witness, rendering the circuit meaningless. This paper introduces a novel approach for pinpointing two distinct types of bugs in ZKP circuits. The method involves encoding the arithmetic circuit constraints to polynomial equation systems and solving polynomial equation systems over a finite field by algebraic computation. The classification of verification results is refined, greatly enhancing the expressive power of the system. We proposed a tool, AC4, to represent the implementation of this method. Experiments demonstrate that AC4 represents a substantial 29% increase in the checked ratio compared to prior work. Within a solvable range, the checking time of AC4 has also exhibited noticeable improvement, demonstrating a magnitude increase compared to previous efforts.

Updated: 2024-05-07 04:29:29

标题: AC4：零知识证明中电路约束的代数计算检查器

摘要: ZKP系统在当代密码学中引起了广泛关注并扮演着基础性角色。Zk-SNARK协议主导了ZKP的使用，通常通过算术电路编程范式实现。然而，欠约束或过约束的电路可能导致错误。欠约束的电路指的是缺乏必要约束条件的电路，导致电路中出现意外解决方案，并导致验证者接受虚假证人。过约束的电路指的是受到过度约束的电路，导致电路缺乏必要解决方案，并导致验证者不接受证人，使电路变得无意义。本文介绍了一种新颖的方法，用于准确定位ZKP电路中的两种不同类型的错误。该方法涉及将算术电路约束转换为多项式方程系统，并通过代数计算在有限域上解决多项式方程系统。验证结果的分类得到了细化，极大地增强了系统的表达能力。我们提出了一个名为AC4的工具，用于表示该方法的实现。实验证明，与之前的工作相比，AC4代表了已检查比率的显著增加29%。在可解范围内，AC4的检查时间也表现出明显的改善，与之前的努力相比呈现量级增加。

更新时间: 2024-05-07 04:29:29

领域: cs.SE,cs.CL,cs.CR

下载: http://arxiv.org/abs/2403.15676v3

Assemblage: Automatic Binary Dataset Construction for Machine Learning

Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpuses of malicious binaries, obtaining high-quality corpuses of benign binaries for modern systems has proven challenging (e.g., due to licensing issues). Consequently, machine learning based pipelines for binary analysis utilize either costly commercial corpuses (e.g., VirusTotal) or open-source binaries (e.g., coreutils) available in limited quantities. To address these issues, we present Assemblage: an extensible cloud-based distributed system that crawls, configures, and builds Windows PE binaries to obtain high-quality binary corpuses suitable for training state-of-the-art models in binary analysis. We have run Assemblage on AWS over the past year, producing 890k Windows PE and 428k Linux ELF binaries across 29 configurations. Assemblage is designed to be both reproducible and extensible, enabling users to publish "recipes" for their datasets, and facilitating the extraction of a wide array of features. We evaluated Assemblage by using its data to train modern learning-based pipelines for compiler provenance and binary function similarity. Our results illustrate the practical need for robust corpuses of high-quality Windows PE binaries in training modern learning-based binary analyses. Assemblage can be downloaded from https://assemblage-dataset.net

Updated: 2024-05-07 04:10:01

标题: 组装：用于机器学习的自动二进制数据集构建

摘要: 二进制代码无处不在，二进制分析是逆向工程、恶意软件分类和漏洞发现的关键任务。不幸的是，虽然存在大量的恶意二进制文件，但获取现代系统的高质量良性二进制文件集一直是具有挑战性的（例如，由于许可问题）。因此，基于机器学习的二进制分析管道要么使用昂贵的商业二进制文件集（例如VirusTotal），要么使用数量有限的开源二进制文件（例如coreutils）。为了解决这些问题，我们提出了Assemblage：一个可扩展的基于云的分布式系统，用于爬取、配置和构建Windows PE二进制文件，以获得适用于训练二进制分析领域最先进模型的高质量二进制文件集。过去一年，我们在AWS上运行了Assemblage，生成了29个配置下的89万个Windows PE和42.8万个Linux ELF二进制文件。Assemblage旨在具有可重复性和可扩展性，使用户能够发布其数据集的“配方”，并便于提取各种特征。我们通过使用Assemblage的数据来训练现代基于学习的编译器来源和二进制函数相似性管道来评估Assemblage。我们的结果说明了在训练现代基于学习的二进制分析时，高质量的Windows PE二进制文件集的实际需求。Assemblage可以从https://assemblage-dataset.net 下载。

更新时间: 2024-05-07 04:10:01

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.03991v1

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.

Updated: 2024-05-07 04:08:49

标题: TrimCaching: 无线边缘网络中参数共享的AI模型缓存

摘要: 下一代移动网络预计将促进快速的AI模型下载到终端用户。通过在边缘服务器上缓存模型，移动网络可以以低延迟将模型传递给终端用户，从而形成一种被称为边缘模型缓存的范式。在本文中，我们提出了一种新颖的模型放置方案，称为参数共享模型缓存（TrimCaching）。TrimCaching利用了一个关键观察结果，即广泛范围的AI模型，如卷积神经网络或大型语言模型，可以共享包含可重复使用知识的参数块的显着比例，从而提高存储效率。为此，我们制定了一个参数共享模型放置问题，通过平衡存储效率和服务延迟之间的基本权衡，最大化多边缘无线网络中的缓存命中率。我们表明，该问题是一个带有子模块约束的子模块最大化问题，不存在多项式时间近似算法。为了克服这一挑战，我们研究了一个重要的特殊情况，即在实践中经常出现的一小部分固定数量的参数块在模型之间共享。在这种情况下，我们开发了一个具有（1-ε）/2近似保证的多项式时间算法。随后，我们通过开发一种贪婪算法来解决一般情况下的原始问题。模拟结果表明，与不利用AI模型中共享参数的最先进内容缓存相比，所提出的TrimCaching框架显着提高了缓存命中率。

更新时间: 2024-05-07 04:08:49

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2405.03990v1

L$^2$GC: Lorentzian Linear Graph Convolutional Networks For Node Classification

Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7$\%$ on Citeseer and 81.3$\%$ on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at https://github.com/llqy123/LLGC-master.

Updated: 2024-05-07 04:06:35

标题: L$^2$GC: 洛伦兹线性图卷积网络用于节点分类

摘要: 线性图卷积网络（GCNs）用于对图数据中的节点进行分类。然而，我们注意到大多数现有的线性GCN模型在欧几里得空间中执行神经网络操作，这并没有明确捕捉到作为图模型的真实世界数据集中展现出的类似树状的分层结构。在本文中，我们尝试将双曲空间引入线性GCN，并提出了一个新颖的洛伦兹线性GCN框架。具体地，我们将图节点的学习特征映射到双曲空间中，然后进行洛伦兹线性特征转换，以捕捉数据的潜在类似树状结构。在标准引文网络数据集上进行的半监督学习实验结果表明，我们的方法在Citeseer数据集上的准确率为74.7%，在PubMed数据集上为81.3%，达到了新的最先进结果。此外，我们观察到我们的方法在PubMed数据集上的训练速度比其他非线性GCN模型快两个数量级。我们的代码可在https://github.com/llqy123/LLGC-master 上公开获取。

更新时间: 2024-05-07 04:06:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.06064v2

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach.

Updated: 2024-05-07 04:00:30

标题: 大型语言模型的知识适应推荐系统在实际工业应用中的应用

摘要: 当代推荐系统主要依赖协同过滤技术，采用ID嵌入以捕捉用户和物品之间的潜在关联。然而，这种方法忽视了文本描述中嵌入的丰富语义信息，导致在冷启动场景和长尾用户推荐中表现不佳。利用预先训练的大型语言模型（LLMs）在大规模文本语料库上的能力为增强推荐系统提供了一个有前途的途径，通过整合开放世界领域知识。在本文中，我们提出了一个名为LEARN的以LLM驱动的知识自适应推荐框架，将开放世界知识与协同知识相结合。我们通过利用预训练的LLMs作为物品编码器并冻结LLM参数来解决计算复杂性问题，以避免灾难性遗忘并保留开放世界知识。为了弥合开放世界和协同领域之间的差距，我们设计了一个双塔结构，受推荐任务监督，并为实际工业应用量身定制。通过在大规模工业数据集上的离线实验和在线A/B测试，我们展示了我们方法的有效性。

更新时间: 2024-05-07 04:00:30

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.03988v1

Navigating Chemical Space with Latent Flows

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.

Updated: 2024-05-07 03:55:57

标题: 用潜流穿越化学空间

摘要: 近期在视觉和语言领域的深度生成模型的进展，激发了对更有结构的数据生成的重大兴趣，如分子。然而，除了生成新的随机分子外，对广阔化学空间的高效探索和全面理解对于分子科学以及药物设计和材料发现的应用至关重要。在本文中，我们提出了一个新的框架ChemFlow，通过在分子生成模型学习的潜在空间中通过流进行导航来遍历化学空间。我们引入了一个动力学系统的视角，将问题形式化为学习一个将分子分布的质量传输到具有所需分子属性或结构多样性的区域的向量场。在这个框架下，我们统一了先前关于分子潜在空间遍历和优化的方法，并提出了结合不同物理先验的替代竞争方法。我们验证了ChemFlow在分子操作和单目标和多目标分子优化任务中的有效性，在监督和无监督的分子发现设置下。代码和演示在GitHub上公开可用：https://github.com/garywei944/ChemFlow。

更新时间: 2024-05-07 03:55:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.03987v1

Factors Influencing User Willingness To Use SORA

Sora promises to redefine the way visual content is created. Despite its numerous forecasted benefits, the drivers of user willingness to use the text-to-video (T2V) model are unknown. This study extends the extended unified theory of acceptance and use of technology (UTAUT2) with perceived realism and novelty value. Using a purposive sampling method, we collected data from 940 respondents in the US and analyzed the sample using covariance-based structural equation modeling and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships are supported, with perceived realism emerging as the most influential driver, followed by novelty value. Moreover, fsQCA identifies five configurations leading to high and low willingness to use, and the model demonstrates high predictive validity, contributing to theory advancement. Our study provides valuable insights for developers and marketers, offering guidance for strategic decisions to promote the widespread adoption of T2V models.

Updated: 2024-05-07 03:55:32

标题: 影响用户使用SORA意愿的因素

摘要: Sora承诺重新定义视觉内容的创作方式。尽管预测到有许多好处，但用户愿意使用文本到视频(T2V)模型的驱动因素尚不清楚。本研究通过感知现实感和新奇价值扩展了技术接受和使用的延伸统一理论(UTAUT2)。我们使用目的抽样方法，在美国收集了940名受访者的数据，并使用基于协方差的结构方程建模和模糊集合定性比较分析(fsQCA)对样本进行分析。研究结果显示，所有假设的关系得到支持，感知现实感被认为是最具影响力的驱动因素，其次是新奇价值。此外，fsQCA确定了导致高低使用意愿的五种配置，并且模型表现出高的预测有效性，有助于理论的进展。我们的研究为开发者和营销人员提供了宝贵的见解，为促进T2V模型的广泛采用提供了战略决策的指导。

更新时间: 2024-05-07 03:55:32

领域: cs.AI,cs.HC,62P225

下载: http://arxiv.org/abs/2405.03986v1

Predicting Lung Disease Severity via Image-Based AQI Analysis using Deep Learning Techniques

Air pollution is a significant health concern worldwide, contributing to various respiratory diseases. Advances in air quality mapping, driven by the emergence of smart cities and the proliferation of Internet-of-Things sensor devices, have led to an increase in available data, fueling momentum in air pollution forecasting. The objective of this study is to devise an integrated approach for predicting air quality using image data and subsequently assessing lung disease severity based on Air Quality Index (AQI).The aim is to implement an integrated approach by refining existing techniques to improve accuracy in predicting AQI and lung disease severity. The study aims to forecast additional atmospheric pollutants like AQI, PM10, O3, CO, SO2, NO2 in addition to PM2.5 levels. Additionally, the study aims to compare the proposed approach with existing methods to show its effectiveness. The approach used in this paper uses VGG16 model for feature extraction in images and neural network for predicting AQI.In predicting lung disease severity, Support Vector Classifier (SVC) and K-Nearest Neighbors (KNN) algorithms are utilized. The neural network model for predicting AQI achieved training accuracy of 88.54 % and testing accuracy of 87.44%,which was measured using loss function, while the KNN model used for predicting lung disease severity achieved training accuracy of 98.4% and testing accuracy of 97.5% In conclusion, the integrated approach presented in this study forecasts air quality and evaluates lung disease severity, achieving high testing accuracies of 87.44% for AQI and 97.5% for lung disease severity using neural network, KNN, and SVC models. The future scope involves implementing transfer learning and advanced deep learning modules to enhance prediction capabilities. While the current study focuses on India, the objective is to expand its scope to encompass global coverage.

Updated: 2024-05-07 03:42:49

标题: 利用深度学习技术基于图像的AQI分析预测肺部疾病严重程度

摘要: 空气污染是全球重要的健康问题，导致各种呼吸道疾病。随着智慧城市的出现和物联网传感器设备的普及，空气质量图表的进展已经推动了可用数据的增加，推动了空气污染预测的发展。本研究的目标是通过使用图像数据设计一个综合方法来预测空气质量，并随后根据空气质量指数（AQI）评估肺部疾病的严重程度。旨在通过改进现有技术来实现综合方法，以提高预测AQI和肺部疾病严重程度的准确性。该研究旨在预测额外的大气污染物，如AQI、PM10、O3、CO、SO2、NO2以及PM2.5水平。此外，该研究旨在将所提出的方法与现有方法进行比较，以展示其有效性。本文采用VGG16模型进行图像特征提取，神经网络用于预测AQI。在预测肺部疾病严重程度方面，使用支持向量分类器（SVC）和K最近邻（KNN）算法。用于预测AQI的神经网络模型达到了88.54%的训练准确度和87.44%的测试准确度，使用损失函数来衡量，而用于预测肺部疾病严重程度的KNN模型达到了98.4%的训练准确度和97.5%的测试准确度。总之，本研究提出的综合方法预测了空气质量，并评估了肺部疾病的严重程度，使用神经网络、KNN和SVC模型获得了87.44%的AQI测试准确度和97.5%的肺部疾病严重程度测试准确度。未来的研究方向包括实施迁移学习和高级深度学习模块，以增强预测能力。虽然当前研究集中在印度，但目标是拓展其范围以包括全球范围。

更新时间: 2024-05-07 03:42:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.03981v1

Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration

Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.

Updated: 2024-05-07 03:31:22

标题: 朝向通过合成种族变化实现包容性人脸识别

摘要: 许多研究表明，现有的人脸识别系统（FRS），包括商业系统，通常由于数据过少而对某些种族表现出偏见。在这项工作中，我们探讨了使用合成人脸图像生成方法进行种族变化和肤色修改，以增加数据集的多样性。我们通过首先构建代表三种种族（亚洲、黑人和印度人）的平衡人脸图像数据集进行了详细分析。然后，我们利用现有的基于生成对抗网络（GAN）的图像到图像转换和流形学习模型来将一种种族转变为另一种。进一步进行了系统分析，通过研究使用个体类型角度（ITA）对真实肤色表示的适用性来评估这种数据集对FRS的适用性。此外，我们还使用现有的人脸图像质量评估（FIQA）方法来分析质量特性。然后，我们使用四种不同的系统对全面的FRS性能进行了分析。我们的研究结果为未来在以下方面开展研究工作铺平了道路：（i）开发特定种族和一般（任何到任何）种族变化模型，（ii）将这种方法扩展到创建具有多样肤色的数据库，（iii）创建代表各种种族的数据集，进一步有助于在解决隐私问题的同时减轻偏见。

更新时间: 2024-05-07 03:31:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.01273v2

Can citations tell us about a paper's reproducibility? A case study of machine learning papers

The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at https://github.com/lamps-lab/ccair-ai-reproducibility .

Updated: 2024-05-07 03:29:11

标题: 引用能告诉我们有关一篇论文的可重现性吗？机器学习论文的案例研究

摘要: 机器学习（ML）和人工智能（AI）领域工作的迭代特性以及对基准数据集的比较依赖强调了该领域中可重复性的重要性。然而，资源限制和不足的文档记录可能使运行复制实验特别具有挑战性。我们的工作探讨了使用下游引文环境作为可重复性信号的潜力。我们引入了一个情感分析框架，应用于参与机器学习可重复性挑战的论文的引文环境，以解释复制尝试的积极或消极结果。我们的贡献包括为可重复性相关环境和情感分析训练分类器，并探索引文环境情感和可重复性得分之间的相关性。研究数据、软件和附件公开可在https://github.com/lamps-lab/ccair-ai-reproducibility获取。

更新时间: 2024-05-07 03:29:11

领域: cs.DL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03977v1

Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.

Updated: 2024-05-07 03:21:18

标题: 通过亲和聚类自动测量青少年特发性脊柱侧凸的超声曲线角度

摘要: 目前评估青少年特发性脊柱侧弯（AIS）的临床金标准是使用Cobb角度测量的X射线摄影。然而，使用X射线频繁监测AIS进展面临累积辐射暴露的挑战。尽管3D超声已被验证为脊柱侧弯评估的可靠且无辐射的替代方法，但测量脊柱曲率的过程仍然是手动进行的。因此，对于能够定位骨骼标志点并进行角度测量的完全自动系统需求很大。为此，我们引入了一个自动超声曲线角（UCA）测量的估计模型。该模型采用双分支网络在超声冠状图像上检测候选标志点并执行椎骨分割。在椎骨分割区域内利用亲和性聚类策略来说明候选标志点之间的亲和关系。随后，我们可以从聚类的亲和性图中高效地进行线条描绘以进行UCA测量。由于我们的方法专门设计用于UCA计算，这种方法在标志点和线条检测任务上优于其他最先进的方法。自动UCA和Cobb角之间的高相关性（R$^2$=0.858）表明我们提出的方法有可能取代超声脊柱侧弯评估中的手动UCA测量。

更新时间: 2024-05-07 03:21:18

领域: eess.IV,cs.AI,cs.CV,physics.med-ph

下载: http://arxiv.org/abs/2405.03141v2

TBNet: A Neural Architectural Defense Framework Facilitating DNN Model Protection in Trusted Execution Environments

Trusted Execution Environments (TEEs) have become a promising solution to secure DNN models on edge devices. However, the existing solutions either provide inadequate protection or introduce large performance overhead. Taking both security and performance into consideration, this paper presents TBNet, a TEE-based defense framework that protects DNN model from a neural architectural perspective. Specifically, TBNet generates a novel Two-Branch substitution model, to respectively exploit (1) the computational resources in the untrusted Rich Execution Environment (REE) for latency reduction and (2) the physically-isolated TEE for model protection. Experimental results on a Raspberry Pi across diverse DNN model architectures and datasets demonstrate that TBNet achieves efficient model protection at a low cost.

Updated: 2024-05-07 03:08:30

标题: TBNet：一种神经架构防御框架，促进在可信执行环境中保护DNN模型

摘要: 受信任执行环境（TEEs）已成为在边缘设备上保护DNN模型的有前途的解决方案。然而，现有的解决方案要么提供不足的保护，要么引入大量性能开销。考虑到安全性和性能，本文提出了TBNet，一个基于TEE的防御框架，从神经架构的角度保护DNN模型。具体来说，TBNet生成了一个新颖的双分支替代模型，分别利用（1）不受信任的富执行环境（REE）中的计算资源以减少延迟和（2）物理隔离的TEE以保护模型。在树莓派上进行的实验结果显示，TBNet在各种DNN模型架构和数据集上实现了高效的模型保护，成本低廉。

更新时间: 2024-05-07 03:08:30

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03974v1

SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems

Reinforcement Learning (RL) trains agents to learn optimal behavior by maximizing reward signals from experience datasets. However, RL training often faces memory limitations, leading to execution latencies and prolonged training times. To overcome this, SwiftRL explores Processing-In-Memory (PIM) architectures to accelerate RL workloads. We achieve near-linear performance scaling by implementing RL algorithms like Tabular Q-learning and SARSA on UPMEM PIM systems and optimizing for hardware. Our experiments on OpenAI GYM environments using UPMEM hardware demonstrate superior performance compared to CPU and GPU implementations.

Updated: 2024-05-07 02:54:31

标题: SwiftRL：面向实际处理内存系统的高效强化学习

摘要: 强化学习（RL）通过最大化从经验数据集中获得的奖励信号，训练代理程序学习最优行为。然而，RL训练经常面临内存限制，导致执行延迟和训练时间延长。为了克服这一问题，SwiftRL 探索了处理内存（PIM）架构来加速RL工作负载。我们通过在UPMEM PIM系统上实现Tabular Q-learning和SARSA等RL算法，并针对硬件进行优化，实现了近线性性能扩展。我们在OpenAI GYM环境上使用UPMEM硬件进行的实验表明，与CPU和GPU实现相比，性能优越。

更新时间: 2024-05-07 02:54:31

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2405.03967v1

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Various industries such as finance, meteorology, and energy produce vast amounts of heterogeneous data every day. There is a natural demand for humans to manage, process, and display data efficiently. However, it necessitates labor-intensive efforts and a high level of expertise for these data-related tasks. Considering large language models (LLMs) showcase promising capabilities in semantic understanding and reasoning, we advocate that the deployment of LLMs could autonomously manage and process massive amounts of data while interacting and displaying in a human-friendly manner. Based on this, we propose Data-Copilot, an LLM-based system that connects numerous data sources on one end and caters to diverse human demands on the other end. Acting as an experienced expert, Data-Copilot autonomously transforms raw data into multi-form output that best matches the user's intent. Specifically, it first designs multiple universal interfaces to satisfy diverse data-related requests, like querying, analysis, prediction, and visualization. In real-time response, it automatically deploys a concise workflow by invoking corresponding interfaces. The whole process is fully controlled by Data-Copilot, without human assistance. We release Data-Copilot-1.0 using massive Chinese financial data, e.g., stocks, funds, and news. Experiments indicate it achieves reliable performance with lower token consumption, showing promising application prospects.

Updated: 2024-05-07 02:53:28

标题: 数据副驾驶：利用自主工作流连接数十亿数据和人类

摘要: 各行各业，如金融、气象和能源，每天产生大量异构数据。人们有自然需求来高效地管理、处理和展示数据。然而，这需要劳动密集型工作和高水平的专业知识来完成这些与数据相关的任务。考虑到大型语言模型(LLMs)展示出在语义理解和推理方面的可靠能力，我们主张LLMs的部署能够自主管理和处理海量数据，并以人类友好的方式进行交互和展示。基于此，我们提出了Data-Copilot，一个基于LLM的系统，连接了众多数据源，并满足了各种人类需求。作为一名经验丰富的专家，Data-Copilot会自主将原始数据转化为最符合用户意图的多种形式输出。具体而言，它首先设计了多个通用接口，以满足各种数据相关请求，如查询、分析、预测和可视化。在实时响应中，它通过调用相应接口自动部署简洁的工作流程。整个过程完全由Data-Copilot控制，无需人类辅助。我们发布了使用大规模中国金融数据（如股票、基金和新闻）的Data-Copilot-1.0。实验证明它在低token消耗的情况下达到了可靠的性能，显示出有着有益的应用前景。

更新时间: 2024-05-07 02:53:28

领域: cs.CL,cs.AI,cs.CE

下载: http://arxiv.org/abs/2306.07209v4

ERATTA: Extreme RAG for Table To Answers with Large Language Models

Large language models (LLMs) with residual augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. However, the choice of use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user query routing, data retrieval and custom prompting for question answering capabilities from data tables that are highly varying and large in size. Our system is tuned to extract information from Enterprise-level data products and furnish real time responses under 10 seconds. One prompt manages user-to-data authentication followed by three prompts to route, fetch data and generate a customizable prompt natural language responses. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.

Updated: 2024-05-07 02:49:59

标题: 勘误：用于大语言模型表格答案的极端RAG

摘要: 具有残差增强生成（RAG）的大型语言模型（LLMs）最近已成为可扩展生成AI解决方案的最佳选择。然而，将RAG与LLMs结合的用例选择要么是通用的，要么是极端领域特定的，因此对RAG-LLM方法的可伸缩性和泛化能力提出了质疑。在本研究中，我们提出了一个独特的基于LLM的系统，可以调用多个LLM来实现数据认证、用户查询路由、数据检索以及来自大小高度变化的数据表的问题回答能力的定制提示。我们的系统经过调整，可以从企业级数据产品中提取信息，并在10秒内提供实时响应。一个提示用于管理用户到数据的认证，然后是三个提示用于路由、获取数据和生成可定制提示的自然语言响应。此外，我们提出了一个五度量评分模块，用于检测和报告LLM响应中的幻觉。我们提出的系统和评分指标在可持续性、财务健康和社交媒体领域的数百个用户查询中达到了>90%的置信度分数。对所提出的极端RAG架构的扩展可以利用LLMs实现异构数据源查询。

更新时间: 2024-05-07 02:49:59

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03963v1

AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion

Determining the optimal configuration of adsorbates on a slab (adslab) is pivotal in the exploration of novel catalysts across diverse applications. Traditionally, the quest for the lowest energy adslab configuration involves placing the adsorbate onto the slab followed by an optimization process. Prior methodologies have relied on heuristics, problem-specific intuitions, or brute-force approaches to guide adsorbate placement. In this work, we propose a novel framework for adsorbate placement using denoising diffusion. The model is designed to predict the optimal adsorbate site and orientation corresponding to the lowest energy configuration. Further, we have an end-to-end evaluation framework where diffusion-predicted adslab configuration is optimized with a pretrained machine learning force field and finally evaluated with Density Functional Theory (DFT). Our findings demonstrate an acceleration of up to 5x or 3.5x improvement in accuracy compared to the previous best approach. Given the novelty of this framework and application, we provide insights into the impact of pre-training, model architectures, and conduct extensive experiments to underscore the significance of this approach.

Updated: 2024-05-07 02:49:21

标题: AdsorbDiff: 通过条件去噪扩散实现吸附物位置的放置

摘要: 在探索各种应用中的新型催化剂时，确定在板上（adslab）的吸附剂的最佳配置是至关重要的。传统上，寻找最低能量的adslab配置涉及将吸附剂放置在板上，然后进行优化过程。以往的方法依赖于启发式、问题特定的直觉或蛮力方法来指导吸附剂的放置。在这项工作中，我们提出了一个使用去噪扩散的吸附剂放置的新框架。该模型旨在预测与最低能量配置对应的最佳吸附剂位置和方向。此外，我们还有一个端到端的评估框架，其中通过预训练的机器学习力场优化扩散预测的adslab配置，最终用密度泛函理论（DFT）进行评估。我们的研究结果显示，与先前最佳方法相比，加速度高达5倍或精度提高了3.5倍。鉴于该框架和应用的新颖性，我们提供了关于预训练、模型架构的影响的见解，并进行了大量实验以强调这种方法的重要性。

更新时间: 2024-05-07 02:49:21

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2405.03962v1

Structure-based drug design by denoising voxel grids

We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets.

Updated: 2024-05-07 02:48:15

标题: 通过去噪体素网格进行基于结构的药物设计

摘要: 我们提出了VoxBind，一种基于分数的生成模型，用于基于蛋白质结构的3D分子。我们的方法将分子表示为3D原子密度网格，并利用3D体素去噪网络进行学习和生成。我们将神经经验贝叶斯形式主义（Saremi & Hyvarinen, 2019）扩展到条件设置，并使用两步过程生成结构条件的分子：（i）使用学习的评分函数从具有过弱朗格文MCMC的高斯平滑的条件分布中采样有噪声的分子，（ii）从有噪声的样本中估计干净的分子，采用单步去噪。与当前的技术水平相比，我们的模型更容易训练，从中采样速度显著更快，并在广泛的体外基准测试中取得更好的结果--生成的分子更多样化，展示更少的立体位阻，并且与蛋白质口袋结合的亲和力更高。

更新时间: 2024-05-07 02:48:15

领域: cs.LG

下载: http://arxiv.org/abs/2405.03961v1

Synapse: Learning Preferential Concepts from Visual Demonstrations

This paper addresses the problem of preference learning, which aims to learn user-specific preferences (e.g., "good parking spot", "convenient drop-off location") from visual input. Despite its similarity to learning factual concepts (e.g., "red cube"), preference learning is a fundamentally harder problem due to its subjective nature and the paucity of person-specific training data. We address this problem using a new framework called Synapse, which is a neuro-symbolic approach designed to efficiently learn preferential concepts from limited demonstrations. Synapse represents preferences as neuro-symbolic programs in a domain-specific language (DSL) that operates over images, and leverages a novel combination of visual parsing, large language models, and program synthesis to learn programs representing individual preferences. We evaluate Synapse through extensive experimentation including a user case study focusing on mobility-related concepts in mobile robotics and autonomous driving. Our evaluation demonstrates that Synapse significantly outperforms existing baselines as well as its own ablations. The code and other details can be found on the project website https://amrl.cs.utexas.edu/synapse .

Updated: 2024-05-07 02:47:55

标题: 突触：从视觉演示中学习偏好概念

摘要: 本文讨论了偏好学习的问题，旨在从视觉输入中学习用户特定的偏好（例如，“好停车位”，“方便的下车位置”）。尽管偏好学习与学习事实概念（例如，“红色立方体”）相似，但由于其主观性质和个人特定训练数据的稀缺性，偏好学习是一个基本更困难的问题。我们使用一个名为Synapse的新框架来解决这个问题，这是一种神经符号方法，旨在高效地从有限的示范中学习偏好概念。Synapse将偏好表示为领域特定语言（DSL）中的神经符号程序，该程序在图像上运行，并利用视觉解析、大型语言模型和程序综合的新组合来学习代表个人偏好的程序。我们通过广泛的实验评估了Synapse，包括一个用户案例研究，重点关注移动机器人和自动驾驶中与移动性相关的概念。我们的评估表明，Synapse在现有基线以及其自身的减弱版本方面表现出色。代码和其他详细信息可以在项目网站https://amrl.cs.utexas.edu/synapse上找到。

更新时间: 2024-05-07 02:47:55

领域: cs.RO,cs.CV,cs.LG,cs.PL

下载: http://arxiv.org/abs/2403.16689v2

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

Updated: 2024-05-07 02:45:28

标题: 简单地在注意力层上添加LoRA调节将改善您的扩散模型

摘要: 当前最先进的扩散模型采用包含卷积和（qkv）自注意力层的U-Net架构。U-Net在处理图像时，同时根据每个采样步骤的时间嵌入输入和与所需条件生成对应的类别或标题嵌入输入进行条件。这种条件涉及将尺度和移位操作应用于卷积层，但并不直接影响注意力层。尽管这些标准的架构选择当然是有效的，但不对注意力层进行条件处理感觉是任意的，可能是次优的。在这项工作中，我们展示了简单地向注意力层添加LoRA条件，而不改变或调整U-Net架构的其他部分，可以提高图像生成质量。例如，将LoRA条件添加到EDM扩散模型中，对无条件和类条件CIFAR-10生成的FID分数分别为1.91/1.75，优于基线的1.97/1.79。

更新时间: 2024-05-07 02:45:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03958v1

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance. Thus, this paper introduces \emph{any-precision LLM}, extending the concept of any-precision DNN to LLMs. Addressing challenges in any-precision LLM, we propose a lightweight method for any-precision quantization of LLMs, leveraging a post-training quantization framework, and develop a specialized software engine for its efficient serving. As a result, our solution significantly reduces the high costs of deploying multiple, different-sized LLMs by overlaying LLMs quantized to varying bit-widths, such as 3, 4, ..., $n$ bits, into a memory footprint comparable to a single $n$-bit LLM. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput, proving itself to be a compelling option for deployment of multiple, different-sized LLMs. Our code is open-sourced and available online.

Updated: 2024-05-07 02:44:25

标题: 任意精度LLM：多个不同大小的LLM的低成本部署

摘要: 最近，大量的工作已经致力于压缩大型语言模型（LLMs），这些模型在各种应用中展示了开创性的能力，但由于其巨大的体积而带来了显著的部署成本。与此同时，尽管部署多个不同大小的LLMs的成本具有实际意义，但却受到了较少关注。因此，本文介绍了“任意精度LLM”，将任意精度深度神经网络的概念扩展到LLMs。针对任意精度LLM中的挑战，我们提出了一种轻量级方法，用于LLMs的任意精度量化，利用后训练量化框架，并开发了一个专门的软件引擎，以实现高效的服务。因此，我们的解决方案通过将LLMs叠加到不同比特宽度（例如3、4、...、$n$比特）的量化LLMs中，将部署多个不同大小的LLMs的高成本显著降低到一个与单个$n$比特LLM相当的内存占用。支持的所有具有不同比特宽度的LLMs展示了最先进的模型质量和推理吞吐量，证明了其作为部署多个不同大小LLMs的一个引人注目的选择。我们的代码是开源的，并且在线可用。

更新时间: 2024-05-07 02:44:25

领域: cs.LG

下载: http://arxiv.org/abs/2402.10517v3

IPFed: Identity protected federated learning for user authentication

With the development of laws and regulations related to privacy preservation, it has become difficult to collect personal data to perform machine learning. In this context, federated learning, which is distributed learning without sharing personal data, has been proposed. In this paper, we focus on federated learning for user authentication. We show that it is difficult to achieve both privacy preservation and high accuracy with existing methods. To address these challenges, we propose IPFed which is privacy-preserving federated learning using random projection for class embedding. Furthermore, we prove that IPFed is capable of learning equivalent to the state-of-the-art method. Experiments on face image datasets show that IPFed can protect the privacy of personal data while maintaining the accuracy of the state-of-the-art method.

Updated: 2024-05-07 02:29:41

标题: IPFed：用于用户认证的身份保护联合学习

摘要: 随着与隐私保护相关的法律法规的发展，收集个人数据以进行机器学习变得困难。在这种背景下，提出了分布式学习而无需共享个人数据的联邦学习。本文重点研究了联邦学习在用户认证中的应用。我们表明，利用现有方法很难同时实现隐私保护和高准确性。为了解决这些挑战，我们提出了IPFed，这是一种使用随机投影进行类嵌入的隐私保护联邦学习方法。此外，我们证明了IPFed能够学习出与现有最先进方法等同的结果。在人脸图像数据集上的实验证明，IPFed能够保护个人数据的隐私同时保持现有最先进方法的准确性。

更新时间: 2024-05-07 02:29:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.03955v1

Revisiting Few-Shot Learning from a Causal Perspective

Few-shot learning with $N$-way $K$-shot scheme is an open challenge in machine learning. Many metric-based approaches have been proposed to tackle this problem, e.g., the Matching Networks and CLIP-Adapter. Despite that these approaches have shown significant progress, the mechanism of why these methods succeed has not been well explored. In this paper, we try to interpret these metric-based few-shot learning methods via causal mechanism. We show that the existing approaches can be viewed as specific forms of front-door adjustment, which can alleviate the effect of spurious correlations and thus learn the causality. This causal interpretation could provide us a new perspective to better understand these existing metric-based methods. Further, based on this causal interpretation, we simply introduce two causal methods for metric-based few-shot learning, which considers not only the relationship between examples but also the diversity of representations. Experimental results demonstrate the superiority of our proposed methods in few-shot classification on various benchmark datasets. Code is available in https://github.com/lingl1024/causalFewShot.

Updated: 2024-05-07 02:27:42

标题: 重新审视少样本学习的因果视角

摘要: $N$-路$K$-拍摄方案的少样本学习是机器学习中的一个开放挑战。许多基于度量的方法已被提出来应对这一问题，例如匹配网络和CLIP-Adapter。尽管这些方法已经显示出了显著的进展，但为什么这些方法成功的机制尚未被很好地探索。在本文中，我们试图通过因果机制解释这些基于度量的少样本学习方法。我们展示了现有方法可以被视为前门调整的特定形式，可以减轻虚假相关性的影响，从而学习因果关系。这种因果解释可以为我们提供一个新的视角来更好地理解这些现有的基于度量的方法。此外，在基于这种因果解释的基础上，我们简单地介绍了两种基于因果关系的度量方法，这些方法不仅考虑了示例之间的关系，还考虑了表示的多样性。实验结果表明，我们提出的方法在各种基准数据集上的少样本分类中具有优越性。代码可在https://github.com/lingl1024/causalFewShot找到。

更新时间: 2024-05-07 02:27:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.13816v3

Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

Heterogeneous Graph Neural Networks (HGNNs) have gained significant popularity in various heterogeneous graph learning tasks. However, most existing HGNNs rely on spatial domain-based methods to aggregate information, i.e., manually selected meta-paths or some heuristic modules, lacking theoretical guarantees. Furthermore, these methods cannot learn arbitrary valid heterogeneous graph filters within the spectral domain, which have limited expressiveness. To tackle these issues, we present a positive spectral heterogeneous graph convolution via positive noncommutative polynomials. Then, using this convolution, we propose PSHGCN, a novel Positive Spectral Heterogeneous Graph Convolutional Network. PSHGCN offers a simple yet effective method for learning valid heterogeneous graph filters. Moreover, we demonstrate the rationale of PSHGCN in the graph optimization framework. We conducted an extensive experimental study to show that PSHGCN can learn diverse heterogeneous graph filters and outperform all baselines on open benchmarks. Notably, PSHGCN exhibits remarkable scalability, efficiently handling large real-world graphs comprising millions of nodes and edges. Our codes are available at https://github.com/ivam-he/PSHGCN.

Updated: 2024-05-07 02:20:18

标题: 通过正非交换多项式的光谱异质图卷积

摘要: 异质图神经网络（HGNNs）在各种异质图学习任务中获得了显著的流行度。然而，大多数现有的HGNNs依赖于空间域基础方法来聚合信息，即手动选择的元路径或一些启发式模块，缺乏理论保证。此外，这些方法无法在频谱域内学习任意有效的异质图滤波器，这限制了其表达能力。为了解决这些问题，我们通过正非交换多项式提出了一种正谱异质图卷积。然后，利用这种卷积，我们提出了PSHGCN，一种新颖的正谱异质图卷积网络。PSHGCN提供了一种简单而有效的学习有效异质图滤波器的方法。此外，我们展示了PSHGCN在图优化框架中的合理性。我们进行了广泛的实验研究，表明PSHGCN能够学习多样化的异质图滤波器，并在开放基准测试中胜过所有基线。值得注意的是，PSHGCN具有显著的可扩展性，能够有效处理包含数百万节点和边的大型现实世界图。我们的代码可在https://github.com/ivam-he/PSHGCN 上找到。

更新时间: 2024-05-07 02:20:18

领域: cs.LG

下载: http://arxiv.org/abs/2305.19872v3

Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a plug-and-play module that enhances GNNs by exploiting inter-graph relationships. This module incorporates a relation-aware encoder and a feedback training strategy. The former enables GNNs to capture relationships across graphs, enriching relation-aware graph representation through collective context. The latter utilizes a feedback loop mechanism for the recursively refinement of these representations, leveraging insights from refining inter-graph dynamics to conduct feedback loop. The synergy between these two innovations results in a robust and versatile module. Relating-Up enhances the expressiveness of GNNs, enabling them to encapsulate a wider spectrum of graph relationships with greater precision. Our evaluations across 16 benchmark datasets demonstrate that integrating Relating-Up into GNN architectures substantially improves performance, positioning Relating-Up as a formidable choice for a broad spectrum of graph representation learning tasks.

Updated: 2024-05-07 02:16:54

标题: 关联上升：通过图间关系推进图神经网络

摘要: 图神经网络（GNNs）在学习图结构数据方面表现出色，特别是在理解单个图内部关系方面。尽管取得了成功，但GNNs由于忽视跨图的关系背景，即图间关系，而受到限制。为了扩展这种能力，我们引入了Relating-Up，这是一个可以增强GNNs的即插即用模块，通过利用图间关系来实现。该模块包含一个关系感知编码器和一个反馈训练策略。前者使GNNs能够捕捉跨图关系，通过集体上下文丰富关系感知的图表示。后者利用反馈循环机制来递归地优化这些表示，利用从细化图间动态中获得的见解来进行反馈循环。这两种创新之间的协同作用产生了一个强大而多功能的模块。Relating-Up增强了GNNs的表达能力，使它们能够更精确地封装更广泛的图关系。我们在16个基准数据集上的评估表明，将Relating-Up集成到GNN架构中显著提高了性能，使Relating-Up成为广泛图表示学习任务的一个强大选择。

更新时间: 2024-05-07 02:16:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03950v1

FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

Recent efforts have been made to integrate self-supervised learning (SSL) with the framework of federated learning (FL). One unique challenge of federated self-supervised learning (FedSSL) is that the global objective of FedSSL usually does not equal the weighted sum of local SSL objectives. Consequently, conventional approaches, such as federated averaging (FedAvg), fail to precisely minimize the FedSSL global objective, often resulting in suboptimal performance, especially when data is non-i.i.d.. To fill this gap, we propose a provable FedSSL algorithm, named FedSC, based on the spectral contrastive objective. In FedSC, clients share correlation matrices of data representations in addition to model weights periodically, which enables inter-client contrast of data samples in addition to intra-client contrast and contraction, resulting in improved quality of data representations. Differential privacy (DP) protection is deployed to control the additional privacy leakage on local datasets when correlation matrices are shared. We also provide theoretical analysis on the convergence and extra privacy leakage. The experimental results validate the effectiveness of our proposed algorithm.

Updated: 2024-05-07 02:12:38

标题: FedSC: 通过非独立同分布数据上的谱对比目标实现可证明的联邦自监督学习

摘要: 最近的努力致力于将自监督学习（SSL）与联邦学习（FL）框架相结合。联邦自监督学习（FedSSL）的一个独特挑战是，FedSSL的全局目标通常不等于本地SSL目标的加权和。因此，传统方法，例如联邦平均（FedAvg），未能精确最小化FedSSL的全局目标，通常导致性能次优，特别是在数据不是独立同分布的情况下。为了填补这一空白，我们提出了一种基于谱对比目标的可证明FedSSL算法，命名为FedSC。在FedSC中，客户端除了定期共享模型权重外，还共享数据表示的相关矩阵，这使得客户端之间的数据样本对比以及客户端内部对比和收缩成为可能，从而提高数据表示的质量。当共享相关矩阵时，差分隐私（DP）保护被部署以控制本地数据集上的额外隐私泄漏。我们还对收敛性和额外隐私泄漏进行了理论分析。实验结果验证了我们提出的算法的有效性。

更新时间: 2024-05-07 02:12:38

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.03949v1

Predictive Modeling with Temporal Graphical Representation on Electronic Health Records

Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient's EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient's health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.

Updated: 2024-05-07 02:05:30

标题: 基于电子健康记录的时间图形表示的预测建模

摘要: 基于深度学习的预测模型，利用电子健康记录（EHR），在医疗保健领域越来越受到关注。患者的EHR的有效表示应该层次地包含历史访问和医疗事件之间的时间关系，以及这些元素内在的结构信息。现有的患者表示方法可以大致分为顺序表示和图形表示。顺序表示方法仅关注纵向访问之间的时间关系。另一方面，图形表示方法虽然擅长提取各种医疗事件之间的图形结构关系，但在有效整合时间信息方面表现不佳。为了捕捉这两种信息，我们将患者的EHR建模为一种新颖的时间异质图。该图包括历史访问节点和医疗事件节点。它从医疗事件节点传播结构化信息到访问节点，并利用时间感知访问节点来捕捉患者健康状况的变化。此外，我们引入了一种新颖的时间图变换器（TRANS），将时间边缘特征、全局位置编码和局部结构编码整合到异质图卷积中，捕捉时间和结构信息。通过对三个真实世界数据集的大量实验，我们验证了TRANS的有效性。结果表明我们提出的方法达到了最先进的性能水平。

更新时间: 2024-05-07 02:05:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03943v1

Collaborative Intelligence in Sequential Experiments: A Human-in-the-Loop Framework for Drug Discovery

Drug discovery is a complex process that involves sequentially screening and examining a vast array of molecules to identify those with the target properties. This process, also referred to as sequential experimentation, faces challenges due to the vast search space, the rarity of target molecules, and constraints imposed by limited data and experimental budgets. To address these challenges, we introduce a human-in-the-loop framework for sequential experiments in drug discovery. This collaborative approach combines human expert knowledge with deep learning algorithms, enhancing the discovery of target molecules within a specified experimental budget. The proposed algorithm processes experimental data to recommend both promising molecules and those that could improve its performance to human experts. Human experts retain the final decision-making authority based on these recommendations and their domain expertise, including the ability to override algorithmic recommendations. We applied our method to drug discovery tasks using real-world data and found that it consistently outperforms all baseline methods, including those which rely solely on human or algorithmic input. This demonstrates the complementarity between human experts and the algorithm. Our results provide key insights into the levels of humans' domain knowledge, the importance of meta-knowledge, and effective work delegation strategies. Our findings suggest that such a framework can significantly accelerate the development of new vaccines and drugs by leveraging the best of both human and artificial intelligence.

Updated: 2024-05-07 02:03:07

标题: 《序列实验中的协作智能：药物发现的人机协同框架》

摘要: 药物发现是一个复杂的过程，涉及顺序筛选和检查大量分子，以识别具有目标特性的分子。这个过程，也被称为顺序实验，面临着由于广泛的搜索空间、目标分子的稀缺性以及受限于有限数据和实验预算而带来的挑战。为了解决这些挑战，我们引入了一个人机协作的框架，用于药物发现中的顺序实验。这种协作方法结合了人类专家知识和深度学习算法，增强了在指定实验预算内发现目标分子的能力。建议的算法处理实验数据，向人类专家推荐有潜力的分子以及那些可以提高其性能的分子。人类专家基于这些推荐和他们的领域专业知识保留最终的决策权，包括有能力覆盖算法推荐。我们应用了我们的方法在使用真实数据的药物发现任务中，发现它始终表现优于所有基准方法，包括那些仅依赖于人类或算法输入的方法。这证明了人类专家和算法之间的互补性。我们的结果提供了关于人类领域知识水平、元知识的重要性和有效工作委托策略的关键见解。我们的发现表明，这样的框架可以通过充分利用人类和人工智能的优势，显著加速新疫苗和药物的开发。

更新时间: 2024-05-07 02:03:07

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.03942v1

Deep Reinforcement Learning for Modelling Protein Complexes

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of $N^{N-2}$ ($N$ is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PCM prediction. Specifically, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we design an adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of complexes and the global assembly rules learned from complexes with varied chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading PCM softwares.

Updated: 2024-05-07 02:00:58

标题: 深度强化学习用于建模蛋白质复合物

摘要: AlphaFold可以用于单链和多链蛋白质结构预测，而随着链数增加，后者变得极具挑战性。在这项工作中，通过将每条链视为一个节点，组装动作视为边，我们展示了一个无环无向连通图可以用于预测多链蛋白质复合物的结构（即蛋白质复合物建模，PCM）。然而，仍然存在两个挑战：1）PCM问题的巨大组合优化空间$N^{N-2}$（其中$N$为链数）可能会导致高计算成本。2）蛋白质复合物的规模由于链数的差异而表现出分布偏移，这需要在建模不同规模的复合物时进行泛化。为了解决这些挑战，我们提出了GAPN，一个由领域特定奖励和对抗损失驱动的生成对抗策略网络，用于自动PCM预测。具体来说，GAPN学习通过策略梯度有效地搜索巨大的组装空间，并通过策略梯度优化直接对接奖励。重要的是，我们设计了一个对抗奖励函数来增强模型的感受域。通过这种方式，GAPN将同时关注一批特定的复合物和从具有不同链数的复合物中学到的全局组装规则。从经验上看，与领先的PCM软件相比，我们在精度（以RMSD和TM-Score衡量）和效率方面都取得了显著的改进。

更新时间: 2024-05-07 02:00:58

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.02299v2

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model

Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely pre-training foundation models from scratch for time series and adapting large language foundation models for time series. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely Effectiveness, Efficiency and Explainability. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series. Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (https://github.com/start2020/Awesome-TimeSeries-LLM-FM).

Updated: 2024-05-07 01:59:37

标题: 一项时间序列基础模型调查：用大型语言模型泛化时间序列表示

摘要: 时间序列数据在各个领域中无处不在，使得时间序列分析至关重要。传统的时间序列模型是任务特定的，具有单一功能和有限的泛化能力。最近，大型语言基础模型展示了它们在跨任务可迁移性、零样本/少样本学习和决策解释性方面的显著能力。这一成功引发了对利用基础模型同时解决多个时间序列挑战的探索兴趣。有两条主要的研究线路，即从头开始为时间序列进行基础模型的预训练和调整大型语言基础模型用于时间序列。它们都为高度泛化、多功能和可理解的时间序列分析统一模型的发展做出了贡献。本调查提供了一个3E分析框架，用于全面考察相关研究。具体而言，我们从三个维度，即效果、效率和可解释性，审查现有研究成果。在每个维度上，我们重点讨论相关作品如何通过考虑时间序列领域的独特挑战来设计定制解决方案。此外，我们提供了一个领域分类法，以帮助追随者跟上领域特定的进展。此外，我们介绍了大量资源，以促进该领域的发展，包括数据集、开源、时间序列库。我们还维护了一个GitHub存储库，用于资源更新（https://github.com/start2020/Awesome-TimeSeries-LLM-FM）。

更新时间: 2024-05-07 01:59:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02358v2

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.

Updated: 2024-05-07 01:57:00

标题: DPOT: 自回归去噪操作符变换器用于大规模PDE预训练

摘要: 预训练已被研究用于改善在数据稀缺环境中训练神经算子的效率和性能。然而，由于长轨迹、多个尺度和不同维度的偏微分方程（PDE）数据的固有复杂性和多样性，预训练在很大程度上仍处于初级阶段。在本文中，我们提出了一种新的自回归去噪预训练策略，可以在PDE数据上进行更稳定和高效的预训练，并推广到各种下游任务。此外，通过基于傅立叶注意力的灵活和可扩展的模型架构设计，我们可以轻松扩展模型进行大规模预训练。我们在10个以上的PDE数据集上使用高达0.5B参数训练我们的PDE基础模型，包括超过100k条轨迹。大量实验表明我们在这些基准测试上取得了SOTA，并验证了我们的模型对不同的下游PDE任务（如3D数据）具有强大的泛化能力，显著提高了性能。代码可在\url{https://github.com/thu-ml/DPOT}上获取。

更新时间: 2024-05-07 01:57:00

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2403.03542v4

Graph Diffusion Transformer for Multi-Conditional Molecular Generation

Inverse molecular design with diffusion models holds great potential for advancements in material and drug discovery. Despite success in unconditional molecule generation, integrating multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models remains unexplored. We present the Graph Diffusion Transformer (Graph DiT) for multi-conditional molecular generation. Graph DiT has a condition encoder to learn the representation of numerical and categorical properties and utilizes a Transformer-based graph denoiser to achieve molecular graph denoising under conditions. Unlike previous graph diffusion models that add noise separately on the atoms and bonds in the forward diffusion process, we propose a graph-dependent noise model for training Graph DiT, designed to accurately estimate graph-related noise in molecules. We extensively validate the Graph DiT for multi-conditional polymer and small molecule generation. Results demonstrate our superiority across metrics from distribution learning to condition control for molecular properties. A polymer inverse design task for gas separation with feedback from domain experts further demonstrates its practical utility.

Updated: 2024-05-07 01:51:26

标题: 图扩散变压器用于多条件分子生成

摘要: 具有扩散模型的反向分子设计在材料和药物发现方面具有巨大潜力。尽管在无条件分子生成方面取得了成功，但将多个性质（如合成评分和气体渗透性）作为条件约束集成到扩散模型中仍未被探索。我们提出了用于多条件分子生成的图扩散变换器（Graph DiT）。Graph DiT具有条件编码器，用于学习数值和分类属性的表示，并利用基于Transformer的图去噪器在条件下实现分子图去噪。与先前的图扩散模型在前向扩散过程中分别对原子和键添加噪声不同，我们提出了用于训练Graph DiT的图相关噪声模型，旨在准确估计分子中的与图相关的噪声。我们对Graph DiT进行了广泛验证，用于多条件聚合物和小分子生成。结果表明，在从分布学习到对分子性质的条件控制的各个指标上，我们都具有优势。一个具有来自领域专家反馈的气体分离的聚合物反向设计任务进一步展示了其实际效用。

更新时间: 2024-05-07 01:51:26

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2401.13858v2

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.

Updated: 2024-05-07 01:46:15

标题: 跳过：一种减少大型视觉-语言模型中幻觉的简单方法

摘要: 最近发展的大型视觉语言模型（LVLMs）已经展示出在人类语言中理解视觉信息方面的令人印象深刻的能力。尽管取得了这些进展，LVLMs仍然面临多模态幻觉的挑战，例如生成描述不在视觉信息中的对象。然而，多模态幻觉的潜在根本原因仍然得到了很少的探讨。在本文中，我们提出了一个新的观点，认为LVLMs中固有的偏见可能是幻觉的关键因素。具体而言，我们系统地识别了与段落分隔符（\n\n）相关的语义转变偏见，即在训练数据中，\n\n之前和之后的内容经常表现出明显的语义变化。这种模式导致模型推断，在'\n\n'之后的内容应该明显不同于之前的内容，具有较少的幻觉描述，从而增加了在'\n\n'之后出现幻觉描述的概率。我们已经在多个公开可用的LVLMs上验证了这一假设。此外，我们发现在生成的描述中故意插入'\n\n'可以诱发更多的幻觉。提出了一种简单的方法，通过跳过'\n'的输出来有效减轻LVLMs的幻觉。

更新时间: 2024-05-07 01:46:15

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.01345v5

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.

Updated: 2024-05-07 01:40:23

标题: CleanGraph：人为循环知识图谱的精炼和完善

摘要: 这篇论文介绍了CleanGraph，这是一个交互式的基于Web的工具，旨在促进知识图的细化和完善。保持知识图的可靠性至关重要，这些图基于高质量和无错误的事实，对于诸如问答和信息检索系统等现实世界应用至关重要。这些图通常是通过从文本来源提取语义三元组来自动组装的。然而，在处理大型或低质量数据集时，确保这些提取的三元组的质量，可能会带来重大挑战，并严重影响下游应用的性能。CleanGraph允许用户对其图进行创建、读取、更新和删除（CRUD）操作，以及应用插件形式的模型进行图细化和完善任务。这些功能使用户能够提高其图数据的完整性和可靠性。CleanGraph的演示和源代码可在https://github.com/nlp-tlp/CleanGraph上以MIT许可证访问。

更新时间: 2024-05-07 01:40:23

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.03932v1

Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations

Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architecture named Unicorn, designed to forecast weekly sea ice. Our model integrates multiple time series images within its architecture to enhance its forecasting performance. Moreover, we incorporate a bottleneck layer within the U-Net architecture, serving as neural ordinary differential equations with convolution operations, to capture the spatiotemporal dynamics of latent variables. Through real data analysis with datasets spanning from 1998 to 2021, our proposed model demonstrates significant improvements over state-of-the-art models in the sea ice concentration forecasting task. It achieves an average MAE improvement of 12% compared to benchmark models. Additionally, our method outperforms existing approaches in sea ice extent forecasting, achieving a classification performance improvement of approximately 18%. These experimental results show the superiority of our proposed model.

Updated: 2024-05-07 01:17:06

标题: 独角兽：使用卷积神经常微分方程的U-Net进行海冰预测

摘要: 北极海冰对全球气候动力学至关重要。然而，由于多个变量之间的复杂相互作用，准确预测海冰面临着重大挑战。许多研究利用整合多个输入和强大性能的能力，转向神经网络用于海冰预测。本文介绍了一种新颖的深度架构，名为独角兽，旨在预测每周的海冰。我们的模型在其架构中集成了多个时间序列图像，以增强其预测性能。此外，我们在U-Net架构中引入了一个瓶颈层，作为具有卷积操作的神经常微分方程，以捕捉潜在变量的时空动态。通过从1998年到2021年的数据集进行实际数据分析，我们提出的模型在海冰浓度预测任务中显著优于最先进的模型。与基准模型相比，平均MAE改进为12%。此外，我们的方法在海冰范围预测方面优于现有方法，分类性能提高约18%。这些实验结果显示了我们提出的模型的优越性。

更新时间: 2024-05-07 01:17:06

领域: cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2405.03929v1

On the power of graph neural networks and the role of the activation function

In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. The proof relies on tools from the algebra of symmetric polynomials. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. It was also known prior to our work that with ReLU (piecewise linear) activations, bounded GNNs are weaker than unbounded GNNs [Aamand & Al., 2022]. Our approach adds to this result by extending it to handle any piecewise polynomial activation function, which goes towards answering an open question formulated by Grohe [Grohe,2021] more completely. Our second result states that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.

Updated: 2024-05-07 01:16:31

标题: 关于图神经网络的能力和激活函数的作用

摘要: 在这篇文章中，我们提出了关于图神经网络（GNNs）表现力的新结果。我们证明了对于任何具有分段多项式激活函数的GNN，其架构大小不随图输入大小增长，存在一对深度为两的非同构根树，使得GNN无法在任意数量的迭代中区分它们的根顶点。证明依赖于对称多项式代数的工具。相比之下，已经知道具有分段多项式激活函数的无界GNN（其大小可以随图大小改变）只需两次迭代就能区分这些顶点。我们的工作之前已知，使用ReLU（分段线性）激活函数，有界GNN比无界GNN更弱。我们的方法通过将其扩展到处理任何分段多项式激活函数来补充这一结果，这有助于更完整地回答Grohe提出的开放问题。我们的第二个结果表明，如果允许使用非分段多项式激活函数，则在两次迭代中，单个神经元感知器可以区分任何一对深度为两的非同构树的根顶点。我们的结果适用于激活函数如sigmoid、双曲正切等。这表明，如果改变神经网络的激活函数，图神经网络的能力可以发生巨大变化。这一结果的证明利用了Lindemann-Weierstrauss定理的超越数论。

更新时间: 2024-05-07 01:16:31

领域: cs.LG

下载: http://arxiv.org/abs/2307.04661v5

Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift

Semantic segmentation plays a crucial role in various computer vision applications, yet its efficacy is often hindered by the lack of high-quality labeled data. To address this challenge, a common strategy is to leverage models trained on data from different populations, such as publicly available datasets. This approach, however, leads to the distribution shift problem, presenting a reduced performance on the population of interest. In scenarios where model errors can have significant consequences, selective prediction methods offer a means to mitigate risks and reduce reliance on expert supervision. This paper investigates selective prediction for semantic segmentation in low-resource settings, thus focusing on post-hoc confidence estimators applied to pre-trained models operating under distribution shift. We propose a novel image-level confidence measure tailored for semantic segmentation and demonstrate its effectiveness through experiments on three medical imaging tasks. Our findings show that post-hoc confidence estimators offer a cost-effective approach to reducing the impacts of distribution shift.

Updated: 2024-05-07 01:05:14

标题: 使用事后置信度估计的语义分割选择性预测及其在分布转移下的性能

摘要: 语义分割在各种计算机视觉应用中起着至关重要的作用，然而其有效性往往受到高质量标注数据的缺乏所阻碍。为了解决这一挑战，一种常见的策略是利用在不同人群数据上训练的模型，例如公开可用的数据集。然而，这种方法导致了分布转移问题，表现出在感兴趣人群上减少的性能。在模型错误可能产生重大后果的情况下，选择性预测方法提供了一种减轻风险并减少对专家监督的依赖的手段。本文研究了在低资源环境中进行语义分割的选择性预测，重点关注应用于在分布转移下运行的预训练模型的事后置信度估计器。我们提出了一种针对语义分割量身定制的新型图像级置信度度量，并通过对三个医学成像任务的实验证明了其有效性。我们的研究结果表明，事后置信度估计器提供了一种成本效益高的方法，可以减少分布转移的影响。

更新时间: 2024-05-07 01:05:14

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.10665v2

NeurDB: An AI-powered Autonomous Data System

In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, our next-generation data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.

Updated: 2024-05-07 00:51:48

标题: NeurDB: 一个人工智能驱动的自主数据系统

摘要: 随着人工智能（AI）的快速发展，我们正处在数据系统转型的边缘。人工智能和数据库（AIxDB）的即将融合承诺着一代新的数据系统，通过提供个性化和自动化的AI增强功能，如基于AI的分析、改善系统性能的自动驾驶能力等，来减轻各行业领域终端用户的负担。在本文中，我们探讨了数据系统的发展，并着重加深了AI和数据库的融合。我们介绍了NeurDB，我们设计的下一代数据系统，旨在在每个主要系统组件中充分融入AI设计，并提供基于AI的数据库内分析。我们概述了NeurDB的概念和架构概述，讨论了其设计选择和关键组件，并报告了其当前的发展和未来计划。

更新时间: 2024-05-07 00:51:48

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2405.03924v1

An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However, the off-policy RL algorithms used for MTF so far have the following severe problems: 1) to avoid out-of-distribution (OOD) problem, their constraints are overly strict, which seriously damage their performance; 2) they are unaware of the exploration policy used for producing training data and never interact with real environment, so only suboptimal policy can be learned; 3) the traditional exploration policies are inefficient and hurt user experience. To solve the above problems, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs. IntegratedRL-MTF integrates off-policy RL model with our online exploration policy to relax overstrict and complicated constraints, which significantly improves its performance. We also design an extremely efficient exploration policy, which eliminates low-value exploration space and focuses on exploring potential high-value state-action pairs. Moreover, we adopt progressive training mode to further enhance our model's performance with the help of our exploration policy. We conduct extensive offline and online experiments in the short video channel of Tencent News. The results demonstrate that our model outperforms other models remarkably. IntegratedRL-MTF has been fully deployed in our RS and other large-scale RSs in Tencent, which have achieved significant improvements.

Updated: 2024-05-07 00:38:37

标题: 一个为大规模推荐系统中的多任务融合定制的离策略强化学习算法

摘要: 作为RS的最后一个关键阶段，多任务融合（MTF）负责将多任务学习（MTL）输出的多个分数组合成最终分数，以最大化用户满意度，决定最终的推荐结果。最近，为了优化推荐会话中的长期用户满意度，工业界开始在MTF中使用强化学习（RL）。然而，迄今为止用于MTF的离策略RL算法存在以下严重问题：1）为避免分布外问题，它们的约束过于严格，严重损害其性能；2）它们不知道用于生成训练数据的探索策略，并且从不与真实环境互动，因此只能学习次优策略；3）传统的探索策略效率低，伤害用户体验。为解决上述问题，我们提出了一种名为IntegratedRL-MTF的新方法，专为大规模RSs中的MTF定制。IntegratedRL-MTF将离策略RL模型与我们的在线探索策略相结合，以放松过于严格和复杂的约束，从而显著提高其性能。我们还设计了一种极其高效的探索策略，消除低价值的探索空间，重点探索潜在的高价值状态-动作对。此外，我们采用渐进式训练模式，借助我们的探索策略进一步提升模型性能。我们在腾讯新闻的短视频频道进行了大量的离线和在线实验。结果表明，我们的模型明显优于其他模型。IntegratedRL-MTF已经完全部署在我们的RS和腾讯其他大规模RS中，取得了显著的改进。

更新时间: 2024-05-07 00:38:37

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.17589v2

A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection

Deception, a prevalent aspect of human communication, has undergone a significant transformation in the digital age. With the globalization of online interactions, individuals are communicating in multiple languages and mixing languages on social media, with varied data becoming available in each language and dialect. At the same time, the techniques for detecting deception are similar across the board. Recent studies have shown the possibility of the existence of universal linguistic cues to deception across domains within the English language; however, the existence of such cues in other languages remains unknown. Furthermore, the practical task of deception detection in low-resource languages is not a well-studied problem due to the lack of labeled data. Another dimension of deception is multimodality. For example, a picture with an altered caption in fake news or disinformation may exist. This paper calls for a comprehensive investigation into the complexities of deceptive language across linguistic boundaries and modalities within the realm of computer security and natural language processing and the possibility of using multilingual transformer models and labeled data in various languages to universally address the task of deception detection.

Updated: 2024-05-07 00:38:34

标题: 一个多语言、多模态领域无关欺骗检测的路线图

摘要: 欺骗是人类交流中普遍存在的一个方面，在数字时代经历了显著的转变。随着在线互动的全球化，个体在社交媒体上使用多种语言并混合使用语言，不同语言和方言的各种数据变得可用。与此同时，检测欺骗的技术在各个领域中是相似的。最近的研究表明，在英语语言领域中存在着通用的欺骗语言线索的可能性；然而，在其他语言中是否存在这样的线索仍然未知。此外，由于缺乏标记数据，欺骗检测在资源匮乏的语言中并不是一个深入研究的问题。欺骗的另一个维度是多模态性。例如，在假新闻或虚假信息中，可能存在一幅图片和经过改变的标题。本文呼吁对跨越语言界限和模态性的欺骗性语言复杂性进行全面调查，在计算机安全和自然语言处理领域中探讨使用多语言转换模型和标记数据来普遍解决欺骗检测任务的可能性。

更新时间: 2024-05-07 00:38:34

领域: cs.CL,cs.AI,cs.MM,I.2.6; I.2.7; I.2.10; K.4.4

下载: http://arxiv.org/abs/2405.03920v1

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.

Updated: 2024-05-07 00:36:56

标题: 通过基于梯度的模型修剪来消除后门攻击的学习

摘要: 在越来越关注网络安全威胁的时代，防御后门攻击对于确保机器学习模型的完整性和可靠性至关重要。然而，许多现有方法需要大量数据来有效减轻风险，在实际部署中面临重大挑战。为了解决这个问题，我们提出了一种新颖的方法来对抗后门攻击，将其减轻视为一个未学习的任务。我们通过针对性的模型修剪策略来解决这一挑战，利用未学习损失梯度来识别并消除模型中的后门元素。基于扎实的理论见解，我们的方法提供了简单和有效的解决方案，适用于数据有限的情况。我们的方法包括制定适当的未学习损失和设计针对卷积神经网络的模型修剪技术。全面的评估表明，与最先进的方法相比，我们提出的方法在现实数据环境中特别有效。

更新时间: 2024-05-07 00:36:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.03918v1

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression, but existing methods still fail at very low bit widths. We observe that distinct channels of a key/value activation embedding are highly inter-dependent, and the joint entropy of multiple channels grows at a slower rate than the sum of their marginal entropies. Based on this insight, we propose Coupled Quantization (CQ), which couples multiple key/value channels together to exploit their inter-dependency and encode the activations in a more information-efficient manner. Extensive experiments reveal that CQ outperforms or is competitive with existing baselines in preserving model quality. Furthermore, we demonstrate that CQ can preserve model quality with KV cache quantized down to 1-bit.

Updated: 2024-05-07 00:25:20

标题: KV缓存每通道1比特：使用耦合量化高效大语言模型推断

摘要: 大型语言模型（LLMs）的高效部署需要将多个请求进行批处理，以提高吞吐量。随着批处理大小、上下文长度或模型大小的增加，键值（KV）缓存的大小很快成为GPU内存使用量的主要贡献者和推理延迟的瓶颈。量化已被证明是一种有效的KV缓存压缩技术，但现有方法在非常低的位宽下仍然失败。我们观察到键/值激活嵌入的不同通道高度相互依赖，并且多个通道的联合熵增长速度比它们边际熵的总和要慢。基于这一观察，我们提出了耦合量化（CQ），将多个键/值通道耦合在一起，以利用它们的相互依赖性，并以更具信息效率的方式对激活进行编码。大量实验证明，CQ在保留模型质量方面优于或与现有基线竞争。此外，我们证明CQ可以将KV缓存量化到1位而保持模型质量。

更新时间: 2024-05-07 00:25:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.03917v1

Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

Biomanufacturing innovation relies on an efficient design of experiments (DoE) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach that can guide a sequential DoEs for digital twin model calibration. In this study, we consider a multi-scale mechanistic model for cell culture process, also known as Biological Systems-of-Systems (Bio-SoS), as our digital twin. This model with modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.

Updated: 2024-05-07 00:22:13

标题: 数字孪生校准在生物系统-系统中的应用：细胞培养生产过程

摘要: 生物制造创新依赖于高效的实验设计（DoE）来优化过程和产品质量。传统的DoE方法忽略了潜在的生物加工机制，常常缺乏可解释性和样本效率。这种局限性激励我们创建一种新的最优学习方法，可以指导数字孪生模型校准的顺序DoE。在这项研究中，我们考虑了一个细胞培养过程的多尺度机械模型，也被称为生物系统-系统（Bio-SoS），作为我们的数字孪生。这种模型具有模块化设计，由子模型组成，使我们能够整合各种生产过程的数据。为了校准Bio-SoS数字孪生，我们评估模型预测的均方误差，并开发了一种计算方法来量化各个子模型参数估计误差对数字孪生预测准确性的影响，这可以指导样本高效和可解释的DoE。

更新时间: 2024-05-07 00:22:13

领域: q-bio.QM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.03913v1

ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality

This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alongside synthetic data generated by LLMs. The results highlight significant improvements, underlining the efficacy of merging advanced machine-learning techniques with a user-centric design ethos. Through this exploration, we bridge the gap between technological sophistication and user-friendly design, ensuring that our framework yields accurate predictions and translates them into actionable insights.

Updated: 2024-05-07 00:20:30

标题: ZzzGPT：一种增强睡眠质量的交互式GPT方法

摘要: 本文探讨了技术与睡眠模式理解的交集，提出了一个利用大型语言模型（LLMs）力量的尖端二阶段框架。主要目标是提供精确的睡眠预测和可操作的反馈，解决现有解决方案的局限性。这种创新方法涉及利用GLOBEM数据集以及LLMs生成的合成数据。结果突出显示了显着的改进，强调了将先进的机器学习技术与以用户为中心的设计理念相结合的效果。通过这种探索，我们弥合了技术复杂性与用户友好设计之间的差距，确保我们的框架产生准确的预测并将其转化为可操作的见解。

更新时间: 2024-05-07 00:20:30

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2310.16242v2

Federated Graph Condensation with Information Bottleneck Principles

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge the gap, we propose and study the novel problem of federated graph condensation for graph neural networks (GNNs). Specifically, we first propose a general framework for federated graph condensation, in which we decouple the typical gradient matching process for graph condensation into client-side gradient calculation and server-side gradient matching. In this way, the burdensome computation cost in client-side is largely alleviated. Besides, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during the federated training can be utilized to steal the training data under the membership inference attacks (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the federated graph condensation, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Extensive experiments on real-world datasets demonstrate that our framework can consistently protect membership privacy during training. Meanwhile, it also achieves comparable and even superior performance against existing centralized graph condensation and federated graph learning methods.

Updated: 2024-05-07 00:08:15

标题: 使用信息瓶颈原则进行联合图压缩

摘要: 图图凝聚通过合成一个小规模的凝缩图作为其替代物，将大规模图的大小减小，从而立即使各种图学习任务受益。然而，现有的图图凝聚方法依赖于集中式数据存储，这对于现实世界中的分散数据分布来说是不可行的，并且忽视了数据持有者的隐私保护要求。为了弥合这一差距，我们提出并研究了图神经网络（GNNs）的联邦图凝聚的新问题。具体地，我们首先提出了一个联邦图凝聚的通用框架，在这个框架中，我们将图凝聚的典型梯度匹配过程解耦为客户端梯度计算和服务器端梯度匹配。通过这种方式，客户端中繁重的计算成本得到了很大程度的缓解。此外，我们的实证研究表明，在联邦设置下，凝缩图将一直泄露数据成员隐私，即在联邦训练期间的凝缩图可以被用来窃取训练数据，从而进行成员推理攻击（MIA）。为了解决这个问题，我们将信息瓶颈原则创新地融入到联邦图凝聚中，这只需要在一个本地预训练步骤中提取部分节点特征，并在联邦训练过程中利用这些特征。对真实世界数据集的广泛实验表明，我们的框架可以在训练过程中始终保护成员隐私。同时，它还实现了与现有的集中式图凝聚和联邦图学习方法相媲美甚至更好的性能。

更新时间: 2024-05-07 00:08:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03911v1

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.

Updated: 2024-05-07 00:00:06

标题: ChemReasoner：使用量子化学反馈在大型语言模型的知识空间上进行启发式搜索

摘要: 发现新的催化剂对于设计新的更高效的化学过程是至关重要的，以实现向可持续未来的过渡。我们引入了一种人工智能引导的计算筛选框架，将语言推理与基于量子化学的三维原子表示的反馈结合起来。我们的方法将催化剂发现形式化为一个不确定的环境，在这个环境中，一个代理人通过大型语言模型（LLM）推导的假设和基于原子图神经网络（GNN）的反馈的迭代组合来积极地寻找高效的催化剂。在中间搜索步骤中确定的催化剂经过基于空间定位、反应途径和稳定性的结构评估。基于吸附能和障碍的评分函数引导探索在LLM的知识空间中朝着能量有利、高效的催化剂方向前进。我们引入了自动引导探索的规划方法，无需人类输入，提供了与基于专家枚举的化学描述符实现相竞争的性能。通过将语言引导的推理与计算化学反馈相结合，我们的工作开创了AI加速的可信赖的催化剂发现。

更新时间: 2024-05-07 00:00:06

领域: physics.chem-ph,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2402.10980v3