Arxiv Day: Article

Concept-based Analysis of Neural Networks via Vision-Language Models

The analysis of vision-based deep neural networks (DNNs) is highly desirable but it is very challenging due to the difficulty of expressing formal specifications for vision tasks and the lack of efficient verification procedures. In this paper, we propose to leverage emerging multimodal, vision-language, foundation models (VLMs) as a lens through which we can reason about vision models. VLMs have been trained on a large body of images accompanied by their textual description, and are thus implicitly aware of high-level, human-understandable concepts describing the images. We describe a logical specification language $\texttt{Con}_{\texttt{spec}}$ designed to facilitate writing specifications in terms of these concepts. To define and formally check $\texttt{Con}_{\texttt{spec}}$ specifications, we build a map between the internal representations of a given vision model and a VLM, leading to an efficient verification procedure of natural-language properties for vision models. We demonstrate our techniques on a ResNet-based classifier trained on the RIVAL-10 dataset using CLIP as the multimodal model.

Updated: 2024-04-10 23:47:34

标题: 通过视觉-语言模型进行基于概念的神经网络分析

摘要: 对基于视觉的深度神经网络（DNNs）进行分析是非常有必要的，但由于难以表达视觉任务的正式规范以及缺乏高效的验证程序，这一挑战性很大。本文提出利用新兴的多模态、视觉-语言、基础模型（VLMs）作为一种透视视觉模型的方式。VLMs已经在大量图像及其文本描述上进行了训练，因此隐含地了解描述图像的高级、人类可理解的概念。我们描述了一个逻辑规范语言$\texttt{Con}_{\texttt{spec}}$，旨在促进以这些概念为基础编写规范。为了定义和正式检查$\texttt{Con}_{\texttt{spec}}$规范，我们建立了一个给定视觉模型和VLM之间的内部表示的映射，从而实现了对视觉模型的自然语言属性的高效验证过程。我们在使用CLIP作为多模态模型对RIVAL-10数据集训练的ResNet分类器上展示了我们的技术。

更新时间: 2024-04-10 23:47:34

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.LO

下载: http://arxiv.org/abs/2403.19837v3

Learning to Predict 3D Rotational Dynamics from Images of a Rigid Body with Unknown Mass Distribution

In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribution of mass inside the body, which, together with initial angular velocity, is what determines how the body will rotate. We present a physics-based neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion. We demonstrate the efficacy of our approach on new rotating rigid-body datasets of sequences of synthetic images of rotating objects, including cubes, prisms and satellites, with unknown uniform and non-uniform mass distributions. Our model outperforms competing baselines on our datasets, producing better qualitative predictions and reducing the error observed for the state-of-the-art Hamiltonian Generative Network by a factor of 2.

Updated: 2024-04-10 23:39:38

标题: 学习从刚体的图像预测未知质量分布的3D旋转动力学

摘要: 在许多真实世界的场景中，当低维度测量不可用时，可以获得自由旋转的三维刚体的图像观测。然而，图像数据的高维度使得无法使用经典估计技术来学习动态。标准深度学习方法的实用性也受到限制，因为刚体的图像并不能揭示刚体内部质量分布的情况，而这与初始角速度一起决定了刚体的旋转方式。我们提出了一种基于物理的神经网络模型，用于从图像序列中估计和预测三维旋转动态。我们通过使用一个多阶段预测管道来实现这一点，将单个图像映射到一个同构于$\mathbf{SO}(3)$的潜在表示，从潜在对中计算角速度，并使用动力学哈密顿方程来预测未来的潜在状态。我们在新的旋转刚体数据集上展示了我们方法的有效性，这些数据集包括旋转对象的合成图像序列，如立方体、棱柱体和卫星，具有未知的均匀和非均匀质量分布。我们的模型在我们的数据集上表现优于竞争基线，产生更好的定性预测，并将现有的最先进的哈密顿生成网络的误差减少了一半。

更新时间: 2024-04-10 23:39:38

领域: cs.CV,cs.CE,cs.LG

下载: http://arxiv.org/abs/2308.14666v2

Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits

While graph neural networks (GNNs) have gained popularity for learning circuit representations in various electronic design automation (EDA) tasks, they face challenges in scalability when applied to large graphs and exhibit limited generalizability to new designs. These limitations make them less practical for addressing large-scale, complex circuit problems. In this work we propose HOGA, a novel attention-based model for learning circuit representations in a scalable and generalizable manner. HOGA first computes hop-wise features per node prior to model training. Subsequently, the hop-wise features are solely used to produce node representations through a gated self-attention module, which adaptively learns important features among different hops without involving the graph topology. As a result, HOGA is adaptive to various structures across different circuits and can be efficiently trained in a distributed manner. To demonstrate the efficacy of HOGA, we consider two representative EDA tasks: quality of results (QoR) prediction and functional reasoning. Our experimental results indicate that (1) HOGA reduces estimation error over conventional GNNs by 46.76% for predicting QoR after logic synthesis; (2) HOGA improves 10.0% reasoning accuracy over GNNs for identifying functional blocks on unseen gate-level netlists after complex technology mapping; (3) The training time for HOGA almost linearly decreases with an increase in computing resources.

Updated: 2024-04-10 23:31:08

标题: 少即是多：基于跳数的图注意力网络用于电路可扩展且泛化性学习

摘要: 尽管图神经网络（GNNs）在各种电子设计自动化（EDA）任务中学习电路表示方面备受欢迎，但在应用于大型图时面临可伸缩性挑战，并且对新设计的泛化能力有限。这些限制使它们在解决大规模、复杂的电路问题方面不够实用。在这项工作中，我们提出了一种名为HOGA的新型基于注意力机制的模型，用于以可伸缩且可泛化的方式学习电路表示。HOGA在模型训练之前首先计算每个节点的逐跳特征。随后，逐跳特征仅用于通过门控自注意力模块生成节点表示，该模块自适应地学习不同跳之间的重要特征，而不涉及图拓扑。因此，HOGA适应于不同电路结构，并可以以分布式方式高效训练。为了证明HOGA的有效性，我们考虑了两个代表性的EDA任务：结果质量（QoR）预测和功能推理。我们的实验结果表明：（1）HOGA比传统GNNs减少了46.76％的QoR预测估计误差；（2）HOGA在识别经过复杂技术映射后的未见门级网表上的功能块方面比GNNs提高了10.0％的推理准确度；（3）HOGA的训练时间随着计算资源的增加几乎呈线性下降。

更新时间: 2024-04-10 23:31:08

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2403.01317v4

Driving Everywhere with Large Language Model Policy Adaptation

Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of autonomous vehicles (AVs). In this paper, we present LLaDA, a simple yet powerful tool that enables human drivers and autonomous vehicles alike to drive everywhere by adapting their tasks and motion plans to traffic rules in new locations. LLaDA achieves this by leveraging the impressive zero-shot generalizability of large language models (LLMs) in interpreting the traffic rules in the local driver handbook. Through an extensive user study, we show that LLaDA's instructions are useful in disambiguating in-the-wild unexpected situations. We also demonstrate LLaDA's ability to adapt AV motion planning policies in real-world datasets; LLaDA outperforms baseline planning approaches on all our metrics. Please check our website for more details: https://boyiliee.github.io/llada.

Updated: 2024-04-10 23:29:18

标题: 使用大型语言模型进行政策适应的驾驶技术

摘要: 将驾驶行为适应新环境、习俗和法律是自动驾驶长期存在的问题，这阻碍了自动驾驶车辆（AVs）的广泛部署。在本文中，我们介绍了LLaDA，这是一个简单但强大的工具，使人类驾驶员和自动驾驶车辆都能通过将其任务和运动计划适应新位置的交通规则来进行驾驶。LLaDA通过利用大型语言模型（LLMs）在解释当地驾驶手册中的交通规则方面的惊人的零-shot泛化能力来实现这一目标。通过广泛的用户研究，我们展示了LLaDA的指导在消除野外意外情况中的有用性。我们还展示了LLaDA在真实世界数据集中调整AV运动规划策略的能力；LLaDA在所有指标上均优于基线规划方法。请查看我们的网站以获取更多详细信息：https://boyiliee.github.io/llada。

更新时间: 2024-04-10 23:29:18

领域: cs.RO,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05932v2

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Novices frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI-based scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through 10 user studies where novices used BISCUIT for machine learning tutorials, we discover that BISCUIT offers user semantic representation of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas. We discuss the implications of our findings for UI-centric interactive paradigm in code generation LLMs.

Updated: 2024-04-10 23:28:09

标题: 饼干：在计算笔记本中使用临时用户界面支撑由LLM生成的代码

摘要: 初学者经常在计算笔记本中使用机器学习教程，并采用基于大型语言模型（LLMs）的代码生成技术。然而，他们在理解和使用LLMs生成的代码时遇到困难。为了减轻这些挑战，我们在计算笔记本中引入了一种新颖的工作流程，将LLM基础的代码生成与额外的短暂UI步骤相结合，为用户提供UI为基础的支架，作为用户提示和代码生成之间的中间阶段。我们在JupyterLab的扩展程序BISCUIT中展示了这种工作流程，该扩展程序根据用户的代码和意图生成LLM生成的短暂UI，为用户提供理解、指导和探索LLM生成的代码的支架。通过进行10项用户研究，让初学者使用BISCUIT进行机器学习教程，我们发现BISCUIT为用户提供了代码的语义表示，以帮助他们理解，减少提示工程的复杂性，并为用户提供一个探索不同变量和思路的游乐场。我们讨论了我们的发现对代码生成LLMs中UI为中心的交互范式的影响。

更新时间: 2024-04-10 23:28:09

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.07387v1

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

It is a notable trend to use Large Language Models (LLMs) to tackle complex tasks, e.g., tasks that require a sequence of actions and dynamic interaction with tools and external environments. In this paper, we propose StateFlow, a novel LLM-based task-solving paradigm that conceptualizes complex task-solving processes as state machines. In StateFlow, we distinguish between "process grounding" (via state and state transitions) and "sub-task solving" (through actions within a state), enhancing control and interpretability of the task-solving procedure. A state represents the status of a running process. The transitions between states are controlled by heuristic rules or decisions made by the LLM, allowing for a dynamic and adaptive progression. Upon entering a state, a series of actions is executed, involving not only calling LLMs guided by different prompts, but also the utilization of external tools as needed. Our results show that StateFlow significantly enhances LLMs' efficiency. For instance, StateFlow achieves 13% and 28% higher success rates compared to ReAct in InterCode SQL and ALFWorld benchmark, with 5x and 3x less cost respectively. We also show that StateFlow can be combined with iterative refining methods like Reflexion to further improve performance.

Updated: 2024-04-10 23:04:48

标题: StateFlow：通过状态驱动工作流增强LLM任务解决

摘要: 使用大型语言模型（LLMs）来解决复杂任务是一个明显的趋势，例如需要一系列动作和与工具以及外部环境的动态交互的任务。在本文中，我们提出了StateFlow，这是一种基于LLM的任务解决范式，将复杂任务解决过程概念化为状态机。在StateFlow中，我们区分了“过程基础”（通过状态和状态转换）和“子任务解决”（通过状态内的动作），增强了任务解决过程的控制性和可解释性。状态代表正在运行的过程的状态。状态之间的转换由启发式规则或LLM做出的决策控制，允许动态和自适应的进展。进入一个状态后，执行一系列动作，不仅包括根据不同提示调用LLMs，还包括根据需要利用外部工具。我们的结果显示，StateFlow显著提高了LLMs的效率。例如，与InterCode SQL和ALFWorld基准中的ReAct相比，StateFlow的成功率分别提高了13％和28％，成本分别降低了5倍和3倍。我们还展示了StateFlow可以与像Reflexion这样的迭代优化方法结合使用以进一步提高性能。

更新时间: 2024-04-10 23:04:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11322v3

Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles

Autonomous vehicles often make complex decisions via machine learning-based predictive models applied to collected sensor data. While this combination of methods provides a foundation for real-time actions, self-driving behavior primarily remains opaque to end users. In this sense, explainability of real-time decisions is a crucial and natural requirement for building trust in autonomous vehicles. Moreover, as autonomous vehicles still cause serious traffic accidents for various reasons, timely conveyance of upcoming hazards to road users can help improve scene understanding and prevent potential risks. Hence, there is also a need to supply autonomous vehicles with user-friendly interfaces for effective human-machine teaming. Motivated by this problem, we study the role of explainable AI and human-machine interface jointly in building trust in vehicle autonomy. We first present a broad context of the explanatory human-machine systems with the "3W1H" (what, whom, when, how) approach. Based on these findings, we present a situation awareness framework for calibrating users' trust in self-driving behavior. Finally, we perform an experiment on our framework, conduct a user study on it, and validate the empirical findings with hypothesis testing.

Updated: 2024-04-10 23:02:13

标题: 将解释融入人机界面，提升自动驾驶车辆中的信任和情境感知

摘要: 自动驾驶车辆通常通过基于机器学习的预测模型应用于收集的传感器数据来做出复杂决策。虽然这些方法的结合为实时行动提供了基础，但自动驾驶行为主要对最终用户不透明。在这个意义上，实时决策的可解释性是建立对自动驾驶车辆信任的关键和自然要求。此外，由于自动驾驶车辆仍然因各种原因造成严重交通事故，及时向道路用户传达即将出现的危险可以帮助改善场景理解并防止潜在风险。因此，还需要为自动驾驶车辆提供用户友好的界面，以实现有效的人机协作。受这一问题的启发，我们研究了可解释人工智能和人机界面在建立车辆自主性信任中的共同作用。我们首先用“3W1H”（什么，谁，何时，如何）方法呈现了解释性人机系统的广泛背景。基于这些发现，我们提出了一个情境意识框架，用于校准用户对自动驾驶行为的信任。最后，我们对我们的框架进行了实验，对其进行了用户研究，并通过假设检验验证了实证结果。

更新时间: 2024-04-10 23:02:13

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.07383v1

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.

Updated: 2024-04-10 23:01:45

标题: 学习失败：使用试错数据对LLMs进行微调，用于直观命题逻辑证明

摘要: 最近自动定理证明方面的进展表明，利用一个（大型）语言模型生成策略（即证明步骤）在搜索证明状态时具有有效性。当前模型虽然仅在成功的证明路径上进行训练，但在推理阶段面临一个不一致，因为它必须在每个证明状态中采样和尝试各种策略，直到找到成功，这与其训练不包括从失败尝试中学习的情况不同。直觉上，导致失败搜索路径的策略将表明类似的策略在接下来的尝试中应该受到较少关注。在本文中，我们展示了训练模型从失败搜索路径中额外学习的好处。面对现有开源定理证明数据集中缺乏这样的试错数据，我们在直觉主义命题逻辑定理上整理了一个数据集，并在Lean中加以形式化，以便能够可靠地检查证明的正确性。我们将我们的模型（TrialMaster）训练在相对较短的试错信息上与仅在正确路径上训练的模型进行比较，发现前者解决了更多未见过的定理，并且试错搜索次数更少。

更新时间: 2024-04-10 23:01:45

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2404.07382v1

Analyzing the Performance of Large Language Models on Code Summarization

Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in standard datasets (corresponding to docstrings in large code bases) are often highly similar to the names of the functions they describe. We also show that this token overlap occurs largely in the function names of the code and compare the relative performance of these models after removing function names versus removing code structure. We also show that using multiple evaluation metrics like BLEU and BERTScore gives us very little additional insight since these metrics are highly correlated with each other.

Updated: 2024-04-10 22:42:18

标题: 分析大型语言模型在代码摘要生成上的性能

摘要: 大型语言模型（LLMs）如Llama 2在涉及自然语言和源代码的任务中表现非常出色，尤其是在代码摘要和代码生成方面。我们发现，在代码摘要任务中，这些模型在个别示例上的性能往往取决于代码与数据集中相应自然语言描述之间的（子词）标记重叠量。这种标记重叠是由于标准数据集中的参考描述（对应于大型代码库中的docstrings）通常与它们描述的函数名称非常相似而产生的。我们还展示了这种标记重叠主要出现在代码的函数名称中，并比较了在删除函数名称与删除代码结构后这些模型的相对性能。我们还展示，使用多个评估指标如BLEU和BERTScore几乎不会给我们更多的见解，因为这些指标彼此高度相关。

更新时间: 2024-04-10 22:42:18

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08018v1

Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI

Building on the remarkable achievements in generative sampling of natural images, we propose an innovative challenge, potentially overly ambitious, which involves generating samples of entire multivariate time series that resemble images. However, the statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects. This issue is especially problematic for deep generative models that follow the conventional approach of generating samples from a canonical distribution and then decoding or denoising them to match the true data distribution. In contrast, our method is grounded in information theory and aims to implicitly characterize the distribution of images, particularly the (global and local) dependency structure between pixels. We achieve this by empirically estimating its KL-divergence in the dual form with respect to the respective marginal distribution. This enables us to perform generative sampling directly in the optimized 1-D dual divergence space. Specifically, in the dual space, training samples representing the data distribution are embedded in the form of various clusters between two end points. In theory, any sample embedded between those two end points is in-distribution w.r.t. the data distribution. Our key idea for generating novel samples of images is to interpolate between the clusters via a walk as per gradients of the dual function w.r.t. the data dimensions. In addition to the data efficiency gained from direct sampling, we propose an algorithm that offers a significant reduction in sample complexity for estimating the divergence of the data distribution with respect to the marginal distribution. We provide strong theoretical guarantees along with an extensive empirical evaluation using many real-world datasets from diverse domains, establishing the superiority of our approach w.r.t. state-of-the-art deep learning methods.

Updated: 2024-04-10 22:35:06

标题: 在双向离散空间中的深度生成采样：一种数据高效和解释性方法，用于生成式人工智能

摘要: 基于自然图像生成抽样方面的显著成就，我们提出了一个创新性的挑战，可能过于雄心勃勃，即生成类似图像的整个多变量时间序列样本。然而，统计挑战在于样本量较小，有时仅包括几百个主体。这对于遵循传统方法从规范分布生成样本，然后解码或去噪以匹配真实数据分布的深度生成模型尤为棘手。相反，我们的方法基于信息理论，旨在隐式表征图像的分布，特别是像素之间的（全局和局部）依赖结构。我们通过在对应边缘分布方面以对偶形式经验估计其KL散度来实现这一点。这使我们能够直接在优化的1-D对偶散度空间中进行生成抽样。具体来说，在对偶空间中，代表数据分布的训练样本以各种簇的形式嵌入在两个端点之间。理论上，嵌入在这两个端点之间的任何样本都是相对于数据分布的内分布。我们生成图像新样本的关键想法是通过根据数据维度的对偶函数梯度在簇之间进行漫步插值。除了从直接抽样中获得的数据效率外，我们提出了一种算法，用于估计相对于边缘分布的数据分布的散度，显著减少了样本复杂度。我们提供了强有力的理论保证，并使用多个来自不同领域的真实数据集进行了广泛的实证评估，建立了我们的方法相对于最先进的深度学习方法的优越性。

更新时间: 2024-04-10 22:35:06

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.07377v1

Improving Multi-Center Generalizability of GAN-Based Fat Suppression using Federated Learning

Generative Adversarial Network (GAN)-based synthesis of fat suppressed (FS) MRIs from non-FS proton density sequences has the potential to accelerate acquisition of knee MRIs. However, GANs trained on single-site data have poor generalizability to external data. We show that federated learning can improve multi-center generalizability of GANs for synthesizing FS MRIs, while facilitating privacy-preserving multi-institutional collaborations.

Updated: 2024-04-10 22:16:20

标题: 使用联邦学习改进基于GAN的脂肪抑制技术在多中心的泛化能力

摘要: 基于生成对抗网络（GAN）的脂肪抑制（FS）MRI合成技术可以加速膝关节MRI的获取，但是在单一数据集上训练的GAN具有较差的泛化能力。我们展示了联邦学习可以提高GAN对于合成FS MRI的多中心泛化能力，同时促进保护隐私的多机构合作。

更新时间: 2024-04-10 22:16:20

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07374v1

Synthesizing Neural Network Controllers with Closed-Loop Dissipativity Guarantees

In this paper, a method is presented to synthesize neural network controllers such that the feedback system of plant and controller is dissipative, certifying performance requirements such as L2 gain bounds. The class of plants considered is that of linear time-invariant (LTI) systems interconnected with an uncertainty, including nonlinearities treated as an uncertainty for convenience of analysis. The uncertainty of the plant and the nonlinearities of the neural network are both described using integral quadratic constraints (IQCs). First, a dissipativity condition is derived for uncertain LTI systems. Second, this condition is used to construct a linear matrix inequality (LMI) which can be used to synthesize neural network controllers. Finally, this convex condition is used in a projection-based training method to synthesize neural network controllers with dissipativity guarantees. Numerical examples on an inverted pendulum and a flexible rod on a cart are provided to demonstrate the effectiveness of this approach.

Updated: 2024-04-10 22:15:28

标题: 合成具有闭环耗散性保证的神经网络控制器

摘要: 在这篇论文中，提出了一种方法来合成神经网络控制器，使得植物和控制器的反馈系统具有耗散性，证明了性能要求，如L2增益边界。考虑的植物类别是与不确定性相互连接的线性时不变（LTI）系统，包括将非线性视为便于分析的不确定性。植物的不确定性和神经网络的非线性都使用积分二次约束（IQCs）来描述。首先，推导了不确定LTI系统的耗散条件。其次，利用这个条件构造了一个线性矩阵不等式（LMI），可以用来合成神经网络控制器。最后，这个凸条件被用于基于投影的训练方法，以合成具有耗散性保证的神经网络控制器。通过倒立摆和车上灵活杆的数值示例来展示这种方法的有效性。

更新时间: 2024-04-10 22:15:28

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2404.07373v1

Differentially Private GANs for Generating Synthetic Indoor Location Data

The advent of location-based services has led to the widespread adoption of indoor localization systems, which enable location tracking of individuals within enclosed spaces such as buildings. While these systems provide numerous benefits such as improved security and personalized services, they also raise concerns regarding privacy violations. As such, there is a growing need for privacy-preserving solutions that can protect users' sensitive location information while still enabling the functionality of indoor localization systems. In recent years, Differentially Private Generative Adversarial Networks (DPGANs) have emerged as a powerful methodology that aims to protect the privacy of individual data points while generating realistic synthetic data similar to original data. DPGANs combine the power of generative adversarial networks (GANs) with the privacy-preserving technique of differential privacy (DP). In this paper, we introduce an indoor localization framework employing DPGANs in order to generate privacy-preserving indoor location data. We evaluate the performance of our framework on a real-world indoor localization dataset and demonstrate its effectiveness in preserving privacy while maintaining the accuracy of the localization system.

Updated: 2024-04-10 21:43:27

标题: 差分隐私生成对抗网络用于生成合成室内位置数据

摘要: 随着基于位置的服务的出现，室内定位系统得到了广泛的应用，这使得个体在建筑等封闭空间内的位置追踪成为可能。虽然这些系统提供了诸多好处，如增强安全性和个性化服务，但也引发了对隐私侵犯的担忧。因此，迫切需要隐私保护解决方案，既能保护用户的敏感位置信息，又能实现室内定位系统的功能。近年来，差分隐私生成对抗网络(DPGANs)作为一种强大的方法学，旨在保护个体数据点的隐私，同时生成类似于原始数据的逼真合成数据。DPGANs将生成对抗网络(GANs)的强大功能与差分隐私(DP)的隐私保护技术结合在一起。本文介绍了一种利用DPGANs生成隐私保护室内定位数据的室内定位框架。我们在真实的室内定位数据集上评估了我们框架的性能，并展示了在保持定位系统准确性的同时保护隐私的有效性。

更新时间: 2024-04-10 21:43:27

领域: cs.CR,cs.AI,eess.SP

下载: http://arxiv.org/abs/2404.07366v1

Gradient Networks

Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in optimization, generative modeling, and optimal transport. This paper introduces gradient networks (GradNets): novel neural network architectures that parameterize gradients of various function classes. GradNets exhibit specialized architectural constraints that ensure correspondence to gradient functions. We provide a comprehensive GradNet design framework that includes methods for transforming GradNets into monotone gradient networks (mGradNets), which are guaranteed to represent gradients of convex functions. We establish the approximation capabilities of the proposed GradNet and mGradNet. Our results demonstrate that these networks universally approximate the gradients of (convex) functions. Furthermore, these networks can be customized to correspond to specific spaces of (monotone) gradient functions, including gradients of transformed sums of (convex) ridge functions. Our analysis leads to two distinct GradNet architectures, GradNet-C and GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and mGradNet-M. Our empirical results show that these architectures offer efficient parameterizations and outperform popular methods in gradient field learning tasks.

Updated: 2024-04-10 21:36:59

标题: 梯度网络

摘要: 直接参数化和学习函数的梯度具有广泛的重要性，在优化、生成建模和最优传输等特定应用中具有特殊意义。本文介绍了梯度网络（GradNets）：一种新颖的神经网络架构，用于参数化各种函数类的梯度。GradNets展示了确保与梯度函数对应的专门的架构约束。我们提供了一个全面的GradNet设计框架，其中包括将GradNets转化为单调梯度网络（mGradNets）的方法，这些网络被保证能够表示凸函数的梯度。我们建立了所提出的GradNet和mGradNet的近似能力。我们的结果表明，这些网络普遍逼近（凸）函数的梯度。此外，这些网络可以定制以对应于特定空间的（单调）梯度函数，包括转换和（凸）脊函数的和的梯度。我们的分析导致了两种不同的GradNet架构，GradNet-C和GradNet-M，并描述了相应的单调版本，mGradNet-C和mGradNet-M。我们的实证结果表明，这些架构提供了高效的参数化，并在梯度场学习任务中优于流行的方法。

更新时间: 2024-04-10 21:36:59

领域: cs.LG,cs.NE,eess.SP,math.OC

下载: http://arxiv.org/abs/2404.07361v1

GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data

Microplastic particle ingestion or inhalation by humans is a problem of growing concern. Unfortunately, current research methods that use machine learning to understand their potential harms are obstructed by a lack of available data. Deep learning techniques in particular are challenged by such domains where only small or imbalanced data sets are available. Overcoming this challenge often involves oversampling underrepresented classes or augmenting the existing data to improve model performance. This paper proposes GANsemble: a two-module framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. First, the data chooser module automates augmentation strategy selection by searching for the best data augmentation strategy. Next, the cGAN module uses this strategy to train a cGAN for generating enhanced synthetic data. We experiment with the GANsemble framework on a small and imbalanced microplastics data set. A Microplastic-cGAN (MPcGAN) algorithm is introduced, and baselines for synthetic microplastics (SYMP) data are established in terms of Frechet Inception Distance (FID) and Inception Scores (IS). We also provide a synthetic microplastics filter (SYMP-Filter) algorithm to increase the quality of generated SYMP. Additionally, we show the best amount of oversampling with augmentation to fix class imbalance in small microplastics data sets. To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.

Updated: 2024-04-10 21:23:13

标题: GANsemble用于小规模和不平衡数据集：合成微塑料数据的基线

摘要: 人类摄入或吸入微塑料颗粒是一个日益令人担忧的问题。不幸的是，目前利用机器学习来理解微塑料潜在危害的研究方法受到可用数据的不足阻碍。特别是深度学习技术在只有少量或不平衡数据集的领域中面临挑战。克服这一挑战通常涉及对少数类别进行过采样或增强现有数据以提高模型性能。本文提出了GANsemble：一个连接数据增强与条件生成对抗网络（cGANs）的两模块框架，用于生成类别条件的合成数据。首先，数据选择器模块通过搜索最佳数据增强策略自动化选择增强策略。接下来，cGAN模块使用该策略训练cGAN以生成增强的合成数据。我们在一个小规模且不平衡的微塑料数据集上尝试了GANsemble框架。引入了微塑料cGAN（MPcGAN）算法，并以Frechet Inception Distance（FID）和Inception Scores（IS）为标准建立了合成微塑料（SYMP）数据的基线。我们还提供了一种合成微塑料过滤器（SYMP-Filter）算法，以提高生成的SYMP的质量。此外，我们展示了通过增强来解决小型微塑料数据集中类别不平衡的最佳过采样量。据我们所知，这项研究是将生成人工智能首次应用于合成微塑料数据。

更新时间: 2024-04-10 21:23:13

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.07356v1

FairEM360: A Suite for Responsible Entity Matching

Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the output of entity matchers across a wide range of fairness measures and paradigms, 2) providing potential explanations for the underlying reasons for unfairness, and 3) providing resolutions for the unfairness issues through an exploratory process with human-in-the-loop feedback, utilizing an ensemble of matchers. We aspire for FairEM360 to contribute to the prioritization of fairness as a key consideration in the evaluation of EM pipelines.

Updated: 2024-04-10 21:19:33

标题: FairEM360：一套负责任实体匹配的工具

摘要: 实体匹配是大数据管道中最早出现的任务之一，且极易受到影响数据质量的无意识偏见的影响。在这个阶段识别和减轻数据中存在的或由匹配器引入的偏见，可以有助于促进下游任务的公平性。本文展示了FairEM360，这是一个框架，可以通过广泛的公平性度量和范式来审计实体匹配器的输出，提供潜在的不公平性原因解释，并通过人机交互的探索过程利用匹配器集合提供解决不公平问题的方法。我们希望FairEM360能够促进将公平性作为评估EM管道时的关键考虑因素。

更新时间: 2024-04-10 21:19:33

领域: cs.DB,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.07354v1

Addressing the Abstraction and Reasoning Corpus via Procedural Example Generation

This work presents code to procedurally generate examples for the ARC training tasks. For each of the 400 tasks, an example generator following the transformation logic of the original examples was created. In effect, the assumed underlying distribution of examples for any given task was reverse engineered by implementing a means to sample from it. An attempt was made to cover an as large as reasonable space of possible examples for each task. That is, whenever the original examples of a given task may be limited in their diversity e.g. by having the dimensions of the grids, the set of symbols or number of objects constant or within tight bounds, even though the transformation does not require it, such constraints were lifted. Having access to not just a few examples per task, as the case for ARC, but instead very many, should enable a wide range of experiments that may be important stepping stones towards making leaps on the benchmark.

Updated: 2024-04-10 21:16:59

标题: 通过程序示例生成解决抽象和推理语料库

摘要: 这项工作提出了用于在ARC训练任务中生成示例的代码。针对每个400个任务，创建了一个遵循原始示例转换逻辑的示例生成器。实际上，通过实现从中采样的方法，逆向工程了任何给定任务的示例的假设基础分布。尝试涵盖每个任务可能的尽可能大的示例空间。也就是说，每当给定任务的原始示例可能受到限制（如网格的维度、符号集或对象数量恒定或在严格范围内），即使转换不需要，也会解除这些约束。与ARC仅有少量示例的情况不同，而是有很多示例，应该能够进行各种实验，这些实验可能是朝着在基准测试中取得重大进展的重要基石。

更新时间: 2024-04-10 21:16:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07353v1

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we introduce a novel method for simulating human gaze behavior. Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer, with the primary role of watching videos and simulating human gaze behavior. We employed an eye-tracking dataset gathered from videos generated by the VirtualHome simulator, with a primary focus on activity recognition. Our experimental results demonstrate the effectiveness of our gaze prediction method by highlighting its capability to replicate human gaze behavior and its applicability for downstream tasks where real human-gaze is used as input.

Updated: 2024-04-10 21:14:33

标题: 基于Transformer的模型用于预测视频中人类凝视行为

摘要: 利用人类注视在视频理解任务中的眼动追踪应用变得越来越重要。为了有效地根据眼动追踪数据自动化视频分析过程，准确复制人类注视行为至关重要。然而，由于人类注视模式的固有复杂性和模糊性，这项任务存在重大挑战。在这项工作中，我们介绍了一种模拟人类注视行为的新方法。我们的方法使用基于转换器的强化学习算法来训练一个充当人类观察者的代理，主要任务是观看视频并模拟人类注视行为。我们使用了从VirtualHome模拟器生成的视频收集的眼动追踪数据集，主要关注活动识别。我们的实验结果通过突出其复制人类注视行为的能力以及对下游任务的适用性，展示了我们的注视预测方法的有效性，在这些任务中，真实的人类注视被用作输入。

更新时间: 2024-04-10 21:14:33

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.07351v1

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial video. We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input. Our method utilizes a Graph Neural Network to recognize the agent's intention and predict the action sequence to fulfill this intention. To assess the efficiency of our approach, we collect a dataset containing household activities generated in the VirtualHome environment, accompanied by human gaze data of viewing videos. Our method outperforms state-of-the-art techniques, achieving a 7\% improvement in accuracy for 18-class intention recognition. This highlights the efficiency of our method in learning important features from human gaze data.

Updated: 2024-04-10 21:03:23

标题: 凝视引导的图神经网络用于基于意图的动作预测

摘要: 人类利用他们的注视来集中注意力于视频中的重要信息，同时感知和解释意图。将人类注视纳入计算算法可以显著提高视频理解任务中模型的性能。在这项工作中，我们解决了一个具有挑战性和创新性的视频理解任务：根据部分视频预测视频中代理人的动作。我们引入了Gaze-guided Action Anticipation算法，从视频输入中建立了一个视觉-语义图。我们的方法利用图神经网络识别代理人的意图，并预测动作序列以实现这一意图。为了评估我们方法的效率，我们收集了在VirtualHome环境中生成的家庭活动数据集，同时附带人类注视观看视频的数据。我们的方法优于最先进的技术，18类意图识别的准确率提高了7\%。这突显了我们的方法从人类注视数据中学习重要特征的高效性。

更新时间: 2024-04-10 21:03:23

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.07347v1

Indoor Location Fingerprinting Privacy: A Comprehensive Survey

The pervasive integration of Indoor Positioning Systems (IPS) arises from the limitations of Global Navigation Satellite Systems (GNSS) in indoor environments, leading to the widespread adoption of Location-Based Services (LBS). Specifically, indoor location fingerprinting employs diverse signal fingerprints from user devices, enabling precise location identification by Location Service Providers (LSP). Despite its broad applications across various domains, indoor location fingerprinting introduces a notable privacy risk, as both LSP and potential adversaries inherently have access to this sensitive information, compromising users' privacy. Consequently, concerns regarding privacy vulnerabilities in this context necessitate a focused exploration of privacy-preserving mechanisms. In response to these concerns, this survey presents a comprehensive review of Privacy-Preserving Mechanisms in Indoor Location Fingerprinting (ILFPPM) based on cryptographic, anonymization, differential privacy (DP), and federated learning (FL) techniques. We also propose a distinctive and novel grouping of privacy vulnerabilities, adversary and attack models, and available evaluation metrics specific to indoor location fingerprinting systems. Given the identified limitations and research gaps in this survey, we highlight numerous prospective opportunities for future investigation, aiming to motivate researchers interested in advancing this field. This survey serves as a valuable reference for researchers and provides a clear overview for those beyond this specific research domain.

Updated: 2024-04-10 21:02:58

标题: 室内定位指纹隐私保护：综合调查

摘要: 室内定位系统（IPS）的广泛整合源于全球导航卫星系统（GNSS）在室内环境中的限制，导致了基于位置的服务（LBS）的广泛应用。具体而言，室内定位指纹采用用户设备的多样信号指纹，使得位置服务提供商（LSP）能够精确识别位置。尽管室内定位指纹在各个领域具有广泛应用，但它引入了明显的隐私风险，因为LSP和潜在的对手都天然可以访问这些敏感信息，从而损害了用户的隐私。因此，对于这种上下文中的隐私漏洞的担忧，需要专注探索隐私保护机制。为了应对这些问题，本调查提供了一个基于密码学、匿名化、差分隐私（DP）和联邦学习（FL）技术的室内定位指纹隐私保护机制（ILFPPM）全面回顾。我们还提出了一个独特且新颖的隐私漏洞、对手和攻击模型以及适用于室内定位指纹系统的评估指标的分组。鉴于本调查中确定的限制和研究空白，我们强调了未来研究中众多前景机会，旨在激励对推进该领域感兴趣的研究人员。本调查为研究人员提供了有价值的参考，并为那些超出这一特定研究领域的人提供了清晰的概述。

更新时间: 2024-04-10 21:02:58

领域: cs.CR,eess.SP

下载: http://arxiv.org/abs/2404.07345v1

Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms

Young children spend substantial portions of their waking hours in noisy preschool classrooms. In these environments, children's vocal interactions with teachers are critical contributors to their language outcomes, but manually transcribing these interactions is prohibitive. Using audio from child- and teacher-worn recorders, we propose an automated framework that uses open source software both to classify speakers (ALICE) and to transcribe their utterances (Whisper). We compare results from our framework to those from a human expert for 110 minutes of classroom recordings, including 85 minutes from child-word microphones (n=4 children) and 25 minutes from teacher-worn microphones (n=2 teachers). The overall proportion of agreement, that is, the proportion of correctly classified teacher and child utterances, was .76, with an error-corrected kappa of .50 and a weighted F1 of .76. The word error rate for both teacher and child transcriptions was .15, meaning that 15% of words would need to be deleted, added, or changed to equate the Whisper and expert transcriptions. Moreover, speech features such as the mean length of utterances in words, the proportion of teacher and child utterances that were questions, and the proportion of utterances that were responded to within 2.5 seconds were similar when calculated separately from expert and automated transcriptions. The results suggest substantial progress in analyzing classroom speech that may support children's language development. Future research using natural language processing is under way to improve speaker classification and to analyze results from the application of the automated framework to a larger dataset containing classroom recordings from 13 children and 3 teachers observed on 17 occasions over one year.

Updated: 2024-04-10 21:02:41

标题: 谁说了什么？一种自动化分析学前班教室演讲的方法

摘要: 幼儿在嘈杂的学前班教室中度过了大部分清醒时间。在这种环境中，幼儿与教师的语言互动对他们的语言发展至关重要，但手工转录这些互动是不可行的。利用幼儿和教师佩戴的录音设备的音频，我们提出了一个自动化框架，使用开源软件来分类说话者(ALICE)和转录他们的话语(Whisper)。我们将框架的结果与人类专家对110分钟的教室录音进行比较，其中包括来自儿童话筒的85分钟录音(4名儿童)和来自教师话筒的25分钟录音(2名教师)。总体协议比例，即正确分类的教师和儿童话语的比例为0.76，误差校正kappa为0.50，加权F1为0.76。教师和儿童转录的单词错误率为0.15，意味着需要删除、添加或更改15%的单词，才能使Whisper和专家转录相等。此外，从专家和自动转录分别计算时，语音特征，如话语的平均长度、教师和儿童话语中问题的比例，以及在2.5秒内回答的话语比例，是相似的。结果表明在分析教室语音方面取得了实质性进展，可能支持儿童的语言发展。未来的研究将使用自然语言处理来改进说话者分类，并分析将自动化框架应用于一个包含13名儿童和3名教师在一年内17次观察的更大数据集的结果。

更新时间: 2024-04-10 21:02:41

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2401.07342v3

Benchmarking Algorithms for Federated Domain Generalization

While prior domain generalization (DG) benchmarks consider train-test dataset heterogeneity, we evaluate Federated DG which introduces federated learning (FL) specific challenges. Additionally, we explore domain-based heterogeneity in clients' local datasets - a realistic Federated DG scenario. Prior Federated DG evaluations are limited in terms of the number or heterogeneity of clients and dataset diversity. To address this gap, we propose an Federated DG benchmark methodology that enables control of the number and heterogeneity of clients and provides metrics for dataset difficulty. We then apply our methodology to evaluate 14 Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG. Our results suggest that despite some progress, there remain significant performance gaps in Federated DG particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Please check our extendable benchmark code here: https://github.com/inouye-lab/FedDG_Benchmark.

Updated: 2024-04-10 21:01:44

标题: 基准算法在联邦领域泛化中的效果比较

摘要: 在先前的域泛化（DG）基准考虑训练-测试数据集的异质性的同时，我们评估了引入联邦学习（FL）特定挑战的联邦DG。此外，我们探索了客户端本地数据集中的基于域的异质性 - 这是一个现实的联邦DG场景。先前的联邦DG评估在客户端数量或异质性以及数据集多样性方面存在局限性。为了填补这一差距，我们提出了一个联邦DG基准测试方法，该方法可以控制客户端的数量和异质性，并提供数据集难度的度量标准。然后，我们应用我们的方法来评估14种联邦DG方法，其中包括针对FL环境进行调整的集中式DG方法，处理客户端异质性的FL方法以及专门设计用于联邦DG的方法。我们的结果表明，尽管取得了一些进展，但在评估大量客户端、高客户端异质性或更现实的数据集时，联邦DG仍存在显着的性能差距。请查看我们可扩展的基准测试代码：https://github.com/inouye-lab/FedDG_Benchmark。

更新时间: 2024-04-10 21:01:44

领域: cs.LG

下载: http://arxiv.org/abs/2307.04942v2

Interactive Learning of Physical Object Properties Through Robot Manipulation and Database of Object Measurements

This work presents a framework for automatically extracting physical object properties, such as material composition, mass, volume, and stiffness, through robot manipulation and a database of object measurements. The framework involves exploratory action selection to maximize learning about objects on a table. A Bayesian network models conditional dependencies between object properties, incorporating prior probability distributions and uncertainty associated with measurement actions. The algorithm selects optimal exploratory actions based on expected information gain and updates object properties through Bayesian inference. Experimental evaluation demonstrates effective action selection compared to a baseline and correct termination of the experiments if there is nothing more to be learned. The algorithm proved to behave intelligently when presented with trick objects with material properties in conflict with their appearance. The robot pipeline integrates with a logging module and an online database of objects, containing over 24,000 measurements of 63 objects with different grippers. All code and data are publicly available, facilitating automatic digitization of objects and their physical properties through exploratory manipulations.

Updated: 2024-04-10 20:59:59

标题: 通过机器人操作和物体测量数据库进行物理物体属性的交互式学习

摘要: 这项工作提出了一个框架，通过机器人操作和物体测量数据库自动提取物理物体的属性，如材料组成、质量、体积和刚度。该框架涉及探索性动作选择，以最大化对桌上物体的学习。贝叶斯网络模拟物体属性之间的条件依赖关系，结合先验概率分布和与测量动作相关的不确定性。该算法基于期望的信息增益选择最佳的探索性动作，并通过贝叶斯推理更新物体属性。实验评估表明，与基线相比，动作选择有效，并在没有更多可学习的情况下正确终止实验。当遇到材料属性与外观相矛盾的欺诈性物体时，该算法表现出智能行为。机器人管道集成了记录模块和包含63个不同夹具的超过24,000个物体测量的在线数据库。所有代码和数据都是公开可用的，促进了通过探索性操作自动数字化物体及其物理属性。

更新时间: 2024-04-10 20:59:59

领域: cs.RO,cs.AI,cs.IT,math.IT,I.2.9

下载: http://arxiv.org/abs/2404.07344v1

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseudo-labeled data results in remarkable improvements in relative Word Error Rate (WER) by 11.5% and 24.3% for our asynchronous and realtime models, respectively. Additionally, the model is more robust to background noise owing to the addition of these data. The results obtained in this study demonstrate that the incorporation of pseudo-labeled publicly available data is a highly effective strategy for improving ASR accuracy and noise robustness.

Updated: 2024-04-10 20:40:24

标题: Conformer-1: 通过大规模半监督自举实现稳健的自动语音识别

摘要: 本文介绍了Conformer-1，这是一个端到端的自动语音识别（ASR）模型，经过对570,000小时语音音频数据的训练，其中91%来自公开可获取的来源。为了实现这一目标，我们在利用强大的Conformer RNN-T基线模型为未标记的公共数据生成伪标签后进行了嘈杂学生训练。这些伪标记数据的添加使得相对词错误率（WER）分别提高了11.5%和24.3%，对于我们的异步和实时模型来说。此外，由于这些数据的添加，模型对背景噪音更加稳健。本研究的结果表明，将伪标记的公开可获取数据纳入ASR中是提高准确性和抗噪声性能的高效策略。

更新时间: 2024-04-10 20:40:24

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.07341v1

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. This benchmark focuses on syntax-aware completions of program structures such as code blocks and conditional expressions, and includes 17,720 examples from multiple programming languages, sourced from recent code submissions after April 2022 to minimize data contamination. SAFIM provides a robust framework with various prompt designs and novel syntax-aware post-processing techniques, facilitating accurate and fair comparisons across LLMs. Our comprehensive evaluation of 15 LLMs shows that FIM pretraining not only enhances FIM proficiency but also improves Left-to-Right (L2R) inference using LLMs. Our findings challenge conventional beliefs and suggest that pretraining methods and data quality have more impact than model size. SAFIM thus serves as a foundational platform for future research in effective pretraining strategies for code LLMs. The evaluation toolkit and dataset are available at https://github.com/gonglinyuan/safim, and the leaderboard is available at https://safimbenchmark.com.

Updated: 2024-04-10 20:26:31

标题: Syntax-Aware Code Fill-in-the-Middle Tasks中LLMs的评估

摘要: 我们引入了Syntax-Aware Fill-In-the-Middle（SAFIM），这是一个新的基准，用于评估大型语言模型（LLMs）在填充中间（FIM）任务上的表现。该基准侧重于对程序结构（如代码块和条件表达式）进行语法感知的完成，并包括来自多种编程语言的17,720个示例，这些示例是从2022年4月之后的最近代码提交中获取的，旨在最小化数据污染。SAFIM提供了一个稳健的框架，具有各种提示设计和新颖的语法感知后处理技术，有助于在LLMs之间进行准确和公平的比较。我们对15个LLMs进行了全面评估，结果显示，FIM预训练不仅增强了FIM的熟练度，还提高了LLMs使用左到右（L2R）推理的能力。我们的发现挑战了常规信念，并表明预训练方法和数据质量比模型大小更有影响。因此，SAFIM可作为未来研究中有效的代码LLMs预训练策略的基础平台。评估工具包和数据集可在https://github.com/gonglinyuan/safim找到，排行榜可在https://safimbenchmark.com找到。

更新时间: 2024-04-10 20:26:31

领域: cs.CL,cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2403.04814v2

A Modified Depolarization Approach for Efficient Quantum Machine Learning

Quantum Computing in the Noisy Intermediate-Scale Quantum (NISQ) era has shown promising applications in machine learning, optimization, and cryptography. Despite the progress, challenges persist due to system noise, errors, and decoherence that complicate the simulation of quantum systems. The depolarization channel is a standard tool for simulating a quantum system's noise. However, modeling such noise for practical applications is computationally expensive when we have limited hardware resources, as is the case in the NISQ era. We propose a modified representation for a single-qubit depolarization channel with two Kraus operators based only on X and Z Pauli matrices. Our approach reduces the computational complexity from six to four matrix multiplications per execution of a channel. Experiments on a Quantum Machine Learning (QML) model on the Iris dataset across various circuit depths and depolarization rates validate that our approach maintains the model's accuracy while improving efficiency. This simplified noise model enables more scalable simulations of quantum circuits under depolarization, advancing capabilities in the NISQ era.

Updated: 2024-04-10 20:17:40

标题: 一种用于高效量子机器学习的改进去极化方法

摘要: 在嘈杂的中间规模量子（NISQ）时代，量子计算显示出在机器学习、优化和密码学方面具有潜在应用。尽管取得了进展，但由于系统噪声、错误和退相干而导致的挑战仍然存在，这些因素使得量子系统的模拟变得复杂。去极化通道是模拟量子系统噪声的标准工具。然而，在NISQ时代硬件资源有限的情况下，将此类噪声建模用于实际应用在计算上是昂贵的。我们提出了一种基于X和Z Pauli矩阵的两个Kraus算子的单比特去极化通道的修改表示。我们的方法将每次执行通道的计算复杂性从六次降低到四次矩阵乘法。在鸢尾花数据集上进行的量子机器学习（QML）模型实验，涵盖了不同电路深度和去极化率，验证了我们的方法在提高效率的同时保持了模型的准确性。这种简化的噪声模型使得在去极化条件下更具可扩展性地模拟量子电路，推动了NISQ时代的能力。

更新时间: 2024-04-10 20:17:40

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2404.07330v1

A Quantitative Autonomy Quantification Framework for Fully Autonomous Robotic Systems

Although autonomous functioning facilitates deployment of robotic systems in domains that admit limited human oversight on our planet and beyond, finding correspondence between task requirements and autonomous capability is still an open challenge. Consequently, a number of methods for quantifying autonomy have been proposed over the last three decades, but to our knowledge all these have no discernment of sub-mode features of variation of autonomy and some are based on metrics that violet the Goodhart's law. This paper focuses on the full autonomous mode and proposes a quantitative autonomy assessment framework based on task requirements. The framework starts by establishing robot task characteristics from which three autonomy metrics, namely requisite capability set, reliability and responsiveness are derived. These characteristics were founded on the realization that robots ultimately replace human skilled workers, from which a relationship between human job and robot task characteristics was established. Additionally, mathematical functions mapping metrics to autonomy as a two-part measure, namely of level and degree of autonomy are also presented. The distinction between level and degree of autonomy stemmed from the acknowledgment that autonomy is not just a question of existence, but also one of performance of requisite capability. The framework has been demonstrated on two case studies, namely autonomous vehicle at an on-road dynamic driving task and the DARPA subterranean challenge rules analysis. The framework provides not only a tool for quantifying autonomy, but also a regulatory interface and common language for autonomous systems developers and users. Its greatest feature is the ability to monitor system integrity when implemented online.

Updated: 2024-04-10 20:04:59

标题: 一个针对完全自主机器人系统的定量自主度量框架

摘要: 尽管自主功能促进了机器人系统在地球内外需要有限人类监督的领域的部署，但在任务要求和自主能力之间找到对应关系仍然是一个开放性挑战。因此，在过去三十年中提出了许多用于量化自主性的方法，但据我们所知，所有这些方法都没有区分自主性变化的子模式特征，并且一些方法基于违反Goodhart定律的指标。本文关注完全自主模式，并提出了一个基于任务要求的量化自主性评估框架。该框架首先从建立机器人任务特征开始，从中衍生出三个自主性指标，即必要能力集、可靠性和响应性。这些特征建立在一个认识上，即机器人最终取代人类熟练工作者，从而建立了人类工作与机器人任务特征之间的关系。此外，还提供了将指标映射到自主性作为两部分度量的数学函数，即自主性的级别和程度。自主性级别和程度之间的区别源于认识到自主性不仅是存在的问题，还涉及到必要能力的执行性能。该框架已在两个案例研究中进行了演示，即在道路动态驾驶任务中的自主车辆和DARPA地下挑战规则分析中。该框架不仅提供了量化自主性的工具，还为自主系统的开发者和用户提供了一个监管接口和共同语言。其最大特点是在在线实施时监控系统的完整性。

更新时间: 2024-04-10 20:04:59

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2311.01939v2

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

Updated: 2024-04-10 19:54:28

标题: 在嘈杂环境中的政策优化：连续控制中的回报景观

摘要: 连续控制的深度强化学习代理在性能上知道会表现出明显的不稳定性。在这项工作中，我们通过研究回报景观提供了对这些行为的新视角：即策略和回报之间的映射。我们发现流行的算法在这个景观的噪音邻域中穿行，其中对策略参数的单次更新会导致各种回报。通过对这些回报进行分布性观察，我们对景观进行了映射，表征了策略空间中容易失败的区域，并揭示了策略质量的一个隐藏维度。我们展示了这个景观展现出惊人的结构，通过在参数空间中找到简单的路径来提高策略的稳定性。最后，我们开发了一个分布意识的程序，可以找到这种路径，远离噪音邻域，以提高策略的鲁棒性。综上所述，我们的结果为优化、评估和代理设计提供了新的见解。

更新时间: 2024-04-10 19:54:28

领域: cs.LG

下载: http://arxiv.org/abs/2309.14597v3

Rethinking Perceptual Metrics for Medical Image Translation

Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in general computer vision. In this paper, we investigate evaluation metrics for medical image translation on two medical image translation tasks (GE breast MRI to Siemens breast MRI and lumbar spine MRI to CT), tested on various state-of-the-art translation methods. We show that perceptual metrics do not generally correlate with segmentation metrics due to them extending poorly to the anatomical constraints of this sub-field, with FID being especially inconsistent. However, we find that the lesser-used pixel-level SWD metric may be useful for subtle intra-modality translation. Our results demonstrate the need for further research into helpful metrics for medical image translation.

Updated: 2024-04-10 19:39:43

标题: 重新思考医学图像翻译的感知度量标准

摘要: 现代医学图像转换方法使用生成模型来执行任务，例如将CT图像转换为MRI。评估这些方法通常依赖于目标领域中选择的某些下游任务，例如分割。另一方面，无任务度量指标很有吸引力，例如网络特征为基础的感知度量指标（例如FID），这些指标通常适用于普通计算机视觉中的图像转换。在本文中，我们研究了医学图像转换的评估指标，针对两个医学图像转换任务（GE乳腺MRI到西门子乳腺MRI和腰椎MRI到CT），对各种最先进的转换方法进行了测试。我们发现，感知度量通常与分割度量不相关，因为它们在解剖约束方面表现不佳，尤其是FID不稳定。然而，我们发现较少使用的像素级SWD度量在微妙的内模态转换中可能是有用的。我们的结果表明需要进一步研究有助于医学图像转换的有用度量标准。

更新时间: 2024-04-10 19:39:43

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07318v1

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Updated: 2024-04-10 19:34:38

标题: 鹰与雀：具有矩阵值状态和动态循环的RWKV

摘要: 我们提出了Eagle（RWKV-5）和Finch（RWKV-6）这两个序列模型，对RWKV（RWKV-4）架构进行了改进。我们的架构设计包括多头矩阵值状态和动态循环机制，提高了表达能力同时保持了RNN推理效率特征。我们引入了一个包含1.12万亿令牌的新的多语言语料库，并基于贪婪匹配设计了一个快速的分词器，以增强多语言能力。我们训练了四个Eagle模型，参数范围从0.46到75亿，以及两个Finch模型，参数分别为16亿和31亿，发现它们在各种基准测试中表现竞争力。我们在HuggingFace上以Apache 2.0许可证发布了所有模型。模型网址为：https://huggingface.co/RWKV，训练代码网址为：https://github.com/RWKV/RWKV-LM，推理代码网址为：https://github.com/RWKV/ChatRWKV，时间并行训练代码网址为：https://github.com/RWKV/RWKV-infctx-trainer。

更新时间: 2024-04-10 19:34:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.05892v2

Structured Reinforcement Learning for Media Streaming at the Wireless Edge

Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$\mu$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase QoE by over 30%.

Updated: 2024-04-10 19:25:51

标题: 边缘无线媒体流媒体的结构化强化学习

摘要: 流媒体是在无线边缘（接入）网络上的主要应用。这些网络日益软件化导致了智能控制的努力，其中可以动态采取特定于应用程序的操作以增强用户体验。本文的目标是开发和展示基于学习的策略，用于确定在视频流设置中动态优先考虑哪些客户端以进行最佳决策。我们将策略设计问题形式化为受限马尔可夫决策问题（CMDP），并观察到通过使用Lagrangian松弛，我们可以将其分解为单客户端问题。此外，最佳策略采用视频缓冲区长度的阈值形式，这使我们能够设计出一种高效的受限强化学习（CRL）算法来学习它。具体而言，我们展示了一种基于自然策略梯度（NPG）的算法，该算法是根据我们问题的结构推导出来的，收敛到全局最优策略。然后，我们为培训开发了一个仿真环境，并为评估附加了一个实际的智能控制器连接到WiFi接入点。我们通过实证表明，结构化学习方法可以实现快速学习。此外，由于低计算复杂性，这样的结构化策略可以轻松部署，从而使策略执行仅约需要15μs。在资源受限的场景中使用YouTube流媒体实验，我们展示了CRL方法可以将QoE提高超过30%。

更新时间: 2024-04-10 19:25:51

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2404.07315v1

AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth

Process refinement to consistently produce high-quality material over a large area of the grown crystal, enabling various applications from optics crystals to quantum detectors, has long been a goal for diamond growth. Machine learning offers a promising path toward this goal, but faces challenges such as the complexity of features within datasets, their time-dependency, and the volume of data produced per growth run. Accurate spatial feature extraction from image to image for real-time monitoring of diamond growth is crucial yet complicated due to the low-volume and high feature complexity nature of the datasets. This paper compares various traditional and machine learning-driven approaches for feature extraction in the diamond growth domain, proposing a novel deep learning-driven semantic segmentation approach to isolate and classify accurate pixel masks of geometric features like diamond, pocket holder, and background, along with their derivative features based on shape and size. Using an annotation-focused human-in-the-loop software architecture for training datasets, with modules for selective data labeling using active learning, data augmentations, and model-assisted labeling, our approach achieves effective annotation accuracy and drastically reduces labeling time and cost. Deep learning algorithms prove highly efficient in accurately learning complex representations from datasets with many features. Our top-performing model, based on the DeeplabV3plus architecture, achieves outstanding accuracy in classifying features of interest, with accuracies of 96.31% for pocket holder, 98.60% for diamond top, and 91.64% for diamond side features.

Updated: 2024-04-10 19:16:08

标题: 基于AI引导的特征分割技术用于对单晶金刚石生长特征进行建模

摘要: 针对在钻石生长过程中持续产生高质量材料的需求，以实现光学晶体到量子探测器等各种应用，工艺优化长期以来一直是钻石生长的目标。机器学习为实现这一目标提供了一个有前途的途径，但面临诸如数据集中特征的复杂性、它们的时间依赖性以及每次生长运行产生的数据量等挑战。准确从图像中提取空间特征以实时监测钻石生长对于数据集的低容量和高特征复杂性本质而言至关重要，但也很复杂。本文比较了钻石生长领域中各种传统和机器学习驱动的特征提取方法，提出了一种新颖的基于深度学习的语义分割方法，用于隔离和分类准确的几何特征像钻石、袋座和背景，以及基于形状和大小的导数特征。通过使用注释为重点的人机协同软件架构进行训练数据集，其中包括用于选择性数据标记的主动学习、数据增强和模型辅助标记的模块，我们的方法实现了有效的注释准确性，并大大减少了标记时间和成本。深度学习算法在准确学习具有许多特征的数据集中的复杂表示方面表现出高效性。我们基于DeeplabV3plus架构的最佳模型在分类感兴趣特征方面取得了出色的准确性，袋座的准确率为96.31%，钻石顶部为98.60%，钻石侧面特征为91.64%。

更新时间: 2024-04-10 19:16:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08017v1

Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the return as the sole ``correct'' optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic returns, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected returns into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term return for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodicity transformations from data and demonstrate its effectiveness in an instructive, non-ergodic environment and on standard RL benchmarks.

Updated: 2024-04-10 19:15:07

标题: 强化学习中的非遍历性：通过遍历性转换实现的稳健性

摘要: 强化学习（RL）的设想应用领域包括自动驾驶、精准农业和金融，这些领域都需要RL代理在现实世界中做出决策。阻碍RL方法在这些领域中被采用的一个重要挑战是传统算法的非鲁棒性。本文认为，导致这种缺乏鲁棒性的基本问题在于将回报的期望值作为唯一“正确”的优化目标。期望值是对无穷多轨迹的统计集合的平均值。对于非遍历性回报，这种平均值与单个但无穷长轨迹的平均值不同。因此，优化期望值可能导致产生几乎肯定会导致灾难性结果的策略，尽管其回报可能为零。通过将收集到的回报的时间序列转换为具有遍历增量的时间序列，可以避免这个问题。这种转换使得通过优化个体代理的长期回报来学习鲁棒策略，而不是在无穷多轨迹上的平均值。我们提出了一种从数据中学习遍历性转换的算法，并在一个有启发性的非遍历性环境和标准RL基准上展示了其有效性。

更新时间: 2024-04-10 19:15:07

领域: cs.LG

下载: http://arxiv.org/abs/2310.11335v2

A benchmark for computational analysis of animal behavior, using animal-borne tags

Animal-borne sensors ('bio-loggers') can record a suite of kinematic and environmental data, which can elucidate animal ecophysiology and improve conservation efforts. Machine learning techniques are used for interpreting the large amounts of data recorded by bio-loggers, but there exists no common framework for comparing the different machine learning techniques in this domain. To address this, we present the Bio-logger Ethogram Benchmark (BEBE), a collection of datasets with behavioral annotations, as well as a modeling task and evaluation metrics. BEBE is to date the largest, most taxonomically diverse, publicly available benchmark of this type, and includes 1654 hours of data collected from 149 individuals across nine taxa. In addition, using BEBE, we test a novel self-supervised learning approach to identifying animal behaviors based on bio-logger data, using a deep neural network pre-trained with self-supervision on data collected from human wrist-worn accelerometers. We show that this approach out-performs common alternatives, especially in a setting with a low amount of training data. Datasets, models, and evaluation code are made publicly available at https://github.com/earthspecies/BEBE, to enable community use of BEBE as a point of comparison in methods development.

Updated: 2024-04-10 19:13:09

标题: 使用动物携带标签进行动物行为计算分析的基准测试

摘要: 动物携带的传感器（“生物记录器”）可以记录一系列运动和环境数据，这些数据可以阐明动物生态生理学并改进保护工作。机器学习技术被用于解释生物记录器记录的大量数据，但在这个领域还没有一个共同的框架来比较不同的机器学习技术。为了解决这个问题，我们提出了“生物记录器行为表征基准”（BEBE），这是一个带有行为注释的数据集合，以及一个建模任务和评估指标。到目前为止，BEBE是最大的、种类最多样化的、公开可用的这种基准，包括来自九个分类单元的149个个体所收集的1654小时的数据。此外，使用BEBE，我们测试了一种新颖的自监督学习方法，用于基于生物记录器数据识别动物行为，使用了一个在人类手腕佩戴的加速度计数据上进行自监督预训练的深度神经网络。我们展示了这种方法在一个训练数据量较少的情况下表现优于常见的替代方法。数据集、模型和评估代码都可以在https://github.com/earthspecies/BEBE上公开获取，以便社区使用BEBE作为方法开发中的比较基准。

更新时间: 2024-04-10 19:13:09

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2305.10740v2

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used to model various real-life phenomena: earthquakes, operations on stock markets, neuronal activity, virus propagation and many others. In this paper, we focus on MHPs with exponential decay kernels and estimate connectivity graphs, which represent the Granger causal relations between their components. We approach this inference problem by proposing an optimization criterion and model selection algorithm based on the minimum message length (MML) principle. MML compares Granger causal models using the Occam's razor principle in the following way: even when models have a comparable goodness-of-fit to the observed data, the one generating the most concise explanation of the data is preferred. While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings. We conduct a numerical study comparing the proposed algorithm to other related classical and state-of-art methods, where we achieve the highest F1 scores in specific sparse graph settings. We illustrate the proposed method also on G7 sovereign bond data and obtain causal connections, which are in agreement with the expert knowledge available in the literature.

Updated: 2024-04-10 19:03:58

标题: Granger因果推断在多变量霍克斯过程中的应用：最小消息长度

摘要: 多元霍克斯过程（MHPs）是一种多功能概率工具，用于建模各种真实生活现象：地震、股票市场操作、神经元活动、病毒传播等等。在本文中，我们重点关注具有指数衰减核的MHPs，并估计连接图，这些图代表它们组成部分之间的Granger因果关系。我们通过提出一个基于最小消息长度（MML）原则的优化标准和模型选择算法来解决这个推理问题。MML使用奥卡姆剃刀原则比较Granger因果模型，即使模型对观察数据的拟合程度相当，也会首选生成对数据最简洁说明的模型。尽管大多数使用套索类型惩罚的最先进方法在短时间范围内倾向于过拟合，但所提出的基于MML的方法在这些场景中实现了高F1分数。我们进行了数值研究，将所提出的算法与其他相关的经典和最先进的方法进行比较，在特定的稀疏图设置中获得最高的F1分数。我们还在G7主权债券数据上演示了所提出的方法，并获得了与文献中专家知识一致的因果连接。

更新时间: 2024-04-10 19:03:58

领域: cs.LG

下载: http://arxiv.org/abs/2309.02027v2

Transfer Learning via Latent Dependency Factor for Estimating PM 2.5

Air pollution, especially particulate matter 2.5 (PM 2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the datasets. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer models using LDF have a $19.34\%$ improvement over the best-performing baselines. We additionally support our experiments with qualitative results.

Updated: 2024-04-10 19:01:44

标题: 通过潜在依赖因子进行转移学习以估算PM 2.5

摘要: 空气污染，特别是细颗粒物（PM2.5），是公共卫生的一个紧迫问题，由于缺乏地面传感器，在发展中国家（数据匮乏地区）很难估计。转移学习模型可以利用备用数据来源来解决这个问题（即来自数据丰富地区的数据）。然而，当前的转移学习方法并未考虑源域和目标域之间的依赖关系。我们将这一转移问题视为空间转移学习，并提出了一个新的特征，名为潜在依赖因子（LDF），它捕获了两个域的空间和语义依赖关系，并随后添加到数据集中。我们使用一种新颖的两阶段自动编码器模型生成LDF，该模型从相似的源域和目标域数据聚类中学习。我们的实验表明，使用LDF的转移模型比表现最佳的基线模型提高了19.34％。我们还通过定性结果支持我们的实验。

更新时间: 2024-04-10 19:01:44

领域: cs.LG

下载: http://arxiv.org/abs/2404.07308v1

AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth

From a process development perspective, diamond growth via chemical vapor deposition has made significant strides. However, challenges persist in achieving high quality and large-area material production. These difficulties include controlling conditions to maintain uniform growth rates for the entire growth surface. As growth progresses, various factors or defect states emerge, altering the uniform conditions. These changes affect the growth rate and result in the formation of crystalline defects at the microscale. However, there is a distinct lack of methods to identify these defect states and their geometry using images taken during the growth process. This paper details seminal work on defect segmentation pipeline using in-situ optical images to identify features that indicate defective states that are visible at the macroscale. Using a semantic segmentation approach as applied in our previous work, these defect states and corresponding derivative features are isolated and classified by their pixel masks. Using an annotation focused human-in-the-loop software architecture to produce training datasets, with modules for selective data labeling using active learning, data augmentations, and model-assisted labeling, our approach achieves effective annotation accuracy and drastically reduces the time and cost of labeling by orders of magnitude. On the model development front, we found that deep learning-based algorithms are the most efficient. They can accurately learn complex representations from feature-rich datasets. Our best-performing model, based on the YOLOV3 and DeeplabV3plus architectures, achieved excellent accuracy for specific features of interest. Specifically, it reached 93.35% accuracy for center defects, 92.83% for polycrystalline defects, and 91.98% for edge defects.

Updated: 2024-04-10 18:58:05

标题: 人工智能引导的缺陷检测技术用于建模单晶金刚石生长

摘要: 从工艺发展的角度来看，化学气相沉积法生长金刚石取得了显著进展。然而，实现高质量和大面积材料生产仍然面临挑战。这些困难包括控制条件以维持整个生长表面的均匀生长速度。随着生长的进行，各种因素或缺陷状态出现，改变了均匀的条件。这些变化影响了生长速率，并导致微观尺度上形成晶体缺陷。然而，在生长过程中缺乏方法来识别这些缺陷状态及其几何形状。本文详细介绍了利用原位光学图像进行缺陷分割管道的开创性工作，以识别可在宏观上可见的缺陷状态。利用我们之前的工作中应用的语义分割方法，这些缺陷状态和相应的派生特征通过其像素掩模被孤立和分类。利用着重于注释的人机交互软件架构生成训练数据集，具有用于主动学习、数据增强和模型辅助标注的选择性数据标记模块，我们的方法实现了有效的注释准确性，并将标注的时间和成本大幅降低数个数量级。在模型开发方面，我们发现基于深度学习的算法是最有效的。它们可以从特征丰富的数据集中准确学习复杂的表示。我们表现最佳的模型基于YOLOV3和DeeplabV3plus架构，对感兴趣的特定特征达到了优秀的准确性。具体而言，对于中心缺陷达到了93.35%的准确率，对于多晶缺陷达到了92.83%，对于边缘缺陷达到了91.98%。

更新时间: 2024-04-10 18:58:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.07306v1

Provable Privacy with Non-Private Pre-Processing

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

Updated: 2024-04-10 18:50:26

标题: 具有非私有预处理的可证明隐私

摘要: 在分析差分隐私（DP）机器学习流程时，经常忽视数据相关预处理的潜在隐私成本。在这项工作中，我们提出了一个通用框架，用于评估非私有数据相关预处理算法所引起的额外隐私成本。我们的框架通过利用两个新的技术概念来建立整体隐私保证的上限：一种名为平滑DP的DP变体和预处理算法的有界灵敏度。除了通用框架外，我们还为多个数据相关预处理算法（如数据填充、量化、去重和PCA）提供明确的整体隐私保证，当它们与几个DP算法结合使用时。值得注意的是，这个框架也很简单实现，可以直接集成到现有的DP流程中。

更新时间: 2024-04-10 18:50:26

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.13041v3

An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning

Various tasks in data science are modeled utilizing the variational regularization approach, where manually selecting regularization parameters presents a challenge. The difficulty gets exacerbated when employing regularizers involving a large number of hyperparameters. To overcome this challenge, bilevel learning can be employed to learn such parameters from data. However, neither exact function values nor exact gradients with respect to the hyperparameters are attainable, necessitating methods that only rely on inexact evaluation of such quantities. State-of-the-art inexact gradient-based methods a priori select a sequence of the required accuracies and cannot identify an appropriate step size since the Lipschitz constant of the hypergradient is unknown. In this work, we propose an algorithm with backtracking line search that only relies on inexact function evaluations and hypergradients and show convergence to a stationary point. Furthermore, the proposed algorithm determines the required accuracy dynamically rather than manually selected before running it. Our numerical experiments demonstrate the efficiency and feasibility of our approach for hyperparameter estimation on a range of relevant problems in imaging and data science such as total variation and field of experts denoising and multinomial logistic regression. Particularly, the results show that the algorithm is robust to its own hyperparameters such as the initial accuracies and step size.

Updated: 2024-04-10 18:49:08

标题: 一个自适应不精确的一阶方法用于双层优化，并应用于超参数学习

摘要: 数据科学中的各种任务利用变分正则化方法建模，其中手动选择正则化参数是一项挑战。当使用涉及大量超参数的正则化器时，困难会加剧。为了克服这一挑战，可以使用双层学习来从数据中学习这些参数。然而，既不能获得准确的函数值，也不能获得关于超参数的准确梯度，这要求仅依赖于这些量的不精确评估的方法。目前最先进的基于不精确梯度的方法事先选择所需精度的序列，并且无法确定适当的步长，因为超梯度的Lipschitz常数是未知的。在这项工作中，我们提出了一种使用回溯线搜索的算法，该算法仅依赖于不精确的函数评估和超梯度，并展示了收敛到一个稳定点。此外，所提出的算法动态确定所需的精度，而不是在运行之前手动选择。我们的数值实验表明，我们的方法在成像和数据科学中的一系列相关问题上进行超参数估计的效率和可行性。例如总变差和专家领域去噪以及多项逻辑回归。特别是，结果表明该算法对其自身的超参数（如初始精度和步长）是鲁棒的。

更新时间: 2024-04-10 18:49:08

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2308.10098v2

Predicting Mergers and Acquisitions in Competitive Industries: A Model Based on Temporal Dynamics and Industry Networks

M&A activities are pivotal for market consolidation, enabling firms to augment market power through strategic complementarities. Existing research often overlooks the peer effect, the mutual influence of M&A behaviors among firms, and fails to capture complex interdependencies within industry networks. Common approaches suffer from reliance on ad-hoc feature engineering, data truncation leading to significant information loss, reduced predictive accuracy, and challenges in real-world application. Additionally, the rarity of M&A events necessitates data rebalancing in conventional models, introducing bias and undermining prediction reliability. We propose an innovative M&A predictive model utilizing the Temporal Dynamic Industry Network (TDIN), leveraging temporal point processes and deep learning to adeptly capture industry-wide M&A dynamics. This model facilitates accurate, detailed deal-level predictions without arbitrary data manipulation or rebalancing, demonstrated through superior evaluation results from M&A cases between January 1997 and December 2020. Our approach marks a significant improvement over traditional models by providing detailed insights into M&A activities and strategic recommendations for specific firms.

Updated: 2024-04-10 18:48:19

标题: 预测竞争性行业中的兼并与收购：基于时间动态和行业网络的模型

摘要: 并购活动对市场整合至关重要，使得企业能够通过战略互补增强市场实力。现有研究经常忽视同行效应，即企业之间并购行为的相互影响，并未捕捉到行业网络内复杂的相互依赖关系。常见的方法受到对特征工程的临时依赖、数据截断导致重要信息的丢失、降低的预测准确性以及在实际应用中的挑战等问题的影响。此外，并购事件的稀缺性需要在传统模型中进行数据再平衡，引入偏见并削弱预测可靠性。我们提出了一种创新的并购预测模型，利用时间动态产业网络（TDIN），利用时间点过程和深度学习灵活捕捉全行业并购动态。该模型能够准确、详细地预测交易级别，无需任意数据处理或再平衡，通过对1997年1月至2020年12月的并购案例的优越评估结果加以证明。我们的方法相比传统模型有了显著改进，能够提供对并购活动和特定企业的战略建议的详细见解。

更新时间: 2024-04-10 18:48:19

领域: q-fin.ST,cs.LG,cs.SI,q-fin.GN

下载: http://arxiv.org/abs/2404.07298v1

ONNXPruner: ONNX-Based General Model Pruning Adapter

Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning.

Updated: 2024-04-10 18:36:25

标题: ONNXPruner：基于ONNX的通用模型剪枝适配器

摘要: 最近在模型修剪方面取得的进展主要集中在开发新算法和改进基准。然而，这些算法在各种模型和平台上的实际应用仍然是一个重大挑战。为了解决这一挑战，我们提出了ONNXPruner，这是一个专为ONNX格式模型设计的多功能修剪适配器。ONNXPruner简化了在不同深度学习框架和硬件平台上的适配过程。ONNXPruner的一个创新之处在于其使用节点关联树，这些树可以自动适应各种模型架构。这些树澄清了节点之间的结构关系，指导修剪过程，特别突出了对互相关联节点的影响。此外，我们引入了一种基于树级别的评估方法。通过利用节点关联树，这种方法允许进行超越传统单节点评估的全面分析，提高修剪性能而无需额外操作。跨多个模型和数据集的实验证实了ONNXPruner的强大适应性和增强效果。我们的工作旨在推动模型修剪的实际应用。

更新时间: 2024-04-10 18:36:25

领域: cs.LG

下载: http://arxiv.org/abs/2404.08016v1

Certifying almost all quantum states with few single-qubit measurements

Certifying that an n-qubit state synthesized in the lab is close to the target state is a fundamental task in quantum information science. However, existing rigorous protocols either require deep quantum circuits or exponentially many single-qubit measurements. In this work, we prove that almost all n-qubit target states, including those with exponential circuit complexity, can be certified from only O(n^2) single-qubit measurements. This result is established by a new technique that relates certification to the mixing time of a random walk. Our protocol has applications for benchmarking quantum systems, for optimizing quantum circuits to generate a desired target state, and for learning and verifying neural networks, tensor networks, and various other representations of quantum states using only single-qubit measurements. We show that such verified representations can be used to efficiently predict highly non-local properties that would otherwise require an exponential number of measurements. We demonstrate these applications in numerical experiments with up to 120 qubits, and observe advantage over existing methods such as cross-entropy benchmarking (XEB).

Updated: 2024-04-10 18:21:11

标题: 用少量单比特测量认证几乎所有的量子态

摘要: 在实验室中合成的n比特态接近目标态的认证是量子信息科学中的基本任务。然而，现有的严格协议要么需要深度量子电路，要么需要指数多的单比特测量。在这项工作中，我们证明几乎所有n比特目标态，包括那些具有指数电路复杂度的目标态，都可以通过仅O(n^2)个单比特测量来认证。这一结果是通过一种将认证与随机行走的混合时间相关联的新技术建立的。我们的协议在量子系统基准测试、优化量子电路生成所需目标态以及学习和验证神经网络、张量网络和各种其他表示量子态的应用中具有作用，而仅使用单比特测量。我们展示了这些经过验证的表示可以用于高效预测高度非局部的性质，否则将需要指数数量的测量。我们在高达120个比特的数值实验中展示了这些应用，并观察到与现有方法（如交叉熵基准测试XEB）相比的优势。

更新时间: 2024-04-10 18:21:11

领域: quant-ph,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.07281v1

Elucidating the Exposure Bias in Diffusion Models

Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.

Updated: 2024-04-10 18:13:00

标题: 阐明扩散模型中的暴露偏差

摘要: 扩散模型展示了令人印象深刻的生成能力，但它们的“曝光偏差”问题，即训练和抽样之间的输入不匹配，缺乏深入探讨。本文系统地研究了扩散模型中的曝光偏差问题，首先通过分析建模抽样分布，然后将每个抽样步骤的预测误差归因为曝光偏差问题的根本原因。此外，我们讨论了这个问题的潜在解决方案，并提出了一个直观的度量标准。除了阐明曝光偏差，我们提出了一种简单而有效的无需训练的方法，称为Epsilon Scaling，以缓解曝光偏差。我们展示了Epsilon Scaling明确地将抽样轨迹移到训练阶段学习的矢量场附近，通过缩小网络输出来减轻训练和抽样之间的输入不匹配。在各种扩散框架（ADM、DDIM、EDM、LDM、DiT、PFGM++）上的实验验证了我们方法的有效性。值得注意的是，我们的ADM-ES作为最先进的随机采样器，在CIFAR-10上的100步无条件生成中获得了2.17的FID。代码可在\url{https://github.com/forever208/ADM-ES}和\url{https://github.com/forever208/EDM-ES}找到。

更新时间: 2024-04-10 18:13:00

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2308.15321v6

Is Your LLM Outdated? Benchmarking LLMs & Alignment Algorithms for Time-Sensitive Knowledge

We study the appropriateness of Large Language Models (LLMs) as knowledge repositories. We focus on the challenge of maintaining LLMs' factual knowledge up-to-date over time. Motivated by the lack of studies on identifying outdated knowledge within LLMs, we design and develop a dynamic benchmark with up-to-date ground truth answers for each target factual question. We evaluate eighteen open-source and closed-source state-of-the-art LLMs on time-sensitive knowledge retrieved in real-time from Wikidata. We select time-sensitive domain facts in politics, sports, and organizations, and estimate the recency of the information learned by the model during pre-training\fine-tuning. In the second contribution, we evaluate the effectiveness of knowledge editing methods for aligning LLMs with up-to-date factual knowledge and compare their performance with Retrieval Augmented Generation. The dynamic benchmark is designed to be used as-is to assess LLMs's up-to-dateness, as well as to be extended to other domains by sharing the code, the dataset, as well as evaluation and visualization scripts.

Updated: 2024-04-10 18:08:59

标题: 您的LLM是否过时？针对时效性知识的LLM和对齐算法的基准测试

摘要: 我们研究大型语言模型（LLMs）作为知识仓库的适用性。我们关注的挑战是随着时间推移保持LLMs的事实知识最新。受到缺乏关于识别LLMs中过时知识的研究的启发，我们设计并开发了一个动态基准，为每个目标事实问题提供最新的真实答案。我们评估了十八个开源和闭源的最先进的LLMs，用实时从Wikidata检索的时间敏感知识。我们选择政治、体育和组织领域的时间敏感领域事实，并估计模型在预训练/微调过程中学习的信息的新近性。在第二个贡献中，我们评估了知识编辑方法对齐LLMs与最新事实知识的有效性，并将它们的性能与检索增强生成进行了比较。动态基准设计为可以原样使用，以评估LLMs的最新性，并通过共享代码、数据集、评估和可视化脚本来扩展到其他领域。

更新时间: 2024-04-10 18:08:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08700v1

AI and Identity

AI-empowered technologies' impact on the world is undeniable, reshaping industries, revolutionizing how humans interact with technology, transforming educational paradigms, and redefining social codes. However, this rapid growth is accompanied by two notable challenges: a lack of diversity within the AI field and a widening AI divide. In this context, This paper examines the intersection of AI and identity as a pathway to understand biases, inequalities, and ethical considerations in AI development and deployment. We present a multifaceted definition of AI identity, which encompasses its creators, applications, and their broader impacts. Understanding AI's identity involves understanding the associations between the individuals involved in AI's development, the technologies produced, and the social, ethical, and psychological implications. After exploring the AI identity ecosystem and its societal dynamics, We propose a framework that highlights the need for diversity in AI across three dimensions: Creators, Creations, and Consequences through the lens of identity. This paper proposes the need for a comprehensive approach to fostering a more inclusive and responsible AI ecosystem through the lens of identity.

Updated: 2024-04-10 18:08:57

标题: 人工智能与身份

摘要: AI-empowered technologies' impact on the world is undeniable, reshaping industries, revolutionizing how humans interact with technology, transforming educational paradigms, and redefining social codes. However, this rapid growth is accompanied by two notable challenges: a lack of diversity within the AI field and a widening AI divide. In this context, This paper examines the intersection of AI and identity as a pathway to understand biases, inequalities, and ethical considerations in AI development and deployment. We present a multifaceted definition of AI identity, which encompasses its creators, applications, and their broader impacts. Understanding AI's identity involves understanding the associations between the individuals involved in AI's development, the technologies produced, and the social, ethical, and psychological implications. After exploring the AI identity ecosystem and its societal dynamics, We propose a framework that highlights the need for diversity in AI across three dimensions: Creators, Creations, and Consequences through the lens of identity. This paper proposes the need for a comprehensive approach to fostering a more inclusive and responsible AI ecosystem through the lens of identity.

更新时间: 2024-04-10 18:08:57

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.07924v2

Predicting Side Effect of Drug Molecules using Recurrent Neural Networks

Identification and verification of molecular properties such as side effects is one of the most important and time-consuming steps in the process of molecule synthesis. For example, failure to identify side effects before submission to regulatory groups can cost millions of dollars and months of additional research to the companies. Failure to identify side effects during the regulatory review can also cost lives. The complexity and expense of this task have made it a candidate for a machine learning-based solution. Prior approaches rely on complex model designs and excessive parameter counts for side effect predictions. We believe reliance on complex models only shifts the difficulty away from chemists rather than alleviating the issue. Implementing large models is also expensive without prior access to high-performance computers. We propose a heuristic approach that allows for the utilization of simple neural networks, specifically the recurrent neural network, with a 98+% reduction in the number of required parameters compared to available large language models while still obtaining near identical results as top-performing models.

Updated: 2024-04-10 18:07:20

标题: 使用循环神经网络预测药物分子的副作用

摘要: 分子属性的识别和验证，如副作用，是分子合成过程中最重要且耗时的步骤之一。例如，在提交给监管组织之前未能识别出副作用可能会给公司造成数百万美元的损失，需要额外数月的研究时间。在监管审查过程中未能识别出副作用也可能导致生命损失。这项任务的复杂性和成本使其成为基于机器学习的解决方案的候选。先前的方法依赖于复杂的模型设计和过多的参数用于副作用预测。我们认为依赖复杂模型只会将困难转移到化学家身上，而不是缓解问题。在没有事先访问高性能计算机的情况下，实施大型模型也是昂贵的。我们提出了一种启发式方法，允许利用简单的神经网络，特别是循环神经网络，与现有大型语言模型相比，所需参数数量减少98+%，同时仍获得与表现最佳的模型几乎相同的结果。

更新时间: 2024-04-10 18:07:20

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2305.10473v2

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information. These demonstrations can be viewed as solving related but slightly different tasks than what the learner faces. This setting arises in many application domains, such as self-driving cars, healthcare, and finance, where expert demonstrations are made using contextual information, which is not recorded in the data available to the learning agent. We model the problem as a zero-shot meta-reinforcement learning setting with an unknown task distribution and a Bayesian regret minimization objective, where the unobserved tasks are encoded as parameters with an unknown prior. We propose the Experts-as-Priors algorithm (ExPerior), a non-parametric empirical Bayes approach that utilizes the principle of maximum entropy to establish an informative prior over the learner's decision-making problem. This prior enables the application of any Bayesian approach for online decision-making, such as posterior sampling. We demonstrate that our strategy surpasses existing behaviour cloning and online algorithms for multi-armed bandits and reinforcement learning, showcasing the utility of our approach in leveraging expert demonstrations across different decision-making setups.

Updated: 2024-04-10 18:00:17

标题: 专家示范下未观察到的异质性条件下的顺序决策制定

摘要: 我们研究在线顺序决策问题，给定来自专家的辅助演示，这些专家根据未观察到的情境信息做出决策。这些演示可以被视为解决与学习者面临的任务相关但略有不同的任务。这种情况在许多应用领域中出现，如自动驾驶汽车、医疗保健和金融领域，专家演示是使用未记录在学习代理可用数据中的情境信息进行的。我们将问题建模为一个具有未知任务分布和贝叶斯遗憾最小化目标的零射击元强化学习设置，其中未观察到的任务被编码为具有未知先验的参数。我们提出了ExPerior算法（Experts-as-Priors），这是一种非参数经验贝叶斯方法，利用最大熵原理在学习者的决策问题上建立一个信息丰富的先验。这个先验使得可以应用任何基于贝叶斯的在线决策方法，如后验抽样。我们证明了我们的策略超越了现有的行为克隆和在线算法，适用于多臂赌博机和强化学习，展示了我们的方法在利用专家演示跨不同决策设置中的实用性。

更新时间: 2024-04-10 18:00:17

领域: cs.LG

下载: http://arxiv.org/abs/2404.07266v1

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

In this paper, we introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing. Unlike existing methods that struggle with accumulated perturbations and often result in distortions, GoodDrag introduces an AlDD framework that alternates between drag and denoising operations within the diffusion process, effectively improving the fidelity of the result. We also propose an information-preserving motion supervision operation that maintains the original features of the starting point for precise manipulation and artifact reduction. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Extensive experiments demonstrate that the proposed GoodDrag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The project page is https://gooddrag.github.io.

Updated: 2024-04-10 17:59:59

标题: GoodDrag：面向扩散模型拖动编辑的良好实践

摘要: 在本文中，我们介绍了GoodDrag，这是一种改进拖动编辑稳定性和图像质量的新方法。与现有的方法不同，这些方法往往在累积扰动中挣扎，并且通常导致失真，GoodDrag引入了一个AlDD框架，该框架在扩散过程中在拖动和去噪操作之间交替进行，有效地提高了结果的保真度。我们还提出了一个保持信息的运动监督操作，以保持起始点的原始特征，以便进行精确操作和减少伪影。此外，我们通过引入新的数据集Drag100，并开发专用的质量评估指标Dragging Accuracy Index和Gemini Score，利用大型多模态模型，为拖动编辑的基准测试做出了贡献。大量实验表明，所提出的GoodDrag在质量和数量上都比现有技术方法表现优越。项目页面是https://gooddrag.github.io。

更新时间: 2024-04-10 17:59:59

领域: cs.CV,cs.AI,cs.GR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.07206v1

BRAVE: Broadening the visual encoding of vision-language models

Vision-language models (VLMs) are typically composed of a vision encoder, e.g. CLIP, and a language model (LM) that interprets the encoded features to solve downstream tasks. Despite remarkable progress, VLMs are subject to several shortcomings due to the limited capabilities of vision encoders, e.g. "blindness" to certain image features, visual hallucination, etc. To address these issues, we study broadening the visual encoding capabilities of VLMs. We first comprehensively benchmark several vision encoders with different inductive biases for solving VLM tasks. We observe that there is no single encoding configuration that consistently achieves top performance across different tasks, and encoders with different biases can perform surprisingly similarly. Motivated by this, we introduce a method, named BRAVE, that consolidates features from multiple frozen encoders into a more versatile representation that can be directly fed as the input to a frozen LM. BRAVE achieves state-of-the-art performance on a broad range of captioning and VQA benchmarks and significantly reduces the aforementioned issues of VLMs, while requiring a smaller number of trainable parameters than existing methods and having a more compressed representation. Our results highlight the potential of incorporating different visual biases for a more broad and contextualized visual understanding of VLMs.

Updated: 2024-04-10 17:59:45

标题: BRAVE：拓展视觉-语言模型的视觉编码

摘要: 视觉语言模型（VLMs）通常由视觉编码器（例如CLIP）和一个语言模型（LM）组成，后者解释编码特征以解决下游任务。尽管取得了显著进展，但由于视觉编码器的能力有限，VLMs存在一些缺点，例如对某些图像特征的“盲目”，视觉幻觉等。为了解决这些问题，我们研究了扩展VLMs视觉编码能力。我们首先全面评估了几种具有不同归纳偏见的视觉编码器，用于解决VLM任务。我们观察到没有一种编码配置在不同任务中始终表现出色，具有不同偏见的编码器可以表现出令人惊讶的相似性。在此基础上，我们引入了一种名为BRAVE的方法，将多个冻结编码器的特征整合成更具多功能性的表示形式，可以直接作为冻结LM的输入。BRAVE在广泛的字幕和VQA基准上实现了最先进的性能，并显著减少了VLMs的前述问题，同时需要比现有方法更少的可训练参数，并具有更紧凑的表示形式。我们的结果突显了将不同的视觉偏见纳入VLMs以实现更广泛和语境化的视觉理解的潜力。

更新时间: 2024-04-10 17:59:45

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07204v1

UMBRAE: Unified Multimodal Decoding of Brain Signals

We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE.

Updated: 2024-04-10 17:59:20

标题: UMBRAE：大脑信号的统一多模态解码

摘要: 我们解决了大脑驱动研究中存在的挑战，观察到文献几乎不恢复准确的空间信息，并需要特定主体模型。为了解决这些挑战，我们提出了UMBRAE，一种统一的多模态大脑信号解码方法。首先，为了从神经信号中提取实例级概念和空间细节，我们引入了一种高效的通用大脑编码器，用于多模态大脑对齐，并从后续的多模态大型语言模型（MLLM）中恢复多个层次的对象描述。其次，我们引入了一种跨主体训练策略，将特定主体特征映射到一个共同的特征空间。这使得模型可以在多个主体上进行训练而无需额外资源，甚至比特定主体模型产生更好的结果。此外，我们展示了这种方法支持对新主体的弱监督适应，仅需使用总训练数据的一小部分。实验证明，UMBRAE不仅在新引入的任务中取得卓越成果，而且在已建立的任务中也胜过其他方法。为了评估我们的方法，我们构建并与社区分享了一个全面的大脑理解基准BrainHub。我们的代码和基准可在https://weihaox.github.io/UMBRAE 上获得。

更新时间: 2024-04-10 17:59:20

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07202v1

Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinct low-frequency bias, which limits FNO's effectiveness in learning high-frequency information from PDE data. To tackle this challenge, we introduce SpecBoost, an ensemble learning framework that employs multiple FNOs to better capture high-frequency information. Specifically, a secondary FNO is utilized to learn the overlooked high-frequency information from the prediction residual of the initial FNO. Experiments demonstrate that SpecBoost noticeably enhances FNO's prediction accuracy on diverse PDE applications, achieving an up to 71% improvement.

Updated: 2024-04-10 17:58:04

标题: 朝着更好地理解傅立叶神经算子：从频谱角度进行分析和改进

摘要: 在解决偏微分方程（PDEs）时，傅立叶神经算子（FNOs）相比于卷积神经网络（CNNs）展现出显著的有效性。本文通过谱分析提供清晰的实证证据，阐明了FNO相对于CNN的优越性：FNO能够显著更好地学习低频信息。这些实证证据还揭示了FNO具有明显的低频偏差，这限制了FNO从PDE数据中学习高频信息的效果。为了应对这一挑战，我们引入了SpecBoost，一个利用多个FNOs更好地捕获高频信息的集成学习框架。具体地，利用第二个FNO来学习初始FNO预测残差中被忽视的高频信息。实验证明，SpecBoost显著提高了FNO在各种PDE应用中的预测准确性，最高可提高71%。

更新时间: 2024-04-10 17:58:04

领域: cs.LG

下载: http://arxiv.org/abs/2404.07200v1

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.

Updated: 2024-04-10 17:57:41

标题: 境界梦想家：基于文本驱动的带修补和深度扩散的3D场景生成

摘要: 我们介绍了RealmDreamer，一种从文本描述生成通用前向3D场景的技术。我们的技术优化了一个3D高斯Splatting表示，以匹配复杂的文本提示。我们通过利用最先进的文本到图像生成器来初始化这些splat，将它们提升到3D，并计算遮挡体积。然后，我们将这个表示优化到多个视图中，作为一个带有图像条件扩散模型的3D修补任务。为了学习正确的几何结构，我们结合了一个深度扩散模型，通过对修补模型的样本进行条件化，提供丰富的几何结构。最后，我们使用来自图像生成器的锐化样本对模型进行微调。值得注意的是，我们的技术不需要视频或多视图数据，可以合成各种风格的高质量3D场景，包括多个对象。其通用性还允许从单个图像中合成3D图像。

更新时间: 2024-04-10 17:57:41

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2404.07199v1

Zero-shot Logical Query Reasoning on any Knowledge Graph

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, an inductive reasoning model that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 14 of them.

Updated: 2024-04-10 17:56:07

标题: 在任何知识图上的零射击逻辑查询推理

摘要: 知识图谱（KGs）中的复杂逻辑查询回答（CLQA）超越了简单的KG完成，并旨在回答由多个投影和逻辑操作组成的组合查询。现有的学习参数绑定到特定实体或关系词汇的CLQA方法只能应用于它们在训练时的图中，这需要大量的训练时间才能在新图上部署。在这里，我们提出了UltraQuery，这是一种归纳推理模型，可以在任何KG上零-shot回答逻辑查询。UltraQuery的核心思想是将投影和逻辑操作都作为独立于词汇的函数来推导，这些函数可以推广到任何KG中的新实体和关系。通过从预训练的归纳KG推理模型初始化投影操作，UltraQuery可以在任何KG上解决CLQA，即使它只在单个数据集上进行微调。在23个数据集上进行实验，UltraQuery在零-shot推理模式下显示出与最佳基线相媲美或更好的查询回答性能，并在其中的14个数据集上取得了新的技术水平。

更新时间: 2024-04-10 17:56:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07198v1

VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification

Being able to identify regions within or around proteins, to which ligands can potentially bind, is an essential step to develop new drugs. Binding site identification methods can now profit from the availability of large amounts of 3D structures in protein structure databases or from AlphaFold predictions. Current binding site identification methods heavily rely on graph neural networks (GNNs), usually designed to output E(3)-equivariant predictions. Such methods turned out to be very beneficial for physics-related tasks like binding energy or motion trajectory prediction. However, the performance of GNNs at binding site identification is still limited potentially due to the lack of dedicated nodes that model hidden geometric entities, such as binding pockets. In this work, we extend E(n)-Equivariant Graph Neural Networks (EGNNs) by adding virtual nodes and applying an extended message passing scheme. The virtual nodes in these graphs are dedicated quantities to learn representations of binding sites, which leads to improved predictive performance. In our experiments, we show that our proposed method VN-EGNN sets a new state-of-the-art at locating binding site centers on COACH420, HOLO4K and PDBbind2020.

Updated: 2024-04-10 17:50:29

标题: VN-EGNN: 带有虚拟节点的E(3)-等变图神经网络增强蛋白结合位点识别

摘要: 能够识别蛋白质内部或周围可能结合的区域，是开发新药物的关键步骤。结合位点识别方法现在可以从蛋白质结构数据库中获得大量的3D结构或者AlphaFold预测。当前的结合位点识别方法严重依赖于图神经网络（GNNs），通常设计用于输出E(3)-等变预测。这些方法在物理相关任务（如结合能量或运动轨迹预测）中表现出非常有益的效果。然而，GNNs在结合位点识别方面的表现仍然受限，可能是由于缺乏专门模拟隐藏几何实体（如结合口袋）的节点。在这项工作中，我们通过添加虚拟节点和应用扩展的消息传递方案来扩展E(n)-等变图神经网络（EGNNs）。这些图中的虚拟节点是专门用来学习结合位点表示的量，这导致了改进的预测性能。在我们的实验中，我们展示了我们提出的VN-EGNN方法在COACH420、HOLO4K和PDBbind2020上定位结合位点中心的最新技术水平。

更新时间: 2024-04-10 17:50:29

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2404.07194v1

Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds.

Updated: 2024-04-10 17:40:27

标题: 从次优演示中学习奖励，及其在外科电烙刀中的应用

摘要: 通过学习展示技术（LfD）自动化机器人手术是非常具有挑战性的。这是因为手术任务通常涉及具有复杂物体相互作用的顺序决策过程，并且对错误的容忍度很低。先前的研究假设所有展示都是完全可观察和最佳的，这在现实世界中可能并不实际。本文介绍了一种样本高效的方法，该方法从有限数量的排名次优展示中学习到一个稳健的奖励函数，其中包括部分视点云观察。然后，该方法通过优化学习的奖励函数使用强化学习（RL）来学习策略。我们展示了使用学习的奖励函数来获得策略比纯模仿学习更加稳健。我们将我们的方法应用于物理手术电灼任务，并展示了即使提供的展示是次优的且观察是高维度的点云，我们的方法也可以表现良好。

更新时间: 2024-04-10 17:40:27

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07185v1

Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection

Advances in Tiny Machine Learning (TinyML) have bolstered the creation of smart industry solutions, including smart agriculture, healthcare and smart cities. Whilst related research contributes to enabling TinyML solutions on constrained hardware, there is a need to amplify real-world applications by optimising energy consumption in battery-powered systems. The work presented extends and contributes to TinyML research by optimising battery-powered image-based anomaly detection Internet of Things (IoT) systems. Whilst previous work in this area has yielded the capabilities of on-device inferencing and training, there has yet to be an investigation into optimising the management of such capabilities using machine learning approaches, such as Reinforcement Learning (RL), to improve the deployment battery life of such systems. Using modelled simulations, the battery life effects of an RL algorithm are benchmarked against static and dynamic optimisation approaches, with the foundation laid for a hardware benchmark to follow. It is shown that using RL within a TinyML-enabled IoT system to optimise the system operations, including cloud anomaly processing and on-device training, yields an improved battery life of 22.86% and 10.86% compared to static and dynamic optimisation approaches respectively. The proposed solution can be deployed to resource-constrained hardware, given its low memory footprint of 800 B, which could be further reduced. This further facilitates the real-world deployment of such systems, including key sectors such as smart agriculture.

Updated: 2024-04-10 17:39:53

标题: 使用强化学习优化的电池供电的TinyML系统在基于图像的异常检测中的模拟

摘要: Winz中的Tiny Machine Learning (TinyML)技术的进步推动了智能产业解决方案的创建，包括智能农业、医疗保健和智能城市。尽管相关研究有助于在受限硬件上实现TinyML解决方案，但有必要通过优化电池供电系统中的能量消耗来增强真实世界应用。该研究通过优化基于图像的异常检测物联网(IoT)系统，扩展和贡献了TinyML研究。尽管该领域的先前工作已经展示了设备端推理和训练的能力，但尚未对使用机器学习方法（如强化学习(RL)）来优化这些能力的管理进行调查，以改善这些系统的部署电池寿命。通过建模仿真，RL算法的电池寿命效果与静态和动态优化方法进行了基准测试，并为随后的硬件基准测试奠定了基础。研究表明，在TinyML-enabled IoT系统中使用RL来优化系统操作，包括云异常处理和设备端训练，与静态和动态优化方法相比，分别提高了22.86%和10.86%的电池寿命。提出的解决方案可以部署到资源受限的硬件上，因为其低内存占用为800 B，这还可以进一步减少。这进一步促进了这些系统的真实世界部署，包括关键领域如智能农业。

更新时间: 2024-04-10 17:39:53

领域: cs.LG

下载: http://arxiv.org/abs/2403.05106v2

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

Updated: 2024-04-10 17:35:16

标题: 通过找到相关子空间来解释神经网络预测的解缰

摘要: 可解释人工智能旨在克服复杂机器学习模型（如神经网络）的黑匣子特性，通过为其预测生成解释。解释通常采用标识输入特征（例如像素）的热图的形式，这些特征与模型决策相关。然而，这些解释纠缠了可能进入整体复杂决策策略的多个因素。我们提出通过在神经网络的某个中间层提取捕捉与预测相关的多个和不同激活模式（例如视觉概念）的子空间来解开解释。为了自动提取这些子空间，我们提出了两种新的分析方法，扩展了PCA或ICA中发现的原理到解释中。这些新颖的分析方法，我们称之为主要相关成分分析（PRCA）和解开相关子空间分析（DRSA），最大化相关性而不是方差或峰度等。这使得分析更加关注机器学习模型实际用于预测的内容，忽略了模型对于激活或概念的不变性。我们的方法足够通用，可以与Shapley Value、Integrated Gradients或LRP等常见归因技术并行使用。我们提出的方法在基准测试和三个用例中表现出实用性，并与现有技术相比具有明显优势。

更新时间: 2024-04-10 17:35:16

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2212.14855v2

BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm^3 on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a ''universal MLFF'' capable of simulating properties of common organic liquids.

Updated: 2024-04-10 17:31:49

标题: 竹子：一种用于液态电解质开发的预测和可转移的机器学习力场框架

摘要: 尽管机器学习力场（MLFF）在固体和小分子上有着广泛的应用，但在复杂液体电解质中应用MLFF存在明显的差距。在这项工作中，我们介绍了BAMBOO（字节跳动AI分子模拟增强器），这是一个新颖的分子动力学（MD）模拟框架，并展示了其在液体电解质（用于锂电池）环境中的能力。我们设计了一个物理启发的图等变换器架构作为BAMBOO的主干，以学习量子力学模拟结果。此外，我们开创了一种集成知识蒸馏方法，并将其应用于MLFF，以提高MD模拟的稳定性。最后，我们提出了密度对齐算法，以将BAMBOO与实验测量结果进行对齐。BAMBOO在预测关键电解质性质（如密度、粘度和离子导电性）方面表现出最先进的准确性，涵盖了各种溶剂和盐的组合。我们目前的模型在超过15种化学物种上进行了训练，与实验数据相比，其平均密度误差为0.01 g/cm^3。此外，我们的模型展示了对未包含在量子力学数据集中的分子的可转移性。我们希望这项工作为开发能够模拟常见有机液体性质的“通用MLFF”铺平道路。

更新时间: 2024-04-10 17:31:49

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2404.07181v1

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of the available compute for training. In this paper, we first demonstrate that making filtering decisions independent of training compute is often suboptimal: the limited high-quality data rapidly loses its utility when repeated, eventually requiring the inclusion of 'unseen' but 'lower-quality' data. To address this quality-quantity tradeoff ($\texttt{QQT}$), we introduce neural scaling laws that account for the non-homogeneous nature of web data, an angle ignored in existing literature. Our scaling laws (i) characterize the $\textit{differing}$ 'utility' of various quality subsets of web data; (ii) account for how utility diminishes for a data point at its 'nth' repetition; and (iii) formulate the mutual interaction of various data pools when combined, enabling the estimation of model performance on a combination of multiple data pools without ever jointly training on them. Our key message is that data curation $\textit{cannot}$ be agnostic of the total compute that a model will be trained for. Our scaling laws allow us to curate the best possible pool for achieving top performance on Datacomp at various compute budgets, carving out a pareto-frontier for data curation. Code is available at https://github.com/locuslab/scaling_laws_data_filtering.

Updated: 2024-04-10 17:27:54

标题: 数据过滤的尺度律——数据策划不能是计算中立的

摘要: 视觉语言模型（VLMs）在精心策划的网络数据集上进行了数千个GPU小时的训练。最近，数据筛选在许多研究中变得重要，这些研究开发了保留“高质量”子集的“原始”抓取数据的策略。例如，LAION公共数据集仅保留了总爬取数据的10％。然而，这些策略通常是在不考虑训练可用计算资源的情况下开发的。本文首先证明，独立于训练计算做出过滤决策通常是次优的：有限的高质量数据在重复使用时迅速失去其效用，最终需要包含“未见过”但“低质量”的数据。为了解决质量-数量权衡（QQT），我们引入了考虑网络数据非均匀性质的神经缩放定律，这是现有文献中忽略的一个角度。我们的缩放定律（i）表征了各种质量子集的“效用”差异；（ii）考虑了数据点在其第“n”次重复时效用的减少；（iii）制定了不同数据池的相互作用，使得能够在多个数据池的组合上估计模型性能，而无需同时对其进行训练。我们的关键信息是，数据筛选不能对模型将要进行训练的总计算资源视而不见。我们的缩放定律使我们能够策划出在各种计算预算下实现Datacomp上最佳性能的最佳数据池，为数据筛选开辟了帕累托前沿。代码可在https://github.com/locuslab/scaling_laws_data_filtering找到。

更新时间: 2024-04-10 17:27:54

领域: cs.LG

下载: http://arxiv.org/abs/2404.07177v1

Deep Learning for Inertial Sensor Alignment

Accurate alignment of a fixed mobile device equipped with inertial sensors inside a moving vehicle is important for navigation, activity recognition, and other applications. Accurate estimation of the device mounting angle is required to rotate the inertial measurement from the sensor frame to the moving platform frame to standardize measurements and improve the performance of the target task. In this work, a data-driven approach using deep neural networks (DNNs) is proposed to learn the yaw mounting angle of a smartphone equipped with an inertial measurement unit (IMU) and strapped to a car. The proposed model uses only the accelerometer and gyroscope readings from an IMU as input and, in contrast to existing solutions, does not require global position inputs from global navigation satellite systems (GNSS). To train the model in a supervised manner, IMU data is collected for training and validation with the sensor mounted at a known yaw mounting angle, and a range of ground truth labels is generated by applying a random rotation in a bounded range to the measurements. The trained model is tested on data with real rotations showing similar performance as with synthetic rotations. The trained model is deployed on an Android device and evaluated in real-time to test the accuracy of the estimated yaw mounting angle. The model is shown to find the mounting angle at an accuracy of 8 degrees within 5 seconds, and 4 degrees within 27 seconds. An experiment is conducted to compare the proposed model with an existing off-the-shelf solution.

Updated: 2024-04-10 17:15:23

标题: 深度学习用于惯性传感器校准

摘要: 固定移动设备配备惯性传感器在移动车辆内的精确对准对于导航、活动识别和其他应用至关重要。准确估计设备的安装角度是必要的，以将惯性测量从传感器框架旋转到移动平台框架，标准化测量并提高目标任务的性能。在这项工作中，提出了一种使用深度神经网络（DNNs）的数据驱动方法，来学习装备惯性测量单元（IMU）并绑在汽车上的智能手机的偏航安装角度。所提出的模型仅使用IMU的加速度计和陀螺仪读数作为输入，与现有解决方案相比，不需要来自全球导航卫星系统（GNSS）的全局位置输入。为了以监督方式训练模型，收集了装有已知偏航安装角度的传感器的IMU数据进行训练和验证，并通过在测量值上应用一个有界范围内的随机旋转生成一系列地面真实标签。训练好的模型在具有真实旋转的数据上进行测试，表现出与合成旋转相似的性能。训练好的模型被部署在Android设备上，并实时评估以测试估计的偏航安装角度的准确性。实验表明，该模型在5秒内以8度的精度找到安装角度，在27秒内为4度。进行实验证明了提出的模型与现有的现成解决方案进行比较。

更新时间: 2024-04-10 17:15:23

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2212.11120v2

A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks

A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various datasets commonly used in image generation tasks, such as MNIST, Fashion MNIST, CIFAR10, FFHQ, and LSUN. Our method is capable of generating high-fidelity images with greater diversity across multiple datasets. It also achieves the highest inception score for CIFAR10 among all compared methods, including state-of-the-art second-order methods. Additionally, its execution time is comparable to that of first-order min-max methods.

Updated: 2024-04-10 17:08:46

标题: 一种用于生成对抗网络中最小-最大优化的高斯-牛顿方法

摘要: 提出了一种用于训练生成对抗网络（GANs）的新颖一阶方法。它修改了高斯-牛顿方法以近似极小极大海森矩阵，并使用谢尔曼-莫里森逆矩阵公式计算逆矩阵。该方法对应于一个确保必要收缩的不动点方法。为了评估其有效性，在各种常用于图像生成任务的数据集上进行了数值实验，如MNIST、时尚MNIST、CIFAR10、FFHQ和LSUN。我们的方法能够在多个数据集上生成具有更大多样性的高保真度图像。它还在所有比较方法中（包括最先进的二阶方法）在CIFAR10上实现了最高的Inception分数。此外，其执行时间与一阶极小极大方法相当。

更新时间: 2024-04-10 17:08:46

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2404.07172v1

Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory

This paper leverages the statistics of extreme values to predict the worst-case convergence times of machine learning algorithms. Timing is a critical non-functional property of ML systems, and providing the worst-case converge times is essential to guarantee the availability of ML and its services. However, timing properties such as worst-case convergence times (WCCT) are difficult to verify since (1) they are not encoded in the syntax or semantics of underlying programming languages of AI, (2) their evaluations depend on both algorithmic implementations and underlying systems, and (3) their measurements involve uncertainty and noise. Therefore, prevalent formal methods and statistical models fail to provide rich information on the amounts and likelihood of WCCT. Our key observation is that the timing information we seek represents the extreme tail of execution times. Therefore, extreme value theory (EVT), a statistical discipline that focuses on understanding and predicting the distribution of extreme values in the tail of outcomes, provides an ideal framework to model and analyze WCCT in the training and inference phases of ML paradigm. Building upon the mathematical tools from EVT, we propose a practical framework to predict the worst-case timing properties of ML. Over a set of linear ML training algorithms, we show that EVT achieves a better accuracy for predicting WCCTs than relevant statistical methods such as the Bayesian factor. On the set of larger machine learning training algorithms and deep neural network inference, we show the feasibility and usefulness of EVT models to accurately predict WCCTs, their expected return periods, and their likelihood.

Updated: 2024-04-10 17:05:12

标题: ML算法的最坏情况收敛时间通过极值理论

摘要: 本文利用极值统计学来预测机器学习算法的最坏情况收敛时间。时间是ML系统的关键非功能属性，提供最坏情况收敛时间对于保证ML及其服务的可用性至关重要。然而，诸如最坏情况收敛时间（WCCT）之类的时间属性很难验证，因为（1）它们并未编码在AI底层编程语言的语法或语义中，（2）它们的评估取决于算法实现和底层系统，（3）它们的测量涉及不确定性和噪声。因此，普遍的形式方法和统计模型无法提供关于WCCT数量和可能性的丰富信息。我们的关键观察是，我们寻求的时间信息代表了执行时间的极端尾部。因此，极值理论（EVT）是一门统计学科，专注于理解和预测结果尾部的极值分布，为在ML范式的训练和推断阶段建模和分析WCCT提供了理想的框架。借助EVT的数学工具，我们提出了一个实用框架来预测ML的最坏情况时间属性。在一组线性ML训练算法上，我们展示了EVT相比贝叶斯因子等相关统计方法更准确地预测WCCT。在更大的机器学习训练算法和深度神经网络推断集合上，我们展示了EVT模型的可行性和有用性，能够准确预测WCCT、它们的预期返回周期以及可能性。

更新时间: 2024-04-10 17:05:12

领域: cs.SE,cs.AI,cs.LG,cs.PF,cs.PL

下载: http://arxiv.org/abs/2404.07170v1

Using Neural Networks to Model Hysteretic Kinematics in Tendon-Actuated Continuum Robots

The ability to accurately model mechanical hysteretic behavior in tendon-actuated continuum robots using deep learning approaches is a growing area of interest. In this paper, we investigate the hysteretic response of two types of tendon-actuated continuum robots and, ultimately, compare three types of neural network modeling approaches with both forward and inverse kinematic mappings: feedforward neural network (FNN), FNN with a history input buffer, and long short-term memory (LSTM) network. We seek to determine which model best captures temporal dependent behavior. We find that, depending on the robot's design, choosing different kinematic inputs can alter whether hysteresis is exhibited by the system. Furthermore, we present the results of the model fittings, revealing that, in contrast to the standard FNN, both FNN with a history input buffer and the LSTM model exhibit the capacity to model historical dependence with comparable performance in capturing rate-dependent hysteresis.

Updated: 2024-04-10 17:04:06

标题: 使用神经网络模拟肌腱驱动连续机器人的滞后运动学

摘要: 使用深度学习方法准确建模腱驱动连续机器人的机械滞后行为的能力是一个备受关注的领域。在本文中，我们研究了两种类型的腱驱动连续机器人的滞后响应，并最终比较了三种神经网络建模方法，包括正向和反向运动学映射：前馈神经网络（FNN）、具有历史输入缓冲区的FNN和长短期记忆（LSTM）网络。我们试图确定哪种模型最好捕捉时间相关行为。我们发现，根据机器人的设计，选择不同的运动学输入可以改变系统是否展示滞后现象。此外，我们呈现了模型拟合的结果，揭示了与标准FNN相比，具有历史输入缓冲区的FNN和LSTM模型展示了在捕捉速率相关滞后方面性能相当的能力。

更新时间: 2024-04-10 17:04:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.07168v1

Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System

Machine Learning (ML) training on large-scale datasets is a very expensive and time-consuming workload. Processor-centric architectures (e.g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i.e., due to repeatedly accessing the training dataset. As a result, processor-centric systems suffer from performance degradation and high energy consumption. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory. Our goal is to understand the capabilities and characteristics of popular distributed optimization algorithms on real-world PIM architectures to accelerate data-intensive ML training workloads. To this end, we 1) implement several representative centralized distributed optimization algorithms on UPMEM's real-world general-purpose PIM system, 2) rigorously evaluate these algorithms for ML training on large-scale datasets in terms of performance, accuracy, and scalability, 3) compare to conventional CPU and GPU baselines, and 4) discuss implications for future PIM hardware and the need to shift to an algorithm-hardware codesign perspective to accommodate decentralized distributed optimization algorithms. Our results demonstrate three major findings: 1) Modern general-purpose PIM architectures can be a viable alternative to state-of-the-art CPUs and GPUs for many memory-bound ML training workloads, when operations and datatypes are natively supported by PIM hardware, 2) the importance of carefully choosing the optimization algorithm that best fit PIM, and 3) contrary to popular belief, contemporary PIM architectures do not scale approximately linearly with the number of nodes for many data-intensive ML training workloads. To facilitate future research, we aim to open-source our complete codebase.

Updated: 2024-04-10 17:00:04

标题: 在一个真实的处理内存系统上分布式优化算法的分析

摘要: 在大规模数据集上进行机器学习（ML）训练是一项非常昂贵且耗时的工作负荷。用于现代ML训练工作负荷的处理器中心架构（例如CPU，GPU）受到数据移动瓶颈的限制，即由于反复访问训练数据集。因此，处理器中心系统会受到性能下降和高能耗的影响。内存内计算（PIM）是缓解数据移动瓶颈的一种有前途的解决方案，通过将计算机制置于内存内或附近。我们的目标是了解流行的分布式优化算法在真实PIM架构上加速数据密集型ML训练工作负荷的能力和特征。为此，我们1）在UPMEM的真实通用PIM系统上实现几种代表性的集中式分布式优化算法，2）严格评估这些算法在大规模数据集上的ML训练性能、准确性和可伸缩性，3）与传统CPU和GPU基线进行比较，并4）讨论对未来PIM硬件的影响以及需要转向算法硬件代码设计观点以适应分散分布式优化算法。我们的研究结果显示三个主要发现：1）现代通用PIM架构可以成为许多内存密集型ML训练工作负荷的与最先进的CPU和GPU相媲美的替代方案，当操作和数据类型在PIM硬件上本地支持时，2）精心选择最适合PIM的优化算法的重要性，3）与普遍观点相反，许多数据密集型ML训练工作负荷的当代PIM架构并不会随节点数量近似线性扩展。为了促进未来研究，我们的目标是开源我们的完整代码库。

更新时间: 2024-04-10 17:00:04

领域: cs.AR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2404.07164v1

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space.

Updated: 2024-04-10 16:55:52

标题: 深度学习中通过几何适应梯度下降实现全局$\mathcal{L}^2$最小化的均匀指数速率

摘要: 我们考虑深度学习（DL）网络中监督学习的场景，并利用选择黎曼度量的任意性相对于梯度下降流可以定义的事实（微分几何的一般事实）。在DL的标准方法中，参数空间（权重和偏置）上的梯度流是根据欧几里德度量定义的。相反，在这里，我们选择与DL网络输出层中的欧几里德度量相关的梯度流。这自然地导致参数空间中两个修改后的梯度下降流版本，一个适用于过参数化设置，另一个适用于不足参数化设置。在过参数化情况下，我们证明，只要一个秩条件成立，所有修改后的梯度下降轨道都将以均匀指数收敛速度将 ${\mathcal L}^2$ 成本驱向其全局最小值；因此，可以获得一个先验停止时间，以接近全局最小值的任何预定接近度。我们指出后者与次黎曼几何的关系。此外，我们将上述框架推广到秩条件不成立的情况；特别是，我们展示了当秩损失发生时，局部平衡只能存在，并且通常它们不是孤立点，而是参数空间的临界子流形的元素。

更新时间: 2024-04-10 16:55:52

领域: cs.LG,cs.AI,math-ph,math.MP,math.OC,stat.ML,57R70, 62M45

下载: http://arxiv.org/abs/2311.15487v4

A Large-Scale Exploration of $μ$-Transfer

Large neural network models have become a mainstay of natural language processing and computer vision, yet their initialization and learning rates are set in a largely heuristic fashion, potentially varying from paper to paper and one model size to the next. The $\mu$-Parameterization ($\mu$P) offers a potential solution to these challenges, yielding scaling rules for model initialization and learning rates, and reportedly enabling zero-shot hyperparameter transfer from small to large models in a variety of cases. Despite the evident promise, the $\mu$P scaling rules are not yet widely adopted, perhaps due to higher implementation complexity, many variations, or complex theoretical background. This work investigates $\mu$P empirically, focusing on the ubiquitous transformer architecture, and aims to answer a simple question: does $\mu$-Transfer yield optimal learning rates in practice? From models with 2M to 10B parameters, we show that $\mu$-Transfer works as intended for the majority of important cases, but also identify some surprising cases where it may not. Our experiment codebase is available at https://github.com/lucaslingle/mu_transformer/

Updated: 2024-04-10 16:55:37

标题: 一个大规模的 $μ$-Transfer 探索

摘要: 大型神经网络模型已经成为自然语言处理和计算机视觉的主要工具，但它们的初始化和学习速率往往是以很大程度上的启发式方式设置的，可能会因论文而异，也可能会因模型规模不同而异。μ-参数化（μP）为这些挑战提供了潜在的解决方案，提供了模型初始化和学习速率的缩放规则，并据称在许多情况下实现了从小型模型到大型模型的零样本超参数转移。尽管显然很有前途，但μP缩放规则尚未被广泛采用，可能是因为实现复杂性较高，存在许多变化，或者具有复杂的理论背景。本研究从实证的角度对μP进行了调查，重点关注普遍存在的变压器架构，并旨在回答一个简单的问题：在实践中，μ-转移是否产生最佳学习速率？从包含2M到10B参数的模型中，我们展示了在大多数重要案例中μ-转移按预期工作，但也确定了一些可能出现意外情况的情况。我们的实验代码库可在以下链接找到：https://github.com/lucaslingle/mu_transformer/

更新时间: 2024-04-10 16:55:37

领域: cs.LG

下载: http://arxiv.org/abs/2404.05728v2

Exploring Physiological Responses in Virtual Reality-based Interventions for Autism Spectrum Disorder: A Data-Driven Investigation

Virtual Reality (VR) has emerged as a promising tool for enhancing social skills and emotional well-being in individuals with Autism Spectrum Disorder (ASD). Through a technical exploration, this study employs a multiplayer serious gaming environment within VR, engaging 34 individuals diagnosed with ASD and employing high-precision biosensors for a comprehensive view of the participants' arousal and responses during the VR sessions. Participants were subjected to a series of 3 virtual scenarios designed in collaboration with stakeholders and clinical experts to promote socio-cognitive skills and emotional regulation in a controlled and structured virtual environment. We combined the framework with wearable non-invasive sensors for bio-signal acquisition, focusing on the collection of heart rate variability, and respiratory patterns to monitor participants behaviors. Further, behavioral assessments were conducted using observation and semi-structured interviews, with the data analyzed in conjunction with physiological measures to identify correlations and explore digital-intervention efficacy. Preliminary analysis revealed significant correlations between physiological responses and behavioral outcomes, indicating the potential of physiological feedback to enhance VR-based interventions for ASD. The study demonstrated the feasibility of using real-time data to adapt virtual scenarios, suggesting a promising avenue to support personalized therapy. The integration of quantitative physiological feedback into digital platforms represents a forward step in the personalized intervention for ASD. By leveraging real-time data to adjust therapeutic content, this approach promises to enhance the efficacy and engagement of digital-based therapies.

Updated: 2024-04-10 16:50:07

标题: 探索基于虚拟现实的自闭症谱系障碍干预中的生理反应：基于数据驱动的调查

摘要: 虚拟现实（VR）已经成为一种有望增强自闭症谱系障碍（ASD）患者社交技能和情绪健康的工具。通过技术探索，本研究在VR环境中采用了一个多人在线游戏环境，吸引了34名ASD诊断者，并采用高精度生物传感器全面了解参与者在VR会话期间的兴奋和反应。参与者经历了一系列与利益相关者和临床专家合作设计的三个虚拟场景，旨在促进社会认知技能和情绪调节，在一个受控和结构化的虚拟环境中。我们将这一框架与可穿戴非侵入式传感器结合起来，进行生物信号采集，重点关注心率变异性和呼吸模式的收集，以监测参与者的行为。此外，采用观察和半结构化访谈进行了行为评估，将数据与生理指标一起分析，以识别相关性并探索数字干预的有效性。初步分析显示生理反应与行为结果之间存在显著相关性，表明生理反馈有潜力增强基于VR的ASD干预。该研究证明了使用实时数据调整虚拟场景的可行性，为支持个性化治疗提供了一个有希望的途径。将定量生理反馈整合到数字平台中代表了个性化干预的一大进步。通过利用实时数据调整治疗内容，这种方法有望增强数字治疗的效果和参与度。

更新时间: 2024-04-10 16:50:07

领域: cs.HC,cs.LG,92C30 (Primary) 92C55, 68T99 (Secondary)

下载: http://arxiv.org/abs/2404.07159v1

Designing Interpretable ML System to Enhance Trust in Healthcare: A Systematic Review to Proposed Responsible Clinician-AI-Collaboration Framework

This paper explores the significant impact of AI-based medical devices, including wearables, telemedicine, large language models, and digital twins, on clinical decision support systems. It emphasizes the importance of producing outcomes that are not only accurate but also interpretable and understandable to clinicians, addressing the risk that lack of interpretability poses in terms of mistrust and reluctance to adopt these technologies in healthcare. The paper reviews interpretable AI processes, methods, applications, and the challenges of implementation in healthcare, focusing on quality control to facilitate responsible communication between AI systems and clinicians. It breaks down the interpretability process into data pre-processing, model selection, and post-processing, aiming to foster a comprehensive understanding of the crucial role of a robust interpretability approach in healthcare and to guide future research in this area. with insights for creating responsible clinician-AI tools for healthcare, as well as to offer a deeper understanding of the challenges they might face. Our research questions, eligibility criteria and primary goals were identified using Preferred Reporting Items for Systematic reviews and Meta-Analyses guideline and PICO method; PubMed, Scopus and Web of Science databases were systematically searched using sensitive and specific search strings. In the end, 52 publications were selected for data extraction which included 8 existing reviews and 44 related experimental studies. The paper offers general concepts of interpretable AI in healthcare and discuss three-levels interpretability process. Additionally, it provides a comprehensive discussion of evaluating robust interpretability AI in healthcare. Moreover, this survey introduces a step-by-step roadmap for implementing responsible AI in healthcare.

Updated: 2024-04-10 16:46:59

标题: 设计可解释的机器学习系统以增强医疗信任：提出负责临床医师-人工智能合作框架的系统性审查

摘要: 本文探讨了基于人工智能的医疗设备，包括可穿戴设备、远程医疗、大型语言模型和数字孪生体，在临床决策支持系统上的显著影响。它强调了产生准确且可解释和可理解结果的重要性，以应对缺乏可解释性对于医疗保健中这些技术带来的不信任和不愿采用的风险。本文回顾了可解释的人工智能过程、方法、应用程序和在医疗保健中实施的挑战，重点关注质量控制，以促进人工智能系统与临床医生之间负责任的沟通。它将解释性过程分解为数据预处理、模型选择和后处理，旨在促进对强大可解释性方法在医疗保健中的关键作用的全面理解，并指导未来在这一领域的研究。提供了为医疗保健创建负责任的临床医生-人工智能工具的见解，以及更深入地了解他们可能面临的挑战。我们使用系统评价和Meta分析指南以及PICO方法确定了研究问题、资格标准和主要目标；通过敏感和具体的搜索字符串系统地搜索了PubMed、Scopus和Web of Science数据库。最终，选择了52篇出版物进行数据提取，其中包括8篇现有综述和44篇相关实验研究。本文提供了医疗保健中可解释的人工智能的一般概念，并讨论了三级解释性过程。此外，它提供了对在医疗保健中评估强大的可解释性人工智能的全面讨论。此外，该调查介绍了在医疗保健中实施负责任人工智能的逐步路线图。

更新时间: 2024-04-10 16:46:59

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2311.11055v2

Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach

Training an effective deep learning model to learn ocean processes involves careful choices of various hyperparameters. We leverage the advanced search algorithms for multiobjective optimization in DeepHyper, a scalable hyperparameter optimization software, to streamline the development of neural networks tailored for ocean modeling. The focus is on optimizing Fourier neural operators (FNOs), a data-driven model capable of simulating complex ocean behaviors. Selecting the correct model and tuning the hyperparameters are challenging tasks, requiring much effort to ensure model accuracy. DeepHyper allows efficient exploration of hyperparameters associated with data preprocessing, FNO architecture-related hyperparameters, and various model training strategies. We aim to obtain an optimal set of hyperparameters leading to the most performant model. Moreover, on top of the commonly used mean squared error for model training, we propose adopting the negative anomaly correlation coefficient as the additional loss term to improve model performance and investigate the potential trade-off between the two terms. The experimental results show that the optimal set of hyperparameters enhanced model performance in single timestepping forecasting and greatly exceeded the baseline configuration in the autoregressive rollout for long-horizon forecasting up to 30 days. Utilizing DeepHyper, we demonstrate an approach to enhance the use of FNOs in ocean dynamics forecasting, offering a scalable solution with improved precision.

Updated: 2024-04-10 16:41:49

标题: 用傅里叶神经算子优化海洋动力学建模：一种多目标超参数和架构优化方法

摘要: 训练一个有效的深度学习模型来学习海洋过程涉及仔细选择各种超参数。我们利用DeepHyper中多目标优化的先进搜索算法，这是一个可扩展的超参数优化软件，来简化为海洋建模量身定制的神经网络的开发。重点是优化傅立叶神经算子（FNOs），这是一个能够模拟复杂海洋行为的数据驱动模型。选择正确的模型和调整超参数是具有挑战性的任务，需要大量努力确保模型准确性。DeepHyper允许有效地探索与数据预处理相关的超参数，FNO架构相关的超参数以及各种模型训练策略。我们的目标是获得一组最佳超参数，从而得到最具性能的模型。此外，在常用的均方误差模型训练中，我们提出采用负异常相关系数作为额外的损失项来改善模型性能，并研究这两项之间的潜在权衡。实验结果显示，最佳超参数组合提高了单个时间步长预测模型的性能，并在长期预测（长达30天）的自回归展开中大大超过了基线配置。利用DeepHyper，我们展示了一种增强FNO在海洋动力学预测中的应用的方法，提供了一个具有改进精度的可扩展解决方案。

更新时间: 2024-04-10 16:41:49

领域: cs.LG,physics.ao-ph,stat.ML

下载: http://arxiv.org/abs/2404.05768v2

Algorithms for Caching and MTS with reduced number of predictions

ML-augmented algorithms utilize predictions to achieve performance beyond their worst-case bounds. Producing these predictions might be a costly operation -- this motivated Im et al. '22 to introduce the study of algorithms which use predictions parsimoniously. We design parsimonious algorithms for caching and MTS with action predictions, proposed by Antoniadis et al. '20, focusing on the parameters of consistency (performance with perfect predictions) and smoothness (dependence of their performance on the prediction error). Our algorithm for caching is 1-consistent, robust, and its smoothness deteriorates with the decreasing number of available predictions. We propose an algorithm for general MTS whose consistency and smoothness both scale linearly with the decreasing number of predictions. Without the restriction on the number of available predictions, both algorithms match the earlier guarantees achieved by Antoniadis et al. '20.

Updated: 2024-04-10 16:30:07

标题: 减少预测数量的缓存算法和多线程系统算法

摘要: 使用机器学习增强的算法利用预测来实现超出最坏情况下的性能界限。生成这些预测可能是一项昂贵的操作，这促使Im等人在'22年引入研究如何节约使用预测的算法。我们设计了一种节约使用预测的缓存和MTS算法，这些算法是由Antoniadis等人在'20年提出的，重点关注一致性（在完美预测下的性能）和平滑度（性能对预测误差的依赖程度）的参数。我们的缓存算法是1一致的、稳健的，其平滑度随着可用预测数量减少而恶化。我们提出了一种用于一般MTS的算法，其一致性和平滑度都随着可用预测数量的减少而线性增加。在没有对可用预测数量进行限制的情况下，这两种算法都与Antoniadis等人在'20年实现的早期保证相匹配。

更新时间: 2024-04-10 16:30:07

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2404.06280v2

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.

Updated: 2024-04-10 16:29:21

标题: 临床医生的一致性如何？评估动力学模型对脓毒症疾病进展的可预测性

摘要: 强化学习（RL）是为重症监护中的败血症患者生成治疗方针的一种有前途的方法。尽管回顾性评估指标显示当遵循这些方针时死亡率下降，但与临床医生的研究表明他们的建议常常是虚假的。我们提出这些缺点可能是由于训练数据中观察到的行动和结果缺乏多样性造成的，并且我们进行实验来调查预测由临床医生行动引起的败血症疾病严重程度变化的可行性。初步结果表明，合并行动信息并不显著改善模型性能，表明临床医生的行动可能不足以产生对疾病进展有可测量影响。我们讨论这些发现对优化败血症治疗的影响。

更新时间: 2024-04-10 16:29:21

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2404.07148v1

Local Causal Discovery for Estimating Causal Effects

Even when the causal graph underlying our data is unknown, we can use observational data to narrow down the possible values that an average treatment effect (ATE) can take by (1) identifying the graph up to a Markov equivalence class; and (2) estimating that ATE for each graph in the class. While the PC algorithm can identify this class under strong faithfulness assumptions, it can be computationally prohibitive. Fortunately, only the local graph structure around the treatment is required to identify the set of possible ATE values, a fact exploited by local discovery algorithms to improve computational efficiency. In this paper, we introduce Local Discovery using Eager Collider Checks (LDECC), a new local causal discovery algorithm that leverages unshielded colliders to orient the treatment's parents differently from existing methods. We show that there exist graphs where LDECC exponentially outperforms existing local discovery algorithms and vice versa. Moreover, we show that LDECC and existing algorithms rely on different faithfulness assumptions, leveraging this insight to weaken the assumptions for identifying the set of possible ATE values.

Updated: 2024-04-10 16:22:16

标题: 本地因果发现用于估计因果效应

摘要: 即使我们不知道数据背后的因果图，我们仍然可以利用观察数据来缩小平均处理效应（ATE）可能取值的范围，方法是（1）识别马尔可夫等价类中的图；和（2）估计该类中每个图的ATE。虽然PC算法可以在强信实性假设下识别这个类，但它可能在计算上是禁制的。幸运的是，只需要识别治疗周围的局部图结构就可以确定可能的ATE值集合，这一事实被局部发现算法利用来提高计算效率。在本文中，我们介绍了使用Eager Collider Checks（LDECC）的本地因果发现算法，该算法利用未屏蔽的碰撞器将治疗的父节点定向与现有方法不同。我们展示了存在某些图，在这些图中，LDECC的性能远远超过现有的本地发现算法，反之亦然。此外，我们展示了LDECC和现有算法依赖不同的信实性假设，利用这一洞察来减弱确定可能的ATE值集合的假设。

更新时间: 2024-04-10 16:22:16

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2302.08070v4

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

Updated: 2024-04-10 16:18:42

标题: 不留下任何上下文：具有无限关注机制的高效无限上下文变换器

摘要: 这项工作介绍了一种有效的方法，将基于Transformer的大型语言模型（LLMs）扩展到无限长的输入，同时保持有限的内存和计算。我们提出的方法的关键组成部分是一种称为Infini-attention的新的注意力技术。Infini-attention将一种压缩式记忆引入到原始的注意力机制中，并在单个Transformer块中构建了遮罩局部注意力和长期线性注意力机制。我们在长文本语言建模基准测试、1M序列长度的密码上下文块检索和500K长度的书籍摘要任务中，使用1B和8B的LLMs展示了我们方法的有效性。我们的方法引入了最小的有界内存参数，并为LLMs实现了快速的流式推断。

更新时间: 2024-04-10 16:18:42

领域: cs.CL,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2404.07143v1

Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks

Model explanations improve the transparency of black-box machine learning (ML) models and their decisions; however, they can also be exploited to carry out privacy threats such as membership inference attacks (MIA). Existing works have only analyzed MIA in a single "what if" interaction scenario between an adversary and the target ML model; thus, it does not discern the factors impacting the capabilities of an adversary in launching MIA in repeated interaction settings. Additionally, these works rely on assumptions about the adversary's knowledge of the target model's structure and, thus, do not guarantee the optimality of the predefined threshold required to distinguish the members from non-members. In this paper, we delve into the domain of explanation-based threshold attacks, where the adversary endeavors to carry out MIA attacks by leveraging the variance of explanations through iterative interactions with the system comprising of the target ML model and its corresponding explanation method. We model such interactions by employing a continuous-time stochastic signaling game framework. In our framework, an adversary plays a stopping game, interacting with the system (having imperfect information about the type of an adversary, i.e., honest or malicious) to obtain explanation variance information and computing an optimal threshold to determine the membership of a datapoint accurately. First, we propose a sound mathematical formulation to prove that such an optimal threshold exists, which can be used to launch MIA. Then, we characterize the conditions under which a unique Markov perfect equilibrium (or steady state) exists in this dynamic system. By means of a comprehensive set of simulations of the proposed game model, we assess different factors that can impact the capability of an adversary to launch MIA in such repeated interaction settings.

Updated: 2024-04-10 16:14:05

标题: 朝向对基于解释的成员推理攻击的博弈论理解

摘要: 模型解释提高了黑盒机器学习（ML）模型及其决策的透明度；然而，它们也可能被利用来进行隐私威胁，如成员推理攻击（MIA）。现有研究仅分析了在敌对方和目标ML模型之间的单一“假设”交互场景中的MIA；因此，它并未区分影响敌对方在重复互动环境中发动MIA能力的因素。此外，这些研究依赖于对敌对方了解目标模型结构的假设，因此并不能保证区分成员和非成员所需的预定义阈值的最优性。在本文中，我们深入探讨了基于解释的阈值攻击领域，其中敌对方通过与包含目标ML模型及其相应解释方法的系统进行迭代互动来进行MIA攻击。我们通过采用连续时间随机信号游戏框架来建模这种互动。在我们的框架中，敌对方进行阻止游戏，与系统互动（对敌对方类型，即诚实或恶意，有不完全信息），以获取解释方差信息并计算一个确定数据点成员身份的最优阈值。首先，我们提出一个坚实的数学公式来证明存在这样一个可以用于发动MIA的最优阈值。然后，我们对在这个动态系统中存在唯一马尔可夫完美均衡（或稳态）的条件进行了表征。通过对所提出的游戏模型的全面模拟，我们评估了可以影响敌对方在这种重复互动环境中发动MIA能力的不同因素。

更新时间: 2024-04-10 16:14:05

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2404.07139v1

Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability

Text-to-Vis is an emerging task in the natural language processing (NLP) area that aims to automatically generate data visualizations from natural language questions (NLQs). Despite their progress, existing text-to-vis models often heavily rely on lexical matching between words in the questions and tokens in data schemas. This overreliance on lexical matching may lead to a diminished level of model robustness against input variations. In this study, we thoroughly examine the robustness of current text-to-vis models, an area that has not previously been explored. In particular, we construct the first robustness dataset nvBench-Rob, which contains diverse lexical and phrasal variations based on the original text-to-vis benchmark nvBench. Then, we found that the performance of existing text-to-vis models on this new dataset dramatically drops, implying that these methods exhibit inadequate robustness overall. Finally, we propose a novel framework based on Retrieval-Augmented Generation (RAG) technique, named GRED, specifically designed to address input perturbations in these two variants. The framework consists of three parts: NLQ-Retrieval Generator, Visualization Query-Retrieval Retuner and Annotation-based Debugger, which are used to tackle the challenges posed by natural language variants, programming style differences and data schema variants, respectively. Extensive experimental evaluations show that, compared to the state-of-the-art model RGVisNet in the Text-to-Vis field, RGDR performs better in terms of model robustness, with a 32% increase in accuracy on the proposed nvBench-Rob dataset.

Updated: 2024-04-10 16:12:50

标题: 朝向文本到可视化翻译的稳健性：抵抗词汇和短语变化

摘要: 文本到可视化是自然语言处理领域的一个新兴任务，旨在从自然语言问题（NLQs）中自动生成数据可视化。尽管取得了进展，现有的文本到可视化模型通常过度依赖问题中的单词和数据模式中的标记之间的词汇匹配。这种对词汇匹配的过度依赖可能会导致模型对输入变化的鲁棒性下降。在本研究中，我们彻底研究了当前文本到可视化模型的鲁棒性，这是以前未被探索过的领域。特别是，我们构建了第一个鲁棒性数据集nvBench-Rob，其中包含基于原始文本到可视化基准nvBench的各种词汇和短语变化。然后，我们发现现有文本到可视化模型在这个新数据集上的性能急剧下降，暗示这些方法整体上表现出不足的鲁棒性。最后，我们提出了一种基于检索增强生成（RAG）技术的新框架，名为GRED，专门设计用于解决这两种变体的输入扰动。该框架包括三个部分：NLQ-检索生成器、可视化查询-检索调整器和基于注释的调试器，分别用于应对自然语言变体、编程风格差异和数据模式变体带来的挑战。广泛的实验评估表明，与文本到可视化领域中的最先进模型RGVisNet相比，RGDR在模型鲁棒性方面表现更好，在提出的nvBench-Rob数据集上准确率提高了32%。

更新时间: 2024-04-10 16:12:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07135v1

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

In-context learning is a powerful emergent ability in transformer models. Prior work in mechanistic interpretability has identified a circuit element that may be critical for in-context learning -- the induction head (IH), which performs a match-and-copy operation. During training of large transformers on natural language data, IHs emerge around the same time as a notable phase change in the loss. Despite the robust evidence for IHs and this interesting coincidence with the phase change, relatively little is known about the diversity and emergence dynamics of IHs. Why is there more than one IH, and how are they dependent on each other? Why do IHs appear all of a sudden, and what are the subcircuits that enable them to emerge? We answer these questions by studying IH emergence dynamics in a controlled setting by training on synthetic data. In doing so, we develop and share a novel optogenetics-inspired causal framework for modifying activations throughout training. Using this framework, we delineate the diverse and additive nature of IHs. By clamping subsets of activations throughout training, we then identify three underlying subcircuits that interact to drive IH formation, yielding the phase change. Furthermore, these subcircuits shed light on data-dependent properties of formation, such as phase change timing, already showing the promise of this more in-depth understanding of subcircuits that need to "go right" for an induction head.

Updated: 2024-04-10 16:07:38

标题: 译文：对感知头功效所需的机制研究：一个关于上下文学习回路及其形成的机械学研究

摘要: 上下文学习是变压器模型中的一种强大的新兴能力。在机械解释性方面的先前工作已经确定了可能对上下文学习至关重要的电路元件——感应头（IH），它执行匹配和复制操作。在大型变压器模型在自然语言数据上的训练过程中，IH在损失发生显著相变的同时出现。尽管有关IH的强有力证据以及与相变的有趣巧合，对于IH的多样性和出现动态性相对了解甚少。为什么会有不止一个IH，它们之间又如何依赖？为什么IH会突然出现，以及使其出现的亚电路是什么？我们通过在合成数据上进行训练来研究IH的出现动态性，回答了这些问题。在这个过程中，我们开发并分享了一个受光遗传学启发的用于在训练过程中修改激活的因果框架。利用这个框架，我们描述了IH的多样性和可加性特性。通过在整个训练过程中固定激活的子集，我们确定了三个相互作用的基础亚电路，驱动IH的形成，产生相变。此外，这些亚电路揭示了形成的数据相关性质，如相变的时机，已经显示了更深入理解需要“正确进行”的亚电路的潜力。

更新时间: 2024-04-10 16:07:38

领域: cs.LG

下载: http://arxiv.org/abs/2404.07129v1

Measuring proximity to standard planes during fetal brain ultrasound scanning

This paper introduces a novel pipeline designed to bring ultrasound (US) plane pose estimation closer to clinical use for more effective navigation to the standard planes (SPs) in the fetal brain. We propose a semi-supervised segmentation model utilizing both labeled SPs and unlabeled 3D US volume slices. Our model enables reliable segmentation across a diverse set of fetal brain images. Furthermore, the model incorporates a classification mechanism to identify the fetal brain precisely. Our model not only filters out frames lacking the brain but also generates masks for those containing it, enhancing the relevance of plane pose regression in clinical settings. We focus on fetal brain navigation from 2D ultrasound (US) video analysis and combine this model with a US plane pose regression network to provide sensorless proximity detection to SPs and non-SPs planes; we emphasize the importance of proximity detection to SPs for guiding sonographers, offering a substantial advantage over traditional methods by allowing earlier and more precise adjustments during scanning. We demonstrate the practical applicability of our approach through validation on real fetal scan videos obtained from sonographers of varying expertise levels. Our findings demonstrate the potential of our approach to complement existing fetal US technologies and advance prenatal diagnostic practices.

Updated: 2024-04-10 16:04:21

标题: 测量胎儿脑超声扫描中与标准平面的接近度

摘要: 这篇论文介绍了一种新颖的流程，旨在将超声（US）平面姿势估计更接近临床应用，以更有效地导航到胎儿脑中的标准平面（SPs）。我们提出了一个利用标记的SPs和未标记的3D US体积切片的半监督分割模型。我们的模型能够可靠地在各种胎儿脑图像中进行分割。此外，该模型还整合了一个分类机制，精确识别胎儿脑。我们的模型不仅可过滤出缺乏脑部的帧，还可为包含脑部的帧生成掩模，增强了在临床环境中平面姿势回归的相关性。我们专注于从2D超声（US）视频分析中进行胎儿脑导航，并将此模型与US平面姿势回归网络相结合，提供无传感器的接近检测到SPs和非SPs平面；我们强调对SPs的接近检测对指导超声医生的重要性，通过允许在扫描过程中更早和更精确地进行调整，比传统方法提供了重大优势。我们通过对来自不同专业水平的超声医生获取的真实胎儿扫描视频进行验证，展示了我们方法的实际适用性。我们的研究结果表明了我们的方法在补充现有胎儿US技术和推进产前诊断实践方面的潜力。

更新时间: 2024-04-10 16:04:21

领域: cs.CV,cs.AI,I.2.0; I.4.0; J.2.0; J.3.0

下载: http://arxiv.org/abs/2404.07124v1

Semantically-correlated memories in a dense associative model

I introduce a novel associative memory model named Correlated Dense Associative Memory (CDAM), which integrates both auto- and hetero-association in a unified framework for continuous-valued memory patterns. Employing an arbitrary graph structure to semantically link memory patterns, CDAM is theoretically and numerically analysed, revealing four distinct dynamical modes: auto-association, narrow hetero-association, wide hetero-association, and neutral quiescence. Drawing inspiration from inhibitory modulation studies, I employ anti-Hebbian learning rules to control the range of hetero-association, extract multi-scale representations of community structures in graphs, and stabilise the recall of temporal sequences. Experimental demonstrations showcase CDAM's efficacy in handling real-world data, replicating a classical neuroscience experiment, performing image retrieval, and simulating arbitrary finite automata.

Updated: 2024-04-10 16:04:07

标题: 密集关联模型中的语义相关记忆

摘要: 我介绍了一种名为相关稠密联想记忆（CDAM）的新型联想记忆模型，它在一个统一的框架中集成了自动关联和异质关联，用于连续值记忆模式。利用任意图结构来语义链接记忆模式，CDAM 在理论和数值上进行了分析，揭示了四种不同的动态模式：自动关联、窄异质关联、宽异质关联和中性静息。从抑制性调制研究中获得灵感，我采用反希伯学习规则来控制异质关联的范围，提取图中社区结构的多尺度表示，并稳定时间序列的召回。实验演示展示了CDAM 在处理真实世界数据、复制经典神经科学实验、执行图像检索和模拟任意有限自动机方面的有效性。

更新时间: 2024-04-10 16:04:07

领域: cs.NE,cs.AI,cs.LG,q-bio.NC,68T07, 92B20, 68T01, 00A69,I.2; I.5; I.4; J.2; J.3

下载: http://arxiv.org/abs/2404.07123v1

Continuous Language Model Interpolation for Dynamic and Controllable Text Generation

As large language models (LLMs) have gained popularity for a variety of use cases, making them adaptable and controllable has become increasingly important, especially for user-facing applications. While the existing literature on LLM adaptation primarily focuses on finding a model (or models) that optimizes a single predefined objective, here we focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences. For this, we leverage adaptation methods based on linear weight interpolation, casting them as continuous multi-domain interpolators that produce models with specific prescribed generation characteristics on-the-fly. Specifically, we use low-rank updates to fine-tune a base model to various different domains, yielding a set of anchor models with distinct generation profiles. Then, we use the weight updates of these anchor models to parametrize the entire (infinite) class of models contained within their convex hull. We empirically show that varying the interpolation weights yields predictable and consistent change in the model outputs with respect to all of the controlled attributes. We find that there is little entanglement between most attributes and identify and discuss the pairs of attributes for which this is not the case. Our results suggest that linearly interpolating between the weights of fine-tuned models facilitates predictable, fine-grained control of model outputs with respect to multiple stylistic characteristics simultaneously.

Updated: 2024-04-10 15:55:07

标题: 连续语言模型插值用于动态和可控文本生成

摘要: 随着大型语言模型（LLMs）在各种用例中日益受到欢迎，使它们适应性和可控性变得越来越重要，特别是对于面向用户的应用程序。尽管现有文献中关于LLM适应性的研究主要集中在找到一个优化单一预定义目标的模型（或模型），但在这里，我们关注的是一个具有挑战性的情况，即模型必须动态适应各种不同（通常是变化的）用户偏好。为此，我们利用基于线性权重插值的适应方法，将它们构建为连续的多域插值器，即实时生成具有特定指定生成特征的模型。具体而言，我们使用低秩更新对基础模型进行微调，以适应各种不同的领域，产生一组具有不同生成特征的锚定模型。然后，我们使用这些锚定模型的权重更新来参数化包含在它们凸包内的整个（无限）模型类。我们在实验中表明，通过改变插值权重，可以预测和一致地改变模型输出，从而控制所有受控属性。我们发现大多数属性之间存在很少的交织，并确定和讨论这种情况不成立的属性对。我们的结果表明，在精调模型的权重之间进行线性插值有助于同时对多个风格特征进行可预测、细粒度的控制。

更新时间: 2024-04-10 15:55:07

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.07117v1

Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision

Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field, only using onboard sensing and computing. One of the key ideas to achieve this is the use of high-dimensional features from pre-trained self-supervised models, which implicitly encode semantic information that massively simplifies the learning task. Further, the development of an online scheme for supervision generator enables concurrent training and inference of the learned model in the wild. We demonstrate our approach through diverse real-world deployments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex, previously unseen outdoor terrains. Code: https://bit.ly/498b0CV - Project page:https://bit.ly/3M6nMHH

Updated: 2024-04-10 15:47:35

标题: 野外视觉导航：通过预训练模型和在线自我监督实现快速可穿越性学习

摘要: 自然环境，如森林和草地，对机器人导航具有挑战性，因为高草、树枝或灌木造成了刚性障碍物的错误感知。在这项工作中，我们提出了野外视觉导航（WVN），这是一个用于视觉可穿越性估计的在线自监督学习系统。该系统能够通过现场简短的人类演示不断适应，仅使用板载传感和计算。实现这一点的关键思想之一是使用来自预训练的自监督模型的高维特征，这些特征隐含地编码了语义信息，大大简化了学习任务。此外，监督生成器的在线方案的开发使得在野外同时训练和推断学习模型成为可能。我们通过在森林、公园和草地进行各种真实世界部署来展示我们的方法。我们的系统能够在不到5分钟的现场训练时间内引导可穿越地形的分割，使机器人能够在复杂的、以前未见过的室外地形中导航。代码：https://bit.ly/498b0CV - 项目页面：https://bit.ly/3M6nMHH

更新时间: 2024-04-10 15:47:35

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07110v1

Dual Prompt Tuning for Domain-Aware Federated Learning

Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.

Updated: 2024-04-10 15:44:27

标题: 领域感知联邦学习的双重提示调整

摘要: 联邦学习是一种分布式机器学习范式，允许多个客户端共同使用本地数据训练共享模型。然而，传统的联邦学习算法常常由于客户端之间普遍存在的领域偏移而难以很好地泛化。在这项工作中，我们考虑了一个具有挑战性但又现实的联邦学习场景，其中每个客户端的训练数据来自不同的领域。我们通过利用提示学习技术来解决领域偏移的挑战，并提出了一种名为联邦双提示调整（Fed-DPT）的新方法。具体来说，Fed-DPT利用预训练的视觉-语言模型，然后同时应用视觉和文本提示调整来促进去中心化数据上的领域适应性。Fed-DPT的大量实验表明其在领域感知的联邦学习中具有显著的有效性。使用预训练的CLIP模型（ViT-Base作为图像编码器），所提出的Fed-DPT在DomainNet数据集的六个领域上获得了68.4%的平均准确率，比原始的CLIP提高了14.8%。

更新时间: 2024-04-10 15:44:27

领域: cs.LG

下载: http://arxiv.org/abs/2310.03103v4

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and co-authorships) which form a (text-attributed) graph. The knowledge in such graphs is encoded not only in single texts/nodes but also in their associated connections. To facilitate the research of augmenting LLMs with graphs, we manually construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs. Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively. Each Graph-CoT iteration consists of three sub-steps: LLM reasoning, LLM-graph interaction, and graph execution. We conduct systematic experiments with three LLM backbones on GRBench, where Graph-CoT outperforms the baselines consistently. The code is available at https://github.com/PeterGriffinJin/Graph-CoT.

Updated: 2024-04-10 15:41:53

标题: 思维链图：通过在图上推理来增强大型语言模型

摘要: 大型语言模型（LLMs）表现出色，但在知识密集型任务中容易出现幻觉。现有作品建议通过从外部知识语料库中检索的个别文本单元来增强LLMs以缓解这一问题。然而，在许多领域中，文本是相互关联的（例如，文献图中的学术论文通过引用和合著关系相连），形成了一个（文本属性）图。这种图中的知识不仅编码在单个文本/节点中，还编码在它们的关联连接中。为了促进LLMs与图的增强研究，我们手动构建了一个名为GRBench的图推理基准数据集，其中包含1,740个可通过10个领域图中的知识回答的问题。然后，我们提出了一个简单且有效的框架，称为图思维链（Graph-CoT），通过鼓励LLMs迭代地在图上进行推理来增强LLMs与图的结合。每个Graph-CoT迭代包括三个子步骤：LLM推理、LLM-图交互和图执行。我们在GRBench上对三种LLM骨干进行了系统实验，其中Graph-CoT始终优于基线。代码可在https://github.com/PeterGriffinJin/Graph-CoT获取。

更新时间: 2024-04-10 15:41:53

领域: cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.07103v1

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments. We first propose a clarification of terminology for OOD detection in RL, which aligns it with the literature from other machine learning domains. We then present new benchmark scenarios for OOD detection, which introduce anomalies with temporal autocorrelation into different components of the agent-environment loop. We argue that such scenarios have been understudied in the current literature, despite their relevance to real-world situations. Confirming our theoretical predictions, our experimental results suggest that state-of-the-art OOD detectors are not able to identify such anomalies. To address this problem, we propose a novel method for OOD detection, which we call DEXTER (Detection via Extraction of Time Series Representations). By treating environment observations as time series data, DEXTER extracts salient time series features, and then leverages an ensemble of isolation forest algorithms to detect anomalies. We find that DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting superior performance compared to both state-of-the-art OOD detectors and high-dimensional changepoint detectors adopted from statistics.

Updated: 2024-04-10 15:39:49

标题: 重新思考强化学习中的外分布检测：推进评估和检测方法

摘要: 强化学习（RL）算法已成功应用于许多顺序决策问题，但它们在未来测试环境中的泛化仍然是一个重要的问题。本文研究了RL中的分布外（OOD）检测问题，重点是在测试时识别RL代理在训练环境中没有遇到的情况。我们首先提出了对RL中的OOD检测术语的澄清，将其与其他机器学习领域的文献保持一致。然后，我们为OOD检测提出了新的基准场景，向代理-环境循环的不同组件引入具有时间自相关性的异常。我们认为，尽管这些场景与现实世界的情况相关，但在当前文献中尚未得到充分研究。通过实验证实了我们的理论预测，我们的实验结果表明，最先进的OOD检测器无法识别这种异常。为了解决这个问题，我们提出了一种新颖的OOD检测方法，称为DEXTER（通过提取时间序列表示进行检测）。DEXTER将环境观察视为时间序列数据，提取显著的时间序列特征，然后利用一组孤立森林算法来检测异常。我们发现，DEXTER能够可靠地识别基准场景中的异常，表现出比最先进的OOD检测器和从统计学中采用的高维变点检测器更优异的性能。

更新时间: 2024-04-10 15:39:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07099v1

Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect Communication

Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded traffic. To cover the limitations of former methods, in this paper, we propose a novel approach to realize an optimized CP under constrained communications. At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range and enhance the Object Detection (OD) accuracy of the ego vehicle. In this two-step process, we first select the helper vehicles that contribute the most to CP based on their visual range and lowest motion blur. Next, we implement a radio block optimization among the candidate vehicles to further improve communication efficiency. We specifically focus on pedestrian detection as an exemplary scenario. To validate our approach, we used the CARLA simulator to create a dataset of annotated videos for different driving scenarios where pedestrian detection is challenging for an AV with compromised vision. Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception in challenging scenarios, substantially improving driving safety under adverse conditions. Finally, we note that the networking assumptions are adopted from LTE Release 14 Mode 4 side-link communication, commonly used for Vehicle-to-Vehicle (V2V) communication. Nonetheless, our method is flexible and applicable to arbitrary V2V communications.

Updated: 2024-04-10 15:37:15

标题: 使用不完善的通信增强自动驾驶车辆的协同感知

摘要: 分享和联合处理摄像头视频和传感器测量数据，被称为合作感知（CP），已经成为实现更高感知质量的新技术。在自动驾驶汽车（AVs）中，CP可以增强安全性，因为它们的个别视觉感知质量受到不利天气条件（如雾天），低照明，曲折道路和拥挤交通的影响。为了覆盖以前方法的局限性，在本文中，我们提出了一种新颖的方法，以在受限通信条件下实现优化的CP。我们方法的核心是从可用的前车列表中招募最佳助手，以增加视觉范围并增强自车的目标检测（OD）准确性。在这个两步骤过程中，我们首先根据它们的视觉范围和最低运动模糊选择对CP贡献最大的辅助车辆。接下来，我们在候选车辆之间实施无线电块优化，进一步提高通信效率。我们特别关注行人检测作为一个示范性场景。为了验证我们的方法，我们使用CARLA模拟器创建了一组注释视频数据集，用于不同驾驶场景，其中行人检测对于视觉受损的AV来说是具有挑战性的。我们的结果证明了我们的两步优化过程在改善具有挑战性场景中合作感知的整体性能方面的有效性，从而在不利条件下大幅提高驾驶安全性。最后，我们注意到，网络假设采用了LTE Release 14 Mode 4侧链通信，通常用于车辆间通信（V2V）。然而，我们的方法灵活且适用于任意V2V通信。

更新时间: 2024-04-10 15:37:15

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08013v1

TransTARec: Time-Adaptive Translating Embedding Model for Next POI Recommendation

The rapid growth of location acquisition technologies makes Point-of-Interest(POI) recommendation possible due to redundant user check-in records. In this paper, we focus on next POI recommendation in which next POI is based on previous POI. We observe that time plays an important role in next POI recommendation but is neglected in the recent proposed translating embedding methods. To tackle this shortage, we propose a time-adaptive translating embedding model (TransTARec) for next POI recommendation that naturally incorporates temporal influence, sequential dynamics, and user preference within a single component. Methodologically, we treat a (previous timestamp, user, next timestamp) triplet as a union translation vector and develop a neural-based fusion operation to fuse user preference and temporal influence. The superiority of TransTARec, which is confirmed by extensive experiments on real-world datasets, comes from not only the introduction of temporal influence but also the direct unification with user preference and sequential dynamics.

Updated: 2024-04-10 15:36:59

标题: TransTARec：面向下一个POI推荐的时间自适应翻译嵌入模型

摘要: 定位获取技术的快速增长使得由于冗余用户签到记录，兴趣点（POI）推荐成为可能。在本文中，我们关注下一个POI推荐，其中下一个POI基于前一个POI。我们观察到时间在下一个POI推荐中发挥重要作用，但在最近提出的转换嵌入方法中被忽视。为了解决这一不足，我们提出了一种适应时间的转换嵌入模型（TransTARec）用于下一个POI推荐，该模型自然地融合了时间影响、顺序动态和用户偏好在一个单一组件中。在方法上，我们将（前一个时间戳、用户、下一个时间戳）三元组视为联合翻译向量，并开发了一个基于神经网络的融合操作来融合用户偏好和时间影响。TransTARec的优越性通过在真实数据集上进行的大量实验证实，这不仅源于引入时间影响，还源于与用户偏好和顺序动态的直接统一。

更新时间: 2024-04-10 15:36:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.07096v1

LaTiM: Longitudinal representation learning in continuous-time models to predict disease progression

This work proposes a novel framework for analyzing disease progression using time-aware neural ordinary differential equations (NODE). We introduce a "time-aware head" in a framework trained through self-supervised learning (SSL) to leverage temporal information in latent space for data augmentation. This approach effectively integrates NODEs with SSL, offering significant performance improvements compared to traditional methods that lack explicit temporal integration. We demonstrate the effectiveness of our strategy for diabetic retinopathy progression prediction using the OPHDIAT database. Compared to the baseline, all NODE architectures achieve statistically significant improvements in area under the ROC curve (AUC) and Kappa metrics, highlighting the efficacy of pre-training with SSL-inspired approaches. Additionally, our framework promotes stable training for NODEs, a commonly encountered challenge in time-aware modeling.

Updated: 2024-04-10 15:29:29

标题: LaTiM: 在连续时间模型中进行纵向表征学习，以预测疾病进展

摘要: 这项工作提出了一种新颖的框架，用于使用时间感知神经微分方程（NODE）分析疾病进展。我们引入了一个“时间感知头”在一个通过自监督学习（SSL）训练的框架中，以利用潜在空间中的时间信息进行数据增强。这种方法有效地将NODE与SSL相结合，与缺乏显式时间整合的传统方法相比，提供了显着的性能改进。我们通过使用OPHDIAT数据库进行糖尿病视网膜病变进展预测策略的有效性。与基线相比，所有NODE架构在ROC曲线下面积（AUC）和Kappa指标方面都取得了统计显著的改进，突出了使用SSL启发式方法进行预训练的有效性。此外，我们的框架促进了NODE的稳定训练，这是时间感知建模中常见的挑战。

更新时间: 2024-04-10 15:29:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07091v1

M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

We address the online combinatorial choice of weight multipliers for multi-objective optimization of many loss terms parameterized by neural works via a probabilistic graphical model (PGM) for the joint model parameter and multiplier evolution process, with a hypervolume based likelihood promoting multi-objective descent. The corresponding parameter and multiplier estimation as a sequential decision process is then cast into an optimal control problem, where the multi-objective descent goal is dispatched hierarchically into a series of constraint optimization sub-problems. The subproblem constraint automatically adapts itself according to Pareto dominance and serves as the setpoint for the low level multiplier controller to schedule loss landscapes via output feedback of each loss term. Our method is multiplier-free and operates at the timescale of epochs, thus saves tremendous computational resources compared to full training cycle multiplier tuning. It also circumvents the excessive memory requirements and heavy computational burden of existing multi-objective deep learning methods. We applied it to domain invariant variational auto-encoding with 6 loss terms on the PACS domain generalization task, and observed robust performance across a range of controller hyperparameters, as well as different multiplier initial conditions, outperforming other multiplier scheduling methods. We offered modular implementation of our method, admitting extension to custom definition of many loss terms.

Updated: 2024-04-10 15:25:00

标题: M-HOF-Opt: 多目标层次输出反馈优化通过乘数诱导损失景观调度

摘要: 我们通过概率图模型（PGM）解决了神经网络参数化的多个损失项的多目标优化中的在线组合权重乘数选择问题，通过基于超体积的似然促进多目标下降的模型参数和乘数演化过程。相应的参数和乘数估计作为一个顺序决策过程被转化为一个最优控制问题，其中多目标下降目标被分层地分配到一系列约束优化子问题中。子问题约束根据Pareto支配自动调整，并作为低级乘数控制器调度损失景观的设定点，通过每个损失项的输出反馈。我们的方法是无乘数的，以epoch为时间尺度运行，与完整训练周期的乘数调整相比，节省了大量计算资源。它也规避了现有多目标深度学习方法的过多内存需求和繁重的计算负担。我们将其应用于在PACS领域泛化任务上具有6个损失项的域不变变分自动编码，观察到在一系列控制器超参数和不同乘数初始条件下的稳健表现，优于其他乘数调度方法。我们提供了我们方法的模块化实现，允许扩展到定义多个损失项。

更新时间: 2024-04-10 15:25:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.13728v2

Understanding Video Transformers via Universal Concept Discovery

This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal with the added temporal dimension, increasing complexity and posing challenges in identifying dynamic concepts over time. In this work, we systematically address these challenges by introducing the first Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an efficient approach for unsupervised identification of units of video transformer representations - concepts, and ranking their importance to the output of a model. The resulting concepts are highly interpretable, revealing spatio-temporal reasoning mechanisms and object-centric representations in unstructured video models. Performing this analysis jointly over a diverse set of supervised and self-supervised representations, we discover that some of these mechanism are universal in video transformers. Finally, we show that VTCD can be used for fine-grained action recognition and video object segmentation.

Updated: 2024-04-10 15:19:07

标题: 通过通用概念发现理解视频变换器

摘要: 这篇论文研究了基于概念的视频变压器表示的可解释性问题。具体而言，我们试图解释基于高级时空概念的视频变压器的决策过程，这些概念是自动发现的。先前关于基于概念的可解释性的研究仅集中在图像级任务上。相比之下，视频模型涉及到额外的时间维度，增加了复杂性，并在识别随时间变化的动态概念方面提出了挑战。在这项工作中，我们通过引入第一个视频变压器概念发现（VTCD）算法系统地解决了这些挑战。为此，我们提出了一种有效的方法来无监督地识别视频变压器表示的单元 - 概念，并对它们对模型输出的重要性进行排名。所得到的概念具有很高的可解释性，揭示了在非结构化视频模型中的时空推理机制和以物体为中心的表示。通过在多样的监督和自监督表示上联合进行这种分析，我们发现一些机制在视频变压器中是普遍存在的。最后，我们展示了VTCD可以用于细粒度动作识别和视频对象分割。

更新时间: 2024-04-10 15:19:07

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2401.10831v3

Dynamic Generation of Personalities with Large Language Models

In the realm of mimicking human deliberation, large language models (LLMs) show promising performance, thereby amplifying the importance of this research area. Deliberation is influenced by both logic and personality. However, previous studies predominantly focused on the logic of LLMs, neglecting the exploration of personality aspects. In this work, we introduce Dynamic Personality Generation (DPG), a dynamic personality generation method based on Hypernetworks. Initially, we embed the Big Five personality theory into GPT-4 to form a personality assessment machine, enabling it to evaluate characters' personality traits from dialogues automatically. We propose a new metric to assess personality generation capability based on this evaluation method. Then, we use this personality assessment machine to evaluate dialogues in script data, resulting in a personality-dialogue dataset. Finally, we fine-tune DPG on the personality-dialogue dataset. Experiments prove that DPG's personality generation capability is stronger after fine-tuning on this dataset than traditional fine-tuning methods, surpassing prompt-based GPT-4.

Updated: 2024-04-10 15:17:17

标题: 大型语言模型动态生成个性

摘要: 在模拟人类思考过程的领域中，大型语言模型（LLMs）表现出有希望的性能，从而加大了这一研究领域的重要性。思考过程受逻辑和个性的影响。然而，先前的研究主要集中在LLMs的逻辑方面，忽视了对个性方面的探索。在这项工作中，我们介绍了基于超网络的动态人格生成（DPG）方法。首先，我们将大五人格理论嵌入到GPT-4中，形成一个人格评估机器，使其能够自动评估角色的人格特征从对话中。我们提出了一种基于这种评估方法的评估人格生成能力的新指标。然后，我们使用这个人格评估机器来评估剧本数据中的对话，从而产生一个人格对话数据集。最后，我们在人格对话数据集上对DPG进行微调。实验证明，DPG在这个数据集上微调后的人格生成能力比传统的微调方法更强，超越了基于提示的GPT-4。

更新时间: 2024-04-10 15:17:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07084v1

Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting

Overparameterized deep neural networks (DNNs), if not sufficiently regularized, are susceptible to overfitting their training examples and not generalizing well to test data. To discourage overfitting, researchers have developed multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance in one or more layers of the network. By analyzing the penultimate feature layer activations output by a DNN's feature extraction section prior to the linear classifier, we find that modified forms of the intra-class feature covariance and inter-class prototype separation are key components of a fundamental Chebyshev upper bound on the probability of misclassification, which we designate the Chebyshev Prototype Risk (CPR). While previous approaches' covariance loss terms scale quadratically with the number of network features, our CPR bound indicates that an approximate covariance loss in log-linear time is sufficient to reduce the bound and is scalable to large architectures. We implement the terms of the CPR bound into our Explicit CPR (exCPR) loss function and observe from empirical results on multiple datasets and network architectures that our training algorithm reduces overfitting and improves upon previous approaches in many settings. Our code is available $\href{https://github.com/Deano1718/Regularization_exCPR}{here}$.

Updated: 2024-04-10 15:16:04

标题: 最小化切比雪夫原型风险神奇地缓解了过拟合的危险

摘要: 深度神经网络（DNNs）如果没有足够的正则化，很容易过拟合训练样本，并且无法很好地泛化到测试数据。为了避免过拟合，研究人员开发了多组分损失函数，用于减少类内特征相关性并最大化网络中一个或多个层的类间特征距离。通过分析DNN特征提取部分在线性分类器之前输出的次最后特征层激活，我们发现修改后的类内特征协方差和类间原型分离是基本切比雪夫上界（CPR）的关键组成部分，该上界限制了误分类的概率。虽然先前方法的协方差损失项随网络特征数量的平方增长，但我们的CPR上界表明，在对数线性时间内对近似协方差损失进行处理就足以降低上界，并且可扩展到大规模架构。我们将CPR上界的项实现到我们的显式CPR（exCPR）损失函数中，并从多个数据集和网络架构的实验结果中观察到，我们的训练算法在许多情况下减少了过拟合并改进了先前的方法。我们的代码可在此处找到：https://github.com/Deano1718/Regularization_exCPR。

更新时间: 2024-04-10 15:16:04

领域: cs.LG,I.5.1

下载: http://arxiv.org/abs/2404.07083v1

Public-private funding models in open source software development: A case study on scikit-learn

Governments are increasingly allocating funding for open source software (OSS) development to address concerns related to software security, digital sovereignty, and national competitiveness in science and innovation, amongst others. While announcements of governmental funding are generally well-received by OSS developers, we still have a limited understanding of OSS developers evaluate the relative benefits and drawbacks of such funding compared to other types of funding. This paper explores this question through a case study on scikit-learn, a Python library for machine learning, whose funding model combines research grants, commercial sponsorship, community donations, and a 32 million euro grant from the France's artificial intelligence strategy. Through 25 interviews with scikit-learn's maintainers and funders, this study makes two key contributions to research and practice. First, the study illustrates how the maintainers have weaved public and private funding into their project to ensure the continued provision of scikit-learn as a digital public good, as well as the importance of diversified funding and governance protocols for funding to safeguard the community ethos of the project. Second, it offers practical recommendations to various stakeholders. For OSS developer communities, it illustrates the benefits of a diversified funding model in balancing the merits and drawbacks of different funding sources. For companies, it serves as a reminder that sponsoring developers or OSS projects can significantly support OSS maintainers, who often struggle with limited resources and towering workloads. For governments, it emphasises the importance of funding the maintenance of existing OSS in addition to or exclusively funding the development of new OSS libraries or features. The paper concludes with suggestions for future research directions.

Updated: 2024-04-10 15:12:32

标题: 公共-私人资金模式在开源软件开发中的应用：以scikit-learn为例的案例研究

摘要: 政府越来越多地为开源软件（OSS）开发分配资金，以解决与软件安全、数字主权和科学创新国家竞争力等相关的问题。尽管政府资金的宣布通常受到OSS开发者的欢迎，但我们对OSS开发者如何评估此类资金相对于其他类型资金的优势和劣势仍知之甚少。本文通过对scikit-learn的案例研究探讨了这个问题，scikit-learn是一款用于机器学习的Python库，其资金模式包括研究资助、商业赞助、社区捐赠以及法国人工智能战略的3200万欧元资助。通过对scikit-learn的维护者和资助者进行的25次访谈，本研究对研究和实践做出了两个关键贡献。首先，研究说明了维护者如何将公共和私人资金融合到他们的项目中，以确保持续提供scikit-learn作为数字公共产品，以及多样化资金和治理协议对资金的重要性，以保护项目的社区理念。其次，它为各利益相关者提供了实用建议。对于OSS开发者社区，它展示了多样化资金模式在平衡不同资金来源的优点和缺点方面的好处。对于公司来说，它提醒赞助开发者或OSS项目可以显著支持经常面临资源有限和工作量巨大的OSS维护者。对于政府来说，它强调资助现有OSS的维护的重要性，除了或者专门资助新的OSS库或功能的开发。文章最后提出了未来研究方向的建议。

更新时间: 2024-04-10 15:12:32

领域: cs.SE,cs.AI,cs.CY,cs.LG,K.4.1

下载: http://arxiv.org/abs/2404.06484v2

MuPT: A Generative Symbolic Music Pretrained Transformer

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.

Updated: 2024-04-10 15:09:52

标题: MuPT：一种生成符号音乐预训练变压器

摘要: 在本文中，我们探讨了大型语言模型（LLMs）在音乐的预训练中的应用。尽管MIDI在音乐建模中的普遍使用已经被广泛确认，但我们的研究结果表明，LLMs与ABC记谱更加兼容，因为ABC记谱更加符合它们的设计和优势，从而提升了模型在音乐创作中的表现。为了解决在生成过程中来自不同轨道的不对齐措施所带来的挑战，我们提出了一种同步多轨ABC记谱（SMT-ABC记谱）的发展，旨在保持多个音乐轨道之间的连贯性。我们的贡献包括一系列能够处理高达8192个标记的模型，覆盖了我们训练集中90％的符号音乐数据。此外，我们探讨了符号音乐缩放定律（SMS Law）对模型性能的影响。结果表明，这为未来音乐生成研究指明了一个有前途的方向，通过我们的开源贡献为社区主导的研究提供了广泛的资源。

更新时间: 2024-04-10 15:09:52

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.06393v2

Location-guided Head Pose Estimation for Fisheye Image

Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experiments results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.

Updated: 2024-04-10 15:09:22

标题: 基于位置引导的鱼眼图像头部姿态估计

摘要: 摄像机配备鱼眼或超广角镜头可以覆盖广阔的视野范围，这是透视投影无法模拟的。图像边缘区域的严重鱼眼镜头失真会导致现有头部姿态估计模型在未经失真处理的图像上表现下降。本文提出了一种新的头部姿态估计方法，利用图像中头部位置的知识来减少鱼眼失真的负面影响。我们开发了一个端到端的卷积神经网络，通过头部姿态和头部位置的多任务学习来估计头部姿态。我们的网络直接从鱼眼图像中估计头部姿态，无需矫正或校准操作。我们还为实验创建了三个流行的头部姿态估计数据集BIWI、300W-LP和AFLW2000的鱼眼失真版本。实验结果显示，与其他最新的一阶段和两阶段方法相比，我们的网络明显提高了头部姿态估计的准确性。

更新时间: 2024-04-10 15:09:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.18320v2

Is Learning in Biological Neural Networks based on Stochastic Gradient Descent? An analysis using stochastic processes

In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.

Updated: 2024-04-10 15:02:35

标题: 生物神经网络中的学习是否基于随机梯度下降？使用随机过程进行分析

摘要: 近年来，关于生物神经网络（BNNs）中学习与人工神经网络中学习的区别存在激烈的争论。人们常常认为大脑中连接的更新仅依赖于局部信息，因此无法使用随机梯度下降类型的优化方法。本文研究了BNNs中监督学习的随机模型。我们展示了每次学习机会被许多局部更新处理时，（连续）梯度步骤近似发生。这一结果表明随机梯度下降确实可能在优化BNNs中发挥作用。

更新时间: 2024-04-10 15:02:35

领域: q-bio.NC,cs.LG,cs.NE,math.PR,92C20, 68T07

下载: http://arxiv.org/abs/2309.05102v3

AI-Enabled System for Efficient and Effective Cyber Incident Detection and Response in Cloud Environments

The escalating sophistication and volume of cyber threats in cloud environments necessitate a paradigm shift in strategies. Recognising the need for an automated and precise response to cyber threats, this research explores the application of AI and ML and proposes an AI-powered cyber incident response system for cloud environments. This system, encompassing Network Traffic Classification, Web Intrusion Detection, and post-incident Malware Analysis (built as a Flask application), achieves seamless integration across platforms like Google Cloud and Microsoft Azure. The findings from this research highlight the effectiveness of the Random Forest model, achieving an accuracy of 90% for the Network Traffic Classifier and 96% for the Malware Analysis Dual Model application. Our research highlights the strengths of AI-powered cyber security. The Random Forest model excels at classifying cyber threats, offering an efficient and robust solution. Deep learning models significantly improve accuracy, and their resource demands can be managed using cloud-based TPUs and GPUs. Cloud environments themselves provide a perfect platform for hosting these AI/ML systems, while container technology ensures both efficiency and scalability. These findings demonstrate the contribution of the AI-led system in guaranteeing a robust and scalable cyber incident response solution in the cloud.

Updated: 2024-04-10 15:01:40

标题: AI-Enabled System for Efficient and Effective Cyber Incident Detection and Response in Cloud Environments 云环境中高效有效的网络事件检测和响应的AI系统

摘要: 云环境中不断升级的网络威胁的复杂性和数量使得战略上需要进行范式转变。认识到对网络威胁需要自动化和精确的响应，本研究探讨了人工智能（AI）和机器学习（ML）的应用，并提出了一种AI驱动的云环境网络事件响应系统。该系统包括网络流量分类、Web入侵检测和事后恶意软件分析（构建为Flask应用程序），实现了在Google Cloud和Microsoft Azure等平台上的无缝集成。本研究的发现突出了随机森林模型的有效性，网络流量分类器的准确率达到90%，恶意软件分析双模型应用的准确率达到96%。我们的研究强调了AI驱动的网络安全的优势。随机森林模型在分类网络威胁方面表现出色，提供了一种高效且强大的解决方案。深度学习模型显著提高了准确性，其资源需求可以通过基于云的TPU和GPU来管理。云环境本身为托管这些AI/ML系统提供了一个完美的平台，而容器技术则确保了效率和可扩展性。这些发现证明了AI主导的系统在云中保证了一个强大和可扩展的网络事件响应解决方案。

更新时间: 2024-04-10 15:01:40

领域: cs.CR,cs.ET,cs.NI

下载: http://arxiv.org/abs/2404.05602v2

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

This paper studies the phenomenon that different concepts are learned in different layers of large language models, i.e. more difficult concepts are fully acquired with deeper layers. We define the difficulty of concepts by the level of abstraction, and here it is crudely categorized by factual, emotional, and inferential. Each category contains a spectrum of tasks, arranged from simple to complex. For example, within the factual dimension, tasks range from lie detection to categorizing mathematical problems. We employ a probing technique to extract representations from different layers of the model and apply these to classification tasks. Our findings reveal that models tend to efficiently classify simpler tasks, indicating that these concepts are learned in shallower layers. Conversely, more complex tasks may only be discernible at deeper layers, if at all. This paper explores the implications of these findings for our understanding of model learning processes and internal representations. Our implementation is available at \url{https://github.com/Luckfort/CD}.

Updated: 2024-04-10 14:56:40

标题: 探讨概念深度：大型语言模型在不同层次上获取知识的方式？

摘要: 本文研究了不同概念在大型语言模型的不同层中学习的现象，即更难的概念是在更深层次上完全获得的。我们通过抽象级别来定义概念的难度，并在这里通过事实、情感和推理粗略地进行了分类。每个类别包含一系列从简单到复杂的任务。例如，在事实维度内，任务范围从检测谎言到对数学问题分类。我们采用一种探测技术从模型的不同层中提取表示，并将这些应用于分类任务。我们的研究结果表明，模型倾向于高效地分类更简单的任务，表明这些概念是在较浅的层次学习的。相反，更复杂的任务可能只在更深的层次上，如果有的话，才能被辨别出来。本文探讨了这些发现对我们对模型学习过程和内部表示的理解的影响。我们的实现可在\url{https://github.com/Luckfort/CD}上找到。

更新时间: 2024-04-10 14:56:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07066v1

LaPlaSS: Latent Space Planning for Stochastic Systems

Autonomous mobile agents often operate in hazardous environments, necessitating an awareness of safety. These agents can have non-linear, stochastic dynamics that must be considered during planning to guarantee bounded risk. Most state of the art methods require closed-form dynamics to verify plan correctness and safety however modern robotic systems often have dynamics that are learned from data. Thus, there is a need to perform efficient trajectory planning with guarantees on risk for agents without known dynamics models. We propose a "generate-and-test" approach to risk-bounded planning in which a planner generates a candidate trajectory using an approximate linear dynamics model and a validator assesses the risk of the trajectory, computing additional safety constraints for the planner if the candidate does not satisfy the desired risk bound. To acquire the approximate model, we use a variational autoencoder to learn a latent linear dynamics model and encode the planning problem into the latent space to generate the candidate trajectory. The VAE also serves to sample trajectories around the candidate to use in the validator. We demonstrate that our algorithm, LaPlaSS, is able to generate trajectory plans with bounded risk for a real-world agent with learned dynamics and is an order of magnitude more efficient than the state of the art.

Updated: 2024-04-10 14:52:35

标题: LaPlaSS：隐空间规划用于随机系统

摘要: 自主移动代理通常在危险环境中运行，需要意识到安全性。这些代理可能具有非线性、随机动态，在规划过程中必须考虑这些动态以确保有界风险。大多数现有的方法需要闭合形式的动态来验证计划的正确性和安全性，然而现代机器人系统通常具有从数据中学习到的动态。因此，有必要为没有已知动态模型的代理执行具有风险保证的高效轨迹规划。我们提出了一种“生成-测试”方法来进行风险有界规划，其中规划者使用近似线性动态模型生成候选轨迹，验证者评估轨迹的风险，如果候选轨迹不满足所需的风险界限，则为规划者计算额外的安全约束。为了获得近似模型，我们使用变分自动编码器来学习潜在的线性动态模型，并将规划问题编码到潜在空间中生成候选轨迹。VAE还用于在验证器中使用周围的候选轨迹进行采样。我们证明我们的算法LaPlaSS能够为具有学习动态的现实世界代理生成具有有界风险的轨迹计划，并且比现有技术高效一个数量级。

更新时间: 2024-04-10 14:52:35

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.07063v1

On Regression in Extreme Regions

The statistical learning problem consists in building a predictive function $\hat{f}$ based on independent copies of $(X,Y)$ so that $Y$ is approximated by $\hat{f}(X)$ with minimum (squared) error. Motivated by various applications, special attention is paid here to the case of extreme (i.e. very large) observations $X$. Because of their rarity, the contributions of such observations to the (empirical) error is negligible, and the predictive performance of empirical risk minimizers can be consequently very poor in extreme regions. In this paper, we develop a general framework for regression on extremes. Under appropriate regular variation assumptions regarding the pair $(X,Y)$, we show that an asymptotic notion of risk can be tailored to summarize appropriately predictive performance in extreme regions. It is also proved that minimization of an empirical and nonasymptotic version of this 'extreme risk', based on a fraction of the largest observations solely, yields good generalization capacity. In addition, numerical results providing strong empirical evidence of the relevance of the approach proposed are displayed.

Updated: 2024-04-10 14:52:19

标题: 在极端地区的回归分析

摘要: 统计学习问题在于基于独立的$(X,Y)$副本构建一个预测函数$\hat{f}$，使得$Y$被$\hat{f}(X)$以最小(平方)误差逼近。受各种应用的启发，这里特别关注极端(即非常大)观测值$X$的情况。由于它们的稀缺性，这些观测对(经验)误差的贡献可以忽略不计，因此经验风险最小化器在极端区域的预测性能可能非常差。在本文中，我们开发了一个用于极端回归的通用框架。在关于$(X,Y)$对的适当正则变化假设下，我们展示了一个渐近风险概念可以被量身定制，以适当总结在极端区域的预测性能。同时证明了基于仅有最大观测值的一部分的经验和非渐近版本的'极端风险'的最小化可以产生良好的泛化能力。此外，提供了强有力的数值结果，展示了所提出方法的相关性。

更新时间: 2024-04-10 14:52:19

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2303.03084v2

Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study

We present an empirical study of groundedness in long-form question answering (LFQA) by retrieval-augmented large language models (LLMs). In particular, we evaluate whether every generated sentence is grounded in the retrieved documents or the model's pre-training data. Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded, even when those sentences contain correct ground-truth answers. Additionally, we examine the impacts of factors such as model size, decoding strategy, and instruction tuning on groundedness. Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations. This study provides novel insights into the groundedness challenges in LFQA and underscores the necessity for more robust mechanisms in LLMs to mitigate the generation of ungrounded content.

Updated: 2024-04-10 14:50:10

标题: 检索增强的长篇生成中的基本性：一项实证研究

摘要: 我们通过检索增强的大型语言模型（LLMs）进行长篇问题回答（LFQA）中的基础性的经验研究。具体来说，我们评估了每个生成的句子是基于检索的文档还是模型的预训练数据。在3个数据集和4个模型系列中，我们的研究结果显示，显著部分的生成句子一直是不稳定的，即使这些句子包含正确的真实答案。此外，我们还研究了模型大小、解码策略和指导调整等因素对基础性的影响。我们的结果表明，尽管更大的模型倾向于更有效地稳定其输出，但仍有相当一部分正确答案受到幻觉的影响。这项研究为LFQA中的基础性挑战提供了新的见解，并强调了LLMs中更强大机制的必要性，以减轻生成不稳定内容的问题。

更新时间: 2024-04-10 14:50:10

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.07060v1

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation

Metaphors, although occasionally unperceived, are ubiquitous in our everyday language. Thus, it is crucial for Language Models to be able to grasp the underlying meaning of this kind of figurative language. In this work, we present Meta4XNLI, a novel parallel dataset for the tasks of metaphor detection and interpretation that contains metaphor annotations in both Spanish and English. We investigate language models' metaphor identification and understanding abilities through a series of monolingual and cross-lingual experiments by leveraging our proposed corpus. In order to comprehend how these non-literal expressions affect models' performance, we look over the results and perform an error analysis. Additionally, parallel data offers many potential opportunities to investigate metaphor transferability between these languages and the impact of translation on the development of multilingual annotated resources.

Updated: 2024-04-10 14:44:48

标题: Meta4XNLI：一个用于隐喻检测和解释的跨语言平行语料库

摘要: 隐喻虽然有时不被察觉，但在我们日常语言中无处不在。因此，语言模型能够理解这种比喻语言的潜在含义非常重要。在这项工作中，我们提出了Meta4XNLI，这是一个新颖的并行数据集，用于隐喻检测和解释任务，其中包含西班牙语和英语的隐喻标注。我们通过利用我们提出的语料库，通过一系列单语和跨语言实验来研究语言模型对隐喻识别和理解的能力。为了理解这些非字面表达如何影响模型的性能，我们审查结果并进行错误分析。此外，平行数据提供了许多潜在机会，可以研究这些语言之间的隐喻可转移性，以及翻译对多语言注释资源发展的影响。

更新时间: 2024-04-10 14:44:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07053v1

Towards Learning Stochastic Population Models by Gradient Descent

Increasing effort is put into the development of methods for learning mechanistic models from data. This task entails not only the accurate estimation of parameters, but also a suitable model structure. Recent work on the discovery of dynamical systems formulates this problem as a linear equation system. Here, we explore several simulation-based optimization approaches, which allow much greater freedom in the objective formulation and weaker conditions on the available data. We show that even for relatively small stochastic population models, simultaneous estimation of parameters and structure poses major challenges for optimization procedures. Particularly, we investigate the application of the local stochastic gradient descent method, commonly used for training machine learning models. We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty. We give an outlook on how this challenge can be overcome.

Updated: 2024-04-10 14:38:58

标题: 朝向通过梯度下降学习随机人口模型

摘要: 随着对从数据中学习机制模型方法的发展工作不断增加。这项任务不仅涉及参数的准确估计，还涉及适当的模型结构。最近关于动态系统发现的研究将这一问题阐述为一个线性方程系统。在这里，我们探讨了几种基于模拟的优化方法，这些方法允许在目标制定和可用数据上放宽条件。我们发现，即使对于相对较小的随机人口模型，同时估计参数和结构也会对优化程序提出重大挑战。特别地，我们研究了局部随机梯度下降方法的应用，这种方法通常用于训练机器学习模型。我们展示了模型的准确估计，但发现强制推断出简约、可解释的模型会极大增加难度。我们展望了如何克服这一挑战。

更新时间: 2024-04-10 14:38:58

领域: cs.LG

下载: http://arxiv.org/abs/2404.07049v1

Comparison of decision trees with Local Interpretable Model-Agnostic Explanations (LIME) technique and multi-linear regression for explaining support vector regression model in terms of root mean square error (RMSE) values

In this work the decision trees are used for explanation of support vector regression model. The decision trees act as a global technique as well as a local technique. They are compared against the popular technique of LIME which is a local explanatory technique and with multi linear regression. It is observed that decision trees give a lower RMSE value when fitted to support vector regression as compared to LIME in 87% of the runs over 5 datasets. The comparison of results is statistically significant. Multi linear regression also gives a lower RMSE value when fitted to support vector regression model as compared to LIME in 73% of the runs over 5 datasets but the comparison of results is not statistically significant. Also, when used as a local explanatory technique, decision trees give better performance than LIME and the comparison of results is statistically significant.

Updated: 2024-04-10 14:36:35

标题: 决策树与局部可解释的模型无关解释（LIME）技术和多元线性回归在解释支持向量回归模型的根均方误差（RMSE）值方面的比较

摘要: 在这项工作中，决策树被用于解释支持向量回归模型。决策树被用作全局技术和局部技术。它们与流行的LIME技术进行了比较，后者是一种局部解释技术，同时也与多元线性回归进行了比较。观察到，与LIME相比，决策树在适配支持向量回归时给出了更低的RMSE值，这在5个数据集中的87%的运行中都是如此。结果的比较具有统计学意义。在5个数据集的73%的运行中，多元线性回归在适配支持向量回归模型时也给出了更低的RMSE值，但结果的比较并不具有统计学意义。此外，当作为局部解释技术时，决策树比LIME表现更好，结果的比较具有统计学意义。

更新时间: 2024-04-10 14:36:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07046v1

Remote Scheduler Contention Attacks

In this paper, we investigate unexplored aspects of scheduler contention: We systematically study the leakage of all scheduler queues on AMD Zen 3 and show that all queues leak. We mount the first scheduler contention attacks on Zen 4, with a novel measurement method evoking an out-of-order race condition, more precise than the state of the art. We demonstrate the first inter-keystroke timing attacks based on scheduler contention, with an F1 score of $\geq$ 99.5 % and a standard deviation below 4 ms from the ground truth. Our end-to-end JavaScript attack transmits across Firefox instances, bypassing cross-origin policies and site isolation, with 891.9 bit/s (Zen 3) and 940.7 bit/s (Zen 4).

Updated: 2024-04-10 14:32:30

标题: 远程调度器争用攻击

摘要: 在这篇论文中，我们研究了调度程序争用的未探索方面：我们系统地研究了 AMD Zen 3 上所有调度程序队列的泄漏，并展示了所有队列都存在泄漏。我们首次在 Zen 4 上发起调度程序争用攻击，采用一种新颖的测量方法引发了一种超出顺序的竞争条件，比现有技术更精确。我们展示了基于调度程序争用的首个间击键定时攻击，F1 得分≥99.5％，标准差低于地面真相的 4 毫秒。我们的端到端 JavaScript 攻击在 Firefox 实例之间传输，绕过跨域政策和站点隔离，速率分别为 891.9 bit/s（Zen 3）和 940.7 bit/s（Zen 4）。

更新时间: 2024-04-10 14:32:30

领域: cs.CR

下载: http://arxiv.org/abs/2404.07042v1

PLAN: Variance-Aware Private Mean Estimation

Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise}$ (PLAN), a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\|\boldsymbol{\sigma}\|_1$. Previous work has either not taken $\boldsymbol{\sigma}$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\|\boldsymbol{\sigma}\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data.

Updated: 2024-04-10 14:30:58

标题: 计划：方差感知的私密均值估计

摘要: 差分隐私均值估计是隐私保护算法在数据分析和机器学习中的重要基础。尽管在最坏情况下隐私与效用之间的权衡已被充分理解，但许多数据集展现出可能被利用以产生更好算法的结构。在本文中，我们提出了“私有极限调整噪声”（PLAN），这是一组针对从分布$\mathcal{D}$独立采样的输入在$\mathbf{R}^d$上的均值估计的差分隐私算法，其中坐标方差为$\boldsymbol{\sigma} \in \mathbf{R}^d$。类似于马氏距离下的均值估计，PLAN将噪声的形状与数据的形状相匹配，但不同于以前的算法，隐私预算在坐标上是非均匀分配的。在对$\mathcal{D}$进行集中假设的条件下，我们展示了如何利用向量$\boldsymbol{\sigma}$中的偏斜，得到一个（零集中）差分隐私均值估计，其$\ell_2$误差与$\|\boldsymbol{\sigma}\|_1$成正比。以前的工作要么没有考虑$\boldsymbol{\sigma}$，要么用马氏距离衡量误差，在这两种情况下，$\ell_2$误差与$\sqrt{d}\|\boldsymbol{\sigma}\|_2$成正比，可能高出一个因子$\sqrt{d}$。为了验证PLAN的有效性，我们在合成数据和实际数据上进行了经验评估。

更新时间: 2024-04-10 14:30:58

领域: cs.CR,cs.DS,cs.LG

下载: http://arxiv.org/abs/2306.08745v3

Characterizing and Classifying Developer Forum Posts with their Intentions

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.

Updated: 2024-04-10 14:25:30

标题: 对开发者论坛帖子的意图进行表征和分类

摘要: 随着开发者社区的迅速增长，线上技术论坛上的帖子数量也在迅速增长，这给用户过滤有用帖子和找到重要信息带来了困难。标签为用户提供了一个简洁的特征维度，让他们可以定位自己感兴趣的帖子，同时也让搜索引擎可以根据查询索引最相关的帖子。然而，大多数标签只关注技术角度（例如，编程语言、平台、工具）。在线开发者社区的论坛帖子通常会展示作者解决问题、寻求建议、分享信息等意图。对帖子意图的建模可以为当前标签分类法提供额外的维度。通过参考以往研究并从业界角度学习，我们为技术论坛帖子意图创建了一个精细的分类法。通过手动标记和分析从在线论坛中提取的样本帖子数据集，我们了解了帖子构成（代码、错误信息）与其意图之间的相关性。受我们手动研究的启发，我们设计了一个基于预训练transformer的模型，可以自动预测帖子意图。我们的意图预测框架的最佳变体，在Micro F1得分为0.589，Top 1-3准确率为62.6%至87.8%，平均AUC为0.787的情况下，优于现有基线方法。我们关于论坛帖子意图的特征化和自动分类可能有助于论坛维护者或第三方工具开发者改善技术论坛上帖子的组织和检索。我们已在附加材料包中发布了我们的标注数据集和代码。

更新时间: 2024-04-10 14:25:30

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.14279v2

Non-Degenerate One-Time Pad and the integrity of perfectly secret messages

We present a new construction of a One Time Pad (OTP) with inherent diffusive properties and a redundancy injection mechanism that benefits from them. The construction is based on interpreting the plaintext and key as members of a permutation group in the Lehmer code representation after conversion to factoradic. The so constructed OTP translates any perturbation of the ciphertext to an unpredictable, metrically large random perturbation of the plaintext. This allows us to provide unconditional integrity assurance without extra key material. The redundancy is injected using Foata's "pun": the reading of the one-line representation as the cyclic one; we call this Pseudo Foata Injection. We obtain algorithms of quadratic complexity that implement both mechanisms.

Updated: 2024-04-10 14:16:44

标题: 非退化一次性密码和完全保密消息的完整性

摘要: 我们提出了一种新的一次性密码本（OTP）的构造，具有固有的扩散特性和一种从中受益的冗余注入机制。该构造是基于将明文和密钥解释为Lehmer编码表示中的排列群成员，经过转换为factoradic。因此构造的OTP将密文的任何扰动转换为不可预测的、度量大的随机扰动明文。这使我们能够在不需要额外密钥材料的情况下提供无条件的完整性保证。冗余是通过Foata的“pun”注入的：将一行表示读取为循环表示；我们称之为伪Foata注入。我们获得了实现这两种机制的二次复杂度算法。

更新时间: 2024-04-10 14:16:44

领域: cs.CR

下载: http://arxiv.org/abs/2404.07022v1

I still know it's you! On Challenges in Anonymizing Source Code

The source code of a program not only defines its semantics but also contains subtle clues that can identify its author. Several studies have shown that these clues can be automatically extracted using machine learning and allow for determining a program's author among hundreds of programmers. This attribution poses a significant threat to developers of anti-censorship and privacy-enhancing technologies, as they become identifiable and may be prosecuted. An ideal protection from this threat would be the anonymization of source code. However, neither theoretical nor practical principles of such an anonymization have been explored so far. In this paper, we tackle this problem and develop a framework for reasoning about code anonymization. We prove that the task of generating a $k$-anonymous program -- a program that cannot be attributed to one of $k$ authors -- is not computable in the general case. As a remedy, we introduce a relaxed concept called $k$-uncertainty, which enables us to measure the protection of developers. Based on this concept, we empirically study candidate techniques for anonymization, such as code normalization, coding style imitation, and code obfuscation. We find that none of the techniques provides sufficient protection when the attacker is aware of the anonymization. While we observe a notable reduction in attribution performance on real-world code, a reliable protection is not achieved for all developers. We conclude that code anonymization is a hard problem that requires further attention from the research community.

Updated: 2024-04-10 14:16:11

标题: 我仍然知道是你！关于匿名化源代码的挑战

摘要: 程序的源代码不仅定义了其语义，还包含可以识别其作者的微妙线索。几项研究表明，这些线索可以通过机器学习自动提取，并允许在数百名程序员中确定程序的作者。这种归属性对反审查和增强隐私技术的开发人员构成了重大威胁，因为他们变得可识别并可能受到起诉。对这种威胁的理想保护将是源代码的匿名化。然而，迄今为止，尚未探讨过这种匿名化的理论或实践原则。在本文中，我们解决了这个问题，并开发了一个用于推理代码匿名化的框架。我们证明在一般情况下生成一个$k$-anonymous程序的任务--一个不能归属于$k$个作者之一的程序--是不可计算的。作为补救措施，我们引入了一个名为$k$-uncertainty的放松概念，它使我们能够衡量开发人员的保护。基于这个概念，我们经验性地研究了匿名化的候选技术，如代码规范化，编码风格模仿和代码混淆。我们发现当攻击者意识到匿名化时，这些技术都无法提供足够的保护。虽然我们观察到在真实代码上对归属性能力的显着降低，但并非所有开发人员都能获得可靠的保护。我们得出结论，代码匿名化是一个难题，需要研究界进一步关注。

更新时间: 2024-04-10 14:16:11

领域: cs.CR,cs.LG,cs.PL,cs.SE

下载: http://arxiv.org/abs/2208.12553v2

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

Nowadays, the spread of misinformation is a prominent problem in society. Our research focuses on aiding the automatic identification of misinformation by analyzing the persuasive strategies employed in textual documents. We introduce a novel annotation scheme encompassing common persuasive writing tactics to achieve our objective. Additionally, we provide a dataset on health misinformation, thoroughly annotated by experts utilizing our proposed scheme. Our contribution includes proposing a new task of annotating pieces of text with their persuasive writing strategy types. We evaluate fine-tuning and prompt-engineering techniques with pre-trained language models of the BERT family and the generative large language models of the GPT family using persuasive strategies as an additional source of information. We evaluate the effects of employing persuasive strategies as intermediate labels in the context of misinformation detection. Our results show that those strategies enhance accuracy and improve the explainability of misinformation detection models. The persuasive strategies can serve as valuable insights and explanations, enabling other models or even humans to make more informed decisions regarding the trustworthiness of the information.

Updated: 2024-04-10 14:13:29

标题: 使用说服性写作策略来解释和检测健康虚假信息

摘要: 当今，误传信息的传播在社会中是一个突出的问题。我们的研究重点是通过分析文本文档中采用的说服策略，帮助自动识别错误信息。我们引入了一个新颖的注释方案，涵盖常见的说服写作策略，以实现我们的目标。此外，我们提供了一个健康误传信息的数据集，由专家利用我们提出的方案进行了全面注释。我们的贡献包括提出一个新的任务，即用说服写作策略类型注释文本片段。我们评估了使用BERT系列的预训练语言模型和GPT系列的生成大型语言模型的微调和提示工程技术，使用说服策略作为额外信息源。我们评估了在误传信息检测上下文中使用说服策略作为中间标签的效果。我们的结果表明，这些策略提高了准确性，并提高了误传信息检测模型的可解释性。说服策略可以作为有价值的见解和解释，使其他模型甚至人类能够更明智地做出关于信息可信度的决定。

更新时间: 2024-04-10 14:13:29

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2211.05985v4

CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing

Data-driven approaches have emerged as a popular tool for addressing challenges in urban computing. However, current research efforts have primarily focused on limited data sources, which fail to capture the complexity of urban data arising from multiple entities and their interconnections. Therefore, a comprehensive and multifaceted dataset is required to enable more extensive studies in urban computing. In this paper, we present CityNet, a multi-modal urban dataset that incorporates various data, including taxi trajectory, traffic speed, point of interest (POI), road network, wind, rain, temperature, and more, from seven cities. We categorize this comprehensive data into three streams: mobility data, geographical data, and meteorological data. We begin by detailing the generation process and basic properties of CityNet. Additionally, we conduct extensive data mining and machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning, to facilitate the use of CityNet. Our experimental results provide benchmarks for various tasks and methods, and also reveal internal correlations among cities and tasks within CityNet that can be leveraged to improve spatiotemporal forecasting performance. Based on our benchmarking results and the correlations uncovered, we believe that CityNet can significantly contribute to the field of urban computing by enabling research on advanced topics.

Updated: 2024-04-10 14:11:50

标题: CityNet：一份为城市计算研究提供的综合多模式城市数据集

摘要: 数据驱动方法已经成为解决城市计算挑战的一种流行工具。然而，目前的研究工作主要集中在有限的数据来源上，这些数据源无法捕捉由多个实体及其相互关联产生的城市数据的复杂性。因此，需要一个全面多方面的数据集来实现更广泛的城市计算研究。在本文中，我们介绍了CityNet，一个多模式城市数据集，包括来自七个城市的出租车轨迹、交通速度、兴趣点（POI）、道路网络、风、雨、温度等各种数据。我们将这些全面的数据分类为三个流：移动数据、地理数据和气象数据。我们首先详细介绍了CityNet的生成过程和基本属性。此外，我们进行了广泛的数据挖掘和机器学习实验，包括时空预测、迁移学习和强化学习，以促进CityNet的使用。我们的实验结果为各种任务和方法提供了基准，并揭示了可以利用来提高时空预测性能的城市和CityNet内任务之间的内部相关性。根据我们的基准结果和发现的相关性，我们相信CityNet可以通过促进对高级主题的研究，显著推动城市计算领域的发展。

更新时间: 2024-04-10 14:11:50

领域: cs.AI

下载: http://arxiv.org/abs/2106.15802v2

Improving Language Model Reasoning with Self-motivated Learning

Large-scale high-quality training data is important for improving the performance of models. After trained with data that has rationales (reasoning steps), models gain reasoning capability. However, the dataset with high-quality rationales is relatively scarce due to the high annotation cost. To address this issue, we propose \textit{Self-motivated Learning} framework. The framework motivates the model itself to automatically generate rationales on existing datasets. Based on the inherent rank from correctness across multiple rationales, the model learns to generate better rationales, leading to higher reasoning capability. Specifically, we train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning. Experiment results of Llama2 7B on multiple reasoning datasets show that our method significantly improves the reasoning ability of models, even outperforming text-davinci-002 in some datasets.

Updated: 2024-04-10 14:05:44

标题: 通过自我激励学习来提升语言模型的推理能力

摘要: 大规模高质量的训练数据对于提升模型性能至关重要。在使用具有合理性（推理步骤）的数据进行训练后，模型获得了推理能力。然而，由于高昂的标注成本，具有高质量合理性的数据集相对稀缺。为了解决这个问题，我们提出了“自我激励学习”框架。该框架激励模型自动生成现有数据集的合理性。根据多个合理性的正确性排名，模型学习生成更好的合理性，从而提高推理能力。具体来说，我们训练一个奖励模型来评估合理性的质量，并通过强化学习改善推理性能。在多个推理数据集上的实验结果表明，我们的方法显著提高了模型的推理能力，甚至在某些数据集上超过了text-davinci-002。

更新时间: 2024-04-10 14:05:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07017v1

Trajectory-Oriented Policy Optimization with Sparse Rewards

Mastering deep reinforcement learning (DRL) proves challenging in tasks featuring scant rewards. These limited rewards merely signify whether the task is partially or entirely accomplished, necessitating various exploration actions before the agent garners meaningful feedback. Consequently, the majority of existing DRL exploration algorithms struggle to acquire practical policies within a reasonable timeframe. To address this challenge, we introduce an approach leveraging offline demonstration trajectories for swifter and more efficient online RL in environments with sparse rewards. Our pivotal insight involves treating offline demonstration trajectories as guidance, rather than mere imitation, allowing our method to learn a policy whose distribution of state-action visitation marginally matches that of offline demonstrations. We specifically introduce a novel trajectory distance relying on maximum mean discrepancy (MMD) and cast policy optimization as a distance-constrained optimization problem. We then illustrate that this optimization problem can be streamlined into a policy-gradient algorithm, integrating rewards shaped by insights from offline demonstrations. The proposed algorithm undergoes evaluation across extensive discrete and continuous control tasks with sparse and misleading rewards. The experimental findings demonstrate the significant superiority of our proposed algorithm over baseline methods concerning diverse exploration and the acquisition of an optimal policy.

Updated: 2024-04-10 14:05:38

标题: 具有稀疏奖励的轨迹导向策略优化

摘要: 在任务中掌握深度强化学习（DRL）在奖励稀缺的情况下变得具有挑战性。这些有限的奖励仅表示任务是否部分或完全完成，需要代理在获得有意义的反馈之前进行各种探索动作。因此，现有的大多数DRL探索算法很难在合理的时间范围内获得实用策略。为了解决这一挑战，我们提出了一种利用离线演示轨迹加速和更高效的在线RL的方法，适用于奖励稀疏的环境。我们的关键见解是将离线演示轨迹视为指导，而不仅仅是模仿，使我们的方法能够学习一个策略，其状态-动作访问分布基本与离线演示相匹配。我们特别介绍了一种依赖于最大平均差异（MMD）的新型轨迹距离，并将策略优化建模为一个受距离约束的优化问题。然后我们证明这个优化问题可以简化为一个策略梯度算法，集成了根据离线演示洞察力塑造的奖励。提出的算法在具有稀疏和误导性奖励的广泛离散和连续控制任务中进行评估。实验结果表明，我们提出的算法在多样化探索和获取最优策略方面显著优于基线方法。

更新时间: 2024-04-10 14:05:38

领域: cs.LG

下载: http://arxiv.org/abs/2401.02225v3

A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification

Cross-domain text classification aims to adapt models to a target domain that lacks labeled data. It leverages or reuses rich labeled data from the different but related source domain(s) and unlabeled data from the target domain. To this end, previous work focuses on either extracting domain-invariant features or task-agnostic features, ignoring domain-aware features that may be present in the target domain and could be useful for the downstream task. In this paper, we propose a two-stage framework for cross-domain text classification. In the first stage, we finetune the model with mask language modeling (MLM) and labeled data from the source domain. In the second stage, we further fine-tune the model with self-supervised distillation (SSD) and unlabeled data from the target domain. We evaluate its performance on a public cross-domain text classification benchmark and the experiment results show that our method achieves new state-of-the-art results for both single-source domain adaptations (94.17% $\uparrow$1.03%) and multi-source domain adaptations (95.09% $\uparrow$1.34%).

Updated: 2024-04-10 14:03:01

标题: 一个具有自监督蒸馏的两阶段框架用于跨领域文本分类

摘要: 跨领域文本分类旨在将模型调整到缺乏标记数据的目标领域。它利用或重复来自不同但相关源领域的丰富标记数据以及目标领域的未标记数据。为此，先前的工作集中于提取域不变特征或任务不可知特征，忽略了目标领域可能存在的并且对下游任务有用的领域感知特征。在本文中，我们提出了一个用于跨领域文本分类的两阶段框架。在第一阶段，我们使用掩码语言建模（MLM）和来自源领域的标记数据对模型进行微调。在第二阶段，我们进一步使用自监督蒸馏（SSD）和来自目标领域的未标记数据对模型进行微调。我们在公开的跨领域文本分类基准上评估了其性能，实验结果显示我们的方法在单源领域适应（94.17% $\uparrow$1.03%）和多源领域适应（95.09% $\uparrow$1.34%）方面实现了新的最先进结果。

更新时间: 2024-04-10 14:03:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2304.09820v2

Data-Efficient Multimodal Fusion on a Single GPU

The goal of multimodal alignment is to learn a single latent space that is shared between multimodal inputs. The most powerful models in this space have been trained using massive datasets of paired inputs and large-scale computational resources, making them prohibitively expensive to train in many practical scenarios. We surmise that existing unimodal encoders pre-trained on large amounts of unimodal data should provide an effective bootstrap to create multimodal models from unimodal ones at much lower costs. We therefore propose FuseMix, a multimodal augmentation scheme that operates on the latent spaces of arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal alignment, we achieve competitive performance -- and in certain cases outperform state-of-the art methods -- in both image-text and audio-text retrieval, with orders of magnitude less compute and data: for example, we outperform CLIP on the Flickr30K text-to-image retrieval task with $\sim \! 600\times$ fewer GPU days and $\sim \! 80\times$ fewer image-text pairs. Additionally, we show how our method can be applied to convert pre-trained text-to-image generative models into audio-to-image ones. Code is available at: https://github.com/layer6ai-labs/fusemix.

Updated: 2024-04-10 13:58:08

标题: 在单个GPU上进行数据高效的多模态融合

摘要: 多模态对齐的目标是学习一个共享于多模态输入之间的单一潜在空间。在这个领域中最强大的模型是使用大规模数据集和大规模计算资源进行训练的，这使得它们在许多实际场景中训练成本过高。我们推断，现有的在大量单模态数据上预训练的单模态编码器应该提供一个有效的引导，以更低的成本从单模态模型创建多模态模型。因此，我们提出了FuseMix，一种在任意预训练单模态编码器的潜在空间上操作的多模态增强方案。使用FuseMix进行多模态对齐，我们在图像文本和音频文本检索中取得了竞争性的性能，而且在某些情况下超过了最先进的方法--例如，我们在Flickr30K文本到图像检索任务上超过了CLIP，GPU天数减少了约600倍，图像文本对减少了约80倍。此外，我们展示了我们的方法如何应用于将预训练的文本到图像生成模型转换为音频到图像模型。代码可在以下链接获取：https://github.com/layer6ai-labs/fusemix.

更新时间: 2024-04-10 13:58:08

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2312.10144v4

Visibility into AI Agents

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.

Updated: 2024-04-10 13:57:06

标题: AI代理的可见性

摘要: 将商业、科学、政府和个人活动委托给人工智能代理——能够在有限监督下追求复杂目标的系统——可能会加剧现有社会风险，并引入新的风险。理解和减轻这些风险涉及对现有治理结构进行批判性评估，在需要时修订和调整这些结构，并确保关键利益相关者的问责制。关于某些人工智能代理被使用的地点、原因、方式和由谁使用的信息，我们称之为可见性，对于实现这些目标至关重要。在本文中，我们评估了三类增加对人工智能代理可见性的措施：代理标识符、实时监控和活动日志记录。对于每种措施，我们概述了可能的实施方式，这些方式在侵入性和信息性上有所不同。我们分析了这些措施在从集中化到分散化部署环境的光谱上的应用情况，考虑了供应链中的各种参与者，包括硬件和软件服务提供商。最后，我们讨论了我们的措施对隐私和权力集中的影响。进一步研究了解这些措施并减轻它们的负面影响，有助于为人工智能代理的治理奠定基础。

更新时间: 2024-04-10 13:57:06

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2401.13138v4

A Mathematical Theory for Learning Semantic Languages by Abstract Learners

Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model presented in [1] for modeling semantic language, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the size of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to the size of training texts. Upon completion of the training, we propose a method for semantic compression and discuss its application in semantic communication.

Updated: 2024-04-10 13:50:46

标题: 一个数学理论，用于通过抽象学习者学习语义语言

摘要: 最近对大型语言模型（LLMs）的研究取得了重大进展，表明当系统参数数量和训练数据规模超过一定阈值时，学到的技能（能力）会出现。这种现象背后的确切机制尚未完全理解，仍是活跃研究领域。受[1]中用于建模语义语言的技能-文本二部图模型的启发，我们开发了一个数学理论来解释学到的技能的出现，考虑了学习（或训练）过程。我们的方法将技能-文本二部图中的技能学习过程建模为低密度奇偶校验（LDPC）码和不规则重复分槽ALOHA（IRSA）中的迭代解码过程。通过密度演化分析，我们展示了当训练文本的规模与技能数量的比值超过一定阈值时，学到的技能的出现。我们的分析还得出了关于与训练文本规模相关的测试错误的规模律。在训练完成后，我们提出了一种语义压缩方法，并讨论了其在语义通信中的应用。

更新时间: 2024-04-10 13:50:46

领域: cs.CL,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.07009v1

Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control

We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

Updated: 2024-04-10 13:49:20

标题: 多智能体软Actor-Critic与全局损失用于自主移动需求车队控制

摘要: 我们研究了一个针对自主移动需求系统的盈利最大化运营商的顺序决策问题。优化中央运营商的车辆请求调度政策需要高效和有效的车队控制策略。为此，我们采用了多智能体软演员-评论家算法结合加权二部匹配。我们提出了一种新颖的基于车辆的算法架构，并调整了评论家的损失函数以适当考虑全局行为。此外，我们扩展了我们的算法以包含再平衡能力。通过数值实验，我们展示了我们的方法在调度方面超过了现有技术基准高达12.9%，在集成再平衡方面高达38.9%。

更新时间: 2024-04-10 13:49:20

领域: eess.SY,cs.LG,cs.MA,cs.SY

下载: http://arxiv.org/abs/2404.06975v1

Knowledge graphs for empirical concept retrieval

Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid formal definitions of concepts and their operationalization, it can be challenging to establish relevant concept datasets. Here, we address this challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet) for comprehensive concept definition and present a workflow for user-driven data collection in both text and image domains. The concepts derived from knowledge graphs are defined interactively, providing an opportunity for personalization and ensuring that the concepts reflect the user's intentions. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs based on these empirical concept datasets provide robust and accurate explanations. Importantly, we also find good alignment between the models' representations of concepts and the structure of knowledge graphs, i.e., human representations. This supports our conclusion that knowledge graph-based concepts are relevant for XAI.

Updated: 2024-04-10 13:47:22

标题: 知识图谱用于经验概念检索

摘要: 基于概念的可解释人工智能被认为是一种改善复杂模型理解的有前途的工具，即作为个性化可解释性的工具。一个重要类别的基于概念的可解释性方法是通过经验定义的概念构建的，间接通过一组正面和负面示例定义，如TCAV方法（Kim等，2018）。虽然避免概念的正式定义和操作化对用户来说很吸引人，但建立相关的概念数据集可能具有挑战性。在这里，我们利用通用知识图谱（如Wikidata或WordNet）来进行全面的概念定义，并提出了一个用户驱动的数据收集工作流程，涵盖文本和图像领域。从知识图谱中得出的概念是通过交互定义的，为个性化提供了机会，并确保这些概念反映了用户的意图。我们测试了从知识图谱中检索的概念数据集在两种基于概念的可解释性方法上的应用，即概念激活向量（CAVs）和概念激活区域（CARs）（Crabbe和van der Schaar，2022）。我们展示了基于这些经验概念数据集的CAVs和CARs提供了稳健和准确的解释。重要的是，我们还发现模型对概念的表示与知识图谱结构，即人类表示之间存在良好的一致性。这支持我们的结论，即基于知识图谱的概念对可解释人工智能是相关的。

更新时间: 2024-04-10 13:47:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07008v1

Building-road Collaborative Extraction from Remotely Sensed Images via Cross-Interaction

Buildings are the basic carrier of social production and human life; roads are the links that interconnect social networks. Building and road information has important application value in the frontier fields of regional coordinated development, disaster prevention, auto-driving, etc. Mapping buildings and roads from very high-resolution (VHR) remote sensing images have become a hot research topic. However, the existing methods often ignore the strong spatial correlation between roads and buildings and extract them in isolation. To fully utilize the complementary advantages between buildings and roads, we propose a building-road collaborative extraction method based on multi-task and cross-scale feature interaction to improve the accuracy of both tasks in a complementary way. A multi-task interaction module is proposed to interact information across tasks and preserve the unique information of each task, which tackle the seesaw phenomenon in multitask learning. By considering the variation in appearance and structure between buildings and roads, a cross-scale interaction module is designed to automatically learn the optimal reception field for different tasks. Compared with many existing methods that train each task individually, the proposed collaborative extraction method can utilize the complementary advantages between buildings and roads by the proposed inter-task and inter-scale feature interactions, and automatically select the optimal reception field for different tasks. Experiments on a wide range of urban and rural scenarios show that the proposed algorithm can achieve building-road extraction with outstanding performance and efficiency.

Updated: 2024-04-10 13:43:54

标题: 通过交叉相互作用从遥感图像中构建道路协同提取

摘要: 建筑物是社会生产和人类生活的基本载体；道路是连接社会网络的纽带。建筑和道路信息在区域协调发展、灾害预防、自动驾驶等前沿领域具有重要的应用价值。从高分辨率遥感图像中提取建筑和道路已成为热门研究课题。然而，现有方法往往忽视了道路和建筑之间的强烈空间相关性，并孤立地提取它们。为充分利用建筑和道路之间的互补优势，我们提出了一种基于多任务和跨尺度特征交互的建筑-道路协同提取方法，以互补方式提高两个任务的准确性。提出了一个多任务交互模块，用于跨任务交互信息并保留每个任务的独特信息，从而解决多任务学习中的跷跷板现象。通过考虑建筑和道路之间外观和结构的变化，设计了一个跨尺度交互模块，自动学习不同任务的最佳接收领域。与许多现有方法单独训练每个任务不同，提出的协同提取方法可以通过所提出的跨任务和跨尺度特征交互利用建筑和道路之间的互补优势，并自动选择不同任务的最佳接收领域。在广泛的城市和农村场景上进行的实验表明，所提出的算法可以实现出色的性能和效率的建筑-道路提取。

更新时间: 2024-04-10 13:43:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2307.12256v2

WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers

Non-native English speakers (NNES) face challenges in digital workspace communication (e.g., emails, Slack messages), often inadvertently translating expressions from their native languages, which can lead to awkward or incorrect usage. Current AI-assisted writing tools are equipped with fluency enhancement and rewriting suggestions; however, NNES may struggle to grasp the subtleties among various expressions, making it challenging to choose the one that accurately reflects their intent. Such challenges are exacerbated in high-stake text-based communications, where the absence of non-verbal cues heightens the risk of misinterpretation. By leveraging the latest advancements in large language models (LLM) and word embeddings, we propose WordDecipher, an explainable AI-assisted writing tool to enhance digital workspace communication for NNES. WordDecipher not only identifies the perceived social intentions detected in users' writing, but also generates rewriting suggestions aligned with users' intended messages, either numerically or by inferring from users' writing in their native language. Then, WordDecipher provides an overview of nuances to help NNES make selections. Through a usage scenario, we demonstrate how WordDecipher can significantly enhance an NNES's ability to communicate her request, showcasing its potential to transform workspace communication for NNES.

Updated: 2024-04-10 13:40:29

标题: WordDecipher：利用可解释的人工智能增强非英语为母语者的数字工作空间沟通

摘要: 非英语为母语的人在数字工作空间沟通（例如电子邮件、Slack消息）中面临挑战，通常会无意中将表达从他们的母语中翻译过来，这可能导致尴尬或不正确的用法。当前的AI辅助写作工具配备了流畅性增强和重写建议；然而，非英语为母语的人可能难以把握各种表达之间的微妙差别，使得选择一个准确反映他们意图的表达变得困难。这种挑战在高风险的基于文本的沟通中加剧，其中缺乏非言语线索会增加误解的风险。通过利用大语言模型（LLM）和词嵌入的最新进展，我们提出了WordDecipher，一种可解释的AI辅助写作工具，以增强非英语为母语者的数字工作空间沟通。WordDecipher不仅识别用户写作中检测到的社交意图，还生成与用户意图消息一致的重写建议，可以通过数字方式或从用户母语写作中推断。然后，WordDecipher提供细微差别的概述，帮助非英语为母语者做出选择。通过使用场景，我们展示了WordDecipher如何显著提升非英语为母语者沟通请求的能力，展示其改变非英语为母语者工作空间沟通的潜力。

更新时间: 2024-04-10 13:40:29

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07005v1

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation

Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task; we show that L2MAC works for general-purpose extensive text-based tasks, such as writing an entire book; and we provide valuable insights into L2MAC's performance improvement over existing methods.

Updated: 2024-04-10 13:38:30

标题: L2MAC：用于大规模代码生成的大型语言模型自动计算机

摘要: 基于Transformer的大型语言模型(LLMs)受限于底层Transformer架构的固定上下文窗口，阻碍了它们产生长而连贯输出的能力。增强记忆的LLMs是一个有前途的解决方案，但目前的方法无法处理长输出生成任务，因为它们要么只专注于读取内存并将其演变为新内存的串联，要么使用无法适应其他领域的非常专业化的内存。本文介绍了L2MAC，这是第一个实用的基于LLM的通用存储程序自动计算机（冯·诺依曼架构）框架，一个基于LLM的多代理系统，用于长期和一致的输出生成。其记忆有两个组成部分：指令注册表，其中填充有解决用户给定任务的提示程序，以及文件存储，其中将包含最终和中间输出。每个指令依次由一个单独的LLM代理执行，其上下文由一个控制单元管理，该控制单元能够精确读写内存，以确保与文件存储的有效交互。这些组件使L2MAC能够生成广泛的输出，绕过有限上下文窗口的限制，同时产生符合复杂用户指定任务的输出。我们经验性地证明，L2MAC在生成大型代码库进行系统设计任务方面达到了最先进的性能，明显优于其他编码方法在实现详细用户指定任务方面；我们展示了L2MAC适用于通用广泛的基于文本的任务，例如撰写整本书；并且我们提供了关于L2MAC相对于现有方法性能改进的宝贵见解。

更新时间: 2024-04-10 13:38:30

领域: cs.SE,cs.AI,cs.LG,cs.PL,I.2.7; I.2.6; I.2.5; D.2.2; D.2.3; D.3.4

下载: http://arxiv.org/abs/2310.02003v5

Prediction Horizon Requirements for Automated Driving: Optimizing Safety, Comfort, and Efficiency

Predicting the movement of other road users is beneficial for improving automated vehicle (AV) performance. However, the relationship between the time horizon associated with these predictions and AV performance remains unclear. Despite the existence of numerous trajectory prediction algorithms, no studies have been conducted on how varying prediction lengths affect AV safety and other vehicle performance metrics, resulting in undefined horizon requirements for prediction methods. Our study addresses this gap by examining the effects of different prediction horizons on AV performance, focusing on safety, comfort, and efficiency. Through multiple experiments using a state-of-the-art, risk-based predictive trajectory planner, we simulated predictions with horizons up to 20 seconds. Based on our simulations, we propose a framework for specifying the minimum required and optimal prediction horizons based on specific AV performance criteria and application needs. Our results indicate that a horizon of 1.6 seconds is required to prevent collisions with crossing pedestrians, horizons of 7-8 seconds yield the best efficiency, and horizons up to 15 seconds improve passenger comfort. We conclude that prediction horizon requirements are application-dependent, and recommend aiming for a prediction horizon of 11.8 seconds as a general guideline for applications involving crossing pedestrians.

Updated: 2024-04-10 13:34:24

标题: Automated Driving的预测视野需求：优化安全性、舒适性和效率

摘要: 预测其他道路使用者的动态有助于改善自动驾驶车辆（AV）的性能。然而，与这些预测相关的时间视野与AV性能之间的关系仍不清楚。尽管存在许多轨迹预测算法，但还没有研究表明不同预测长度如何影响AV的安全性和其他车辆性能指标，导致了预测方法的未定义视野要求。我们的研究通过研究不同预测视野对AV性能的影响，重点关注安全性、舒适性和效率性。通过使用最先进的基于风险的预测轨迹规划器进行多次实验，我们模拟了长达20秒的预测。根据我们的模拟，我们提出了一个框架，根据具体的AV性能标准和应用需求，确定最低要求和最佳预测视野。我们的结果表明，为了防止与过马路的行人相撞，需要1.6秒的视野，7-8秒的视野实现最佳效率，而长达15秒的视野则提高了乘客的舒适度。我们得出结论，预测视野要求取决于具体应用，建议在涉及过马路行人的应用中，以11.8秒的预测视野作为一般指导。

更新时间: 2024-04-10 13:34:24

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2402.03893v2

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized offline demonstrations to achieve impressive results in multiple hard tasks. However, these approaches place high demands on demonstration quality, and obtaining expert-like actions is often costly and unrealistic. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG), which leverages a small set of state-only demonstrations (where only state information is included in demonstrations) to indirectly make approximate and feasible long-term credit assignments and facilitate exploration. Specifically, we first design a trajectory-importance evaluation mechanism to determine the quality of the current trajectory against demonstrations. Then, we introduce a guidance reward computation technology based on trajectory importance to measure the impact of each state-action pair. We theoretically analyze the performance improvement caused by smooth guidance rewards and derive a new worst-case lower bound on the performance improvement. Extensive results demonstrate POSG's significant advantages in control performance and convergence speed in four sparse-reward environments, including the grid-world maze, Hopper-v4, HalfCheetah-v4, and Ant maze. Notably, the specific metrics and quantifiable results are investigated to demonstrate the superiority of POSG.

Updated: 2024-04-10 13:32:06

标题: 从仅有状态演示学习到的平滑引导的政策优化

摘要: 奖励反馈的稀疏性在在线深度强化学习（DRL）中仍然是一个具有挑战性的问题。先前的方法利用离线演示在多个困难任务中取得了令人印象深刻的结果。然而，这些方法对演示质量提出了很高的要求，获得类似专家的行动往往是昂贵且不切实际的。为了解决这些问题，我们提出了一种简单高效的算法，称为带有平滑引导的策略优化（POSG），它利用仅包含状态信息的少量状态演示间接进行近似和可行的长期信用分配，并促进探索。具体来说，我们首先设计了一个轨迹重要性评估机制，以确定当前轨迹与演示之间的质量。然后，我们介绍了一种基于轨迹重要性的引导奖励计算技术，用于衡量每个状态-动作对的影响。我们在理论上分析了平滑引导奖励带来的性能改进，并推导了性能改进的新的最坏情况下界。大量结果表明，在包括网格世界迷宫、Hopper-v4、HalfCheetah-v4和蚂蚁迷宫在内的四个稀疏奖励环境中，POSG在控制性能和收敛速度方面具有显著优势。值得注意的是，具体的指标和可量化的结果被调查，以展示POSG的优越性。

更新时间: 2024-04-10 13:32:06

领域: cs.LG

下载: http://arxiv.org/abs/2401.00162v2

Event Grounded Criminal Court View Generation withCooperative (Large) Language Models

With the development of legal intelligence, Criminal Court View Generation has attracted much attention as a crucial task of legal intelligence, which aims to generate concise and coherent texts that summarize case facts and provide explanations for verdicts. Existing researches explore the key information in case facts to yield the court views. Most of them employ a coarse-grained approach that partitions the facts into broad segments (e.g., verdict-related sentences) to make predictions. However, this approach fails to capture the complex details present in the case facts, such as various criminal elements and legal events. To this end, in this paper, we propose an Event Grounded Generation (EGG) method for criminal court view generation with cooperative (Large) Language Models, which introduces the fine-grained event information into the generation. Specifically, we first design a LLMs-based extraction method that can extract events in case facts without massive annotated events. Then, we incorporate the extracted events into court view generation by merging case facts and events. Besides, considering the computational burden posed by the use of LLMs in the extraction phase of EGG, we propose a LLMs-free EGG method that can eliminate the requirement for event extraction using LLMs in the inference phase. Extensive experimental results on a real-world dataset clearly validate the effectiveness of our proposed method.

Updated: 2024-04-10 13:31:07

标题: 使用合作（大型）语言模型生成事件驱动的刑事法院观点

摘要: 随着法律智能的发展，刑事法庭观点生成作为法律智能的关键任务受到了广泛关注，其目标是生成简洁连贯的文本，总结案件事实并解释裁决。现有研究探索案件事实中的关键信息以产生法庭观点。其中大多数采用粗粒度方法，将事实划分为广泛的段落（例如与裁决相关的句子）进行预测。然而，这种方法无法捕捉案件事实中存在的复杂细节，例如各种犯罪元素和法律事件。因此，在本文中，我们提出了一种基于事件的生成（EGG）方法，该方法利用协作（大）语言模型进行刑事法庭观点生成，将细粒度事件信息引入生成过程。具体而言，我们首先设计了一种基于LLMs的提取方法，可以在没有大量注释事件的情况下提取案件事实中的事件。然后，我们通过合并案件事实和事件将提取的事件纳入法庭观点生成中。此外，考虑到在EGG的提取阶段中使用LLMs所带来的计算负担，我们提出了一种无LLMs的EGG方法，可以在推理阶段消除对LLMs进行事件提取的需求。对一个真实数据集的广泛实验结果清楚地验证了我们提出方法的有效性。

更新时间: 2024-04-10 13:31:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07001v1

Agent-driven Generative Semantic Communication for Remote Surveillance

In the era of 6G, featuring compelling visions of intelligent transportation system, digital twins, remote surveillance is poised to become a ubiquitous practice. The substantial data volume and frequent updates present challenges in wireless networks. To address this, we propose a novel agent-driven generative semantic communication (A-GSC) framework based on reinforcement learning. In contrast to the existing research on semantic communication (SemCom), which mainly focuses on semantic compression or semantic sampling, we seamlessly cascade both together by jointly considering the intrinsic attributes of source information and the contextual information regarding the task. Notably, the introduction of the generative artificial intelligence (GAI) enables the independent design of semantic encoders and decoders. In this work, we develop an agent-assisted semantic encoder leveraging the knowledge based soft actor-critic algorithm, which can track the semantic changes, channel condition, and sampling intervals, so as to perform adaptive semantic sampling. Accordingly, we design a semantic decoder with both predictive and generative capabilities, which consists of two tailored modules. Moreover, the effectiveness of the designed models has been verified based on the dataset generated from CDNet2014, and the performance gain of the overall A-GSC framework in both energy saving and reconstruction accuracy have been demonstrated.

Updated: 2024-04-10 13:24:27

标题: 基于代理的生成语义通信在远程监视中的应用

摘要: 在6G时代，智能交通系统、数字孪生、远程监视等引人注目的愿景即将成为一种普遍实践。庞大的数据量和频繁的更新给无线网络带来了挑战。为了解决这个问题，我们提出了一种基于强化学习的新型代理驱动生成语义通信（A-GSC）框架。与现有的语义通信（SemCom）研究主要集中在语义压缩或语义采样不同，我们通过联合考虑源信息的固有属性和与任务相关的上下文信息，将两者无缝地串联在一起。值得注意的是，引入生成人工智能（GAI）使得语义编码器和解码器能够独立设计。在这项工作中，我们开发了一个基于知识的软行为者-评论者算法的代理辅助语义编码器，它可以跟踪语义变化、信道条件和采样间隔，从而进行自适应的语义采样。因此，我们设计了一个具有预测和生成能力的语义解码器，它包括两个定制模块。此外，基于从CDNet2014生成的数据集验证了设计模型的有效性，并演示了整体A-GSC框架在节能和重建精度方面的性能增益。

更新时间: 2024-04-10 13:24:27

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2404.06997v1

XNLIeu: a dataset for cross-lingual NLI in Basque

XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI corpus into Basque, followed by a manual post-edition step. We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation. The results show that post-edition is necessary and that the translate-train cross-lingual strategy obtains better results overall, although the gain is lower when tested in a dataset that has been built natively from scratch. Our code and datasets are publicly available under open licenses.

Updated: 2024-04-10 13:19:56

标题: XNLIeu：巴斯克语跨语言自然语言推理数据集

摘要: XNLI是一个流行的自然语言推理（NLI）基准，广泛用于评估跨语言自然语言理解（NLU）能力。在本文中，我们将XNLI扩展到包括巴斯克语，这是一种资源稀缺的语言，可以极大地受益于迁移学习方法。新数据集被命名为XNLIeu，首先将英语XNLI语料库机器翻译成巴斯克语，然后进行手动后编辑步骤。我们进行了一系列实验，使用单语和多语言LLMs来评估：a) 专业后编辑对机器翻译系统的影响；b) 在巴斯克语中最佳的跨语言NLI策略；以及c) 最佳跨语言策略的选择是否受到数据集通过翻译构建的影响。结果显示，后编辑是必要的，而translate-train跨语言策略总体上获得更好的结果，尽管在原生构建的数据集上测试时，增益较低。我们的代码和数据集在开放许可下公开。

更新时间: 2024-04-10 13:19:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.06996v1

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. Here, we propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and robust solutions. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce \texttt{LLMatic}, a Neural Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS directly through prompts, \texttt{LLMatic} uses a procedural approach, leveraging QD for prompts and network architecture to create diverse and high-performing networks. We test \texttt{LLMatic} on the CIFAR-10 and NAS-bench-201 benchmarks, demonstrating that it can produce competitive networks while evaluating just $2,000$ candidates, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark. The open-sourced code is available in \url{https://github.com/umair-nasir14/LLMatic}.

Updated: 2024-04-10 13:18:37

标题: LLMatic：通过大型语言模型和质量多样性优化进行神经架构搜索

摘要: 大型语言模型（LLMs）已经成为能够完成广泛任务的强大工具。它们的能力涵盖了许多领域，其中一个领域它们产生了显著影响的是代码生成领域。在这里，我们提议利用LLMs的编码能力对定义神经网络的代码引入有意义的变化。同时，质量多样性（QD）算法被认为能够发现多样化和稳健的解决方案。通过将LLMs的代码生成能力与QD解决方案的多样性和稳健性相结合，我们引入了LLMatic，一个神经架构搜索（NAS）算法。虽然LLMs直接通过提示进行NAS存在困难，但LLMatic采用了一种程序化方法，利用QD进行提示和网络架构，创建多样化和高性能网络。我们在CIFAR-10和NAS-bench-201基准上测试了LLMatic，表明它可以在评估仅2000个候选人的情况下产生具有竞争力的网络，甚至没有关于基准领域的先前知识或接触到任何先前表现优秀的模型。开源代码可在https://github.com/umair-nasir14/LLMatic找到。

更新时间: 2024-04-10 13:18:37

领域: cs.NE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2306.01102v7

Expediting Building Footprint Extraction from High-resolution Remote Sensing Images via progressive lenient supervision

The efficacy of building footprint segmentation from remotely sensed images has been hindered by model transfer effectiveness. Many existing building segmentation methods were developed upon the encoder-decoder architecture of U-Net, in which the encoder is finetuned from the newly developed backbone networks that are pre-trained on ImageNet. However, the heavy computational burden of the existing decoder designs hampers the successful transfer of these modern encoder networks to remote sensing tasks. Even the widely-adopted deep supervision strategy fails to mitigate these challenges due to its invalid loss in hybrid regions where foreground and background pixels are intermixed. In this paper, we conduct a comprehensive evaluation of existing decoder network designs for building footprint segmentation and propose an efficient framework denoted as BFSeg to enhance learning efficiency and effectiveness. Specifically, a densely-connected coarse-to-fine feature fusion decoder network that facilitates easy and fast feature fusion across scales is proposed. Moreover, considering the invalidity of hybrid regions in the down-sampled ground truth during the deep supervision process, we present a lenient deep supervision and distillation strategy that enables the network to learn proper knowledge from deep supervision. Building upon these advancements, we have developed a new family of building segmentation networks, which consistently surpass prior works with outstanding performance and efficiency across a wide range of newly developed encoder networks.

Updated: 2024-04-10 13:15:41

标题: 通过逐步宽松监督加速从高分辨率遥感图像中提取建筑物占地面积

摘要: 建筑物轮廓分割的效果受到模型转移效果的影响。许多现有的建筑分割方法都是基于U-Net的编码器-解码器架构开发的，其中编码器是根据在ImageNet上预训练的新开发的骨干网络进行微调的。然而，现有解码器设计的巨大计算负担阻碍了这些现代编码器网络成功转移到遥感任务。即使被广泛采用的深度监督策略也无法缓解由于前景和背景像素混合的混合区域中的无效损失而带来的挑战。在本文中，我们对现有的解码器网络设计进行了全面评估，提出了一种名为BFSeg的高效框架，以增强学习效率和效果。具体来说，提出了一种密集连接的粗到细特征融合解码器网络，有助于跨尺度轻松快速地进行特征融合。此外，考虑到在深度监督过程中降采样的地面实况的混合区域的无效性，我们提出了一种宽松的深度监督和蒸馏策略，使网络能够从深度监督中学习正确的知识。基于这些进展，我们开发了一系列新的建筑物分割网络，这些网络在一系列新开发的编码器网络上始终表现出色，性能和效率优越于以前的工作。

更新时间: 2024-04-10 13:15:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2307.12220v2

Quiver Laplacians and Feature Selection

The challenge of selecting the most relevant features of a given dataset arises ubiquitously in data analysis and dimensionality reduction. However, features found to be of high importance for the entire dataset may not be relevant to subsets of interest, and vice versa. Given a feature selector and a fixed decomposition of the data into subsets, we describe a method for identifying selected features which are compatible with the decomposition into subsets. We achieve this by re-framing the problem of finding compatible features to one of finding sections of a suitable quiver representation. In order to approximate such sections, we then introduce a Laplacian operator for quiver representations valued in Hilbert spaces. We provide explicit bounds on how the spectrum of a quiver Laplacian changes when the representation and the underlying quiver are modified in certain natural ways. Finally, we apply this machinery to the study of peak-calling algorithms which measure chromatin accessibility in single-cell data. We demonstrate that eigenvectors of the associated quiver Laplacian yield locally and globally compatible features.

Updated: 2024-04-10 13:12:07

标题: 箭筒拉普拉斯算子和特征选择

摘要: 在数据分析和降维过程中，选择给定数据集中最相关特征的挑战普遍存在。然而，被发现对整个数据集非常重要的特征可能与感兴趣的子集无关，反之亦然。鉴于特征选择器和数据的固定分解为子集，我们描述了一种识别与子集分解兼容的选定特征的方法。我们通过重新构建找到兼容特征的问题，将其转化为找到适当准则表示的部分。为了近似此类部分，我们引入了一个在希尔伯特空间中取值的准则表示的拉普拉斯算子。我们对于当准则和潜在准则以某些自然方式修改时准则拉普拉斯的谱如何变化提供了明确的界限。最后，我们将此方法应用于测量单细胞数据中染色质可及性的峰值调用算法的研究中。我们证明相关准则拉普拉斯的特征向量产生了局部和全局兼容的特征。

更新时间: 2024-04-10 13:12:07

领域: stat.ML,cs.LG,math.CO,math.RT,math.ST,q-bio.QM,stat.TH,16G20, 05C50, 62P05, 62H25

下载: http://arxiv.org/abs/2404.06993v1

On Fixing the Right Problems in Predictive Analytics: AUC Is Not the Problem

Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metric as a gold standard for comparison to AUC ROC, and the article's application of critiques solely to AUC ROC for concerns that would apply to the use of any metric. We conclude with a re-framing of the very valid concerns raised in that article, and discuss how the use of AUC ROC can remain a valid and appropriate practice in a well-informed predictive analytics approach taking those concerns into account. We conclude by discussing the combined use of multiple metrics, including machine learning bias metrics, and AUC ROC's place in such an approach. Like broccoli, AUC ROC is healthy, but also like broccoli, researchers and practitioners in our field shouldn't eat a diet of only AUC ROC.

Updated: 2024-04-10 13:08:07

标题: 解决预测分析中的正确问题：AUC不是问题

摘要: 最近，ACM FAccT发表了一篇由Kwegyir-Aggrey和同事撰写的文章（2023年），批评在几个领域中使用AUC ROC进行预测分析。在本文中，我们对该文章进行了评论。具体而言，我们强调了该论文中对指标进行比较的技术错误，对AUC ROC的解释和目标的错误规范，文章将准确度指标作为与AUC ROC比较的金标准，以及仅将批评应用于AUC ROC的文章关注任何指标使用的担忧。我们总结了在该文章中提出的非常有效的关注点，并讨论了如何在考虑这些问题的情况下，AUC ROC的使用可以保持有效和适当的做法。我们最后讨论了多种指标的综合使用，包括机器学习偏差指标，以及AUC ROC在这种方法中的地位。就像西兰花一样，AUC ROC是健康的，但是像西兰花一样，在我们领域的研究人员和从业者不应该只吃AUC ROC的饮食。

更新时间: 2024-04-10 13:08:07

领域: cs.LG

下载: http://arxiv.org/abs/2404.06989v1

ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues

Aspect Sentiment Understanding (ASU) in interactive scenarios (e.g., Question-Answering and Dialogue) has attracted ever-more interest in recent years and achieved important progresses. However, existing studies on interactive ASU largely ignore the coreference issue for opinion targets (i.e., aspects), while this phenomenon is ubiquitous in interactive scenarios especially dialogues, limiting the ASU performance. Recently, large language models (LLMs) shows the powerful ability to integrate various NLP tasks with the chat paradigm. In this way, this paper proposes a new Chat-based Aspect Sentiment Understanding (ChatASU) task, aiming to explore LLMs' ability in understanding aspect sentiments in dialogue scenarios. Particularly, this ChatASU task introduces a sub-task, i.e., Aspect Chain Reasoning (ACR) task, to address the aspect coreference issue. On this basis, we propose a Trusted Self-reflexion Approach (TSA) with ChatGLM as backbone to ChatASU. Specifically, this TSA treats the ACR task as an auxiliary task to boost the performance of the primary ASU task, and further integrates trusted learning into reflexion mechanisms to alleviate the LLMs-intrinsic factual hallucination problem in TSA. Furthermore, a high-quality ChatASU dataset is annotated to evaluate TSA, and extensive experiments show that our proposed TSA can significantly outperform several state-of-the-art baselines, justifying the effectiveness of TSA to ChatASU and the importance of considering the coreference and hallucination issues in ChatASU.

Updated: 2024-04-10 13:08:07

标题: ChatASU：唤起LLM的反思，真正理解对话中的情感方面

摘要: 最近几年，交互式场景（例如问答和对话）中的方面情感理解（ASU）引起了越来越多的关注，并取得了重要进展。然而，现有的关于交互式ASU的研究很大程度上忽略了观点目标（即方面）的共指问题，而这种现象在交互式场景特别是对话中普遍存在，限制了ASU的性能。最近，大型语言模型（LLMs）显示出将各种自然语言处理任务与聊天范式集成的强大能力。因此，本文提出了一个新的基于聊天的方面情感理解（ChatASU）任务，旨在探索LLMs在对话场景中理解方面情感的能力。特别是，这个ChatASU任务引入了一个子任务，即方面链推理（ACR）任务，来解决方面共指问题。在此基础上，我们提出了一种以ChatGLM为骨干的可信自反思方法（TSA）用于ChatASU。具体而言，这种TSA将ACR任务视为辅助任务，以提高主要ASU任务的性能，并进一步将可信学习整合到反思机制中，以缓解TSA中LLMs特有的事实幻觉问题。此外，我们还标注了一个高质量的ChatASU数据集来评估TSA，并进行了大量实验表明，我们提出的TSA能够显著优于几种最先进的基线模型，证明了TSA对ChatASU的有效性以及在ChatASU中考虑共指和幻觉问题的重要性。

更新时间: 2024-04-10 13:08:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.05326v4

DREAM: Visual Decoding from Reversing Human Visual System

In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.

Updated: 2024-04-10 12:54:12

标题: 梦境：从逆向人类视觉系统解码视觉

摘要: 在这项工作中，我们提出了一种名为DREAM的fMRI到图像的方法，用于从大脑活动中重建被观看的图像，基于对人类视觉系统基本知识的理解。我们设计了模拟人类感知视觉世界的分层和并行本质的逆向路径。这些定制的路径专门用于解读fMRI数据中的语义、颜色和深度线索，反映了从视觉刺激到fMRI记录的正向路径。为此，两个组件模拟了人类视觉系统内部的逆向过程：逆向视觉联合皮层（R-VAC）逆转了这一脑区的路径，从fMRI数据中提取语义信息；逆向并行PKM（R-PKM）组件同时从fMRI信号中预测颜色和深度。实验证明，我们的方法在外观、结构和语义的一致性方面优于当前的最先进模型。代码将公开提供，以促进这一领域的进一步研究。

更新时间: 2024-04-10 12:54:12

领域: cs.CV,cs.LG,eess.IV,q-bio.NC

下载: http://arxiv.org/abs/2310.02265v2

The CAST package for training and assessment of spatial prediction models in R

One key task in environmental science is to map environmental variables continuously in space or even in space and time. Machine learning algorithms are frequently used to learn from local field observations to make spatial predictions by estimating the value of the variable of interest in places where it has not been measured. However, the application of machine learning strategies for spatial mapping involves additional challenges compared to "non-spatial" prediction tasks that often originate from spatial autocorrelation and from training data that are not independent and identically distributed. In the past few years, we developed a number of methods to support the application of machine learning for spatial data which involves the development of suitable cross-validation strategies for performance assessment and model selection, spatial feature selection, and methods to assess the area of applicability of the trained models. The intention of the CAST package is to support the application of machine learning strategies for predictive mapping by implementing such methods and making them available for easy integration into modelling workflows. Here we introduce the CAST package and its core functionalities. At the case study of mapping plant species richness, we will go through the different steps of the modelling workflow and show how CAST can be used to support more reliable spatial predictions.

Updated: 2024-04-10 12:48:10

标题: R中用于空间预测模型训练和评估的CAST包

摘要: 环境科学中的一个关键任务是连续地在空间甚至时空中映射环境变量。机器学习算法经常被用来从本地野外观测中学习，通过估计感兴趣变量的值来做空间预测，即在尚未进行测量的地方。然而，与“非空间”预测任务相比，机器学习策略在空间映射方面的应用涉及额外的挑战，这些挑战通常源于空间自相关性以及训练数据的不独立和不同分布。在过去几年中，我们开发了一些方法来支持机器学习在空间数据中的应用，这些方法包括开发适用于性能评估和模型选择的适当交叉验证策略、空间特征选择以及评估训练模型适用性的方法。CAST软件包的目的是支持机器学习策略用于预测映射，通过实现这些方法并提供易于集成到建模工作流程中。在这里，我们介绍CAST软件包及其核心功能。在映射植物物种丰富度的案例研究中，我们将介绍建模工作流程的不同步骤，并展示CAST如何用于支持更可靠的空间预测。

更新时间: 2024-04-10 12:48:10

领域: stat.ML,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2404.06978v1

Gemma: Open Models Based on Gemini Research and Technology

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

Updated: 2024-04-10 12:32:33

标题: 吉尼姆：基于吉尼米研究与技术的开放模型

摘要: 这项工作介绍了 Gemma，这是一系列轻量级、最新技术的开放模型，是基于用于创建 Gemini 模型的研究和技术构建的。Gemma 模型在语言理解、推理和安全方面展现出强大的性能，适用于学术基准测试。我们发布了两种规模的模型（20亿和70亿个参数），提供预训练和微调检查点。Gemma 在18个基于文本的任务中超越了规模相似的开放模型的11个，并且我们对模型的安全性和责任方面进行了全面评估，同时详细描述了模型的开发过程。我们认为负责任地发布大型语言模型对于提高前沿模型的安全性以及推动下一波大型语言模型创新至关重要。

更新时间: 2024-04-10 12:32:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.08295v2

Toward industrial use of continual learning : new metrics proposal for class incremental learning

In this paper, we investigate continual learning performance metrics used in class incremental learning strategies for continual learning (CL) using some high performing methods. We investigate especially mean task accuracy. First, we show that it lacks of expressiveness through some simple experiments to capture performance. We show that monitoring average tasks performance is over optimistic and can lead to misleading conclusions for future real life industrial uses. Then, we propose first a simple metric, Minimal Incremental Class Accuracy (MICA) which gives a fair and more useful evaluation of different continual learning methods. Moreover, in order to provide a simple way to easily compare different methods performance in continual learning, we derive another single scalar metric that take into account the learning performance variation as well as our newly introduced metric.

Updated: 2024-04-10 12:32:18

标题: 朝着持续学习的工业应用：面向类增量学习的新度量提议

摘要: 在这篇论文中，我们使用一些高性能方法，研究了用于持续学习（CL）的类增量学习策略中使用的持续学习性能指标。我们特别研究了平均任务准确性。首先，我们通过一些简单的实验表明，这种方法在捕捉性能方面缺乏表现力。我们展示了监测平均任务性能是过于乐观的，并且可能导致对未来实际工业应用的误导性结论。然后，我们首先提出了一个简单的指标，最小增量类准确性（MICA），它提供了对不同持续学习方法的公平且更有用的评估。此外，为了提供一种简单的比较不同方法在持续学习中的性能的方法，我们推导出另一个单一标量指标，考虑了学习性能变化以及我们新引入的指标。

更新时间: 2024-04-10 12:32:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.06972v1

TrajPRed: Trajectory Prediction with Region-based Relation Learning

Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. Nevertheless, edge-based relations can be vulnerable under perturbations. To alleviate these issues, we propose a region-based relation learning paradigm that models social interactions via region-wise dynamics of joint states, i.e., the changes in the density of crowds. In particular, region-wise agent joint information is encoded within convolutional feature grids. Social relations are modeled by relating the temporal changes of local joint information from a global perspective. We show that region-based relations are less susceptible to perturbations. In order to account for the stochastic individual goals, we exploit a conditional variational autoencoder to realize multi-goal estimation and diverse future prediction. Specifically, we perform variational inference via the latent distribution, which is conditioned on the correlation between input states and associated target goals. Sampling from the latent distribution enables the framework to reliably capture the stochastic behavior in test data. We integrate multi-goal estimation and region-based relation learning to model the two stimuli, social interactions, and stochastic goals, in a prediction framework. We evaluate our framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that the diverse prediction better fits the ground truth when incorporating the relation module. Our framework outperforms the state-of-the-art models on SDD by $27.61\%$/$18.20\%$ of ADE/FDE metrics.

Updated: 2024-04-10 12:31:43

标题: TrajPRed: 使用基于区域关系学习的轨迹预测

摘要: 在交通场景中预测人类轨迹对于混合或完全自主系统的安全至关重要。人类未来的轨迹由两个主要刺激驱动，社交互动和随机目标。因此，可靠的预测需要捕捉这两个刺激。基于边缘的关系建模使用精确的个体状态之间的成对相关性来表示社交互动。然而，基于边缘的关系在受干扰时可能会脆弱。为了缓解这些问题，我们提出了一种基于区域的关系学习范式，通过模拟联合状态的区域动态，即人群密度的变化来建模社交互动。特别地，区域化的代理联合信息被编码在卷积特征网格中。社交关系通过从全局视角关联本地联合信息的时间变化来建模。我们展示了基于区域的关系对干扰的敏感性较小。为了考虑随机个体目标，我们利用条件变分自动编码器实现多目标估计和多样的未来预测。具体来说，我们通过潜在分布进行变分推断，该分布取决于输入状态和相关目标目标之间的相关性。从潜在分布中进行采样使框架能够可靠地捕捉测试数据中的随机行为。我们将多目标估计和基于区域的关系学习集成到预测框架中以建模两个刺激，社交互动和随机目标。我们在ETH-UCY数据集和斯坦福无人机数据集（SDD）上评估了我们的框架。我们展示了当结合关系模块时，多样化的预测更好地与实际情况吻合。我们的框架在SDD上的ADE/FDE指标上超越了最先进的模型$27.61\%$/$18.20\%$。

更新时间: 2024-04-10 12:31:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.06971v1

FiP: a Fixed-Point Approach for Causal Generative Modeling

Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard inverse problem in general. In this work, we propose a new and equivalent formalism that do not require DAGs to describe them, viewed as fixed-point problems on the causally ordered variables, and show three important cases where they can be uniquely recovered given the topological ordering (TO). To the best of our knowledge, we obtain the most general recovery results when the TO is known. Based on our theoretical findings, we design a two-stage causal generative model that first infers the causal order from observations in a zero-shot manner, thus by-passing the search, and then learns the generative fixed-point SCM on the ordered variables. To infer TOs from observations, we propose to amortize the learning of TOs on generated datasets by sequentially predicting the leaves of graphs seen during training. To learn fixed-point SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems.

Updated: 2024-04-10 12:29:05

标题: FiP：因果生成建模的固定点方法

摘要: 建立真实世界数据生成过程的模型是经验科学的核心。结构因果模型（SCMs）及其相关的有向无环图（DAGs）通过定义将随机噪声转化为观测结果的因果生成过程，提供了越来越受欢迎的解决方案。然而，从观测数据中学习它们通常会导致一个难以解决且NP难的逆问题。在这项工作中，我们提出了一个新的等效形式主义，不需要DAGs来描述它们，将其视为在因果有序变量上的固定点问题，并展示了三种重要情况，在这些情况下，通过已知的拓扑排序（TO）可以唯一恢复它们。据我们所知，当已知TO时，我们获得了最普遍的恢复结果。基于我们的理论发现，我们设计了一个两阶段的因果生成模型，首先通过零照方式从观测中推断出因果顺序，从而绕过搜索，然后学习有序变量上的生成固定点SCM。为了从观测中推断TOs，我们建议通过在生成的数据集上摊销TOs的学习，通过在训练期间顺序预测所见图形的叶子节点。为了学习固定点SCMs，我们设计了一个基于变压器的架构，利用一种新的注意机制，使得模拟因果结构成为可能，并展示了这种参数化与我们的形式主义一致。最后，我们对每种方法进行了广泛的评估，并展示了当结合使用时，我们的模型在生成的超出分布问题上优于各种基线模型。

更新时间: 2024-04-10 12:29:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.06969v1

Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.

Updated: 2024-04-10 12:24:05

标题: 脑电图序列是时间序列吗？利用时间序列模型和联合主题训练进行脑电图分类

摘要: 与大多数其他数据领域一样，脑电图（EEG）数据分析依赖于丰富的领域特定预处理。除了这种预处理之外，机器学习者希望像处理任何其他时间序列数据一样处理这些数据。对于EEG分类，已经开发了许多模型，这些模型具有我们通常在时间序列分类中看不到的层类型和架构。此外，通常会为每个个体主体学习单独的模型，而不是为所有个体学习一个模型。在本文中，我们系统地研究了EEG分类模型与通用时间序列分类模型之间的差异。我们描述了三种不同的模型设置来处理来自不同个体的EEG数据，即特定于个体的模型（大多数EEG文献）、不特定于个体的模型和特定于个体的模型。在三个数据集的实验中，我们证明了针对每个个体训练的现成时间序列分类模型表现接近于EEG分类模型，但并没有达到领域特定建模的性能。此外，我们将时间序列模型与个体嵌入结合起来，以训练一个联合的特定于个体的分类器，可以处理所有个体。结果表明，在3个数据集中，这些模型在2个数据集中与专用的EEG模型竞争，甚至在其中一个数据集上表现优于所有EEG方法。

更新时间: 2024-04-10 12:24:05

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2404.06966v1

Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Forecasting the short-term spread of an ongoing disease outbreak is a formidable challenge due to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables such as epidemiological time series data, viral biology, population demographics, and the intersection of public policy and human behavior. Existing forecasting model frameworks struggle with the multifaceted nature of relevant data and robust results translation, which hinders their performances and the provision of actionable insights for public health decision-makers. Our work introduces PandemicLLM, a novel framework with multi-modal Large Language Models (LLMs) that reformulates real-time forecasting of disease spread as a text reasoning problem, with the ability to incorporate real-time, complex, non-numerical information that previously unattainable in traditional forecasting models. This approach, through a unique AI-human cooperative prompt design and time series representation learning, encodes multi-modal data for LLMs. The model is applied to the COVID-19 pandemic, and trained to utilize textual public health policies, genomic surveillance, spatial, and epidemiological time series data, and is subsequently tested across all 50 states of the U.S. Empirically, PandemicLLM is shown to be a high-performing pandemic forecasting framework that effectively captures the impact of emerging variants and can provide timely and accurate predictions. The proposed PandemicLLM opens avenues for incorporating various pandemic-related data in heterogeneous formats and exhibits performance benefits over existing models. This study illuminates the potential of adapting LLMs and representation learning to enhance pandemic forecasting, illustrating how AI innovations can strengthen pandemic responses and crisis management in the future.

Updated: 2024-04-10 12:22:03

标题: 利用大型语言模型推进实时流行病预测：COVID-19案例研究

摘要: 预测正在进行中的疾病爆发的短期传播是一项艰巨的挑战，因为涉及因素的复杂性，其中一些可以通过相互关联的多模态变量进行表征，例如流行病学时间序列数据、病毒生物学、人口统计学以及公共政策和人类行为的交汇。现有的预测模型框架在相关数据的多方面性和结果转化的鲁棒性方面存在困难，这限制了它们的性能以及为公共卫生决策者提供可操作见解的能力。我们的工作介绍了PandemicLLM，这是一个新颖的框架，具有多模态的大型语言模型（LLMs），将疾病传播的实时预测重新构建为一个文本推理问题，具有整合实时、复杂、非数值信息的能力，这在传统预测模型中以前是无法获得的。通过独特的人工智能-人类合作提示设计和时间序列表示学习，这种方法为LLMs编码多模态数据。该模型应用于COVID-19大流行，并经过训练，利用文本公共卫生政策、基因组监测、空间和流行病学时间序列数据，并在美国50个州进行测试。实证上，PandemicLLM被证明是一个高性能的大流行病预测框架，有效捕捉新出现变种的影响，并能够提供及时准确的预测。提出的PandemicLLM为整合各种以异质格式呈现的与大流行相关数据打开了途径，并展示了相对于现有模型的性能优势。这项研究阐明了将LLMs和表示学习应用于增强大流行病预测的潜力，展示了人工智能创新如何可以加强未来大流行病应对和危机管理。

更新时间: 2024-04-10 12:22:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.06962v1

Adversarial purification for no-reference image-quality metrics: applicability study and new methods

Recently, the area of adversarial attacks on image quality metrics has begun to be explored, whereas the area of defences remains under-researched. In this study, we aim to cover that case and check the transferability of adversarial purification defences from image classifiers to IQA methods. In this paper, we apply several widespread attacks on IQA models and examine the success of the defences against them. The purification methodologies covered different preprocessing techniques, including geometrical transformations, compression, denoising, and modern neural network-based methods. Also, we address the challenge of assessing the efficacy of a defensive methodology by proposing ways to estimate output visual quality and the success of neutralizing attacks. Defences were tested against attack on three IQA metrics -- Linearity, MetaIQA and SPAQ. The code for attacks and defences is available at: (link is hidden for a blind review).

Updated: 2024-04-10 12:17:25

标题: 对无参考图像质量评价的对抗净化：适用性研究和新方法

摘要: 最近，对图像质量度量的对抗攻击领域开始受到探索，而防御领域仍未得到充分研究。在这项研究中，我们旨在覆盖这一情况，并检查从图像分类器到IQA方法的对抗净化防御的可转移性。在本文中，我们对IQA模型应用了几种常见的攻击，并检查了防御措施的成功与否。净化方法包括不同的预处理技术，包括几何变换、压缩、去噪和现代基于神经网络的方法。此外，我们提出了评估防御方法有效性的挑战，并提出了估计输出视觉质量和中和攻击成功的方法。防御措施针对三种IQA指标进行了测试--线性度、MetaIQA和SPAQ。攻击和防御的代码可在以下链接找到：(链接已隐藏以进行盲审)。

更新时间: 2024-04-10 12:17:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06957v1

Untangling Critical Interaction with AI in Students Written Assessment

Artificial Intelligence (AI) has become a ubiquitous part of society, but a key challenge exists in ensuring that humans are equipped with the required critical thinking and AI literacy skills to interact with machines effectively by understanding their capabilities and limitations. These skills are particularly important for learners to develop in the age of generative AI where AI tools can demonstrate complex knowledge and ability previously thought to be uniquely human. To activate effective human-AI partnerships in writing, this paper provides a first step toward conceptualizing the notion of critical learner interaction with AI. Using both theoretical models and empirical data, our preliminary findings suggest a general lack of Deep interaction with AI during the writing process. We believe that the outcomes can lead to better task and tool design in the future for learners to develop deep, critical thinking when interacting with AI.

Updated: 2024-04-10 12:12:50

标题: 解开学生书面评估中与人工智能的关键互动

摘要: 人工智能（AI）已经成为社会中无处不在的一部分，然而一个关键挑战在于确保人类具备必要的批判性思维和AI素养技能，以有效地与机器互动，理解它们的能力和局限性。在生成AI时代，这些技能对学习者的发展尤为重要，AI工具可以展示先前认为是人类独有的复杂知识和能力。为了在写作中激活有效的人工智能合作，本文提供了一个关于批判性学习者与AI互动概念的第一步。通过理论模型和实证数据，我们的初步发现表明在写作过程中存在一般缺乏与AI的深度互动。我们相信这些结果可以为未来的任务和工具设计提供更好的方向，使学习者在与AI互动时发展深入的批判性思维。

更新时间: 2024-04-10 12:12:50

领域: cs.HC,cs.AI,I.2; K.3.1

下载: http://arxiv.org/abs/2404.06955v1

A Shift In Artistic Practices through Artificial Intelligence

The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globally, how does this new paradigm shift challenge the status quo in artistic practices? What kind of changes will AI technology bring to music, arts, and new media?

Updated: 2024-04-10 12:08:54

标题: 通过人工智能技术引发的艺术实践转变

摘要: 人工智能（AI）模型生成的内容爆炸引发了艺术、音乐和媒体领域的文化转变，角色发生了变化，价值观发生了转变，传统被挑战。互联网上庞大且随时可获得的数据集为AI模型的训练创造了一个环境。随着AI模型在全球范围内公开共享和使用，这种新的范式转变如何挑战艺术实践的现状？AI技术将给音乐、艺术和新媒体带来怎样的变化？

更新时间: 2024-04-10 12:08:54

领域: cs.CY,cs.AI,cs.MM

下载: http://arxiv.org/abs/2306.10054v3

Model-based deep reinforcement learning for accelerated learning from flow simulations

In recent years, deep reinforcement learning has emerged as a technique to solve closed-loop flow control problems. Employing simulation-based environments in reinforcement learning enables a priori end-to-end optimization of the control system, provides a virtual testbed for safety-critical control applications, and allows to gain a deep understanding of the control mechanisms. While reinforcement learning has been applied successfully in a number of rather simple flow control benchmarks, a major bottleneck toward real-world applications is the high computational cost and turnaround time of flow simulations. In this contribution, we demonstrate the benefits of model-based reinforcement learning for flow control applications. Specifically, we optimize the policy by alternating between trajectories sampled from flow simulations and trajectories sampled from an ensemble of environment models. The model-based learning reduces the overall training time by up to $85\%$ for the fluidic pinball test case. Even larger savings are expected for more demanding flow simulations.

Updated: 2024-04-10 12:01:43

标题: 基于模型的深度强化学习用于加速从流体模拟中学习

摘要: 近年来，深度强化学习已经成为解决闭环流控制问题的一种技术。在强化学习中使用基于模拟的环境使得控制系统能够事先进行端对端优化，为安全关键控制应用提供虚拟测试平台，并能够深入了解控制机制。虽然强化学习已成功应用于一些相当简单的流控制基准测试中，但实现面向实际应用的一个主要障碍是流动模拟的高计算成本和周转时间。在本文中，我们展示了基于模型的强化学习在流控制应用中的好处。具体来说，我们通过在从流动模拟中采样的轨迹和从环境模型合集中采样的轨迹之间交替优化策略。基于模型的学习可将液体弹球测试案例的总训练时间减少高达85%。对于更具挑战性的流动模拟，预计可以实现更大的节约。

更新时间: 2024-04-10 12:01:43

领域: physics.flu-dyn,cs.CE,cs.LG

下载: http://arxiv.org/abs/2402.16543v2

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Federated learning (FL) in multidevice environments creates new opportunities to learn from a vast and diverse amount of private data. Although personal devices capture valuable data, their memory, computing, connectivity, and battery resources are often limited. Since deep neural networks (DNNs) are the typical machine learning models employed in FL, there are demands for integrating ubiquitous constrained devices into the training process of DNNs. In this paper, we develop an FL framework to incorporate on-device data selection on such constrained devices, which allows partition-based training of a DNN through collaboration between constrained devices and resourceful devices of the same client. Evaluations on five benchmark DNNs and six benchmark datasets across different modalities show that, on average, our framework achieves ~19% higher accuracy and ~58% lower latency; compared to the baseline FL without our implemented strategies. We demonstrate the effectiveness of our FL framework when dealing with imbalanced data, client participation heterogeneity, and various mobility patterns. As a benchmark for the community, our code is available at https://github.com/dr-bell/data-centric-federated-learning

Updated: 2024-04-10 12:01:20

标题: 通过数据选择提高多设备联邦学习的效率

摘要: 在多设备环境中进行联邦学习（FL）为从大量和多样化的私人数据中学习创造了新的机会。尽管个人设备捕获了有价值的数据，但它们的内存、计算、连接和电池资源通常是有限的。由于深度神经网络（DNNs）是FL中典型的机器学习模型，因此有需求将普遍受限的设备整合到DNNs的训练过程中。在本文中，我们开发了一个FL框架，在这些受限设备上整合了设备端数据选择，通过受限设备和同一客户端的资源丰富设备之间的协作，实现了基于分区的DNN训练。对五个基准DNN和六个不同模态的基准数据集进行评估表明，我们的框架平均实现了约19%更高的准确性和约58%更低的延迟；与没有我们实施策略的基准FL相比。我们展示了我们的FL框架在处理不平衡数据、客户参与异质性和各种移动模式时的有效性。作为社区的基准，我们的代码可在https://github.com/dr-bell/data-centric-federated-learning 上获得。

更新时间: 2024-04-10 12:01:20

领域: cs.LG

下载: http://arxiv.org/abs/2211.04175v4

MetaCheckGPT -- A Multi-task Hallucination Detection Using LLM Uncertainty and Meta-models

This paper presents our winning solution for the SemEval-2024 Task 6 competition. We propose a meta-regressor framework of large language models (LLMs) for model evaluation and integration that achieves the highest scores on the leader board. Our approach leverages uncertainty signals present in a diverse basket of LLMs to detect hallucinations more robustly.

Updated: 2024-04-10 11:56:01

标题: MetaCheckGPT -- 使用LLM不确定性和元模型的多任务幻觉检测

摘要: 这篇论文介绍了我们在SemEval-2024 Task 6比赛中获胜的解决方案。我们提出了一个大型语言模型(LLMs)的元回归框架，用于模型评估和集成，实现了在排行榜上最高的分数。我们的方法利用多样化的LLMs中存在的不确定信号，更加稳健地检测幻觉。

更新时间: 2024-04-10 11:56:01

领域: cs.CL,cs.AI,68T07, 68T50,I.2.7

下载: http://arxiv.org/abs/2404.06948v1

A Survey on the Integration of Generative AI for Critical Thinking in Mobile Networks

In the near future, mobile networks are expected to broaden their services and coverage to accommodate a larger user base and diverse user needs. Thus, they will increasingly rely on artificial intelligence (AI) to manage network operation and control costs, undertaking complex decision-making roles. This shift will necessitate the application of techniques that incorporate critical thinking abilities, including reasoning and planning. Symbolic AI techniques already facilitate critical thinking based on existing knowledge. Yet, their use in telecommunications is hindered by the high cost of mostly manual curation of this knowledge and high computational complexity of reasoning tasks. At the same time, there is a spurt of innovations in industries such as telecommunications due to Generative AI (GenAI) technologies, operating independently of human-curated knowledge. However, their capacity for critical thinking remains uncertain. This paper aims to address this gap by examining the current status of GenAI algorithms with critical thinking capabilities and investigating their potential applications in telecom networks. Specifically, the aim of this study is to offer an introduction to the potential utilization of GenAI for critical thinking techniques in mobile networks, while also establishing a foundation for future research.

Updated: 2024-04-10 11:55:33

标题: 一个关于在移动网络中集成生成式人工智能促进批判性思维的调查

摘要: 在不久的将来，移动网络预计会拓展其服务和覆盖范围，以满足更大的用户群体和多样化的用户需求。因此，它们将越来越依赖人工智能（AI）来管理网络运营和控制成本，承担复杂的决策角色。这种转变将需要应用那些包含关键思维能力的技术，包括推理和规划。符号AI技术已经基于现有知识促进了关键思维。然而，它们在电信领域的应用受到了主要手动整理知识的高成本以及推理任务的高计算复杂性的阻碍。与此同时，由于生成式人工智能（GenAI）技术的涌现，诸如电信等行业正在迎来创新潮，这些技术独立于人类整理的知识。然而，它们的关键思维能力仍不确定。本文旨在通过审视具有关键思维能力的GenAI算法的当前状态并调查它们在电信网络中的潜在应用来填补这一空白。具体来说，本研究旨在介绍GenAI在移动网络中关键思维技术的潜在利用，并为未来研究奠定基础。

更新时间: 2024-04-10 11:55:33

领域: cs.AI

下载: http://arxiv.org/abs/2404.06946v1

DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation

Applying pre-trained medical segmentation models on out-of-domain images often yields predictions of insufficient quality. Several strategies have been proposed to maintain model performance, such as finetuning or unsupervised- and source-free domain adaptation. These strategies set restrictive requirements for data availability. In this study, we propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce the MIND descriptor previously used in image registration tasks as a further technique to achieve generalization and present superior performance for small-scale datasets compared to existing approaches. At test-time, high-quality segmentation for every single unseen scan is ensured by optimizing the model weights for consistency given different image augmentations. That way, our method enables separate use of source and target data and thus removes current data availability barriers. Moreover, the presented method is highly modular as it does not require specific model architectures or prior knowledge of involved domains and labels. We demonstrate this by integrating it into the nnUNet, which is currently the most popular and accurate framework for medical image segmentation. We employ multiple datasets covering abdominal, cardiac, and lumbar spine scans and compose several out-of-domain scenarios in this study. We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy in all of the aforementioned scenarios. Open-source code can be found here: https://github.com/multimodallearning/DG-TTA

Updated: 2024-04-10 11:49:05

标题: DG-TTA: 领域泛化和测试阶段适应性实现医学图像跨领域分割

摘要: 将预训练医学分割模型应用于域外图像通常会产生质量不足的预测。已经提出了几种策略来维持模型性能，例如微调或无监督和无源领域适应。这些策略对数据可用性设置了严格要求。在本研究中，我们建议结合域泛化和测试时间适应，以在未知目标领域中重新使用预训练模型的高效方法。在源数据上进行域泛化预训练以在目标领域中获得最佳初始性能。我们引入了先前在图像配准任务中使用的MIND描述符作为进一步实现泛化的技术，并相对于现有方法在小规模数据集上展现出卓越性能。在测试时间，通过优化模型权重以确保在不同图像增强下的一致性，为每个单独的未见扫描实现高质量分割。这样，我们的方法使得可以分别使用源数据和目标数据，从而消除当前数据可用性障碍。此外，所提出的方法高度模块化，因为它不需要特定的模型架构或已涉及领域和标签的先验知识。我们通过将其集成到nnUNet中进行了示范，该框架目前是医学图像分割最流行和准确的框架。我们在本研究中使用多个数据集，涵盖腹部、心脏和腰椎扫描，并组成了几个域外场景。我们展示了我们的方法，结合预训练的全身CT模型，在所有上述场景中都能够高精度地分割MR图像。开源代码可在此处找到：https://github.com/multimodallearning/DG-TTA

更新时间: 2024-04-10 11:49:05

领域: cs.CV,cs.LG,92C55 (Primary), 68T07 (Secondary),I.2.6; I.4.6

下载: http://arxiv.org/abs/2312.06275v3

Equivariant Networks for Zero-Shot Coordination

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

Updated: 2024-04-10 11:45:10

标题: 等变网络用于零射协调

摘要: 在Dec-POMDPs中成功的协调需要代理采用稳健的策略和可解释的游戏风格来适应他们的合作伙伴。常见的失败模式是对称性破坏，当代理任意地收敛于许多等效但相互不兼容的政策之一时。通常这些例子包括部分可观察性，例如，挥动右手与左手传达隐秘信息。在本文中，我们提出了一种新颖的等变网络架构，用于在Dec-POMDPs中利用环境对称性来有效提高零-shot协调，比先前的方法更有效。我们的方法还充当了一种“协调改进运算符”，用于通用的预先训练政策，并且可能在测试时间与任何自我博弈算法一起使用。我们提供了我们工作的理论保证，并在Hanabi的AI基准任务上进行测试，在那里我们展示了我们的方法在零-shot协调中优于其他对称意识基线，并且能够提高各种预先训练政策的协调能力。特别是，我们展示了我们的方法可以用来改进Hanabi基准测试中零-shot协调的最新技术水平。

更新时间: 2024-04-10 11:45:10

领域: cs.LG

下载: http://arxiv.org/abs/2210.12124v2

Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks

This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and device architectures. We focus on accelerating the technology level of STCO using AI techniques, by employing graph neural network (GNN)-based approaches for both TCAD simulation and cell library characterization, which are interconnected through a unified compact model, collectively achieving over a 100X speedup over traditional methods. These advancements enable comprehensive STCO iterations with runtime speedups ranging from 1.9X to 14.1X and supports both emerging and traditional technologies.

Updated: 2024-04-10 11:43:26

标题: 基于图神经网络的新兴技术快速系统技术协同优化框架

摘要: 这篇论文提出了一种快速系统技术协同优化（STCO）框架，用于优化下一代集成电路设计的功耗、性能和面积（PPA），应对新材料和器件架构带来的挑战和机遇。我们专注于利用人工智能技术加速STCO的技术水平，通过使用基于图神经网络（GNN）的方法进行TCAD模拟和单元库特性化，这两者通过统一的紧凑模型相互连接，共同实现了比传统方法快100倍以上的加速。这些进展使得全面的STCO迭代获得了从1.9倍到14.1倍不等的运行速度提升，并支持新兴和传统技术。

更新时间: 2024-04-10 11:43:26

领域: cs.ET,cs.AI

下载: http://arxiv.org/abs/2404.06939v1

ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers

In this paper we delve into the properties of transformers, attained through self-supervision, in the point cloud domain. Specifically, we evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative. In our study we investigate the impact of data quantity on the learned features, and uncover similarities in the transformer's behavior across domains. Through comprehensive visualiations, we observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry. Moreover, we examine the finetuning process and its effect on the learned representations. Based on that, we devise an unfreezing strategy which consistently outperforms our baseline without introducing any other modifications to the model or the training pipeline, and achieve state-of-the-art results in the classification task among transformer models.

Updated: 2024-04-10 11:42:22

标题: ExpPoint-MAE：自监督点云变换器的更好可解释性和性能

摘要: 在这篇论文中，我们深入研究了通过自监督学习在点云领域中获得的transformer的特性。具体来说，我们评估了Masked Autoencoding作为预训练方案的有效性，并探索了动量对比作为一种替代方案。在我们的研究中，我们调查了数据数量对学习特征的影响，并发现了transformer在不同领域中行为的相似之处。通过全面的可视化，我们观察到transformer学会了关注语义上有意义的区域，表明预训练有助于更好地理解底层几何结构。此外，我们研究了微调过程及其对学习表示的影响。基于此，我们设计了一种解冻策略，始终优于我们的基准模型，在不引入任何其他修改到模型或训练流程的情况下，在分类任务中取得了与transformer模型中的最新结果。

更新时间: 2024-04-10 11:42:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.10798v3

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This approach aligns naturally with the MDL principle, offering a more robust theoretical basis than the existing reliance on cross-validation. The study reveals that deriving Sparse Kernel Flows does not require a statistical approach; instead, one can directly engage with code-lengths and complexities, concepts central to AIT. Thereby, this approach opens the door to reformulating algorithms in machine learning using tools from AIT, with the aim of providing them a more solid theoretical foundation.

Updated: 2024-04-10 11:35:14

标题: 连接算法信息理论和机器学习：一种新的核学习方法

摘要: 机器学习（ML）和算法信息理论（AIT）从不同的角度看待复杂性。我们通过采用AIT的视角，探索了AIT和核方法（在ML中普遍存在）之间的接口，以解决从数据中学习内核的问题，在核岭回归中，通过稀疏内核流的方法。特别是，通过研究最小描述长度（MDL）和机器学习中的正则化（RML）之间的差异和共同点，我们证明了稀疏内核流方法是从数据中学习内核的自然方法。这种方法与MDL原则自然地一致，提供了比现有依赖交叉验证更牢固的理论基础。研究表明，推导稀疏内核流并不需要统计方法；相反，可以直接涉及代码长度和复杂性，这些概念是AIT的核心。因此，这种方法为使用AIT工具重新制定机器学习算法打开了大门，旨在为它们提供更坚实的理论基础。

更新时间: 2024-04-10 11:35:14

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2311.12624v3

fairret: a Framework for Differentiable Fairness Regularization Terms

Current fairness toolkits in machine learning only admit a limited range of fairness definitions and have seen little integration with automatic differentiation libraries, despite the central role these libraries play in modern machine learning pipelines. We introduce a framework of fairness regularization terms (fairrets) which quantify bias as modular, flexible objectives that are easily integrated in automatic differentiation pipelines. By employing a general definition of fairness in terms of linear-fractional statistics, a wide class of fairrets can be computed efficiently. Experiments show the behavior of their gradients and their utility in enforcing fairness with minimal loss of predictive power compared to baselines. Our contribution includes a PyTorch implementation of the fairret framework.

Updated: 2024-04-10 11:22:51

标题: fairret: 一个可微公平正则化项的框架

摘要: 目前机器学习中的公平工具包仅接受有限范围的公平定义，并且与自动微分库的集成较少，尽管这些库在现代机器学习流程中起着核心作用。我们引入了一个公平正则化项（fairrets）的框架，它量化偏见作为模块化、灵活的目标，可以轻松集成到自动微分流程中。通过采用线性分数统计的一般公平定义，可以高效地计算出一大类fairrets。实验证明了它们的梯度行为以及与基线相比在实施公平性时对预测能力的最小损失。我们的贡献包括一个PyTorch实现的fairret框架。

更新时间: 2024-04-10 11:22:51

领域: cs.LG

下载: http://arxiv.org/abs/2310.17256v2

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

Large Language Models (LLMs) are evolving beyond their classical role of providing information within dialogue systems to actively engaging with tools and performing actions on real-world applications and services. Today, humans verify the correctness and appropriateness of the LLM-generated outputs (e.g., code, functions, or actions) before putting them into real-world execution. This poses significant challenges as code comprehension is well known to be notoriously difficult. In this paper, we study how humans can efficiently collaborate with, delegate to, and supervise autonomous LLMs in the future. We argue that in many cases, "post-facto validation" - verifying the correctness of a proposed action after seeing the output - is much easier than the aforementioned "pre-facto validation" setting. The core concept behind enabling a post-facto validation system is the integration of an intuitive undo feature, and establishing a damage confinement for the LLM-generated actions as effective strategies to mitigate the associated risks. Using this, a human can now either revert the effect of an LLM-generated output or be confident that the potential risk is bounded. We believe this is critical to unlock the potential for LLM agents to interact with applications and services with limited (post-facto) human involvement. We describe the design and implementation of our open-source runtime for executing LLM actions, Gorilla Execution Engine (GoEX), and present open research questions towards realizing the goal of LLMs and applications interacting with each other with minimal human supervision. We release GoEX at https://github.com/ShishirPatil/gorilla/.

Updated: 2024-04-10 11:17:33

标题: GoEX：面向自主LLM应用程序运行时的观点和设计

摘要: 大型语言模型（LLMs）正在超越其传统角色，不仅在对话系统中提供信息，还积极参与工具使用并在现实世界的应用和服务中执行操作。今天，在将LLM生成的输出（例如代码、函数或操作）投入现实世界执行之前，人类会验证其正确性和适当性。这带来了重大挑战，因为代码理解众所周知地十分困难。在本文中，我们研究了人类如何可以有效地与、委托和监督未来的自主LLMs合作。我们认为，在许多情况下，“事后验证” - 在看到输出后验证提出的行动的正确性 - 比前述的“事前验证”设置要容易得多。实现事后验证系统的核心概念是集成一个直观的撤消功能，并建立一个损害约束，作为减轻相关风险的有效策略。借助这一点，一个人现在可以撤销LLM生成的输出的效果，或者确信潜在风险是有限的。我们相信这对于释放LLM代理与应用和服务进行有限（事后）人类参与的潜力至关重要。我们描述了我们的开源运行时执行LLM操作的Gorilla Execution Engine（GoEX）的设计和实施，并提出了实现LLMs和应用之间相互交互的目标的开放性研究问题，这些交互需要最少的人类监督。我们在https://github.com/ShishirPatil/gorilla/发布了GoEX。

更新时间: 2024-04-10 11:17:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.06921v1

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon," where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, superposition prompting, which can be directly applied to pre-trained transformer-based LLMs without the need for fine-tuning. At a high level, superposition prompting allows the LLM to process input documents in parallel prompt paths, discarding paths once they are deemed irrelevant. We demonstrate the capability of our method to simultaneously enhance time efficiency across a variety of question-answering benchmarks using multiple pre-trained LLMs. Furthermore, our technique significantly improves accuracy when the retrieved context is large relative the context the model was trained on. For example, our approach facilitates an 93x reduction in compute time while improving accuracy by 43\% on the NaturalQuestions-Open dataset with the MPT-7B instruction-tuned model over naive RAG.

Updated: 2024-04-10 11:03:17

标题: 叠加提示：改进和加速检索增强生成

摘要: 尽管大型语言模型（LLMs）取得了成功，但它们存在显著的缺点，特别是在处理长上下文时。它们的推理成本随序列长度的增加呈二次方增长，使得在一些真实世界的文本处理应用中部署变得昂贵，例如检索增强生成（RAG）。此外，LLMs还表现出“分心现象”，即提示中的无关上下文会降低输出质量。为了解决这些缺点，我们提出了一种新颖的RAG提示方法，即叠加提示方法，可以直接应用于经过预训练的基于Transformer的LLMs，无需进行微调。在高层次上，叠加提示允许LLM以并行提示路径处理输入文档，一旦被认为无关的路径即被丢弃。我们展示了我们的方法在多个预训练LLMs上通过多个问答基准测试同时提高时间效率的能力。此外，我们的技术在检索到的上下文相对于模型训练的上下文较大时显著提高了准确性。例如，我们的方法在使用MPT-7B指令调整模型在NaturalQuestions-Open数据集上相对于朴素的RAG，实现了93倍的计算时间减少，并将准确性提高了43％。

更新时间: 2024-04-10 11:03:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.06910v1

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{\circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/

Updated: 2024-04-10 10:46:59

标题: DreamScene360：使用全景高斯喷洒进行无约束的文本到3D场景生成

摘要: 虚拟现实应用的需求不断增加，突显了打造沉浸式3D资产的重要性。我们提出了一种文本到3D 360$^{\circ}$场景生成管线，可以在几分钟内为野外环境创建全面的360$^{\circ}$场景。我们的方法利用2D扩散模型的生成能力和快速自我完善，创建高质量且全球一致的全景图像。该图像充当初步的“平面”（2D）场景表示。随后，通过使用点泼技术将其提升到3D高斯分布，以实现实时探索。为了产生一致的3D几何结构，我们的管线通过将2D单眼深度对齐为全局优化的点云来构建空间一致结构。这个点云作为3D高斯分布的质心的初始状态。为了解决单视图输入固有的不可见问题，我们对合成和输入相机视图都施加语义和几何约束作为规范化。这些约束指导高斯的优化，有助于重建看不见的区域。总之，我们的方法提供了一个全球一致的360$^{\circ}$视角内的3D场景，比现有技术提供了增强的沉浸式体验。项目网站：http://dreamscene360.github.io/

更新时间: 2024-04-10 10:46:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06903v1

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.

Updated: 2024-04-10 10:37:22

标题: 物理引导的形状模板：通过神经替代模型实现单目视频感知

摘要: 三维重建动态场景是计算机图形学中长期存在的问题，随着信息量的减少，这一问题变得越来越困难。基于模板的形状重建（SfT）方法旨在从RGB图像或视频序列中重建基于模板的几何形状，通常仅利用单目摄像头而不需要深度信息，例如常规智能手机录制的视频。不幸的是，现有的重建方法要么不符合物理规律且噪声较大，要么在优化过程中速度较慢。为了解决这个问题，我们提出了一种新颖的基于模板的布料重建算法，使用预训练的神经替代模型进行快速评估，稳定且产生平滑的重建结果，这是由于正则化的物理模拟。模拟网格的可微渲染使得可以在重建和目标视频序列之间进行像素级比较，这可以用于基于梯度的优化过程，从而提取出形状信息以及布料的拉伸、剪切或弯曲刚度等物理参数。这样可以在减少运行时间的同时保持精确、稳定和平滑的重建几何形状，相较于$\phi$-SfT，这种方法使运行时间减少了400-500倍，是一种最先进的基于物理的SfT方法。

更新时间: 2024-04-10 10:37:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.12796v2

Research on Detection of Floating Objects in River and Lake Based on AI Intelligent Image Recognition

With the rapid advancement of artificial intelligence technology, AI-enabled image recognition has emerged as a potent tool for addressing challenges in traditional environmental monitoring. This study focuses on the detection of floating objects in river and lake environments, exploring an innovative approach based on deep learning. By intricately analyzing the technical pathways for detecting static and dynamic features and considering the characteristics of river and lake debris, a comprehensive image acquisition and processing workflow has been developed. The study highlights the application and performance comparison of three mainstream deep learning models -SSD, Faster-RCNN, and YOLOv5- in debris identification. Additionally, a detection system for floating objects has been designed and implemented, encompassing both hardware platform construction and software framework development. Through rigorous experimental validation, the proposed system has demonstrated its ability to significantly enhance the accuracy and efficiency of debris detection, thus offering a new technological avenue for water quality monitoring in rivers and lakes

Updated: 2024-04-10 10:13:37

标题: 基于人工智能智能图像识别技术的河湖漂浮物检测研究

摘要: 随着人工智能技术的快速发展，AI技术的图像识别已经成为解决传统环境监测挑战的有力工具。本研究侧重于河流和湖泊环境中漂浮物的检测，探讨基于深度学习的创新方法。通过精密分析检测静态和动态特征的技术路径，并考虑河流和湖泊杂物的特点，开发了一套全面的图像获取和处理工作流程。该研究突出了三种主流深度学习模型（SSD，Faster-RCNN和YOLOv5）在杂物识别中的应用和性能比较。此外，设计和实施了一个漂浮物检测系统，包括硬件平台构建和软件框架开发。通过严格的实验验证，提出的系统已证明其能够显著提高杂物检测的准确性和效率，从而为河流和湖泊水质监测提供了一条新的技术途径。

更新时间: 2024-04-10 10:13:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06883v1

SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography

Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.

Updated: 2024-04-10 09:47:34

标题: SleepPPG-Net2：基于光电容积脉搏图的睡眠分期深度学习泛化

摘要: 背景：睡眠分期是睡眠障碍诊断和睡眠健康管理的基本组成部分。传统上，这种分析是在临床环境中进行的，涉及耗时的评分程序。最近，利用光电容积脉搏图（PPG）时间序列的数据驱动算法显示在本地测试集上表现出较高性能，但在外部数据集上由于数据漂移而表现较低。方法：本研究旨在开发一个通用的深度学习模型，用于从原始PPG生理时间序列进行四类（清醒、浅睡眠、深睡眠和快速动眼运动（REM））睡眠分期任务。使用了六个睡眠数据集，总共2,574个患者记录。为了创建一个更具通用性的表示，我们开发并评估了一个称为SleepPPG-Net2的深度学习模型，该模型采用多源域训练方法。SleepPPG-Net2与两种最先进的模型进行了基准测试。结果：SleepPPG-Net2在基准方法上表现出一致较高的性能，广义性能（Cohen's kappa）提高了高达19%。观察到与年龄、性别和睡眠呼吸暂停严重程度相关的性能差异。结论：SleepPPG-Net2为从原始PPG时间序列中分期睡眠设定了一个新的标准。

更新时间: 2024-04-10 09:47:34

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2404.06869v1

A tutorial on learning from preferences and choices with Gaussian Processes

Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations, paving the way for more efficient and personalised applications across a wide range of domains. The objective of this tutorial is to present a cohesive and comprehensive framework for preference learning with Gaussian Processes (GPs), demonstrating how to seamlessly incorporate rationality principles (from economics and decision theory) into the learning process. By suitably tailoring the likelihood function, this framework enables the construction of preference learning models that encompass random utility models, limits of discernment, and scenarios with multiple conflicting utilities for both object- and label-preference. This tutorial builds upon established research while simultaneously introducing some novel GP-based models to address specific gaps in the existing literature.

Updated: 2024-04-10 09:44:31

标题: 使用高斯过程学习偏好和选择的教程

摘要: 偏好建模位于经济学、决策理论、机器学习和统计学的交叉点。通过了解个体的偏好和他们的选择方式，我们可以构建与他们期望密切匹配的产品，为跨越各种领域的更高效和个性化应用铺平道路。本教程的目标是提供一个连贯和全面的偏好学习框架，使用高斯过程(GPs)，演示如何无缝地将理性原则(来自经济学和决策理论)纳入学习过程中。通过适当定制似然函数，这个框架使得可以构建包含随机效用模型、辨识度限制以及在物体和标签偏好方面具有多个冲突效用的偏好学习模型。本教程基于已建立的研究，同时引入了一些新颖的基于GP的模型来解决现有文献中的特定空白。

更新时间: 2024-04-10 09:44:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.11782v3

Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark

Multi-label image classification in dynamic environments is a problem that poses significant challenges. Previous studies have primarily focused on scenarios such as Domain Incremental Learning and Class Incremental Learning, which do not fully capture the complexity of real-world applications. In this paper, we study the problem of classification of medical imaging in the scenario termed New Instances \& New Classes, which combines the challenges of both new class arrivals and domain shifts in a single framework. Unlike traditional scenarios, it reflects the realistic nature of CL in domains such as medical imaging, where updates may introduce both new classes and changes in domain characteristics. To address the unique challenges posed by this complex scenario, we introduce a novel approach called Pseudo-Label Replay. This method aims to mitigate forgetting while adapting to new classes and domain shifts by combining the advantages of the Replay and Pseudo-Label methods and solving their limitations in the proposed scenario. % part3 We evaluate our proposed approach on a challenging benchmark consisting of two datasets, seven tasks, and nineteen classes, modeling a realistic Continual Learning scenario. Our experimental findings demonstrate the effectiveness of Pseudo-Label Replay in addressing the challenges posed by the complex scenario proposed. Our method surpasses existing approaches, exhibiting superior performance while showing minimal forgetting.

Updated: 2024-04-10 09:35:36

标题: 医疗领域的多标签持续学习：一个新颖的基准Benchmark

摘要: 在动态环境中进行多标签图像分类是一个具有重大挑战的问题。先前的研究主要集中在域增量学习和类增量学习等场景上，这些场景并没有完全捕捉到真实应用的复杂性。在本文中，我们研究了医学成像分类的问题，采用了被称为新实例和新类别的场景，将新类别到来和领域转移的挑战结合到一个框架中。与传统场景不同，这反映了在领域中更新可能引入新类别和领域特征变化的现实特性。为了解决这种复杂场景带来的独特挑战，我们引入了一种新方法，称为伪标签重放。该方法旨在通过结合重放和伪标签方法的优势以及解决提出场景中的限制，从而减轻遗忘并适应新类别和领域转移。我们在一个具有两个数据集、七个任务和十九个类别的具有挑战性的基准上评估了我们提出的方法，建模了一个真实的持续学习场景。我们的实验结果表明，伪标签重放在解决所提出的复杂场景带来的挑战方面非常有效。我们的方法超越了现有方法，表现出优越的性能，同时显示出最小的遗忘。

更新时间: 2024-04-10 09:35:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06859v1

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal_BigModels_Survey. This paper has been published by the journal Machine Intelligence Research (MIR), https://link.springer.com/article/10.1007/s11633-022-1410-8, DOI: 10.1007/s11633-022-1410-8, vol. 20, no. 4, pp. 447-482, 2023.

Updated: 2024-04-10 09:34:03

标题: 大规模多模态预训练模型：全面调查

摘要: 随着对泛化深度模型的迫切需求，许多预训练的大型模型被提出，如BERT、ViT、GPT等。受到这些模型在单一领域（如计算机视觉和自然语言处理）取得成功的启发，近年来，多模态预训练大型模型也越来越受到关注。在这项工作中，我们对这些模型进行了全面的调查，希望本文能提供新的见解，帮助新研究者追踪最前沿的工作。具体而言，我们首先通过回顾传统的深度学习、自然语言处理、计算机视觉和语音预训练工作，介绍了多模态预训练的背景。然后，我们介绍了多模态预训练模型（MM-PTMs）的任务定义、关键挑战和优势，并重点讨论了数据、目标、网络架构和知识增强预训练方面的内容。之后，我们介绍了用于验证大规模MM-PTMs的下游任务，包括生成、分类和回归任务。我们还对代表性的下游任务的模型参数和结果进行了可视化和分析。最后，我们指出了可能有益于未来工作的此主题的可能研究方向。此外，我们维护着一个持续更新的大规模预训练多模态大型模型的论文列表：https://github.com/wangxiao5791509/MultiModal_BigModels_Survey。本文已发表在《机器智能研究》（MIR）杂志上，https://link.springer.com/article/10.1007/s11633-022-1410-8，DOI: 10.1007/s11633-022-1410-8，第20卷，第4期，2023年，页码447-482。

更新时间: 2024-04-10 09:34:03

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2302.10035v3

Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing

Modern computing systems heavily rely on hardware as the root of trust. However, their increasing complexity has given rise to security-critical vulnerabilities that cross-layer at-tacks can exploit. Traditional hardware vulnerability detection methods, such as random regression and formal verification, have limitations. Random regression, while scalable, is slow in exploring hardware, and formal verification techniques are often concerned with manual effort and state explosions. Hardware fuzzing has emerged as an effective approach to exploring and detecting security vulnerabilities in large-scale designs like modern processors. They outperform traditional methods regarding coverage, scalability, and efficiency. However, state-of-the-art fuzzers struggle to achieve comprehensive coverage of intricate hardware designs within a practical timeframe, often falling short of a 70% coverage threshold. We propose a novel ML-based hardware fuzzer, ChatFuzz, to address this challenge. Ourapproach leverages LLMs like ChatGPT to understand processor language, focusing on machine codes and generating assembly code sequences. RL is integrated to guide the input generation process by rewarding the inputs using code coverage metrics. We use the open-source RISCV-based RocketCore processor as our testbed. ChatFuzz achieves condition coverage rate of 75% in just 52 minutes compared to a state-of-the-art fuzzer, which requires a lengthy 30-hour window to reach a similar condition coverage. Furthermore, our fuzzer can attain 80% coverage when provided with a limited pool of 10 simulation instances/licenses within a 130-hour window. During this time, it conducted a total of 199K test cases, of which 6K produced discrepancies with the processor's golden model. Our analysis identified more than 10 unique mismatches, including two new bugs in the RocketCore and discrepancies from the RISC-V ISA Simulator.

Updated: 2024-04-10 09:28:54

标题: 超越随机输入：一种新颖的基于机器学习的硬件模糊测试

摘要: 现代计算系统在可信硬件的基础上严重依赖。然而，随着其日益复杂，安全关键漏洞的出现使得跨层攻击有可能被利用。传统的硬件漏洞检测方法，如随机回归和形式验证，存在局限性。随机回归虽然可扩展，但在探索硬件时速度较慢，而形式验证技术常常涉及手动工作和状态爆炸。硬件模糊测试已经成为一种探索和检测现代处理器等大规模设计中安全漏洞的有效方法。它们在覆盖范围、可扩展性和效率方面优于传统方法。然而，尖端的模糊测试工具往往难以在实际时间范围内实现对复杂硬件设计的全面覆盖，往往未达到70%的覆盖率阈值。我们提出了一种新颖的基于机器学习的硬件模糊测试工具ChatFuzz，以应对这一挑战。我们的方法利用像ChatGPT这样的LLM来了解处理器语言，专注于机器代码并生成汇编代码序列。RL被整合进来，通过使用代码覆盖度指标奖励输入生成过程。我们使用开源的基于RISCV的RocketCore处理器作为我们的测试平台。与一种尖端的模糊测试工具相比，ChatFuzz仅用52分钟就实现了75%的条件覆盖率，而后者需要长达30小时的时间窗口才能达到类似的条件覆盖率。此外，我们的模糊测试工具在提供有限的10个仿真实例/许可证池的情况下，在130小时的时间窗口内能够达到80%的覆盖率。在此期间，它进行了共计199K个测试案例，其中6K个与处理器的黄金模型产生了差异。我们的分析发现了超过10个独特的不匹配，包括RocketCore中的两个新错误和与RISC-V ISA模拟器的不一致之处。

更新时间: 2024-04-10 09:28:54

领域: cs.SE,cs.AR,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.06856v1

The Topos of Transformer Networks

The transformer neural network has significantly out-shined all other neural network architectures as the engine behind large language models. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional networks, can be embedded in a pretopos of piecewise-linear functions, but that the transformer necessarily lives in its topos completion. In particular, this suggests that the two network families instantiate different fragments of logic: the former are first order, whereas transformers are higher-order reasoners. Furthermore, we draw parallels with architecture search and gradient descent, integrating our analysis in the framework of cybernetic agents.

Updated: 2024-04-10 09:24:16

标题: Transformer 网络的拓扑结构

摘要: 变压器神经网络作为大型语言模型背后的引擎，明显优于所有其他神经网络架构。我们通过拓扑理论的视角提供了对变压器架构表达能力的理论分析。从这个视角来看，我们展示了许多常见的神经网络架构，如卷积、循环和图卷积网络，可以嵌入到分段线性函数的pretopos中，但变压器必然存在于其拓扑完成中。特别是，这表明这两个网络家族实例化了不同的逻辑片段：前者是一阶逻辑，而变压器是高阶推理器。此外，我们将架构搜索和梯度下降与之类比，将我们的分析整合到了网络智能代理的框架中。

更新时间: 2024-04-10 09:24:16

领域: cs.LG,math.CT

下载: http://arxiv.org/abs/2403.18415v2

Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation

Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This leads to not optimally used CPU registers. This is a shortcoming, especially in resource constrained embedded setups. In this work, we present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step directly from the high-level model representation. Specifically, we develop various approaches to effectively allocate registers for the inference of decision tree ensembles. Extensive evaluations of the proposed method are conducted in comparison to the basic realization of C code from the high-level machine learning model and succeeding compilation. The results show that the performance of decision tree ensemble inference can be significantly improved (by up to $\approx1.6\times$), if the methods are applied carefully to the appropriate scenario.

Updated: 2024-04-10 09:17:22

标题: 登记您的森林：通过显式CPU寄存器分配优化决策树集成

摘要: 将高级机器学习模型引入高效且适合的机器实现通常需要调用一系列工具，例如代码生成器、编译器和优化器。在这样的工具链中，必须应用抽象化。这导致CPU寄存器未能得到最佳利用，这在资源受限的嵌入式环境中尤为不足。本文提出了一种决策树集成的代码生成方法，通过单一转换步骤直接从高级模型表示生成机器汇编代码。具体而言，我们开发了各种方法来有效地为决策树集成的推理分配寄存器。与从高级机器学习模型生成基本C代码并进行后续编译的基本实现进行了广泛评估。结果显示，如果谨慎应用这些方法到适当的场景中，决策树集成推理的性能可以显著提高（最多可提高约1.6倍）。

更新时间: 2024-04-10 09:17:22

领域: cs.LG

下载: http://arxiv.org/abs/2404.06846v1

An experimental evaluation of Deep Reinforcement Learning algorithms for HVAC control

Heating, Ventilation, and Air Conditioning (HVAC) systems are a major driver of energy consumption in commercial and residential buildings. Recent studies have shown that Deep Reinforcement Learning (DRL) algorithms can outperform traditional reactive controllers. However, DRL-based solutions are generally designed for ad hoc setups and lack standardization for comparison. To fill this gap, this paper provides a critical and reproducible evaluation, in terms of comfort and energy consumption, of several state-of-the-art DRL algorithms for HVAC control. The study examines the controllers' robustness, adaptability, and trade-off between optimization goals by using the Sinergym framework. The results obtained confirm the potential of DRL algorithms, such as SAC and TD3, in complex scenarios and reveal several challenges related to generalization and incremental learning.

Updated: 2024-04-10 09:06:41

标题: 一个HVAC控制的深度强化学习算法的实验评估

摘要: 加热、通风和空调（HVAC）系统是商业和住宅建筑中能源消耗的重要驱动因素。最近的研究表明，深度强化学习（DRL）算法可以胜过传统的反应控制器。然而，基于DRL的解决方案通常设计为特定设置，并缺乏用于比较的标准化。为了填补这一空白，本文提供了对几种最先进的HVAC控制DRL算法在舒适度和能源消耗方面的关键且可重复的评估。该研究通过使用Sinergym框架考察控制器的健壮性、适应性和优化目标之间的权衡。所获得的结果确认了DRL算法（如SAC和TD3）在复杂场景中的潜力，并揭示了与泛化和增量学习相关的几个挑战。

更新时间: 2024-04-10 09:06:41

领域: cs.LG,cs.SY,eess.SY,I.2.8; J.2

下载: http://arxiv.org/abs/2401.05737v2

Universal Prompt Tuning for Graph Neural Networks

In recent years, prompt tuning has sparked a research surge in adapting pre-trained models. Unlike the unified pre-training strategy employed in the language field, the graph field exhibits diverse pre-training strategies, posing challenges in designing appropriate prompt-based tuning methods for graph neural networks. While some pioneering work has devised specialized prompting functions for models that employ edge prediction as their pre-training tasks, these methods are limited to specific pre-trained GNN models and lack broader applicability. In this paper, we introduce a universal prompt-based tuning method called Graph Prompt Feature (GPF) for pre-trained GNN models under any pre-training strategy. GPF operates on the input graph's feature space and can theoretically achieve an equivalent effect to any form of prompting function. Consequently, we no longer need to illustrate the prompting function corresponding to each pre-training strategy explicitly. Instead, we employ GPF to obtain the prompted graph for the downstream task in an adaptive manner. We provide rigorous derivations to demonstrate the universality of GPF and make guarantee of its effectiveness. The experimental results under various pre-training strategies indicate that our method performs better than fine-tuning, with an average improvement of about 1.4% in full-shot scenarios and about 3.2% in few-shot scenarios. Moreover, our method significantly outperforms existing specialized prompt-based tuning methods when applied to models utilizing the pre-training strategy they specialize in. These numerous advantages position our method as a compelling alternative to fine-tuning for downstream adaptations.

Updated: 2024-04-10 09:04:26

标题: 图神经网络的通用提示调整

摘要: 近年来，快速调整已经引发了对适应预训练模型的研究热潮。与语言领域采用的统一预训练策略不同，图领域展现出多样的预训练策略，为设计适用于图神经网络的基于提示的调整方法带来了挑战。虽然一些开创性工作已经为采用边缘预测作为其预训练任务的模型设计了专门的提示函数，但这些方法仅适用于特定的预训练 GNN 模型，缺乏更广泛的适用性。本文介绍了一种称为图提示特征（GPF）的通用基于提示的调整方法，适用于任何预训练策略下的预训练 GNN 模型。GPF 在输入图的特征空间上操作，理论上可以实现与任何形式的提示函数等效的效果。因此，我们不再需要明确说明与每种预训练策略对应的提示函数。相反，我们使用 GPF 以自适应的方式获得下游任务的提示图。我们提供严格的推导来证明 GPF 的普适性并保证其有效性。在各种预训练策略下的实验结果表明，我们的方法比微调表现更好，在全样本场景中平均提高约 1.4%，在少样本场景中约提高 3.2%。此外，我们的方法在应用于利用其专门化的预训练策略的模型时，明显优于现有的专门的基于提示的调整方法。这些众多优势使我们的方法成为下游适应的微调的一个引人注目的替代方案。

更新时间: 2024-04-10 09:04:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.15240v5

MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers

Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct predictions than in incorrect ones on clean and adversarial data alike, we speculate amplifying this "benign confidence property" can reconcile accuracy and robustness in an ensemble setting. To achieve so, we propose "MixedNUTS", a training-free method where the output logits of a robust classifier and a standard non-robust classifier are processed by nonlinear transformations with only three parameters, which are optimized through an efficient algorithm. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness -- it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy.

Updated: 2024-04-10 09:00:44

标题: 混合坚果：通过非线性混合分类器实现无需训练的准确性-稳健性平衡

摘要: 对抗性鲁棒性通常会以降低的准确性为代价，阻碍了鲁棒分类模型在现实生活中的应用。基于训练的解决方案在与已经训练好的高性能大模型不兼容的情况下，很难进行更好的权衡，因此需要探索无需训练的集成方法。观察到，在干净数据和对抗数据上，鲁棒模型对于正确预测比对于错误预测更有信心，我们推测放大这种“良性信心属性”可以在集成设置中调和准确性和鲁棒性。为了实现这一目标，我们提出了“MixedNUTS”方法，这是一种无需训练的方法，其中鲁棒分类器和标准非鲁棒分类器的输出logit通过仅有三个参数的非线性转换进行处理，这些参数通过高效的算法进行优化。MixedNUTS然后将转换后的logit转换为概率，并将它们混合作为整体输出。在CIFAR-10、CIFAR-100和ImageNet数据集上，通过定制强适应性攻击的实验结果表明MixedNUTS的准确性大幅提高，接近SOTA的鲁棒性 -- 它将CIFAR-100的干净准确率提高了7.86个百分点，仅牺牲了0.87个百分点的鲁棒准确率。

更新时间: 2024-04-10 09:00:44

领域: cs.LG,cs.AI,cs.CV,68T07

下载: http://arxiv.org/abs/2402.02263v2

Multi-role Consensus through LLMs Discussions for Vulnerability Detection

Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces a multi-role approach to employ LLMs to act as different roles to simulate real-life code review process, engaging in discussions towards a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of the proposed approach indicates a 4.73% increase in the precision rate, 58.9% increase in the recall rate, and a 28.1% increase in the F1 score.

Updated: 2024-04-10 08:53:13

标题: 通过LLMs讨论实现漏洞检测的多角色共识

摘要: 最近关于大型语言模型（LLMs）的进展突显了漏洞检测的潜力，这是软件质量保证的关键组成部分。尽管取得了进展，但大多数研究仅限于单一角色的视角，通常是测试人员，缺乏来自典型软件开发生命周期中不同角色的多样观点，包括开发人员和测试人员。为此，本文介绍了一种多角色方法，利用LLMs扮演不同角色，模拟实际代码审查过程，进行讨论以达成对代码中漏洞的存在和分类的共识。拟议方法的初步评估显示精度率提高了4.73％，召回率提高了58.9％，F1分数提高了28.1％。

更新时间: 2024-04-10 08:53:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2403.14274v2

Solving Parametric PDEs with Radial Basis Functions and Deep Neural Networks

We propose the POD-DNN, a novel algorithm leveraging deep neural networks (DNNs) along with radial basis functions (RBFs) in the context of the proper orthogonal decomposition (POD) reduced basis method (RBM), aimed at approximating the parametric mapping of parametric partial differential equations on irregular domains. The POD-DNN algorithm capitalizes on the low-dimensional characteristics of the solution manifold for parametric equations, alongside the inherent offline-online computational strategy of RBM and DNNs. In numerical experiments, POD-DNN demonstrates significantly accelerated computation speeds during the online phase. Compared to other algorithms that utilize RBF without integrating DNNs, POD-DNN substantially improves the computational speed in the online inference process. Furthermore, under reasonable assumptions, we have rigorously derived upper bounds on the complexity of approximating parametric mappings with POD-DNN, thereby providing a theoretical analysis of the algorithm's empirical performance.

Updated: 2024-04-10 08:52:12

标题: 用径向基函数和深度神经网络解决参数化偏微分方程

摘要: 我们提出了POD-DNN，这是一种新颖的算法，利用深度神经网络（DNN）和径向基函数（RBF）在适当正交分解（POD）约简基方法（RBM）的背景下，旨在近似不规则域上参数化偏微分方程的参数映射。POD-DNN算法利用参数方程的解流形的低维特性，以及RBM和DNN的固有离线-在线计算策略。在数值实验中，POD-DNN在在线阶段表现出明显加速的计算速度。与仅利用RBF而不集成DNN的其他算法相比，POD-DNN在在线推断过程中显著提高了计算速度。此外，在合理假设下，我们严格推导了用POD-DNN近似参数映射的复杂度的上界，从而对算法的经验性能进行了理论分析。

更新时间: 2024-04-10 08:52:12

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.06834v1

SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection

Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set.

Updated: 2024-04-10 08:48:09

标题: SplatPose & Detect：与姿势无关的3D异常检测

摘要: 在学术界和工业界，检测图像中的异常已经成为一个被广泛探讨的问题。最先进的算法能够在越来越困难的环境和数据模态中检测缺陷。然而，大多数当前的方法并不适合处理从不同姿态捕获的3D对象。虽然已经提出了使用神经辐射场（NeRFs）的解决方案，但它们受到了过高的计算需求的限制，这妨碍了实际的可用性。因此，我们提出了基于3D高斯喷溅的新颖框架SplatPose，通过给定3D对象的多视图图像，以可微分的方式准确估计未见视图的姿态，并检测其中的异常。我们在训练速度、推理速度和检测性能方面取得了最先进的结果，即使使用的训练数据比竞争方法少。我们通过最近提出的Pose-agnostic Anomaly Detection基准和其多姿态异常检测（MAD）数据集对我们的框架进行了彻底评估。

更新时间: 2024-04-10 08:48:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.06832v1

Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

We study the generalized linear contextual bandit problem within the requirements of limited adaptivity. In this paper, we present two algorithms, \texttt{B-GLinCB} and \texttt{RS-GLinCB}, that address, respectively, two prevalent limited adaptivity models: batch learning with stochastic contexts and rare policy switches with adversarial contexts. For both these models, we establish essentially tight regret bounds. Notably, in the obtained bounds, we manage to eliminate a dependence on a key parameter $\kappa$, which captures the non-linearity of the underlying reward model. For our batch learning algorithm \texttt{B-GLinCB}, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$. Further, we establish that our rarely switching algorithm \texttt{RS-GLinCB} updates its policy at most $\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$. Our approach for removing the dependence on $\kappa$ for generalized linear contextual bandits might be of independent interest.

Updated: 2024-04-10 08:47:57

标题: 广义线性情境臂的有限适应性下的最佳后悔率

摘要: 我们研究了在有限适应性要求下的广义线性情境赌博问题。在本文中，我们提出了两种算法\texttt{B-GLinCB}和\texttt{RS-GLinCB}，分别解决了两种普遍的有限适应性模型：具有随机情境的批次学习和具有对抗性情境的稀少策略切换。对于这两种模型，我们建立了基本上紧密的遗憾界。值得注意的是，在获得的界中，我们设法消除了对关键参数$\kappa$的依赖，该参数捕捉了基础奖励模型的非线性性。对于我们的批次学习算法\texttt{B-GLinCB}，通过$\Omega\left( \log{\log T} \right)$批次，遗憾随时间的增长为$\tilde{O}(\sqrt{T})$。此外，我们确定我们的稀少切换算法\texttt{RS-GLinCB}最多更新其策略$\tilde{O}(\log^2 T)$次，并实现了$\tilde{O}(\sqrt{T})$的遗憾。我们消除广义线性情境赌博中对$\kappa$的依赖的方法可能具有独立的兴趣。

更新时间: 2024-04-10 08:47:57

领域: cs.LG

下载: http://arxiv.org/abs/2404.06831v1

Large Language Models for Software Engineering: A Systematic Literature Review

Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We select and analyze 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR.

Updated: 2024-04-10 08:41:22

标题: 大型语言模型在软件工程中的应用：一项系统文献综述

摘要: 大型语言模型（LLMs）显著影响了许多领域，包括软件工程（SE）。许多最近的出版物探讨了LLMs应用于各种SE任务。然而，对LLMs在SE上的应用、影响和可能的限制的全面理解仍处于早期阶段。为了弥合这一差距，我们进行了一项系统文献综述（SLR）关于LLM4SE，特别关注了理解LLMs如何被利用来优化过程和结果。我们选择并分析了从2017年1月到2024年1月的395篇研究论文，以回答四个关键研究问题（RQs）。在RQ1中，我们对在SE任务中使用的不同LLMs进行分类，描述它们的独特特征和用途。在RQ2中，我们分析了数据收集、预处理和应用中使用的方法，强调了为成功的LLM for SE实施而精心策划的数据集的作用。RQ3调查了用于优化和评估LLMs在SE中性能的策略。最后，RQ4考察了LLMs迄今在SE任务中显示成功的具体情况，说明了它们对该领域的实际贡献。通过这些RQ的答案，我们讨论了当前的最新技术和趋势，确定了现有研究中的差距，并标记了未来研究的有希望的领域。我们的文献资料可在https://github.com/xinyi-hou/LLM4SE_SLR 上公开获取。

更新时间: 2024-04-10 08:41:22

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2308.10620v6

Proposed modified computational model for the amoeba-inspired combinatorial optimization machine

A single-celled amoeba can solve the traveling salesman problem through its shape-changing dynamics. In this paper, we examine roles of several elements in a previously proposed computational model of the solution-search process of amoeba and three modifications towards enhancing the solution-search preformance. We find that appropriate modifications can indeed significantly improve the quality of solutions. It is also found that a condition associated with the volume conservation can also be modified in contrast to the naive belief that it is indispensable for the solution-search ability of amoeba. A proposed modified model shows much better performance.

Updated: 2024-04-10 08:32:29

标题: 拟议的修正计算模型用于基于变形虫的组合优化机器

摘要: 一个单细胞的变形虫可以通过其形态变化动态来解决旅行推销员问题。在本文中，我们研究了之前提出的一个计算模型中几个元素在变形虫解决方案搜索过程中的作用，以及为增强解决方案搜索性能而进行的三项修改。我们发现适当的修改确实可以显著改善解决方案的质量。同时，我们也发现与体积守恒相关的条件可以被修改，与人们过去认为的对变形虫的解决方案搜索能力是不可或缺的观点相反。一个提出的修改模型表现出更好的性能。

更新时间: 2024-04-10 08:32:29

领域: cs.NE,cond-mat.dis-nn,cs.AI,nlin.CD,stat.CO

下载: http://arxiv.org/abs/2404.06828v1

Hysteresis Compensation of Flexible Continuum Manipulator using RGBD Sensing and Temporal Convolutional Network

Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with long and coupled, multi-segmented manipulator. This paper proposes a data-driven approach based on Deep Neural Networks (DNN) to capture these nonlinear and previous states-dependent characteristics of cable actuation. We collect physical joint configurations according to command joint configurations using RGBD sensing and 7 fiducial markers to model the hysteresis of the proposed manipulator. Result on a study comparing the estimation performance of four DNN models show that the Temporal Convolution Network (TCN) demonstrates the highest predictive capability. Leveraging trained TCNs, we build a control algorithm to compensate for hysteresis. Tracking tests in task space using unseen trajectories show that the proposed control algorithm reduces the average position and orientation error by 61.39% (from 13.7mm to 5.29 mm) and 64.04% (from 31.17{\deg} to 11.21{\deg}), respectively. This result implies that the proposed calibrated controller effectively reaches the desired configurations by estimating the hysteresis of the manipulator. Applying this method in real surgical scenarios has the potential to enhance control precision and improve surgical performance.

Updated: 2024-04-10 08:31:08

标题: 使用RGBD感知和时间卷积网络对柔性连续操作器的滞后补偿

摘要: 柔性连续操纵器在微创手术中备受重视，通过非线性路径进入受限空间。然而，由于缆绳效应如摩擦、伸长和耦合引起的滞后，缆绳驱动操纵器面临控制困难。这些效应难以建模，因为非线性，而且当处理长且耦合的多段式操纵器时，困难变得更加明显。本文提出了一种基于深度神经网络（DNN）的数据驱动方法，以捕捉缆绳驱动的非线性和先前状态相关特性。我们使用RGBD感测和7个基准标记收集物理关节配置，以模拟所提出操纵器的滞后。在一项研究中，比较了四种DNN模型的估计性能，结果显示时间卷积网络（TCN）表现出最高的预测能力。利用训练好的TCNs，我们建立了一个控制算法来补偿滞后。在任务空间中使用未见过的轨迹进行跟踪测试，结果显示所提出的控制算法将平均位置和方向误差分别减少了61.39%（从13.7mm到5.29mm）和64.04%（从31.17°到11.21°）。这一结果表明，所提出的校准控制器通过估计操纵器的滞后有效地达到了期望的配置。将这种方法应用于实际外科手术场景有可能增强控制精度并改善手术性能。

更新时间: 2024-04-10 08:31:08

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2402.11319v2

IA2: Leveraging Instance-Aware Index Advisor with Reinforcement Learning for Diverse Workloads

This study introduces the Instance-Aware Index Advisor (IA2), a novel deep reinforcement learning (DRL)-based approach for optimizing index selection in databases facing large action spaces of potential candidates. IA2 introduces the Twin Delayed Deep Deterministic Policy Gradient - Temporal Difference State-Wise Action Refinery (TD3-TD-SWAR) model, enabling efficient index selection by understanding workload-index dependencies and employing adaptive action masking. This method includes a comprehensive workload model, enhancing its ability to adapt to unseen workloads and ensuring robust performance across diverse database environments. Evaluation on benchmarks such as TPC-H reveals IA2's suggested indexes' performance in enhancing runtime, securing a 40% reduction in runtime for complex TPC-H workloads compared to scenarios without indexes, and delivering a 20% improvement over existing state-of-the-art DRL-based index advisors.

Updated: 2024-04-10 08:23:48

标题: IA2：利用强化学习的实例感知索引顾问来处理多样化的工作负载

摘要: 本研究介绍了Instance-Aware Index Advisor（IA2），这是一种新颖的基于深度强化学习（DRL）的方法，用于优化面临大量潜在候选索引的数据库中的索引选择。IA2引入了Twin Delayed Deep Deterministic Policy Gradient - Temporal Difference State-Wise Action Refinery（TD3-TD-SWAR）模型，通过理解工作量-索引依赖关系并采用自适应动作屏蔽，实现了高效的索引选择。该方法包括一个全面的工作负载模型，增强了其适应未知工作负载的能力，并确保在不同的数据库环境中稳健的性能。对诸如TPC-H等基准测试的评估显示，IA2建议的索引在提高运行时间方面的性能，相比于没有索引的情景，可以减少40%的运行时间，比现有最先进的基于DRL的索引顾问提高20%。

更新时间: 2024-04-10 08:23:48

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2404.05777v2

Error Mitigation for TDoA UWB Indoor Localization using Unsupervised Machine Learning

Indoor positioning systems based on Ultra-wideband (UWB) technology are gaining recognition for their ability to provide cm-level localization accuracy. However, these systems often encounter challenges caused by dense multi-path fading, leading to positioning errors. To address this issue, in this letter, we propose a novel methodology for unsupervised anchor node selection using deep embedded clustering (DEC). Our approach uses an Auto Encoder (AE) before clustering, thereby better separating UWB features into separable clusters of UWB input signals. We furthermore investigate how to rank these clusters based on their cluster quality, allowing us to remove untrustworthy signals. Experimental results show the efficiency of our proposed method, demonstrating a significant 23.1% reduction in mean absolute error (MAE) compared to without anchor exclusion. Especially in the dense multi-path area, our algorithm achieves even more significant enhancements, reducing the MAE by 26.6% and the 95th percentile error by 49.3% compared to without anchor exclusion.

Updated: 2024-04-10 08:23:05

标题: 使用无监督机器学习对TDoA超宽带室内定位的误差缓解

摘要: 基于超宽带（UWB）技术的室内定位系统因其提供厘米级定位精度而受到认可。然而，这些系统通常会遇到由密集多径衰落引起的挑战，导致定位误差。为了解决这个问题，在这封信中，我们提出了一种使用深度嵌入式聚类（DEC）的无监督锚节点选择的新方法。我们的方法在聚类之前使用自动编码器（AE），从而更好地将UWB特征分离成可分离的UWB输入信号簇。我们进一步研究如何基于它们的簇质量对这些簇进行排序，从而可以去除不可靠的信号。实验结果显示了我们提出的方法的效率，相比不排除锚点，平均绝对误差（MAE）显著减少了23.1％。特别是在密集多径区域，我们的算法实现了更显著的提升，将MAE减少了26.6％，将第95百分位误差减少了49.3％，相比不排除锚点。

更新时间: 2024-04-10 08:23:05

领域: cs.LG,I.2.1

下载: http://arxiv.org/abs/2404.06824v1

Enc2DB: A Hybrid and Adaptive Encrypted Query Processing Framework

As cloud computing gains traction, data owners are outsourcing their data to cloud service providers (CSPs) for Database Service (DBaaS), bringing in a deviation of data ownership and usage, and intensifying privacy concerns, especially with potential breaches by hackers or CSP insiders. To address that, encrypted database services propose encrypting every tuple and query statement before submitting to the CSP, ensuring data confidentiality when the CSP is honest-but-curious, or even compromised. Existing solutions either employ property preserving cryptography schemes, which can perform certain operations over ciphertext without decrypting the data over the CSP, or utilize trusted execution environment (TEE) to safeguard data and computations from the CSP. Based on these efforts, we introduce Enc2DB, a novel secure database system, following a hybrid strategy on PostgreSQL and openGauss. We present a micro-benchmarking test and self-adaptive mode switch strategy that can dynamically choose the best execution path (cryptography or TEE) to answer a given query. Besides, we also design and implement a ciphertext index compatible with native cost model and query optimizers to accelerate query processing. Empirical study over TPC-C test justifies that Enc2DB outperforms pure TEE and cryptography solutions, and our ciphertext index implementation also outperforms the state-of-the-art cryptographic-based system.

Updated: 2024-04-10 08:11:12

标题: Enc2DB：一种混合和自适应的加密查询处理框架

摘要: 随着云计算的普及，数据所有者正在将其数据外包给云服务提供商（CSPs）进行数据库服务（DBaaS），这带来了数据所有权和使用的偏离，并加剧了隐私担忧，尤其是在遭受黑客或CSP内部人员潜在侵犯的情况下。为了解决这个问题，加密数据库服务提出在提交给CSP之前加密每个元组和查询语句，确保在CSP是诚实但好奇，甚至被入侵时保护数据的机密性。现有的解决方案要么采用保持属性的加密方案，可以在不解密数据的情况下对密文执行某些操作，要么利用受信任的执行环境（TEE）保护数据和计算免受CSP的侵害。基于这些努力，我们介绍了Enc2DB，这是一个新颖的安全数据库系统，采用了基于PostgreSQL和openGauss的混合策略。我们提出了一个微基准测试和自适应模式切换策略，可以动态选择最佳执行路径（加密或TEE）来回答特定查询。此外，我们还设计并实现了一个与本地成本模型和查询优化器兼容的密文索引，以加速查询处理。经过TPC-C测试的实证研究证明，Enc2DB优于纯TEE和加密解决方案，并且我们的密文索引实现也优于现有的基于加密的系统。

更新时间: 2024-04-10 08:11:12

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2404.06819v1

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcription while ensuring both high performance and lightweight. To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model. First, we extend the acoustic module by adding a frequency-conditioned FiLM layer to the CNN module to adapt the convolutional filters on the frequency axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM that focuses on note-state transitions within a note. In addition, we augment the autoregressive connection with an enhanced recursive context. Using these components, we propose two types of models; one for high performance and the other for high compactness. Through extensive experiments, we show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset. We also investigate the effective model size and real-time inference latency by gradually streamlining the architecture. Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth analysis to elucidate the effect of the proposed components in the view of note length and pitch range.

Updated: 2024-04-10 08:06:15

标题: 朝着高效和实时的钢琴转录：使用神经自回归模型

摘要: 近年来，神经网络设计的进步和大规模标记数据集的可用性导致钢琴转录模型的准确性显著提高。然而，大多数先前的工作都集中在高性能的离线转录上，忽视了对模型大小的刻意考虑。本文的目标是在确保高性能和轻量级的情况下实现钢琴转录的实时推断。为此，我们提出了用于卷积递归神经网络的新颖架构，重新设计了现有的自回归钢琴转录模型。首先，我们通过在CNN模块中添加一个频率条件的FiLM层来扩展声学模块，以使卷积滤波器在频率轴上适应。其次，我们通过使用一个基于音高的LSTM来改进音符状态序列建模，该LSTM侧重于音符内的音符状态转换。此外，我们通过增强的递归上下文来增强自回归连接。利用这些组件，我们提出了两种模型：一种用于高性能，另一种用于高紧凑性。通过大量实验，我们展示了提出的模型在MAESTRO数据集的音符准确性方面与最先进的模型相当。我们还通过逐步简化架构来研究有效的模型大小和实时推断延迟。最后，我们对未见过的钢琴数据集进行跨数据评估，并进行深入分析，以阐明所提出的组件在音符长度和音高范围的视角下的影响。

更新时间: 2024-04-10 08:06:15

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.06818v1

Security Assessment of the LG Cryptosystem

The LG cryptosystem is a public-key encryption scheme in the rank metric using the recent family of $\lambdav-$Gabidulin codes and introduced in 2019 by Lau and Tan. In this paper, we present a cryptanalysis showing that the security of several parameters of the scheme have been overestimated. We also show the existence of some weak keys allowing an attacker to find in polynomial time an alternative private key.

Updated: 2024-04-10 08:02:18

标题: LG密码系统的安全评估

摘要: LG密码系统是一种使用最近的$\lambdav-$Gabidulin码族在秩度量中的公钥加密方案，由Lau和Tan于2019年引入。在本文中，我们提出了一种密码分析，表明方案的几个参数的安全性被高估了。我们还展示了存在一些弱密钥，允许攻击者在多项式时间内找到替代私钥。

更新时间: 2024-04-10 08:02:18

领域: cs.CR

下载: http://arxiv.org/abs/2404.06815v1

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks. The results show that \gpt has high accuracy in facial action unit recognition and micro-expression detection while its general facial expression recognition performance is not accurate. We also highlight the challenges of achieving fine-grained micro-expression recognition and the potential for further study and demonstrate the versatility and potential of \gpt for handling advanced tasks in emotion recognition and related fields by integrating with task-related agents for more complex tasks, such as heart rate estimation through signal processing. In conclusion, this paper provides valuable insights into the potential applications and challenges of MLLMs in human-centric computing. Our interesting examples are at https://github.com/EnVision-Research/GPT4Affectivity.

Updated: 2024-04-10 07:58:44

标题: GPT作为心理学家？关于GPT-4V在视觉情感计算上的初步评估

摘要: 多模态大型语言模型（MLLMs）旨在处理和整合来自多个来源的信息，如文本、语音、图像和视频。尽管在语言理解方面取得成功，但对于更好的以人为中心的应用，评估下游任务的性能至关重要。本文评估了MLLMs在情感计算中的应用，涵盖了视觉情感任务和推理任务等5个关键能力。结果显示，\gpt 在面部动作单元识别和微表情检测方面具有高准确性，而其一般面部表情识别性能不准确。我们还强调了实现细粒度微表情识别的挑战，以及进一步研究的潜力，并通过与任务相关代理的集成展示了\gpt 在处理情绪识别和相关领域的高级任务方面的多功能性和潜力，例如通过信号处理进行心率估计。总之，本文提供了有关MLLMs在人为中心计算中潜在应用和挑战的宝贵见解。我们的有趣示例可在https://github.com/EnVision-Research/GPT4Affectivity找到。

更新时间: 2024-04-10 07:58:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.05916v2

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3$\times$ lower GPU memory usage and 5$\times$ faster fitting time not only rivals INRs (e.g., WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. Code will be available at https://github.com/Xinjie-Q/GaussianImage.

Updated: 2024-04-10 07:58:04

标题: 高斯图像：通过2D高斯喷洒实现每秒1000帧图像表示和压缩

摘要: 最近，隐式神经表示（INRs）在图像表示和压缩方面取得了巨大成功，提供了高视觉质量和快速渲染速度，每秒可达10-1000帧，假设有足够的GPU资源可用。然而，这种要求通常会阻碍它们在内存有限的低端设备上的使用。作为回应，我们提出了一种通过2D高斯喷溅的图像表示和压缩的开创性范式，称为GaussianImage。我们首先引入2D高斯来表示图像，其中每个高斯具有包括位置、协方差和颜色在内的8个参数。随后，我们揭示了一种基于累积求和的新型渲染算法。值得注意的是，我们的方法在GPU内存使用方面最多减少了3倍，并且拟合时间快了5倍，不仅在表示性能上与INRs（例如WIRE、I-NGP）相媲美，还提供了1500-2000帧的更快渲染速度，而不受参数大小的影响。此外，我们整合了现有的矢量量化技术来构建图像编解码器。实验结果表明，我们的编解码器在速率失真性能方面与基于压缩的INRs（如COIN和COIN++）相媲美，同时可以实现约1000帧的解码速度。此外，初步的概念验证显示，我们的编解码器在使用部分比特回退编码时超过了COIN和COIN++的性能。代码将在https://github.com/Xinjie-Q/GaussianImage 上提供。

更新时间: 2024-04-10 07:58:04

领域: eess.IV,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2403.08551v3

Formation-Controlled Dimensionality Reduction

Dimensionality reduction represents the process of generating a low dimensional representation of high dimensional data. Motivated by the formation control of mobile agents, we propose a nonlinear dynamical system for dimensionality reduction. The system consists of two parts; the control of neighbor points, addressing local structures, and the control of remote points, accounting for global structures. We also include a brief mathematical observation of the model and its numerical procedure. Numerical experiments are performed on both synthetic and real datasets and comparisons with existing models demonstrate the soundness and effectiveness of the proposed model.

Updated: 2024-04-10 07:55:10

标题: 形成控制的维度减少

摘要: 降维代表着生成高维数据的低维表示的过程。受移动代理的形成控制启发，我们提出了一个用于降维的非线性动态系统。该系统由两部分组成；控制邻近点，处理局部结构，以及控制远程点，考虑全局结构。我们还对模型及其数值程序进行了简要的数学观察。在合成和真实数据集上进行了数值实验，并与现有模型进行了比较，结果表明所提出的模型的合理性和有效性。

更新时间: 2024-04-10 07:55:10

领域: cs.LG

下载: http://arxiv.org/abs/2404.06808v1

Generative Resident Separation and Multi-label Classification for Multi-person Activity Recognition

This paper presents two models to address the problem of multi-person activity recognition using ambient sensors in a home. The first model, Seq2Res, uses a sequence generation approach to separate sensor events from different residents. The second model, BiGRU+Q2L, uses a Query2Label multi-label classifier to predict multiple activities simultaneously. Performances of these models are compared to a state-of-the-art model in different experimental scenarios, using a state-of-the-art dataset of two residents in a home instrumented with ambient sensors. These results lead to a discussion on the advantages and drawbacks of resident separation and multi-label classification for multi-person activity recognition.

Updated: 2024-04-10 07:46:30

标题: 生成式居民分离和多标签分类用于多人活动识别

摘要: 本文提出了两种模型来解决在家中使用环境传感器进行多人活动识别的问题。第一个模型Seq2Res使用序列生成方法来区分来自不同居民的传感器事件。第二个模型BiGRU+Q2L使用Query2Label多标签分类器同时预测多个活动。这些模型在不同实验场景下与现有技术模型进行了比较，使用了一个家庭中安装环境传感器的两个居民的现有数据集。这些结果导致对居民分离和多标签分类对多人活动识别的优势和劣势进行讨论。

更新时间: 2024-04-10 07:46:30

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2404.07245v1

YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy

Knowledge Bases (KBs) find applications in many knowledge-intensive tasks and, most notably, in information retrieval. Wikidata is one of the largest public general-purpose KBs. Yet, its collaborative nature has led to a convoluted schema and taxonomy. The YAGO 4 KB cleaned up the taxonomy by incorporating the ontology of Schema.org, resulting in a cleaner structure amenable to automated reasoning. However, it also cut away large parts of the Wikidata taxonomy, which is essential for information retrieval. In this paper, we extend YAGO 4 with a large part of the Wikidata taxonomy - while respecting logical constraints and the distinction between classes and instances. This yields YAGO 4.5, a new, logically consistent version of YAGO that adds a rich layer of informative classes. An intrinsic and an extrinsic evaluation show the value of the new resource.

Updated: 2024-04-10 07:45:22

标题: YAGO 4.5：一个拥有丰富分类法的大型且干净的知识库

摘要: 知识库（KBs）在许多知识密集型任务中找到应用，尤其是在信息检索中。Wikidata是最大的公共通用KB之一。然而，它的协作性质导致了一个复杂的模式和分类法。YAGO 4 KB通过整合Schema.org的本体论来清理分类法，从而得到一个更清晰的结构，便于自动推理。然而，它也削减了大部分Wikidata分类法，这对信息检索至关重要。在本文中，我们扩展了YAGO 4，引入了大部分Wikidata分类法 - 同时尊重逻辑约束和类与实例之间的区别。这产生了YAGO 4.5，一个新的、逻辑一致的YAGO版本，增加了丰富的信息类别。内在和外在评估显示了新资源的价值。

更新时间: 2024-04-10 07:45:22

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2308.11884v2

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents. As various long-text techniques and model architectures emerge, the precise and detailed evaluation of models' long-text capabilities has become increasingly important. Existing long-text evaluation benchmarks, such as L-Eval and LongBench, construct long-text test sets based on open-source datasets, focusing mainly on QA and summarization tasks. These datasets include test samples of varying lengths (from 2k to 32k+) entangled together, making it challenging to assess model capabilities across different length ranges. Moreover, they do not cover the ultralong settings (100k+ tokens) that the latest LLMs claim to achieve. In this paper, we introduce Ada-LEval, a length-adaptable benchmark for evaluating the long-context understanding of LLMs. Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities. These benchmarks support intricate manipulation of the length of test cases, and can easily produce text samples up to 128k tokens. We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval. The evaluation results demonstrate the limitations of current LLMs, especially in ultra-long-context settings. Our code is available at https://github.com/open-compass/Ada-LEval.

Updated: 2024-04-10 07:40:56

标题: Ada-LEval：使用长度可调节的基准评估长上下文LLMs

摘要: 最近，大型语言模型（LLM）社区显示出越来越多的兴趣，以增强LLMs处理极长文档的能力。随着各种长文本技术和模型架构的涌现，对模型长文本能力的精确和详细评估变得越来越重要。现有的长文本评估基准，如L-Eval和LongBench，基于开源数据集构建长文本测试集，主要关注问答和摘要任务。这些数据集包括长度不同的测试样本（从2k到32k+），交织在一起，使得跨不同长度范围评估模型能力变得具有挑战性。此外，它们不涵盖最新LLMs声称可以实现的超长设置（100k+标记）。在本文中，我们介绍了Ada-LEval，一个用于评估LLMs长上下文理解能力的长度可调整的基准。Ada-LEval包括两个具有挑战性的子集，TSort和BestAnswer，可以更可靠地评估LLMs的长上下文能力。这些基准支持对测试用例长度的复杂操作，并且可以轻松生成高达128k标记的文本样本。我们使用Ada-LEval评估了4个最新的闭源API模型和6个开源模型。评估结果表明当前LLMs的局限性，特别是在超长上下文设置中。我们的代码可在https://github.com/open-compass/Ada-LEval上找到。

更新时间: 2024-04-10 07:40:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.06480v2

Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less noisy subset for a robust model training. Extensive experiments demonstrate that our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.

Updated: 2024-04-10 07:34:37

标题: 提取干净且平衡的子集，用于嘈杂的长尾分类

摘要: 真实世界的数据集通常存在类别不平衡，并且受标签噪声影响。为了解决长尾分布和标签噪声的共同问题，大多数先前的工作通常旨在设计一个噪声检测器来区分嘈杂和干净的样本。尽管它们有效，但它们在统一处理共同问题方面可能存在局限性。在这项工作中，我们从分布匹配的角度开发了一种使用类别原型的新颖伪标记方法，可以通过最优传输（OT）来解决。通过设置一个手动特定的概率测度，并使用学习到的传输计划对训练样本进行伪标记，所提出的方法可以同时减少嘈杂和长尾数据的副作用。然后，我们引入了一个简单但有效的过滤标准，通过结合观察到的标签和伪标签来获得一个更平衡且更少嘈杂的子集，用于稳健的模型训练。大量实验证明，我们的方法可以提取出具有干净标签的类别平衡子集，从而为带有标签噪声的长尾分类带来有效的性能增益。

更新时间: 2024-04-10 07:34:37

领域: cs.LG

下载: http://arxiv.org/abs/2404.06795v1

A General Theory for Kernel Packets: from state space model to compactly supported basis

It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to $\CalO(n)$ for $n$ data points. We prove that an $m$-dimensional SS model formulation of GP is equivalent to a concept we introduce as the general right Kernel Packet (KP): a transformation for the GP covariance $K$ such that $\sum_{i=0}^{m}a_iD_t^{(j)}K(t,t_i)=0$ holds for any $t \leq t_1$, 0 $\leq j \leq m-1$, and $m+1$ consecutive points $t_i$, where ${D}_t^{(j)}f(t) $ denotes $j$-th derivative acting on $t$. We extend this idea to the backward SS model formulation, leading to the left KP for next $m$ consecutive points: $\sum_{i=0}^{m}b_i{D}_t^{(j)}K(t,t_{m+i})=0$ for any $t\geq t_{2m}$. By combining both left and right KPs, we can prove that a suitable linear combination of these covariance functions yields $m$ KP functions compactly supported on $(t_0,t_{2m})$. KPs improve GP prediction time to $\mathcal{O}(\log n)$ or $\mathcal{O}(1)$, enable broader applications including GP's derivatives and kernel multiplications, and can be generalized to multi-dimensional additive and product kernels for scattered data.

Updated: 2024-04-10 07:24:59

标题: 核数据包的一般理论：从状态空间模型到紧支持基础

摘要: 这篇文献摘要讨论了高斯过程（GP）的状态空间（SS）模型表述可以降低其训练和预测时间至$O(n)$，其中$n$为数据点数量。作者证明了一个$m$维SS模型表述的GP等价于他们引入的概念——一般右核包（KP）：一种用于GP协方差$K$的转换，使得对于任意$t\leq t_1$，$0\leq j\leq m-1$和$m+1$个连续点$t_i$，都有$\sum_{i=0}^{m}a_iD_t^{(j)}K(t,t_i)=0$成立，其中${D}_t^{(j)}f(t)$表示对$t$进行$j$阶导数操作。他们将这个想法扩展到了后向SS模型表述，得到了下一个$m$个连续点的左KP：对于任意$t\geq t_{2m}$，$\sum_{i=0}^{m}b_i{D}_t^{(j)}K(t,t_{m+i})=0$。通过结合左右两种KP，他们证明了这些协方差函数的适当线性组合产生了$m$个在$(t_0,t_{2m})$上紧支持的KP函数。KP提高了GP的预测时间至$O(\log n)$或$O(1)$，实现了更广泛的应用，包括GP的导数和核乘法，并且可以推广到多维散点数据的加法和乘法核。

更新时间: 2024-04-10 07:24:59

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.04022v4

Neural Architecture Search via Two Constant Shared Weights Initialisations

In recent years, zero-cost metrics are gaining ground in neural architecture search (NAS). There metrics allow finding the optimal neural network for a given task faster and with a lesser computational load than conventional NAS methods. Equally important is that they also shed some light on the internal workings of neural architectures. This paper presents a zero-cost metric that highly correlated with the train set accuracy across the NAS-Bench-101, NAS-Bench-201 and NAS-Bench-NLP benchmark datasets. We evaluate a neural achitecture's potential based on the outputs' statistics after two constant shared weights initialisations. For this, we only use an unlabelled mini-batch of data. We observe that the dispersion of the outputs between two initialisations positively correlates with trained accuracy. The correlation further improves when we normalise dispersion by average output magnitude. The resulting metric, epsilon, does not require gradients computation and unbinds the NAS procedure from training hyperparameters, loss metrics and human-labelled data. Our method is easy to integrate within existing NAS algorithms and takes a fraction of a second to evaluate a single network. The code supporting this study can be found on GitHub at https://github.com/egracheva/epsinas.

Updated: 2024-04-10 07:12:31

标题: 通过两个常数共享权重初始化的神经架构搜索（注：该翻译仅供参考，具体翻译可能根据上下文有所调整）

摘要: 最近几年，零成本指标在神经架构搜索（NAS）中越来越受到重视。这些指标可以比传统的NAS方法更快地找到给定任务的最佳神经网络，并且计算负载更小。同样重要的是，它们还为神经架构的内部工作提供了一些启示。本文提出了一种与NAS-Bench-101、NAS-Bench-201和NAS-Bench-NLP基准数据集上的训练集准确性高度相关的零成本指标。我们基于两个常数共享权重初始化后的输出统计来评估神经架构的潜力。为此，我们只使用一个无标签的小批量数据。我们观察到，在两个初始化之间的输出的分散与训练准确性呈正相关。当我们通过平均输出幅度对分散进行归一化时，相关性进一步提高。由此产生的度量标准 epsilon 不需要计算梯度，并将NAS过程与训练超参数、损失指标和人工标记数据解绑。我们的方法易于集成到现有的NAS算法中，并且评估单个网络仅需一小部分时间。支持这项研究的代码可在GitHub上找到：https://github.com/egracheva/epsinas。

更新时间: 2024-04-10 07:12:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.04406v2

Private Wasserstein Distance with Random Noises

Wasserstein distance is a principle measure of data divergence from a distributional standpoint. However, its application becomes challenging in the context of data privacy, where sharing raw data is restricted. Prior attempts have employed techniques like Differential Privacy or Federated optimization to approximate Wasserstein distance. Nevertheless, these approaches often lack accuracy and robustness against potential attack. In this study, we investigate the underlying triangular properties within the Wasserstein space, leading to a straightforward solution named TriangleWad. This approach enables the computation of Wasserstein distance between datasets stored across different entities. Notably, TriangleWad is 20 times faster, making raw data information truly invisible, enhancing resilience against attacks, and without sacrificing estimation accuracy. Through comprehensive experimentation across various tasks involving both image and text data, we demonstrate its superior performance and generalizations.

Updated: 2024-04-10 06:58:58

标题: Private Wasserstein Distance with Random Noises 的中文翻译为：带有随机噪声的私人Wasserstein距离

摘要: Wasserstein距离是数据从分布角度测量的一个重要指标。然而，在数据隐私的背景下，由于原始数据的共享受到限制，其应用变得具有挑战性。先前的尝试使用差分隐私或联邦优化等技术来近似Wasserstein距离。然而，这些方法通常缺乏对潜在攻击的准确性和鲁棒性。在本研究中，我们研究了Wasserstein空间中的三角形特性，提出了一种简单的解决方案，命名为TriangleWad。这种方法可以计算存储在不同实体之间的数据集之间的Wasserstein距离。值得注意的是，TriangleWad速度快20倍，使原始数据信息真正不可见，增强了对攻击的抵抗力，而不会牺牲估计精度。通过在涉及图像和文本数据的各种任务中进行全面实验，我们展示了其优越的性能和泛化能力。

更新时间: 2024-04-10 06:58:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.06787v1

Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data

Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks. However, challenges arise due to vulnerability to adversarial examples (AEs) and the non-independent and identically distributed (non-IID) nature of data distribution among devices, hindering the deployment of adversarially robust and accurate learning models at the edge. While adversarial training (AT) is commonly acknowledged as an effective defense strategy against adversarial attacks in centralized training, we shed light on the adverse effects of directly applying AT in FL that can severely compromise accuracy, especially in non-IID challenges. Given this limitation, this paper proposes FatCC, which incorporates local logit \underline{C}alibration and global feature \underline{C}ontrast into the vanilla federated adversarial training (\underline{FAT}) process from both logit and feature perspectives. This approach can effectively enhance the federated system's robust accuracy (RA) and clean accuracy (CA). First, we propose logit calibration, where the logits are calibrated during local adversarial updates, thereby improving adversarial robustness. Second, FatCC introduces feature contrast, which involves a global alignment term that aligns each local representation with unbiased global features, thus further enhancing robustness and accuracy in federated adversarial environments. Extensive experiments across multiple datasets demonstrate that FatCC achieves comparable or superior performance gains in both CA and RA compared to other baselines.

Updated: 2024-04-10 06:35:25

标题: Logit校准和特征对比：用于非独立同分布数据上稳健联邦学习的方法

摘要: 联邦学习（FL）是一种隐私保护的分布式框架，用于在边缘网络中设备上进行协作模型训练。然而，由于对抗性示例（AEs）的脆弱性和数据在设备间的非独立同分布（non-IID）性质，挑战出现，阻碍了在边缘部署对抗性强和准确的学习模型。虽然对抗性训练（AT）通常被认为是中心化训练中有效的防御策略，但我们揭示了直接将AT应用于FL中可能严重损害准确性的负面影响，尤其是在非IID挑战中。鉴于这一局限性，本文提出了FatCC，将局部logit校准和全局特征对比融入到从logit和特征角度的原始联邦对抗训练（FAT）过程中。这种方法可以有效增强联邦系统的鲁棒准确性（RA）和干净准确性（CA）。首先，我们提出logit校准，通过在局部对抗更新期间对logits进行校准，从而提高对抗性鲁棒性。其次，FatCC引入特征对比，涉及一个全局对齐项，将每个局部表示与无偏全局特征对齐，从而进一步增强在联邦对抗环境中的鲁棒性和准确性。跨多个数据集的大量实验证明，与其他基线相比，FatCC在CA和RA方面实现了可比或更优越的性能提升。

更新时间: 2024-04-10 06:35:25

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.06776v1

An inclusive review on deep learning techniques and their scope in handwriting recognition

Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.

Updated: 2024-04-10 06:30:33

标题: 一个关于深度学习技术及其在手写识别中的应用范围的综述

摘要: 深度学习是一类机器学习算法，具有将原始输入转化为中间特征层的能力。这些深度学习算法在不同领域展现出了优异的成果。特别是在计算机视觉和模式识别领域，深度学习取得了人类水平表现的巨大成就。为了在不同领域实现最先进的性能，深度学习采用了不同的架构，并利用激活函数在隐藏层和输出层之间执行各种计算。本文对手写识别领域中深度学习的现有研究进行了调查。尽管最近的进展表明，深度学习方法在加速或提供准确的手写识别结果方面提供了有价值的手段，但根据广泛的文献调查，本研究发现深度学习仍需要在这一领域实现更多颠覆性进展，并解决许多最迫切的挑战，但在之前的技术水平上已取得了有希望的进展。此外，在这一领域中缺乏标记数据用于训练也带来了问题。然而，当前的手写识别调查预见深度学习能够在实验室和床边带来变革，有潜力转变多个领域，如图像处理、语音识别、计算机视觉、机器翻译、机器人技术和控制、医学成像、医学信息处理、生物信息学、自然语言处理、网络安全等。

更新时间: 2024-04-10 06:30:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08011v1

Unsupervised Learning for Solving the Travelling Salesman Problem

We propose UTSP, an unsupervised learning (UL) framework for solving the Travelling Salesman Problem (TSP). We train a Graph Neural Network (GNN) using a surrogate loss. The GNN outputs a heat map representing the probability for each edge to be part of the optimal path. We then apply local search to generate our final prediction based on the heat map. Our loss function consists of two parts: one pushes the model to find the shortest path and the other serves as a surrogate for the constraint that the route should form a Hamiltonian Cycle. Experimental results show that UTSP outperforms the existing data-driven TSP heuristics. Our approach is parameter efficient as well as data efficient: the model takes $\sim$ 10\% of the number of parameters and $\sim$ 0.2\% of training samples compared with reinforcement learning or supervised learning methods.

Updated: 2024-04-10 05:59:10

标题: 无监督学习解决旅行商问题

摘要: 我们提出了UTSP，一个用于解决旅行商问题（TSP）的无监督学习（UL）框架。我们使用替代损失训练一个图神经网络（GNN）。GNN输出一个热图，表示每条边成为最优路径的概率。然后我们应用局部搜索基于热图生成最终预测。我们的损失函数包括两部分：一部分推动模型找到最短路径，另一部分作为路径应形成哈密顿回路的约束的替代。实验结果表明UTSP优于现有的数据驱动TSP启发式算法。我们的方法参数效率高且数据效率高：与强化学习或监督学习方法相比，模型的参数数量约为现有方法的10％，训练样本数量约为现有方法的0.2％。

更新时间: 2024-04-10 05:59:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2303.10538v2

DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space

In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem, but the diversity is limited. Recently, diffusion models have made breakthroughs in computer vision, and some attempts have been made in natural language processing. In this paper, we propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model. In our approach, we introduce continuous latent variables into the diffusion model. The problem of using latent variables in the dialog task is how to build both an effective prior of the latent space and an inferring process to obtain the proper latent given the context. By combining the encoder and latent-based diffusion model, we encode the response's latent representation in a continuous space as the prior, instead of fixed Gaussian distribution or simply discrete ones. We then infer the latent by denoising step by step with the diffusion model. The experimental results show that our model greatly enhances the diversity of dialog responses while maintaining coherence. Furthermore, in further analysis, we find that our diffusion model achieves high inference efficiency, which is the main challenge of applying diffusion models in natural language processing.

Updated: 2024-04-10 05:56:46

标题: DiffusionDialog：一种基于潜空间的多样对话生成扩散模型

摘要: 在现实对话中，内容是多样化的，存在着需要多样化生成的一对多问题。先前的研究尝试引入离散或基于高斯的连续潜在变量来解决一对多问题，但多样性有限。最近，扩散模型在计算机视觉领域取得了突破，一些尝试也在自然语言处理领域进行。在本文中，我们提出了DiffusionDialog，一种利用扩散模型增强对话生成多样性的新方法。在我们的方法中，我们将连续潜在变量引入扩散模型中。在对话任务中使用潜在变量的问题是如何构建潜在空间的有效先验和获取给定上下文的正确潜在的推理过程。通过结合编码器和基于潜在变量的扩散模型，我们在连续空间中编码响应的潜在表示作为先验，而不是固定的高斯分布或简单的离散分布。然后，我们通过扩散模型逐步去噪来推断潜在变量。实验结果显示，我们的模型大大增强了对话回复的多样性，同时保持了连贯性。此外，在进一步分析中，我们发现我们的扩散模型实现了高推理效率，这是将扩散模型应用于自然语言处理的主要挑战。

更新时间: 2024-04-10 05:56:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.06760v1

Language Generation in the Limit

Although current large language models are complex, the most basic specifications of the underlying language generation problem itself are simple to state: given a finite set of training samples from an unknown language, produce valid new strings from the language that don't already appear in the training data. Here we ask what we can conclude about language generation using only this specification, without further assumptions. In particular, suppose that an adversary enumerates the strings of an unknown target language L that is known only to come from one of a possibly infinite list of candidates. A computational agent is trying to learn to generate from this language; we say that the agent generates from L in the limit if after some finite point in the enumeration of L, the agent is able to produce new elements that come exclusively from L and that have not yet been presented by the adversary. Our main result is that there is an agent that is able to generate in the limit for every countable list of candidate languages. This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning where the goal is to identify an unknown language from samples; the difference between these results suggests that identifying a language is a fundamentally different problem than generating from it.

Updated: 2024-04-10 05:53:25

标题: 语言生成的极限

摘要: 尽管当前的大型语言模型很复杂，但基础的语言生成问题本身的规范非常简单：给定一个未知语言的有限训练样本集，生成该语言中尚未出现在训练数据中的有效新字符串。在这里，我们探讨了在仅有这一规范的情况下，我们可以得出关于语言生成的什么结论，而不需要进一步的假设。特别是，假设一个对手枚举了一个未知目标语言L的字符串，该语言只知道来自可能无限列表中的一个候选项。一个计算代理正在尝试学习从这种语言生成；我们说该代理在极限情况下生成自L，如果在L的枚举的某个有限点之后，代理能够产生来自L的全新元素，并且这些元素还未被对手呈现。我们的主要结果是，有一种代理能够在极限情况下为每一个可数的候选语言列表进行生成。这与Gold和Angluin在一个广泛研究的语言学习模型中所取得的负面结果形成了鲜明对比，在该模型中的目标是从样本中识别出一个未知语言；这些结果之间的差异表明，识别一种语言与从中生成一种语言是根本不同的问题。

更新时间: 2024-04-10 05:53:25

领域: cs.DS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.06757v1

CrimeAlarm: Towards Intensive Intent Dynamics in Fine-grained Crime Prediction

Granularity and accuracy are two crucial factors for crime event prediction. Within fine-grained event classification, multiple criminal intents may alternately exhibit in preceding sequential events, and progress differently in next. Such intensive intent dynamics makes training models hard to capture unobserved intents, and thus leads to sub-optimal generalization performance, especially in the intertwining of numerous potential events. To capture comprehensive criminal intents, this paper proposes a fine-grained sequential crime prediction framework, CrimeAlarm, that equips with a novel mutual distillation strategy inspired by curriculum learning. During the early training phase, spot-shared criminal intents are captured through high-confidence sequence samples. In the later phase, spot-specific intents are gradually learned by increasing the contribution of low-confidence sequences. Meanwhile, the output probability distributions are reciprocally learned between prediction networks to model unobserved criminal intents. Extensive experiments show that CrimeAlarm outperforms state-of-the-art methods in terms of NDCG@5, with improvements of 4.51% for the NYC16 and 7.73% for the CHI18 in accuracy measures.

Updated: 2024-04-10 05:44:28

标题: 犯罪警报：朝向细粒度犯罪预测中的密集意图动态

摘要: 粒度和准确性是犯罪事件预测的两个关键因素。在细粒度事件分类中，多种犯罪意图可能会在前序事件中交替展现，并在接下来以不同方式发展。这种密集的意图动态使训练模型难以捕捉未观察到的意图，从而导致次优的泛化性能，特别是在众多潜在事件的交织中。为了捕捉全面的犯罪意图，本文提出了一种细粒度序列犯罪预测框架CrimeAlarm，该框架装备有一种受课程学习启发的新颖互相蒸馏策略。在早期训练阶段，通过高置信度序列样本捕捉共享的犯罪意图。在后期阶段，通过增加低置信度序列的贡献逐渐学习特定的意图。同时，预测网络之间的输出概率分布也会相互学习，以建模未观察到的犯罪意图。大量实验表明，在NDCG@5方面，CrimeAlarm在准确度度量上优于最先进的方法，对于NYC16和CHI18分别提高了4.51%和7.73%。

更新时间: 2024-04-10 05:44:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.06756v1

BONES: Near-Optimal Neural-Enhanced Video Streaming

Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Comprehensive experimental results indicate that BONES increases QoE by 5\% to 20\% over state-of-the-art algorithms with minimal overhead. Our code is available at https://github.com/UMass-LIDS/bones.

Updated: 2024-04-10 05:39:23

标题: 骨干：近乎最佳的神经增强视频流传输

摘要: 访问高质量视频内容可能会面临网络带宽不足和不稳定的挑战。最近神经增强技术的进步通过深度学习显示出改善降质视频质量的成果。神经增强流媒体（NES）将这种新方法融入视频流媒体中，允许用户下载低质量视频片段，然后增强它们以获取高质量内容，而不会破坏视频流的播放。我们介绍了BONES，一个NES控制算法，它共同管理网络和计算资源，以最大化用户的体验质量（QoE）。BONES将NES作为Lyapunov优化问题，以接近最优性能的在线方式解决，使其成为第一个提供理论性能保证的NES算法。全面的实验结果表明，BONES相对于最先进的算法可将QoE提高5\%至20%，而开销则很小。我们的代码可在https://github.com/UMass-LIDS/bones 上找到。

更新时间: 2024-04-10 05:39:23

领域: eess.SY,cs.LG,cs.NI,cs.SY

下载: http://arxiv.org/abs/2310.09920v2

Zero-Shot Clinical Trial Patient Matching with LLMs

Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.

Updated: 2024-04-10 05:37:26

标题: 零样本临床试验患者匹配与LLMs

摘要: 将患者与临床试验匹配是将新药物推向市场的一个关键未解决挑战。如今，识别符合试验的资格标准的患者是高度手动化的，每名患者需要花费长达1小时的时间。然而，自动化筛选具有挑战性，因为它需要理解非结构化的临床文本。大型语言模型（LLMs）提供了一个有前途的解决方案。在这项工作中，我们探讨了它们在试验匹配中的应用。首先，我们设计了一个基于LLM的系统，通过将患者的医疗史作为非结构化的临床文本，评估该患者是否符合一组包含标准（也以自由文本形式指定）。我们的零-shot系统在n2c2 2018队列选择基准测试中取得了最先进的成绩。其次，我们通过确定一个提示策略，使我们的方法的数据和成本效率得到提高，比现状快一个数量级，并且更便宜，并开发了一个两阶段检索管道，可以将处理的令牌数量减少到三分之一，同时保持高性能。第三，我们通过让临床医生评估LLM为每个符合资格决定生成的自然语言理由来评估我们系统的可解释性，并且表明它可以为其97%的正确决定和75%的错误决定输出连贯的解释。我们的结果确立了使用LLMs加速临床试验操作的可行性。

更新时间: 2024-04-10 05:37:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.05125v3

Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents

Some have criticised Generative AI Systems for replicating the familiar pathologies of already widely-deployed AI systems. Other critics highlight how they foreshadow vastly more powerful future systems, which might threaten humanity's survival. The first group says there is nothing new here; the other looks through the present to a perhaps distant horizon. In this paper, I instead pay attention to what makes these particular systems distinctive: both their remarkable scientific achievement, and the most likely and consequential ways in which they will change society over the next five to ten years. In particular, I explore the potential societal impacts and normative questions raised by the looming prospect of 'Generative Agents', in which multimodal large language models (LLMs) form the executive centre of complex, tool-using AI systems that can take unsupervised sequences of actions towards some goal.

Updated: 2024-04-10 05:34:07

标题: 前沿人工智能伦理：预测和评估生成型代理的社会影响

摘要: 一些人批评生成式人工智能系统复制了已广泛部署的人工智能系统的熟知病态。其他批评者强调它们预示着更加强大的未来系统，可能会威胁人类的生存。第一组人说这里没有什么新鲜事；另一组人则透过现在看向可能遥远的地平线。在本文中，我反而关注这些特定系统的独特之处：它们卓越的科学成就，以及它们在未来五到十年内将如何改变社会的最可能和最重要的方式。特别地，我探讨了“生成代理”的潜在社会影响和规范问题，其中多模态大型语言模型（LLM）形成复杂的、使用工具的人工智能系统的执行中心，可以未经监督地采取一系列行动朝向某个目标。

更新时间: 2024-04-10 05:34:07

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2404.06750v1

CGNSDE: Conditional Gaussian Neural Stochastic Differential Equation for Modeling Complex Systems and Data Assimilation

A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both forward prediction tasks and inverse state estimation problems. The CGNSDE starts by exploiting a systematic causal inference via information theory to build a simple knowledge-based nonlinear model that nevertheless captures as much explainable physics as possible. Then, neural networks are supplemented to the knowledge-based model in a specific way, which not only characterizes the remaining features that are challenging to model with simple forms but also advances the use of analytic formulae to efficiently compute the nonlinear DA solution. These analytic formulae are used as an additional computationally affordable loss to train the neural networks that directly improve the DA accuracy. This DA loss function promotes the CGNSDE to capture the interactions between state variables and thus advances its modeling skills. With the DA loss, the CGNSDE is more capable of estimating extreme events and quantifying the associated uncertainty. Furthermore, crucial physical properties in many complex systems, such as the translate-invariant local dependence of state variables, can significantly simplify the neural network structures and facilitate the CGNSDE to be applied to high-dimensional systems. Numerical experiments based on chaotic systems with intermittency and strong non-Gaussian features indicate that the CGNSDE outperforms knowledge-based regression models, and the DA loss further enhances the modeling skills of the CGNSDE.

Updated: 2024-04-10 05:32:03

标题: CGNSDE：用于建模复杂系统和数据同化的条件高斯神经随机微分方程

摘要: 一种名为条件高斯神经随机微分方程（CGNSDE）的新型基于知识和机器学习混合建模方法被开发出来，以促进对复杂动态系统的建模，并实现相关数据同化（DA）的解析公式。与标准神经网络预测模型相比，CGNSDE旨在有效地处理前向预测任务和逆状态估计问题。CGNSDE首先利用信息理论进行系统因果推断，建立一个简单的基于知识的非线性模型，尽可能捕捉尽可能多的可解释物理现象。然后，神经网络以一种特定方式补充到基于知识的模型中，不仅表征了难以用简单形式建模的剩余特征，还推进了使用解析公式来高效计算非线性DA解决方案。这些解析公式被用作额外的计算成本可承受的损失，用于训练直接提高DA准确性的神经网络。这种DA损失函数促使CGNSDE捕捉状态变量之间的相互作用，从而提升了其建模技能。通过DA损失，CGNSDE更能够估计极端事件并量化相关不确定性。此外，许多复杂系统中的关键物理特性，如状态变量的平移不变局部依赖性，可以显著简化神经网络结构，并促进CGNSDE应用于高维系统。基于具有间歇性和强非高斯特征的混沌系统的数值实验表明，CGNSDE优于基于知识的回归模型，而DA损失进一步增强了CGNSDE的建模技能。

更新时间: 2024-04-10 05:32:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.06749v1

Disguised Copyright Infringement of Latent Diffusion Model

Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access.

Updated: 2024-04-10 04:55:57

标题: 潜在扩散模型的隐蔽版权侵权

摘要: 版权侵权可能发生在一个生成模型生成与在训练阶段访问过的一些受版权保护数据极为相似的样本时。访问的概念通常指的是将受版权保护的样本直接包含在训练数据集中，这样可以检查以识别侵权行为。我们认为，这种视觉审计很大程度上忽视了隐藏的版权侵权行为，即构建一个看起来与受版权保护的样本大相径庭但仍会导致在其上训练潜在扩散模型的效果的伪装。这种伪装只需要间接访问受版权保护的材料，且无法在视觉上加以区分，因此很容易规避当前的审计工具。在本文中，我们通过揭示伪装生成算法、伪装的揭示以及重要的是如何检测它们来更好地理解这种伪装的版权侵权行为，以增强现有工具箱。此外，我们引入了一个更广泛的承认概念，以理解这种间接访问。

更新时间: 2024-04-10 04:55:57

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.06737v1

Autonomous Evaluation and Refinement of Digital Agents

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade off between inference cost, modularity of design, and accuracy. We validate the performance of these models in several popular benchmarks for digital agents, finding between 74.4 and 92.9% agreement with oracle evaluation metrics. Finally, we use these evaluators to improve the performance of existing agents via fine-tuning and inference-time guidance. Without any additional supervision, we improve state-of-the-art performance by 29% on the popular benchmark WebArena, and achieve a 75% relative improvement in a challenging domain transfer scenario.

Updated: 2024-04-10 04:55:54

标题: 数字代理的自主评估和改进

摘要: 我们展示了通用领域自动评估器可以显著提高网络导航和设备控制代理的性能。我们尝试了多种评估模型，权衡推理成本、设计模块化和准确性。我们验证了这些模型在数字代理的几个流行基准测试中的性能，在oracle评估指标上达到74.4%至92.9%的一致性。最后，我们利用这些评估器通过微调和推理时间指导来提高现有代理的性能。在没有任何额外监督的情况下，在流行基准测试WebArena上提高了29%的最新性能，并在具有挑战性的领域转移场景中实现了75%的相对改进。

更新时间: 2024-04-10 04:55:54

领域: cs.AI

下载: http://arxiv.org/abs/2404.06474v2

Turning Noises to Fingerprint-Free "Credentials": Secure and Usable Drone Authentication

Drones have been widely used in various services, such as delivery and surveillance. Authentication forms the foundation of the security of these services. However, drones are expensive and may carry important payloads. To avoid being captured by attackers, drones should keep a safe distance from the verifier before authentication succeeds. This makes authentication methods that only work in very close proximity not applicable. Our work leverages drone noises for authentication. While using sounds for authentication is highly usable, how to handle various attacks that manipulate sounds is an \emph{unresolved challenge}. It is also unclear how to ensure robustness under various environmental sounds. Being the first in the literature, we address the two major challenges by exploiting unique characteristics of drone noises. We thereby build an authentication system that does \emph{not} rely on any drone sound fingerprints, keeps resilient to attacks, and is robust under environmental sounds. An extensive evaluation demonstrates its security and usability.

Updated: 2024-04-10 04:54:15

标题: 将噪音转化为无指纹“ 凭证”：安全且可用的无人机认证

摘要: 无人机已被广泛用于各种服务，如快递和监视。身份验证是这些服务安全性的基础。然而，无人机价格昂贵，可能携带重要的载荷。为了避免被攻击者捕获，无人机在身份验证成功之前应与验证器保持安全距离。这使得仅在非常近距离内工作的身份验证方法不适用。我们的工作利用无人机噪音进行身份验证。虽然使用声音进行身份验证非常实用，但如何处理操纵声音的各种攻击是一个未解决的挑战。在各种环境声音下如何确保鲁棒性也不清楚。作为文献中的第一个，我们利用无人机噪音的独特特征来解决这两个主要挑战。因此，我们建立了一个身份验证系统，不依赖于任何无人机声音指纹，对攻击具有韧性，并在环境声音下具有鲁棒性。广泛的评估证明了其安全性和可用性。

更新时间: 2024-04-10 04:54:15

领域: cs.CR

下载: http://arxiv.org/abs/2302.09197v2

Discovering Closed-Loop Failures of Vision-Based Controllers via Reachability Analysis

Machine learning driven image-based controllers allow robotic systems to take intelligent actions based on the visual feedback from their environment. Understanding when these controllers might lead to system safety violations is important for their integration in safety-critical applications and engineering corrective safety measures for the system. Existing methods leverage simulation-based testing (or falsification) to find the failures of vision-based controllers, i.e., the visual inputs that lead to closed-loop safety violations. However, these techniques do not scale well to the scenarios involving high-dimensional and complex visual inputs, such as RGB images. In this work, we cast the problem of finding closed-loop vision failures as a Hamilton-Jacobi (HJ) reachability problem. Our approach blends simulation-based analysis with HJ reachability methods to compute an approximation of the backward reachable tube (BRT) of the system, i.e., the set of unsafe states for the system under vision-based controllers. Utilizing the BRT, we can tractably and systematically find the system states and corresponding visual inputs that lead to closed-loop failures. These visual inputs can be subsequently analyzed to find the input characteristics that might have caused the failure. Besides its scalability to high-dimensional visual inputs, an explicit computation of BRT allows the proposed approach to capture non-trivial system failures that are difficult to expose via random simulations. We demonstrate our framework on two case studies involving an RGB image-based neural network controller for (a) autonomous indoor navigation, and (b) autonomous aircraft taxiing.

Updated: 2024-04-10 04:51:33

标题: 通过可达性分析发现基于视觉的控制器的闭环故障

摘要: 机器学习驱动的基于图像的控制器使机器人系统能够根据环境的视觉反馈采取智能行动。了解这些控制器可能导致系统安全违规的时间对于将它们整合到安全关键应用程序中并为系统设计纠正安全措施至关重要。现有方法利用基于模拟的测试（或证伪）来发现基于视觉的控制器的故障，即导致闭环安全违规的视觉输入。然而，这些技术在涉及高维度和复杂视觉输入（如RGB图像）的情况下不易扩展。在这项工作中，我们将找到闭环视觉故障的问题形式化为汉密尔顿-雅可比（HJ）可达性问题。我们的方法将基于模拟的分析与HJ可达性方法相结合，计算系统的后向可达管（BRT）的近似，即在基于视觉的控制器下系统的不安全状态集。利用BRT，我们可以可追踪地和系统地找到导致闭环故障的系统状态和相应的视觉输入。随后可以分析这些视觉输入，以找出可能导致故障的输入特征。除了其对高维度视觉输入的可伸缩性外，BRT的显式计算使所提出的方法能够捕捉通过随机模拟难以暴露的非平凡系统故障。我们在两个案例研究中展示了我们的框架，涉及基于RGB图像的神经网络控制器用于（a）自主室内导航和（b）自主飞机滑行。

更新时间: 2024-04-10 04:51:33

领域: cs.RO,cs.AI,cs.CV,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2211.02736v4

A Copula Graphical Model for Multi-Attribute Data using Optimal Transport

Motivated by modern data forms such as images and multi-view data, the multi-attribute graphical model aims to explore the conditional independence structure among vectors. Under the Gaussian assumption, the conditional independence between vectors is characterized by blockwise zeros in the precision matrix. To relax the restrictive Gaussian assumption, in this paper, we introduce a novel semiparametric multi-attribute graphical model based on a new copula named Cyclically Monotone Copula. This new copula treats the distribution of the node vectors as multivariate marginals and transforms them into Gaussian distributions based on the optimal transport theory. Since the model allows the node vectors to have arbitrary continuous distributions, it is more flexible than the classical Gaussian copula method that performs coordinatewise Gaussianization. We establish the concentration inequalities of the estimated covariance matrices and provide sufficient conditions for selection consistency of the group graphical lasso estimator. For the setting with high-dimensional attributes, a {Projected Cyclically Monotone Copula} model is proposed to address the curse of dimensionality issue that arises from solving high-dimensional optimal transport problems. Numerical results based on synthetic and real data show the efficiency and flexibility of our methods.

Updated: 2024-04-10 04:49:00

标题: 一个用于多属性数据的Copula图模型，使用最优运输

摘要: 受现代数据形式如图像和多视角数据的启发，多属性图模型旨在探索向量之间的条件独立结构。在高斯假设下，向量之间的条件独立性由精度矩阵中的分块零来表征。为了放宽严格的高斯假设，在本文中，我们引入了一种基于新的Cyclically Monotone Copula命名的新半参数多属性图模型。这种新的copula将节点向量的分布视为多元边际，并基于最优输运理论将它们转换为高斯分布。由于该模型允许节点向量具有任意连续分布，因此比执行坐标方向高斯化的经典高斯copula方法更灵活。我们建立了估计协方差矩阵的集中不等式，并为群组图lasso估计器的选择一致性提供了充分条件。对于具有高维属性的设置，提出了一种{投影Cyclically Monotone Copula}模型，以解决解决高维最优输运问题引起的维数灾难问题。基于合成和真实数据的数值结果显示了我们方法的效率和灵活性。

更新时间: 2024-04-10 04:49:00

领域: stat.ML,cs.LG,math.ST,stat.AP,stat.ME,stat.TH

下载: http://arxiv.org/abs/2404.06735v1

Incremental XAI: Memorable Understanding of AI with Incremental Explanations

Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.

Updated: 2024-04-10 04:38:17

标题: 递增式XAI：通过递增式解释实现对人工智能的深入理解

摘要: 许多可解释人工智能（XAI）技术旨在通过提供简明突出的信息（如稀疏线性因子）来实现可解释性。然而，用户要么只看到不准确的全局解释，要么看到高度变化的局部解释。我们提出通过利用人类认知能力逐渐接收更多细节来提供更详细的解释。专注于线性因子解释（因子×值=结果），我们引入增量XAI，通过提供基础+增量因子自动对一般和非典型实例进行解释分区，帮助用户阅读和记忆更忠实的解释。通过重复使用基础因子并减少在非典型情况下显示的因子数量，提高了记忆性。在建模、形成和总结用户研究中，我们评估了增量XAI相对于基线解释方法的忠实性、记忆性和可理解性。这项工作有助于提供更易用的解释，使用户更好地融入，促进与人工智能的直观互动。

更新时间: 2024-04-10 04:38:17

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.06733v1

Enhancing Safety in Mixed Traffic: Learning-Based Modeling and Efficient Control of Autonomous and Human-Driven Vehicles

With the increasing presence of autonomous vehicles (AVs) on public roads, developing robust control strategies to navigate the uncertainty of human-driven vehicles (HVs) is crucial. This paper introduces an advanced method for modeling HV behavior, combining a first-principles model with Gaussian process (GP) learning to enhance velocity prediction accuracy and provide a measurable uncertainty. We validated this innovative HV model using real-world data from field experiments and applied it to develop a GP-enhanced model predictive control (GP-MPC) strategy. This strategy aims to improve safety in mixed vehicle platoons by integrating uncertainty assessment into distance constraints. Comparative simulation studies with a conventional model predictive control (MPC) approach demonstrated that our GP-MPC strategy ensures more reliable safe distancing and fosters efficient vehicular dynamics, achieving notably higher speeds within the platoon. By incorporating a sparse GP technique in HV modeling and adopting a dynamic GP prediction within the MPC framework, we significantly reduced the computation time of GP-MPC, marking it only 4.6% higher than that of the conventional MPC. This represents a substantial improvement, making the process about 100 times faster than our preliminary work without these approximations. Our findings underscore the effectiveness of learning-based HV modeling in enhancing both safety and operational efficiency in mixed-traffic environments, paving the way for more harmonious AV-HV interactions.

Updated: 2024-04-10 04:36:24

标题: 提高混合交通的安全性：基于学习的建模和高效控制自动驾驶和人驾驶车辆

摘要: 随着自动驾驶汽车（AVs）在公共道路上的增加，开发稳健的控制策略以应对人驾驶汽车（HVs）的不确定性至关重要。本文介绍了一种先进的方法来建模HV行为，将基本原理模型与高斯过程（GP）学习相结合，以提高速度预测准确性并提供可测量的不确定性。我们利用实地实验的真实数据验证了这一创新的HV模型，并将其应用于开发GP增强模型预测控制（GP-MPC）策略。该策略旨在通过将不确定性评估整合到距离约束中，提高混合车辆编队的安全性。与传统的模型预测控制（MPC）方法进行的比较模拟研究表明，我们的GP-MPC策略确保更可靠的安全距离，并促进车辆动力学的高效性，在编队内获得明显更高的速度。通过在HV建模中采用稀疏GP技术，并在MPC框架内采用动态GP预测，我们显著减少了GP-MPC的计算时间，使其仅比传统MPC高出4.6％。这代表了一个重大改进，使该过程比我们之前的工作快约100倍。我们的发现强调了基于学习的HV建模在提高混合交通环境中的安全性和运营效率方面的有效性，为更加和谐的AV-HV互动铺平了道路。

更新时间: 2024-04-10 04:36:24

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.06732v1

Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination

Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available service built upon an LLM, for sentiment analysis to discern different stances toward Human Papillomavirus (HPV) vaccination. Methods. Messages related to HPV vaccination were collected from social media supporting different message formats: Facebook (long format) and Twitter (short format). A selection of 1,000 human-evaluated messages was input into the LLM, which generated multiple response instances containing its classification results. Accuracy was measured for each message as the level of concurrence between human and machine decisions, ranging between 0 and 1. Results. Average accuracy was notably high when 20 response instances were used to determine the machine decision of each message: .882 (SE = .021) and .750 (SE = .029) for anti- and pro-vaccination long-form; .773 (SE = .027) and .723 (SE = .029) for anti- and pro-vaccination short-form, respectively. Using only three or even one instance did not lead to a severe decrease in accuracy. However, for long-form messages, the language model exhibited significantly lower accuracy in categorizing pro-vaccination messages than anti-vaccination ones. Conclusions. ChatGPT shows potential in analyzing public opinions on HPV vaccination using social media content. However, understanding the characteristics and limitations of a language model within specific public health contexts remains imperative.

Updated: 2024-04-10 04:35:54

标题: 一个大型语言模型在社交媒体上区分反疫苗和赞成疫苗信息的准确性：以人乳头瘤病毒疫苗为例

摘要: 目标。疫苗接种引发了各种公众意见，社交媒体作为健康讨论的重要平台。人工智能技术的出现，如大型语言模型（LLMs），为高效调查公众话语提供了新机会。本研究评估了ChatGPT的准确性，这是一个广泛使用且免费的基于LLM构建的服务，用于情感分析以识别对人乳头瘤病毒（HPV）疫苗持不同立场的人。方法。收集了与HPV疫苗有关的消息来自支持不同消息格式的社交媒体：Facebook（长格式）和Twitter（短格式）。将1000条人工评估的消息输入LLM，生成包含其分类结果的多个响应实例。对每条消息的准确性进行测量，即人类和机器决策之间的一致程度，范围为0到1。结果。当使用20个响应实例来确定每条消息的机器决策时，平均准确性显著高：反疫苗长格式为.882（SE = .021）和.750（SE = .029）；赞成疫苗长格式为.773（SE = .027）和.723（SE = .029）；反疫苗短格式为.773（SE = .027）和.723（SE = .029）。使用仅三个甚至一个实例并没有导致准确性严重下降。然而，对于长格式消息，语言模型在将赞成疫苗消息分类时的准确性明显低于反对疫苗消息。结论。ChatGPT在使用社交媒体内容分析公众对HPV疫苗的意见方面显示出潜力。然而，在特定公共卫生背景下理解语言模型的特征和局限性仍然至关重要。

更新时间: 2024-04-10 04:35:54

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2404.06731v1

SoK: Trusting Self-Sovereign Identity

Digital identity is evolving from centralized systems to a decentralized approach known as Self-Sovereign Identity (SSI). SSI empowers individuals to control their digital identities, eliminating reliance on third-party data custodians and reducing the risk of data breaches. However, the concept of trust in SSI remains complex and fragmented. This paper systematically analyzes trust in SSI in light of its components and threats posed by various actors in the system. As a result, we derive three distinct trust models that capture the threats and mitigations identified across SSI literature and implementations. Our work provides a foundational framework for future SSI research and development, including a comprehensive catalogue of SSI components and design requirements for trust, shortcomings in existing SSI systems and areas for further exploration.

Updated: 2024-04-10 04:28:50

标题: SoK：信任自主身份

摘要: 数字身份正在从集中式系统发展到一种去中心化的方法，即自主身份（SSI）。SSI赋予个人控制他们的数字身份的权力，消除了对第三方数据保管者的依赖，并减少了数据泄露的风险。然而，在SSI中信任的概念仍然复杂且分散。本文系统地分析了SSI中的信任，考虑了其组成部分以及系统中各种参与者所构成的威胁。因此，我们得出了三种不同的信任模型，捕捉了SSI文献和实施中识别的威胁和缓解措施。我们的工作为未来SSI研究和开发提供了一个基础框架，包括SSI组件和设计要求的全面目录，现有SSI系统的不足之处以及进一步探索的领域。

更新时间: 2024-04-10 04:28:50

领域: cs.CR

下载: http://arxiv.org/abs/2404.06729v1

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

Updated: 2024-04-10 04:19:59

标题: 全球对比式训练用于带有语言监督的多模态电子健康记录

摘要: 现代电子健康记录（EHRs）在通过序贯深度学习跟踪个性化患者健康轨迹方面具有巨大潜力，这要归功于其广泛的广度、规模和时间粒度。然而，如何有效地利用EHRs中的多种模态面临着重大挑战，因为其复杂特性，如高维度、多模态、稀疏性、不同的记录频率和时间不规则性。为此，本文介绍了一种新颖的多模态对比学习框架，特别关注医学时间序列和临床记录。为了解决医学时间序列中的稀疏性和不规则时间间隔的挑战，该框架将时间交叉关注变换器与动态嵌入和标记化方案相结合，用于学习多模态特征表示。为了利用医学时间序列和临床记录之间的相互关系，该框架配备了全局对比损失，将患者的多模态特征表示与相应的出院总结进行对齐。由于出院总结独特地与个体患者相关，并代表患者住院期间的整体视图，机器学习模型被引导通过全局对比学习来学习有区别的多模态特征。通过对一个真实世界的EHR数据集进行广泛实验，我们证明了我们的框架在使用来自UF健康系统三家医院（UF健康盖恩斯维尔、UF健康杰克逊维尔和UF健康杰克逊维尔北部）的多模态数据预测超过120,000例重大住院手术中九种术后并发症发生的示范任务上胜过了最先进的方法。

更新时间: 2024-04-10 04:19:59

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.06723v1

Poisoning Prevention in Federated Learning and Differential Privacy via Stateful Proofs of Execution

The rise in IoT-driven distributed data analytics, coupled with increasing privacy concerns, has led to a demand for effective privacy-preserving and federated data collection/model training mechanisms. In response, approaches such as Federated Learning (FL) and Local Differential Privacy (LDP) have been proposed and attracted much attention over the past few years. However, they still share the common limitation of being vulnerable to poisoning attacks wherein adversaries compromising edge devices feed forged (a.k.a. poisoned) data to aggregation back-ends, undermining the integrity of FL/LDP results. In this work, we propose a system-level approach to remedy this issue based on a novel security notion of Proofs of Stateful Execution (PoSX) for IoT/embedded devices' software. To realize the PoSX concept, we design SLAPP: a System-Level Approach for Poisoning Prevention. SLAPP leverages commodity security features of embedded devices - in particular ARM TrustZoneM security extensions - to verifiably bind raw sensed data to their correct usage as part of FL/LDP edge device routines. As a consequence, it offers robust security guarantees against poisoning. Our evaluation, based on real-world prototypes featuring multiple cryptographic primitives and data collection schemes, showcases SLAPP's security and low overhead.

Updated: 2024-04-10 04:18:26

标题: 通过有状态的执行证明实现联邦学习和差分隐私中的中毒预防

摘要: 物联网驱动的分布式数据分析的增长，加上日益增长的隐私关注，已经导致对有效的隐私保护和联邦数据收集/模型训练机制的需求。作为回应，近年来提出了联邦学习（FL）和本地差分隐私（LDP）等方法，并引起了广泛关注。然而，它们仍然共享一个常见的限制，即容易受到毒化攻击的影响，即对手损害边缘设备，向聚合后端输入伪造（又名毒化）数据，破坏FL/LDP结果的完整性。在这项工作中，我们提出了一种基于IoT/嵌入式设备软件的Proofs of Stateful Execution（PoSX）安全概念的系统级方法，以解决这个问题。为了实现PoSX概念，我们设计了SLAPP：一种用于毒化预防的系统级方法。SLAPP利用嵌入式设备的通用安全功能 - 特别是ARM TrustZoneM安全扩展 - 将原始感知数据可靠地绑定到它们作为FL/LDP边缘设备例程的一部分的正确用法。因此，它提供了针对毒化的强大安全保证。我们的评估基于真实世界原型，采用多个加密原语和数据收集方案，展示了SLAPP的安全性和低开销。

更新时间: 2024-04-10 04:18:26

领域: cs.CR

下载: http://arxiv.org/abs/2404.06721v1

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $\epsilon>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $\epsilon \geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+\delta}$ bits of memory or must make at least $1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)})$ oracle queries, for any $\delta\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+\delta}$ memory or make at least $1/(d^{2\delta} \epsilon^{2(1-4\delta)-o(1)})$ queries for any $\delta\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/\epsilon)$ but makes $\Omega(1/\epsilon^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/\epsilon$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/\epsilon)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/\epsilon)$ queries.

Updated: 2024-04-10 04:15:50

标题: 梯度下降在可行性问题的Oracle复杂性和内存权衡中是帕累托最优的。

摘要: 在本文中，我们提供了使用具有访问分离oracle的内存受限算法找到给定集合中的一个点的oracle复杂性下界。我们假设该集合包含在单位$d$维球内，并且包含一个已知半径为$\epsilon>0$的球。这种设置通常被称为可行性问题。我们表明，要解决精度为$\epsilon \geq e^{-d^{o(1)}}$的可行性问题，任何确定性算法要么使用$d^{1+\delta}$比特的内存，要么必须至少进行$1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)})$次oracle查询，对于任意$\delta\in[0,1]$。此外，我们表明随机算法要么使用$d^{1+\delta}$内存，要么至少进行$1/(d^{2\delta} \epsilon^{2(1-4\delta)-o(1)})$次查询，对于任意$\delta\in[0,\frac{1}{4}]$。由于梯度下降算法仅使用线性内存$\mathcal O(d\ln 1/\epsilon)$但进行$\Omega(1/\epsilon^2)$次查询，我们的结果意味着它在oracle复杂性/内存权衡中是帕累托最优的。此外，我们的结果表明，如果算法在$d$中具有不到二次的内存，则确定性算法的oracle复杂性始终是$1/\epsilon$的多项式。这揭示了一个明显的相变，因为具有二次$\mathcal O(d^2 \ln1/\epsilon)$内存的割平面方法仅需要$\mathcal O(d\ln 1/\epsilon)$次查询。

更新时间: 2024-04-10 04:15:50

领域: math.OC,cs.CC,cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.06720v1

Ripple Knowledge Graph Convolutional Networks For Recommendation Systems

Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side and user side to enrich their representations to maximize the utilization of the abundant information in knowledge graphs. RKGCN is able to offer more personalized and relevant recommendations in three different scenarios. The experimental results show the superior effectiveness of our model over 5 baseline models on three real-world datasets including movies, books, and music.

Updated: 2024-04-10 04:09:44

标题: 涟漪知识图卷积网络用于推荐系统

摘要: 最近已经证明，使用知识图来帮助深度学习模型做出推荐决策可以有效地提高模型的可解释性和准确性。本文介绍了一种名为RKGCN的端到端深度学习模型，该模型动态分析每个用户的偏好并推荐合适的物品。它结合了物品和用户两方的知识图，丰富它们的表示以最大化知识图中丰富信息的利用。在三种不同场景下，RKGCN能够提供更加个性化和相关的推荐。实验结果显示，我们的模型在包括电影、书籍和音乐在内的三个真实世界数据集上比5个基准模型具有更高的有效性。

更新时间: 2024-04-10 04:09:44

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.01147v2

Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent

Racial diversity has become increasingly discussed within the AI and algorithmic fairness literature, yet little attention is focused on justifying the choices of racial categories and understanding how people are racialized into these chosen racial categories. Even less attention is given to how racial categories shift and how the racialization process changes depending on the context of a dataset or model. An unclear understanding of \textit{who} comprises the racial categories chosen and \textit{how} people are racialized into these categories can lead to varying interpretations of these categories. These varying interpretations can lead to harm when the understanding of racial categories and the racialization process is misaligned from the actual racialization process and racial categories used. Harm can also arise if the racialization process and racial categories used are irrelevant or do not exist in the context they are applied. In this paper, we make two contributions. First, we demonstrate how racial categories with unclear assumptions and little justification can lead to varying datasets that poorly represent groups obfuscated or unrepresented by the given racial categories and models that perform poorly on these groups. Second, we develop a framework, CIRCSheets, for documenting the choices and assumptions in choosing racial categories and the process of racialization into these categories to facilitate transparency in understanding the processes and assumptions made by dataset or model developers when selecting or using these racial categories.

Updated: 2024-04-10 04:04:05

标题: 在人工智能和算法公平性中的种族/族裔分类：为什么重要以及代表了什么

摘要: 种族多样性在人工智能和算法公平性文献中越来越受到关注，然而很少有人关注如何证明种族分类的选择，并理解人们如何被归类到这些选择的种族分类中。甚至更少的注意力被给予种族分类如何转变以及在数据集或模型的背景下种族化过程如何变化。对选择的种族分类由谁组成以及人们如何被归类到这些分类中的不清楚理解可能导致对这些分类的不同解释。这些不同的解释可能会导致伤害，当对种族分类和种族化过程的理解与实际的种族化过程和使用的种族分类不一致时。如果种族化过程和使用的种族分类在其应用的背景中是无关的或不存在，也可能会造成伤害。在本文中，我们做出了两方面贡献。首先，我们展示了如何没有明确假设和论证的种族分类可能导致数据集的变化，这些数据集对被给定的种族分类模糊或未被代表的群体表现不佳，以及模型在这些群体上表现不佳。其次，我们开发了一个名为CIRCSheets的框架，用于记录选择种族分类的选择和假设以及种族化过程到这些分类中，以促进透明度，了解数据集或模型开发者在选择或使用这些种族分类时所做的过程和假设。

更新时间: 2024-04-10 04:04:05

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2404.06717v1

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256$\times$256. Our implementation is available on https://ytakida.github.io/san.

Updated: 2024-04-10 04:03:06

标题: SAN：通过具有判别性归一化线性层诱导GAN的可测度

摘要: 生成对抗网络（GANs）通过优化生成器和判别器的最小最大目标来学习目标概率分布。本文探讨了这种优化实际上是否提供了生成器梯度，使其分布接近目标分布的问题。我们推导出了可度量条件，充分条件表明判别器可作为分布之间的距离，将GAN公式与切片最优输运概念相联系。此外，通过利用这些理论结果，我们提出了一种新颖的GAN训练方案，称为切片对抗网络（SAN）。通过简单修改，广泛类别的现有GAN可以转换为SAN。对合成和图像数据集的实验支持了我们的理论结果以及SAN相对于通常的GAN的有效性。此外，我们还将SAN应用于StyleGAN-XL，导致在ImageNet 256×256上进行类条件生成时的最先进FID分数。我们的实现可在https://ytakida.github.io/san上找到。

更新时间: 2024-04-10 04:03:06

领域: cs.LG

下载: http://arxiv.org/abs/2301.12811v4

Port Forwarding Services Are Forwarding Security Risks

We conduct the first comprehensive security study on representative port forwarding services (PFS), which emerge in recent years and make the web services deployed in internal networks available on the Internet along with better usability but less complexity compared to traditional techniques (e.g., NAT traversal techniques). Our study is made possible through a set of novel methodologies, which are designed to uncover the technical mechanisms of PFS, experiment attack scenarios for PFS protocols, automatically discover and snapshot port-forwarded websites (PFWs) at scale, and classify PFWs into well-observed categories. Leveraging these methodologies, we have observed the widespread adoption of PFS with millions of PFWs distributed across tens of thousands of ISPs worldwide. Furthermore, 32.31% PFWs have been classified into website categories that serve access to critical data or infrastructure, such as, web consoles for industrial control systems, IoT controllers, code repositories, and office automation systems. And 18.57% PFWs didn't enforce any access control for external visitors. Also identified are two types of attacks inherent in the protocols of Oray (one well-adopted PFS provider), and the notable abuse of PFSes by malicious actors in activities such as malware distribution, botnet operation and phishing.

Updated: 2024-04-10 03:53:46

标题: 端口转发服务正在转发安全风险

摘要: 我们进行了对代表性端口转发服务（PFS）的第一次全面安全研究，这些服务近年来出现并使部署在内部网络中的网络服务可以与传统技术（例如NAT穿透技术）相比，更易于使用但更简单地在互联网上提供。我们的研究得以通过一系列新颖的方法论，这些方法论旨在揭示PFS的技术机制，为PFS协议实验攻击场景，自动发现和快照端口转发的网站（PFWs）规模，并将PFWs分类为常见类别。利用这些方法，我们观察到PFS的广泛应用，数百万PFWs分布在全球数万个ISP中。此外，32.31%的PFWs已被分类为提供对关键数据或基础设施访问的网站类别，例如工业控制系统的Web控制台，物联网控制器，代码仓库和办公自动化系统。18.57%的PFWs没有强制执行外部访客的访问控制。还确定了Oray协议中固有的两种攻击类型（一个广受欢迎的PFS提供商），以及恶意行为者在恶意软件分发，僵尸网络操作和网络钓鱼等活动中对PFS的显著滥用。

更新时间: 2024-04-10 03:53:46

领域: cs.CR

下载: http://arxiv.org/abs/2403.16060v2

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

In this paper, we propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network. The first algorithm, termed Compressed Push-Pull (CPP), combines the gradient tracking Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence rate for strongly convex and smooth objective functions. The second algorithm is a broadcast-like version of CPP (B-CPP), and it also achieves linear convergence rate under the same conditions on the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.

Updated: 2024-04-10 03:50:54

标题: 压缩梯度跟踪在通用有向网络上的分散优化

摘要: 在本文中，我们提出了两种在一般有向多智能体网络上进行通信效率高的分散优化算法。第一种算法被称为压缩推拉（CPP），它将梯度跟踪推拉方法与通信压缩结合起来。我们展示了CPP适用于一般类别的无偏压缩操作符，并且对于强凸和平滑目标函数实现了线性收敛速度。第二种算法是CPP的广播版本（B-CPP），在相同的目标函数条件下，它也实现了线性收敛速度。与CPP相比，B-CPP可以应用于异步广播设置，并进一步减少通信成本。数值实验补充了理论分析，并确认了所提方法的有效性。

更新时间: 2024-04-10 03:50:54

领域: math.OC,cs.DC,cs.LG,cs.MA,eess.SP

下载: http://arxiv.org/abs/2106.07243v4

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.

Updated: 2024-04-10 03:31:32

标题: SpikeNVS：通过尖峰相机提高模糊图像的新视图合成

摘要: 使用神经场方法（如神经辐射场（NeRF）和3D高斯喷洒（3DGS））实现尖锐的新视角合成（NVS）最关键的因素之一是训练图像的质量。然而，传统的RGB相机容易受到运动模糊的影响。相比之下，类似事件和脉冲相机的神经形态相机固有地捕获更全面的时间信息，这可以提供对场景的尖锐表示作为额外的训练数据。最近的方法已经探索了整合事件相机以改善NVS质量的方法。事件-RGB方法存在一些限制，例如高训练成本和无法有效在后台运行。相反，我们的研究介绍了一种使用脉冲相机来克服这些限制的新方法。通过将脉冲流的纹理重建视为地面真值，我们设计了来自脉冲的纹理（TfS）损失。由于脉冲相机依赖于时间积分而不是事件相机使用的时间微分，我们提出的TfS损失保持可管理的训练成本。它可以同时处理前景物体和背景。我们还提供了使用我们的脉冲-RGB相机系统捕获的真实世界数据集，以促进未来的研究努力。我们进行了广泛的实验，使用合成和真实世界数据集来证明我们的设计可以增强NeRF和3DGS的新视角合成。代码和数据集将提供给公众访问。

更新时间: 2024-04-10 03:31:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06710v1

Verification of Neural Reachable Tubes via Scenario Optimization and Conformal Prediction

Learning-based approaches for controlling safety-critical systems are rapidly growing in popularity; thus, it is important to assure their performance and safety. Hamilton-Jacobi (HJ) reachability analysis is a popular formal verification tool for providing such guarantees, since it can handle general nonlinear system dynamics, bounded adversarial system disturbances, and state and input constraints. However, its computational and memory complexity scales exponentially with the state dimension, making it intractable for large-scale systems. To overcome this challenge, neural approaches, such as DeepReach, have been used to synthesize reachable tubes and safety controllers for high-dimensional systems. However, verifying these neural reachable tubes remains challenging. In this work, we propose two verification methods, based on robust scenario optimization and conformal prediction, to provide probabilistic safety guarantees for neural reachable tubes. Our methods allow a direct trade-off between resilience to outlier errors in the neural tube, which are inevitable in a learning-based approach, and the strength of the probabilistic safety guarantee. Furthermore, we show that split conformal prediction, a widely used method in the machine learning community for uncertainty quantification, reduces to a scenario-based approach, making the two methods equivalent not only for verification of neural reachable tubes but also more generally. To our knowledge, our proof is the first in the literature to show a strong relationship between conformal prediction and scenario optimization. Finally, we propose an outlier-adjusted verification approach that uses the error distribution in neural reachable tubes to recover greater safe volumes. We demonstrate the efficacy of the proposed approaches for the high-dimensional problems of multi-vehicle collision avoidance and rocket landing with no-go zones.

Updated: 2024-04-10 03:29:32

标题: 通过场景优化和合规预测验证神经可达管。

摘要: 基于学习的方法用于控制安全关键系统正迅速增长，因此，确保其性能和安全性非常重要。汉密尔顿-雅可比（HJ）可达性分析是一种流行的形式验证工具，可提供这些保证，因为它可以处理一般的非线性系统动态、有界的对抗性系统干扰以及状态和输入约束。然而，随着状态维数的指数增长，其计算和内存复杂度也呈指数增长，使其在大规模系统中变得难以处理。为了克服这一挑战，已经使用了神经方法，如DeepReach，用于合成高维系统的可达管和安全控制器。然而，验证这些神经可达管仍然具有挑战性。在这项工作中，我们提出了基于鲁棒场景优化和符合预测的两种验证方法，以为神经可达管提供概率安全保证。我们的方法允许在神经管中对异常错误的鲁棒性与概率安全保证的强度之间直接权衡，这些异常错误在学习方法中是不可避免的。此外，我们展示了拆分符合预测，这是机器学习社区中广泛使用的一种不确定性量化方法，归结为一种基于场景的方法，使得两种方法不仅在验证神经可达管时等效，而且更为一般。据我们所知，我们的证明是文献中首次展示符合预测与场景优化之间强相关性的。最后，我们提出了一种调整异常值的验证方法，利用神经可达管中的错误分布来恢复更大的安全体积。我们展示了所提出方法在多车辆避撞和具有禁止区域的火箭着陆等高维问题中的有效性。

更新时间: 2024-04-10 03:29:32

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2312.08604v2

Differentiable Search for Finding Optimal Quantization Strategy

To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.

Updated: 2024-04-10 03:22:58

标题: 寻找最佳量化策略的可微寻优算法

摘要: 为了加速和压缩深度神经网络（DNNs），许多网络量化算法已被提出。尽管来自最先进技术的任何算法的量化策略在某些网络架构中可能优于其他算法，但很难证明该策略始终优于其他策略，甚至无法判断该策略是否始终是网络中所有层的最佳选择。换句话说，现有量化算法是次优的，因为它们忽视了不同层的不同特征，并通过统一的量化策略对所有层进行量化。为了解决这个问题，在本文中，我们提出了一种可微分的量化策略搜索（DQSS），通过利用不同量化算法的优势为每个单独的层分配最佳的量化策略。具体来说，我们将DQSS构建为一个可微分的神经架构搜索问题，并采用高效的卷积来从全局视角通过基于梯度的优化来高效探索混合量化策略。我们对后训练量化进行了DQSS以使其性能可与完整精度模型相媲美。我们还在量化感知训练中采用DQSS来进一步验证DQSS的有效性。为了避免在量化感知训练中使用DQSS时的昂贵优化成本，我们在单次前向-后向传递中更新超参数和网络参数。此外，我们调整优化过程以避免潜在的欠拟合问题。在高级计算机视觉任务（如图像分类）和低级计算机视觉任务（如图像超分辨率）上进行的全面实验显示，DQSS可能优于最先进技术。

更新时间: 2024-04-10 03:22:58

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.08010v1

Convolution-based Probability Gradient Loss for Semantic Segmentation

In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation. It employs convolution kernels similar to the Sobel operator, capable of computing the gradient of pixel intensity in an image. This enables the computation of gradients for both ground-truth and predicted category-wise probabilities. It enhances network performance by maximizing the similarity between these two probability gradients. Moreover, to specifically enhance accuracy near the object's boundary, we extract the object boundary based on the ground-truth probability gradient and exclusively apply the CPG loss to pixels belonging to boundaries. CPG loss proves to be highly convenient and effective. It establishes pixel relationships through convolution, calculating errors from a distinct dimension compared to pixel-wise loss functions such as cross-entropy loss. We conduct qualitative and quantitative analyses to evaluate the impact of the CPG loss on three well-established networks (DeepLabv3-Resnet50, HRNetV2-OCR, and LRASPP_MobileNet_V3_Large) across three standard segmentation datasets (Cityscapes, COCO-Stuff, ADE20K). Our extensive experimental results consistently and significantly demonstrate that the CPG loss enhances the mean Intersection over Union.

Updated: 2024-04-10 03:20:33

标题: 基于卷积的概率梯度损失函数用于语义分割

摘要: 在这篇论文中，我们引入了一种新颖的基于卷积的概率梯度（CPG）损失用于语义分割。它采用类似于Sobel算子的卷积核，能够计算图像中像素强度的梯度。这使得可以计算地面真值和预测类别概率的梯度。它通过最大化这两个概率梯度之间的相似性来增强网络性能。此外，为了特别增强在目标边界附近的准确性，我们基于地面真值概率梯度提取目标边界，并专门将CPG损失应用于属于边界的像素。CPG损失被证明具有极高的便利性和有效性。它通过卷积建立像素关系，从与像素级损失函数（如交叉熵损失）不同的维度计算错误。我们进行定性和定量分析，评估CPG损失对三个知名网络（DeepLabv3-Resnet50、HRNetV2-OCR和LRASPP_MobileNet_V3_Large）在三个标准分割数据集（Cityscapes、COCO-Stuff、ADE20K）上的影响。我们广泛的实验结果一致且显著地表明，CPG损失提高了平均交并比。

更新时间: 2024-04-10 03:20:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06704v1

How to Craft Backdoors with Unlabeled Data Alone?

Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich features in an economical and scalable way. As the drive-horse for building foundation models, SSL has received a lot of attention recently with wide applications, which also raises security concerns where backdoor attack is a major type of threat: if the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples. The goal of this work is to investigate this potential risk. We notice that existing backdoors all require a considerable amount of \emph{labeled} data that may not be available for SSL. To circumvent this limitation, we explore a more restrictive setting called no-label backdoors, where we only have access to the unlabeled data alone, where the key challenge is how to select the proper poison set without using label information. We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle. Experiments on CIFAR-10 and ImageNet-100 show that both no-label backdoors are effective on many SSL methods and outperform random poisoning by a large margin. Code will be available at https://github.com/PKU-ML/nlb.

Updated: 2024-04-10 02:54:18

标题: 如何仅使用无标签数据创建后门？

摘要: 仅依靠无标签数据，自监督学习（SSL）可以以经济高效且可扩展的方式学习丰富的特征。作为构建基础模型的驱动力，SSL最近受到了广泛关注并具有广泛应用，但也引起了安全担忧，其中后门攻击是一种主要威胁：如果发布的数据集被恶意毒害，带有后门的SSL模型在注入触发器进行测试样本时可能表现不佳。本研究的目标是调查这种潜在风险。我们注意到现有的后门攻击都需要相当数量的标记数据，这对于SSL可能不可用。为了规避这一限制，我们探索了一个更为严格的设置，称为无标签后门攻击，其中我们只能访问无标签数据，关键挑战在于如何选择适当的毒害集而不使用标签信息。我们提出了两种毒害选择策略：基于聚类的选择使用伪标签，以及根据互信息原则推导的对比选择。在CIFAR-10和ImageNet-100上的实验表明，两种无标签后门攻击对许多SSL方法都有效，并且表现比随机毒害要好得多。代码将在https://github.com/PKU-ML/nlb 上提供。

更新时间: 2024-04-10 02:54:18

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2404.06694v1

Latent Chemical Space Searching for Plug-in Multi-objective Molecule Generation

Molecular generation, an essential method for identifying new drug structures, has been supported by advancements in machine learning and computational technology. However, challenges remain in multi-objective generation, model adaptability, and practical application in drug discovery. In this study, we developed a versatile 'plug-in' molecular generation model that incorporates multiple objectives related to target affinity, drug-likeness, and synthesizability, facilitating its application in various drug development contexts. We improved the Particle Swarm Optimization (PSO) in the context of drug discoveries, and identified PSO-ENP as the optimal variant for multi-objective molecular generation and optimization through comparative experiments. The model also incorporates a novel target-ligand affinity predictor, enhancing the model's utility by supporting three-dimensional information and improving synthetic feasibility. Case studies focused on generating and optimizing drug-like big marine natural products were performed, underscoring PSO-ENP's effectiveness and demonstrating its considerable potential for practical drug discovery applications.

Updated: 2024-04-10 02:37:24

标题: 潜在化学空间搜索用于插件式多目标分子生成

摘要: 分子生成是识别新药物结构的重要方法，得到了机器学习和计算技术的进展支持。然而，在多目标生成、模型适应性和在药物发现中的实际应用方面仍然存在挑战。在本研究中，我们开发了一种多功能的“插件”分子生成模型，将与靶点亲和力、药物样性和可合成性相关的多个目标纳入其中，促进了在各种药物开发环境中的应用。我们改进了在药物发现背景下的粒子群优化（PSO）算法，并通过比较实验确定了PSO-ENP作为多目标分子生成和优化的最佳变体。该模型还包括一种新颖的靶点-配体亲和力预测器，通过支持三维信息和改进合成可行性，提高了模型的实用性。案例研究集中在生成和优化类似药物的大型海洋天然产物上，突出了PSO-ENP的有效性，并展示了其在实际药物发现应用中的巨大潜力。

更新时间: 2024-04-10 02:37:24

领域: q-bio.BM,cs.LG,cs.NE

下载: http://arxiv.org/abs/2404.06691v1

Zipformer: A faster and better encoder for automatic speech recognition

The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at https://github.com/k2-fsa/icefall.

Updated: 2024-04-10 02:35:38

标题: Zipformer：一种更快更好的自动语音识别编码器

摘要: Conformer已成为自动语音识别(ASR)中最受欢迎的编码器模型。它在transformer中添加了卷积模块，以学习局部和全局依赖关系。本研究中，我们描述了一种更快、更节省内存、性能更好的transformer，称为Zipformer。建模变化包括：1）类似U-Net的编码器结构，其中中间堆栈以较低的帧率操作；2）重新组织的块结构，包含更多模块，在其中我们重复使用注意力权重以提高效率；3）一种称为BiasNorm的修改形式的LayerNorm允许我们保留一些长度信息；4）新的激活函数SwooshR和SwooshL比Swish更有效。我们还提出了一种新的优化器，称为ScaledAdam，它通过每个张量的当前规模来缩放更新，以保持相对变化大致相同，并且显式学习参数规模。它实现了比Adam更快的收敛速度和更好的性能。对LibriSpeech、Aishell-1和WenetSpeech数据集的大量实验表明，我们提出的Zipformer相对于其他最先进的ASR模型的有效性。我们的代码可以在https://github.com/k2-fsa/icefall上公开获取。

更新时间: 2024-04-10 02:35:38

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2310.11230v4

A Generic Shared Attention Mechanism for Various Backbone Neural Networks

The self-attention mechanism has emerged as a critical component for improving the performance of various backbone neural networks. However, current mainstream approaches individually incorporate newly designed self-attention modules (SAMs) into each layer of the network for granted without fully exploiting their parameters' potential. This leads to suboptimal performance and increased parameter consumption as the network depth increases. To improve this paradigm, in this paper, we first present a counterintuitive but inherent phenomenon: SAMs tend to produce strongly correlated attention maps across different layers, with an average Pearson correlation coefficient of up to 0.85. Inspired by this inherent observation, we propose Dense-and-Implicit Attention (DIA), which directly shares SAMs across layers and employs a long short-term memory module to calibrate and bridge the highly correlated attention maps of different layers, thus improving the parameter utilization efficiency of SAMs. This design of DIA is also consistent with the neural network's dynamical system perspective. Through extensive experiments, we demonstrate that our simple yet effective DIA can consistently enhance various network backbones, including ResNet, Transformer, and UNet, across tasks such as image classification, object detection, and image generation using diffusion models.

Updated: 2024-04-10 02:33:57

标题: 一种适用于不同主干神经网络的通用共享注意机制

摘要: 自我注意机制已经成为改进各种主干神经网络性能的关键组件。然而，目前主流方法通常将新设计的自我注意模块（SAMs）单独并入网络的每一层，而没有充分利用其参数潜力。随着网络深度的增加，这导致性能不佳和参数消耗增加。为了改进这一范式，本文首先提出了一个反直觉但固有的现象：SAMs倾向于在不同层之间产生高度相关的注意力图，平均皮尔逊相关系数高达0.85。受到这一固有观察的启发，我们提出了密集和隐式注意力（DIA），它直接在层间共享SAMs，并使用长短期记忆模块来校准和连接不同层的高度相关的注意力图，从而提高SAMs的参数利用效率。DIA的设计也与神经网络的动力系统视角一致。通过大量实验，我们证明了我们简单而有效的DIA能够持续增强各种网络主干，包括ResNet、Transformer和UNet，在图像分类、目标检测和使用扩散模型进行图像生成等任务中。

更新时间: 2024-04-10 02:33:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2210.16101v2

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge in the field. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter. Audio samples are available at https://aka.ms/covomix.

Updated: 2024-04-10 02:32:58

标题: CoVoMix：推进人类多方对话的零-shot语音生成

摘要: 最近在零-shot文本到语音（TTS）建模方面取得了重大进展，导致生成高保真度和多样化语音方面取得了显著进展。然而，对话生成以及在语音中实现类似人类的自然性仍然是该领域面临的挑战。在本文中，我们介绍了CoVoMix：对话语音混合生成，这是一种新型的零-shot、类人、多说话者、多轮对话语音生成模型。CoVoMix能够首先将对话文本转换为多个离散令牌流，其中每个令牌流表示个体说话者的语义信息。然后，将这些令牌流输入基于流匹配的声学模型中，以生成混合的mel-频谱图。最后，使用HiFi-GAN模型生成语音波形。此外，我们设计了一套全面的指标来衡量对话建模和生成的有效性。我们的实验结果表明，CoVoMix可以生成不仅在自然性和连贯性方面类似于人类的对话，还涉及多个说话者参与多轮对话。这些对话在单个通道内生成，具有无缝的语音过渡，包括重叠的语音和适当的语言行为，如笑声。音频样本可在https://aka.ms/covomix获取。

更新时间: 2024-04-10 02:32:58

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.06690v1

Understanding Expressivity of GNN in Rule Learning

Rule learning is critical to improving knowledge graph (KG) reasoning due to their ability to provide logical and interpretable explanations. Recently, Graph Neural Networks (GNNs) with tail entity scoring achieve the state-of-the-art performance on KG reasoning. However, the theoretical understandings for these GNNs are either lacking or focusing on single-relational graphs, leaving what the kind of rules these GNNs can learn an open problem. We propose to fill the above gap in this paper. Specifically, GNNs with tail entity scoring are unified into a common framework. Then, we analyze their expressivity by formally describing the rule structures they can learn and theoretically demonstrating their superiority. These results further inspire us to propose a novel labeling strategy to learn more rules in KG reasoning. Experimental results are consistent with our theoretical findings and verify the effectiveness of our proposed method. The code is publicly available at https://github.com/LARS-research/Rule-learning-expressivity.

Updated: 2024-04-10 02:32:52

标题: 理解图神经网络在规则学习中的表现力

摘要: 规则学习对于改善知识图谱（KG）推理至关重要，因为它们能够提供逻辑和可解释的解释。最近，带有尾实体评分的图神经网络（GNNs）在KG推理上取得了最先进的性能。然而，对于这些GNNs的理论理解要么缺乏，要么集中在单关系图上，未能解决这些GNNs可以学习的规则类型是什么的问题。我们提出在本文中填补上述空白。具体而言，带有尾实体评分的GNNs被统一到一个通用框架中。然后，我们通过正式描述它们可以学习的规则结构并在理论上证明其优越性来分析它们的表达能力。这些结果进一步激发我们提出一种新的标记策略，以在KG推理中学习更多规则。实验结果与我们的理论发现一致，并验证了我们提出的方法的有效性。代码可在https://github.com/LARS-research/Rule-learning-expressivity 上公开获取。

更新时间: 2024-04-10 02:32:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2303.12306v2

MiniLLM: Knowledge Distillation of Large Language Models

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into small models is still under-explored, which becomes more important with the prosperity of open-source LLMs. In this work, we propose a KD approach that distills LLMs into smaller language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. The student models are named MiniLLM. Extensive experiments in the instruction-following setting show that MiniLLM generates more precise responses with higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance than the baselines. Our method is scalable for different model families with 120M to 13B parameters. Our code, data, and model checkpoints can be found in https://github.com/microsoft/LMOps/tree/main/minillm.

Updated: 2024-04-10 02:30:19

标题: MiniLLM：大型语言模型的知识蒸馏

摘要: 知识蒸馏（KD）是一种用于减少大语言模型（LLMs）高计算需求的有前途的技术。然而，先前的KD方法主要应用于白盒分类模型或训练小模型来模仿ChatGPT等黑盒模型API。如何有效地将白盒LLMs的知识蒸馏到小模型中仍未得到深入探讨，随着开源LLMs的繁荣，这变得更加重要。在这项工作中，我们提出了一种将LLMs蒸馏到更小语言模型的KD方法。我们首先将标准KD方法中的正向Kullback-Leibler散度（KLD）目标替换为逆KLD，这对于在生成语言模型上进行KD更为合适，以防止学生模型高估教师分布的低概率区域。然后，我们提出了一种有效的优化方法来学习这个目标。学生模型被命名为MiniLLM。在指令跟随设置中进行的大量实验表明，MiniLLM生成更精确的响应，整体质量更高，暴露偏差更低，校准更好，并且长文本生成性能更高，比基线更好。我们的方法可扩展到具有120M至13B参数的不同模型系列。我们的代码、数据和模型检查点可在https://github.com/microsoft/LMOps/tree/main/minillm上找到。

更新时间: 2024-04-10 02:30:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2306.08543v4

Atlas-X Equity Financing: Unlocking New Methods to Securely Obfuscate Axe Inventory Data Based on Differential Privacy

Banks publish daily a list of available securities/assets (axe list) to selected clients to help them effectively locate Long (buy) or Short (sell) trades at reduced financing rates. This reduces costs for the bank, as the list aggregates the bank's internal firm inventory per asset for all clients of long as well as short trades. However, this is somewhat problematic: (1) the bank's inventory is revealed; (2) trades of clients who contribute to the aggregated list, particularly those deemed large, are revealed to other clients. Clients conducting sizable trades with the bank and possessing a portion of the aggregated asset exceeding $50\%$ are considered to be concentrated clients. This could potentially reveal a trading concentrated client's activity to their competitors, thus providing an unfair advantage over the market. Atlas-X Axe Obfuscation, powered by new differential private methods, enables a bank to obfuscate its published axe list on a daily basis while under continual observation, thus maintaining an acceptable inventory Profit and Loss (P&L) cost pertaining to the noisy obfuscated axe list while reducing the clients' trading activity leakage. Our main differential private innovation is a differential private aggregator for streams (time series data) of both positive and negative integers under continual observation. For the last two years, Atlas-X system has been live in production across three major regions-USA, Europe, and Asia-at J.P. Morgan, a major financial institution, facilitating significant profitability. To our knowledge, it is the first differential privacy solution to be deployed in the financial sector. We also report benchmarks of our algorithm based on (anonymous) real and synthetic data to showcase the quality of our obfuscation and its success in production.

Updated: 2024-04-10 02:19:37

标题: 《Atlas-X股权融资：基于差分隐私的新方法解锁安全混淆斧头库存数据》

摘要: 银行每天发布一份可用证券/资产的清单（斧头清单）给选择的客户，帮助他们有效地找到长期（买入）或空头（卖出）交易，以降低融资利率。这降低了银行的成本，因为该清单聚合了银行每个客户的内部公司库存每项资产的长期和短期交易。然而，这在某种程度上存在问题：（1）银行的库存被揭示；（2）贡献到聚合清单的客户的交易，特别是那些被认为是大宗的客户，被揭示给其他客户。与银行进行大宗交易并持有超过总资产的50%的聚合资产的部分的客户被认为是集中的客户。这可能会将交易集中的客户的活动透露给其竞争对手，从而为其提供不公平的市场优势。 Atlas-X Axe Obfuscation，由新的差分私有方法提供支持，使银行能够在持续观察的情况下每天对其发布的斧头清单进行混淆，从而在减少客户交易活动泄漏的同时保持与嘈杂的混淆斧头清单相关的可接受的库存利润和损失（P＆L）成本。我们的主要差分私有创新是对正整数和负整数的流（时间序列数据）进行持续观察的差分私有聚合器。在过去的两年里，Atlas-X系统已经在J.P.摩根这家主要金融机构的美国、欧洲和亚洲三个主要地区投入生产，为其带来了显著的盈利。据我们所知，这是第一个在金融领域部署的差分私有解决方案。我们还根据（匿名）真实和合成数据报告了我们算法的基准，展示了我们混淆的质量及其在生产中的成功。

更新时间: 2024-04-10 02:19:37

领域: cs.CR

下载: http://arxiv.org/abs/2404.06686v1

Subspace Representations for Soft Set Operations and Sentence Similarities

In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic, we realize the representation of word sets and corresponding set operations within pre-trained word embedding spaces. By grounding our approach in the linear subspaces, we enable efficient computation of various set operations and facilitate the soft computation of membership functions within continuous spaces. Moreover, we allow for the computation of the F-score directly within word vectors, thereby establishing a direct link to the assessment of sentence similarity. In experiments with widely-used pre-trained embeddings and benchmarks, we show that our subspace-based set operations consistently outperform vector-based ones in both sentence similarity and set retrieval tasks.

Updated: 2024-04-10 02:16:55

标题: 子空间表示对软集操作和句子相似性的影响

摘要: 在自然语言处理（NLP）领域，连续向量表示对于捕捉单词的语义意义至关重要。然而，在单词集合的表示方面，传统的基于向量的方法往往在表达和缺乏联合、交集和补集等基本集合操作方面存在困难。受量子逻辑的启发，我们在预训练的词嵌入空间内实现了单词集合的表示和相应的集合操作。通过将我们的方法基于线性子空间，我们实现了在连续空间内高效计算各种集合操作，并促进了在连续空间内成员函数的软计算。此外，我们允许直接在单词向量中计算F分数，从而建立与句子相似性评估的直接联系。在广泛使用的预训练嵌入和基准测试中的实验中，我们展示了我们基于子空间的集合操作在句子相似性和集合检索任务中始终优于基于向量的方法。

更新时间: 2024-04-10 02:16:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2210.13034v4

Text-Based Reasoning About Vector Graphics

While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose the Visually Descriptive Language Model (VDLM), which performs text-based reasoning about vector graphics. VDLM leverages Scalable Vector Graphics (SVG) for a more precise visual description and first uses an off-the-shelf raster-to-SVG algorithm for encoding. Since existing language models cannot understand raw SVGs in a zero-shot setting, VDLM then bridges SVG with pretrained language models through a newly introduced intermediate symbolic representation, Primal Visual Description (PVD), comprising primitive attributes (e.g., shape, position, measurement) with their corresponding predicted values. PVD is task-agnostic and represents visual primitives that are universal across all vector graphics. It can be learned with procedurally generated (SVG, PVD) pairs and also enables the direct use of LLMs for generalization to complex reasoning tasks. By casting an image to a text-based representation, we can leverage the power of language models to learn alignment from SVG to visual primitives and generalize to unseen question-answering tasks. Empirical results show that VDLM achieves stronger zero-shot performance compared to state-of-the-art LMMs, such as GPT-4V, in various low-level multimodal perception and reasoning tasks on vector graphics. We additionally present extensive analyses on VDLM's performance, demonstrating that our framework offers better interpretability due to its disentangled perception and reasoning processes. Project page: https://mikewangwzhl.github.io/VDLM/

Updated: 2024-04-10 02:12:27

标题: 基于文本的矢量图形推理

摘要: 尽管大型多模态模型在广泛的视觉语言基准测试中表现优异，但它们通常在需要精确感知低级视觉细节的任务中遇到困难，比如比较线段长度或解决简单的迷宫问题。特别是，在有关矢量图形的问题回答任务中，这种失败模式仍然存在，矢量图形由纯粹的二维对象和形状组成。为了解决这一挑战，我们提出了视觉描述语言模型（VDLM），它对矢量图形进行基于文本的推理。VDLM利用可伸缩矢量图形（SVG）进行更精确的视觉描述，首先使用现成的光栅到SVG算法进行编码。由于现有的语言模型无法在零-shot设置中理解原始SVG，VDLM通过引入新的中间符号表示，原始视觉描述（PVD），将SVG与预训练语言模型进行桥接，PVD包括基本属性（例如形状、位置、测量）及其相应的预测值。PVD是与任务无关的，代表了所有矢量图形中通用的视觉基元。它可以通过程序生成的（SVG，PVD）对进行学习，还可以直接使用LLMs进行复杂推理任务的泛化。通过将图像转换为基于文本的表示，我们可以利用语言模型的能力从SVG到视觉基元学习对齐，并泛化到看不见的问题回答任务。实证结果表明，与最先进的LMMs（如GPT-4V）相比，VDLM在矢量图形上的各种低级多模态感知和推理任务中实现了更强的零-shot性能。此外，我们还对VDLM的性能进行了广泛分析，证明我们的框架由于其解缠的感知和推理过程而提供了更好的可解释性。项目页面：https://mikewangwzhl.github.io/VDLM/

更新时间: 2024-04-10 02:12:27

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.06479v2

Causal Representation Learning from Multiple Distributions: A General Setting

In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, each latent variable can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims.

Updated: 2024-04-10 02:08:29

标题: 从多个分布中学习因果表示：一个通用的设置

摘要: 在许多问题中，测量变量（例如，图像像素）仅仅是隐藏因果变量（例如，潜在概念或对象）的数学函数。为了在不断变化的环境中进行预测或对系统进行适当更改，有助于恢复隐藏因果变量$Z_i$及其由图$\mathcal{G}_Z$表示的因果关系。这个问题最近被称为因果表征学习。本文关注从多个分布（源自异构数据或非平稳时间序列）中进行因果表征学习的一般、完全非参数化设置，而不假设分布变化背后有硬干预。我们的目标是在这种基本情况下开发通用解决方案；作为一个副产品，这有助于看到其他假设（如参数因果模型或硬干预）所提供的独特好处。我们表明，在对潜在变量的恢复图施加稀疏约束并对因果影响施加适当的充分变化条件时，有趣的是，可以恢复潜在有向无环图的道德化图，并且恢复的潜在变量及其关系以一种特定且非平凡的方式与潜在因果模型相关联。在某些情况下，甚至可以恢复每个潜在变量直至分量方式的转换。实验结果验证了我们的理论断言。

更新时间: 2024-04-10 02:08:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.05052v2

Causal Unit Selection using Tractable Arithmetic Circuits

The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work allows one to find optimal units exactly by reducing the causal objective to a classical objective on a meta-model, and then applying a variant of the classical Variable Elimination (VE) algorithm to the meta-model -- assuming a fully specified causal model is available. In practice, however, finding optimal units using this approach can be very expensive because the used VE algorithm must be exponential in the constrained treewidth of the meta-model, which is larger and denser than the original model. We address this computational challenge by introducing a new approach for unit selection that is not necessarily limited by the constrained treewidth. This is done through compiling the meta-model into a special class of tractable arithmetic circuits that allows the computation of optimal units in time linear in the circuit size. We finally present empirical results on random causal models that show order-of-magnitude speedups based on the proposed method for solving unit selection.

Updated: 2024-04-10 02:02:34

标题: 使用可处理的算术电路进行因果单元选择

摘要: 单位选择问题旨在找到能够优化因果目标函数的对象，称为单位，该函数描述了对象在因果上下文中的行为（例如，选择即将流失但如果受到鼓励可能会改变主意的客户）。早期的研究主要集中在使用数据限制特定类别的反事实目标函数，而最近的工作允许通过将因果目标简化为元模型上的经典目标，并应用经典变量消除（VE）算法的变体来精确找到最佳单位 - 假设完全指定的因果模型可用。然而，在实践中，使用这种方法找到最佳单位可能非常昂贵，因为所使用的VE算法必须是元模型的受限树宽的指数级，而这比原始模型更大更密集。我们通过引入一种新的单位选择方法来解决这个计算挑战，该方法不一定受限于受限树宽。通过将元模型编译成一类可计算的算术电路，可以在线性时间内计算出最佳单位。最后，我们展示了关于随机因果模型的实证结果，这些结果显示了基于所提出的方法解决单位选择问题的速度提升数量级。

更新时间: 2024-04-10 02:02:34

领域: cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.06681v1

Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution

A major contributor to the quality of a deep learning model is the selection of the optimizer. We propose a new dual-joint search space in the realm of neural optimizer search (NOS), along with an integrity check, to automate the process of finding deep learning optimizers. Our dual-joint search space simultaneously allows for the optimization of not only the update equation, but also internal decay functions and learning rate schedules for optimizers. We search the space using our proposed mutation-only, particle-based genetic algorithm able to be massively parallelized for our domain-specific problem. We evaluate our candidate optimizers on the CIFAR-10 dataset using a small ConvNet. To assess generalization, the final optimizers were then transferred to large-scale image classification on CIFAR- 100 and TinyImageNet, while also being fine-tuned on Flowers102, Cars196, and Caltech101 using EfficientNetV2Small. We found multiple optimizers, learning rate schedules, and Adam variants that outperformed Adam, as well as other standard deep learning optimizers, across the image classification tasks.

Updated: 2024-04-10 02:00:24

标题: 神经优化器方程、衰减函数和学习率调度的联合演化

摘要: 深度学习模型质量的一个重要因素是优化器的选择。我们提出了一个新的双联搜索空间，用于神经优化器搜索（NOS），并提出了一种完整性检查，以自动化查找深度学习优化器的过程。我们的双联搜索空间同时允许优化不仅是更新方程，还有内部衰减函数和优化器的学习率调度。我们使用我们提出的仅突变、基于粒子的遗传算法来搜索空间，该算法能够针对我们的特定领域问题进行大规模并行化。我们在CIFAR-10数据集上使用一个小型ConvNet评估我们的候选优化器。为了评估泛化能力，最终优化器随后被转移到CIFAR-100和TinyImageNet上的大规模图像分类任务中，同时还在EfficientNetV2Small上对Flowers102、Cars196和Caltech101进行微调。我们发现多个优化器、学习率调度和Adam变体在图像分类任务中表现优于Adam，以及其他标准深度学习优化器。

更新时间: 2024-04-10 02:00:24

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.06679v1

Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classification processes. Topological Data Analysis (TDA) offers a novel perspective for ADHD classification, diverging from traditional time-frequency domain features. Yet, conventional TDA models are restricted to single-channel time series and are susceptible to noise, leading to the loss of topological features in persistence diagrams.This paper presents an enhanced TDA approach applicable to multi-channel EEG in ADHD. Initially, optimal input parameters for multi-channel EEG are determined. Subsequently, each channel's EEG undergoes phase space reconstruction (PSR) followed by the utilization of k-Power Distance to Measure (k-PDTM) for approximating ideal point clouds. Then, multi-dimensional time series are re-embedded, and TDA is applied to obtain topological feature information. Gaussian function-based Multivariate Kernel Density Estimation (MKDE) is employed in the merger persistence diagram to filter out desired topological feature mappings. Finally, persistence image (PI) method is utilized to extract topological features, and the influence of various weighting functions on the results is discussed.The effectiveness of our method is evaluated using the IEEE ADHD dataset. Results demonstrate that the accuracy, sensitivity, and specificity reach 85.60%, 83.61%, and 88.33%, respectively. Compared to traditional TDA methods, our method was effectively improved and outperforms typical nonlinear descriptors. These findings indicate that our method exhibits higher precision and robustness.

Updated: 2024-04-10 01:37:41

标题: 多通道脑电图的拓扑特征搜索方法：在ADHD分类中的应用

摘要: 近年来，使用脑电图（EEG）进行注意缺陷多动障碍（ADHD）的初步诊断引起了研究人员的关注。脑电图以其迅速和高效而闻名，在ADHD的诊断和治疗中起着关键作用。然而，EEG信号的非稳态性和受试者间的变异性给诊断和分类过程带来了挑战。拓扑数据分析（TDA）为ADHD分类提供了一种新的视角，与传统的时频域特征有所不同。然而，传统的TDA模型受限于单通道时间序列，并容易受到噪声的干扰，导致持续图中拓扑特征的丢失。本文提出了一种适用于多通道EEG的增强TDA方法。首先确定了多通道EEG的最佳输入参数。随后，对每个通道的EEG进行相空间重构（PSR），然后利用k-Power Distance to Measure（k-PDTM）来逼近理想点云。然后，多维时间序列被重新嵌入，应用TDA来获取拓扑特征信息。高斯函数基于多元核密度估计（MKDE）被用于合并持久图，以过滤出所需的拓扑特征映射。最后，持续图像（PI）方法被用来提取拓扑特征，并讨论了各种加权函数对结果的影响。我们的方法使用IEEE ADHD数据集进行了评估。结果表明，准确率、灵敏度和特异度分别达到了85.60％、83.61％和88.33％。与传统的TDA方法相比，我们的方法得到了有效改进，并胜过了典型的非线性描述符。这些发现表明，我们的方法具有更高的精度和稳健性。

更新时间: 2024-04-10 01:37:41

领域: cs.LG,eess.SP,stat.AP

下载: http://arxiv.org/abs/2404.06676v1

Toward Cross-Layer Energy Optimizations in Machine Learning Systems

The enormous energy consumption of machine learning (ML) and generative AI workloads shows no sign of waning, taking a toll on operating costs, power delivery, and environmental sustainability. Despite a long line of research on energy-efficient hardware, we found that software plays a critical role in ML energy optimization through two recent works: Zeus and Perseus. This is especially true for large language models (LLMs) because their model sizes and, therefore, energy demands are growing faster than hardware efficiency improvements. Therefore, we advocate for a cross-layer approach for energy optimizations in ML systems, where hardware provides architectural support that pushes energy-efficient software further, while software leverages and abstracts the hardware to develop techniques that bring hardware-agnostic energy-efficiency gains.

Updated: 2024-04-10 01:35:17

标题: 朝向机器学习系统中的跨层能源优化

摘要: 机器学习（ML）和生成式人工智能工作量的巨大能耗显示出没有减弱的迹象，给运营成本、电力供应和环境可持续性带来了负担。尽管硬件能效研究已有很长一段时间，但我们发现软件通过两项最近的工作——Zeus和Perseus，在ML能源优化中发挥了关键作用。对于大型语言模型（LLMs）来说，这一点尤为重要，因为它们的模型规模和能耗增长速度比硬件效率提升更快。因此，我们主张在ML系统中采用跨层方法进行能源优化，其中硬件提供架构支持，推动能效软件进一步发展，而软件利用和抽象硬件来开发技术，带来硬件无关的能效收益。

更新时间: 2024-04-10 01:35:17

领域: cs.LG,cs.AR,cs.DC

下载: http://arxiv.org/abs/2404.06675v1

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion effect is weak, there is no zero-shot capability for out-of-distribution speakers, or the synthesized outputs exhibit timbre leakage which changes the speaker's perceived identity. Our work proposes solutions for each of these issues in a simple modular framework based on a conditional diffusion backbone model with optional normalizing flow-based and sequence-to-sequence speaker attribute-editing modules, whose components can be combined or removed during inference to meet a wide array of tasks without additional model finetuning. Audio samples are available at https://voiceshopai.github.io

Updated: 2024-04-10 01:33:08

标题: VoiceShop：一个统一的语音对语音框架，用于保留身份的零样本语音编辑

摘要: 我们提出了VoiceShop，这是一个新颖的语音到语音框架，可以在单次前向传递中修改语音的多个属性，例如年龄、性别、口音和语音风格，同时保留输入说话者的音色。先前的工作受到了专门模型的限制，这些模型只能单独编辑这些属性，并且存在以下问题：转换效果的幅度较弱，对于分布外说话者没有零样本能力，或者合成输出显示音色泄漏，改变了说话者的被感知身份。我们的工作针对这些问题提出了解决方案，基于一个条件扩散骨干模型，可选地使用基于正规化流和序列到序列说话者属性编辑模块，这些组件可以在推理过程中组合或移除，以满足各种任务而无需额外的模型微调。音频样本可在https://voiceshopai.github.io找到。

更新时间: 2024-04-10 01:33:08

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.06674v1

Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition

The past years have witnessed a proliferation of large language models (LLMs). Yet, automated and unbiased evaluation of LLMs is challenging due to the inaccuracy of standard metrics in reflecting human preferences and the inefficiency in sampling informative and diverse test examples. While human evaluation remains the gold standard, it is expensive and time-consuming, especially when dealing with a large number of testing samples. To address this problem, we propose a sample-efficient human evaluation method based on MAximum Discrepancy (MAD) competition. MAD automatically selects a small set of informative and diverse instructions, each adapted to two LLMs, whose responses are subject to three-alternative forced choice by human subjects. The pairwise comparison results are then aggregated into a global ranking using the Elo rating system. We select eight representative LLMs and compare them in terms of four skills: knowledge understanding, mathematical reasoning, writing, and coding. Experimental results show that the proposed method achieves a reliable and sensible ranking of LLMs' capabilities, identifies their relative strengths and weaknesses, and offers valuable insights for further LLM advancement.

Updated: 2024-04-10 01:26:24

标题: 大型语言模型的样本高效人类评估：通过最大差异竞赛

摘要: 在过去的几年中，大型语言模型（LLMs）数量激增。然而，由于标准评估指标在反映人类偏好方面的不准确性以及在采样具有信息量和多样性的测试示例方面的低效性，LLMs的自动化和无偏评估具有挑战性。虽然人工评估仍然是金标准，但在处理大量测试样本时，成本高昂且耗时，特别是在处理大量测试样本时。为了解决这个问题，我们提出了一种基于最大差异（MAD）竞争的样本高效的人工评估方法。MAD自动选择一小组具有信息量和多样性的指令，每个指令适用于两个LLMs，这两个LLMs的响应由人类受试者进行三选一强制选择。然后使用Elo评分系统将成对比较结果聚合为全局排名。我们选择了八个代表性的LLMs，并在知识理解、数学推理、写作和编码四个技能方面进行比较。实验结果表明，所提出的方法能够可靠地和合理地排名LLMs的能力，识别它们的相对优势和劣势，并为进一步的LLMs发展提供有价值的见解。

更新时间: 2024-04-10 01:26:24

领域: cs.LG,cs.CL,cs.HC

下载: http://arxiv.org/abs/2404.08008v1

Leveraging Diffusion For Strong and High Quality Face Morphing Attacks

Face morphing attacks seek to deceive a Face Recognition (FR) system by presenting a morphed image consisting of the biometric qualities from two different identities with the aim of triggering a false acceptance with one of the two identities, thereby presenting a significant threat to biometric systems. The success of a morphing attack is dependent on the ability of the morphed image to represent the biometric characteristics of both identities that were used to create the image. We present a novel morphing attack that uses a Diffusion-based architecture to improve the visual fidelity of the image and the ability of the morphing attack to represent characteristics from both identities. We demonstrate the effectiveness of the proposed attack by evaluating its visual fidelity via the Frechet Inception Distance (FID). Also, extensive experiments are conducted to measure the vulnerability of FR systems to the proposed attack. The ability of a morphing attack detector to detect the proposed attack is measured and compared against two state-of-the-art GAN-based morphing attacks along with two Landmark-based attacks. Additionally, a novel metric to measure the relative strength between different morphing attacks is introduced and evaluated.

Updated: 2024-04-10 01:11:15

标题: 利用扩散进行强大和高质量的人脸变形攻击

摘要: 面部变形攻击旨在通过展示一个由两个不同身份的生物特征质量组成的变形图像来欺骗面部识别系统，以触发其中一个身份的虚假接受，从而对生物特征系统构成重大威胁。变形攻击的成功取决于变形图像能否代表用于创建图像的两个身份的生物特征特征。我们提出了一种新颖的变形攻击，该攻击利用基于扩散的架构来提高图像的视觉保真度和变形攻击代表两个身份的特征的能力。我们通过Frechet Inception Distance（FID）评估了所提出攻击的视觉保真度的有效性。此外，还进行了大量实验来衡量FR系统对所提出攻击的脆弱性。测量变形攻击检测器检测所提出攻击的能力，并与两种最先进的基于GAN的变形攻击以及两种基于Landmark的攻击进行比较。此外，引入并评估了一种衡量不同变形攻击之间相对强度的新颖度量。

更新时间: 2024-04-10 01:11:15

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2301.04218v4

Forecasting the Future with Future Technologies: Advancements in Large Meteorological Models

The field of meteorological forecasting has undergone a significant transformation with the integration of large models, especially those employing deep learning techniques. This paper reviews the advancements and applications of these models in weather prediction, emphasizing their role in transforming traditional forecasting methods. Models like FourCastNet, Pangu-Weather, GraphCast, ClimaX, and FengWu have made notable contributions by providing accurate, high-resolution forecasts, surpassing the capabilities of traditional Numerical Weather Prediction (NWP) models. These models utilize advanced neural network architectures, such as Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), and Transformers, to process diverse meteorological data, enhancing predictive accuracy across various time scales and spatial resolutions. The paper addresses challenges in this domain, including data acquisition and computational demands, and explores future opportunities for model optimization and hardware advancements. It underscores the integration of artificial intelligence with conventional meteorological techniques, promising improved weather prediction accuracy and a significant contribution to addressing climate-related challenges. This synergy positions large models as pivotal in the evolving landscape of meteorological forecasting.

Updated: 2024-04-10 00:52:54

标题: 使用未来技术预测未来: 大气象模型的进展

摘要: 气象预测领域经历了一场重大变革，特别是在整合大型模型方面，尤其是采用深度学习技术的模型。本文回顾了这些模型在天气预测中的进展和应用，强调它们在改变传统预测方法方面的作用。像FourCastNet、Pangu-Weather、GraphCast、ClimaX和FengWu等模型通过提供准确、高分辨率的预测，超越了传统的数值天气预测（NWP）模型的能力，做出了显著贡献。这些模型利用先进的神经网络架构，如卷积神经网络（CNNs）、图神经网络（GNNs）和Transformer，处理多样化的气象数据，增强了各种时间尺度和空间分辨率上的预测准确性。本文讨论了该领域面临的挑战，包括数据获取和计算需求，并探讨了模型优化和硬件进展的未来机遇。它强调了人工智能与传统气象技术的整合，承诺提高天气预测准确性，并在应对与气候相关的挑战方面做出重要贡献。这种协同作用使大型模型在气象预测不断发展的格局中处于重要地位。

更新时间: 2024-04-10 00:52:54

领域: cs.LG,cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2404.06668v1

Stabilizing Estimates of Shapley Values with Control Variates

Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.

Updated: 2024-04-10 00:35:36

标题: 使用控制变量稳定估计 Shapley 值

摘要: Shapley值是解释黑盒机器学习模型预测中最流行的工具之一。然而，它们的高计算成本促使使用采样近似，从而引入了相当程度的不确定性。为了稳定这些模型解释，我们提出了一种基于控制变量蒙特卡洛技术的方法，名为ControlSHAP。我们的方法适用于任何机器学习模型，几乎不需要额外的计算或建模工作。在几个高维数据集上，我们发现它可以显著减少Shapley估计的蒙特卡罗变异性。

更新时间: 2024-04-10 00:35:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.07672v3

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.

Updated: 2024-04-10 00:33:25

标题: 《Interspeech 2024中的X-LANCE技术报告：使用离散语音单元挑战进行语音处理》

摘要: 离散语音单元在多个语音处理领域中变得越来越受欢迎，包括自动语音识别（ASR）、文本到语音（TTS）和歌声合成（SVS）。本文介绍了由上海交通大学X-LANCE团队开发的系统，用于Interspeech 2024使用离散语音单元挑战中的TTS（声学+声码器）、SVS和ASR跟踪。值得注意的是，我们在TTS跟踪中获得了榜首位置，无论是使用整个训练集还是仅使用1小时的训练数据，我们的UTMOS得分最高，比所有提交作品中的比特率都要低。

更新时间: 2024-04-10 00:33:25

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2404.06079v2

SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block explicit NSFW-related content (e.g., naked or sexy) but may still be vulnerable to adversarial prompts inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate unsafe content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate unsafe visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets demonstrate SafeGen's effectiveness in mitigating unsafe content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.1% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.

Updated: 2024-04-10 00:26:08

标题: SafeGen：减轻文本到图像模型中不安全内容生成

摘要: 文本到图像（T2I）模型，如稳定扩散，近年来在从文本描述中生成高质量图像方面表现出卓越性能。然而，文本到图像模型可能会被欺骗以生成不适宜的内容，特别是在性场景中。现有的对抗措施主要集中在过滤不当的输入和输出，或者抑制不当的文本嵌入，这可以阻止显式的与NSFW相关的内容（例如，裸体或性感），但仍可能容易受到表面上看起来无害但具有恶意意图的对抗提示输入的攻击。在本文中，我们提出了SafeGen，一个框架，以文本不可知的方式减轻文本到图像模型生成不安全内容的问题。关键思想是消除模型中的不安全视觉表示，而不考虑文本输入。通过这种方式，文本到图像模型对于对抗提示是具有抵抗力的，因为不安全的视觉表示被从内部阻碍。在四个数据集上进行的大量实验表明，SafeGen在减少不安全内容生成的同时保留了良性图像的高保真度。SafeGen胜过了八种最先进的基准方法，并实现了99.1％的性内容移除性能。此外，我们构建的对抗提示基准为未来开发和评估反NSFW生成方法提供了基础。

更新时间: 2024-04-10 00:26:08

领域: cs.CV,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2404.06666v1

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for developing benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdated internet resources. Thus, they struggle to capture the intricacy, dynamics, and diversity of cultural norms. LLM-generated benchmarks are promising, yet risk propagating the same biases they are meant to measure. To synergize the creativity and expert cultural knowledge of human annotators and the scalability and standardizability of LLM-based automation, we introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build truly challenging evaluation dataset for assessing the multicultural knowledge of LLMs, while improving annotators' capabilities and experiences. Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions, that modern LLMs fail at, in a gamified manner. Importantly, the increased level of AI assistance (e.g., LLM-generated revision hints) empowers users to create more difficult questions with enhanced perceived creativity of themselves, shedding light on the promises of involving heavier AI assistance in modern evaluation dataset creation procedures. Through a series of 1-hour workshop sessions, we gather CULTURALBENCH-V0.1, a compact yet high-quality evaluation dataset with users' red-teaming attempts, that different families of modern LLMs perform with accuracy ranging from 37.7% to 72.2%, revealing a notable gap in LLMs' multicultural proficiency.

Updated: 2024-04-10 00:25:09

标题: 文化团队：AI辅助互动式红队行动挑战LLMs（缺乏）跨文化知识

摘要: 前沿大型语言模型（LLMs）是由具有偏向文化背景的研究人员和从业者在数据集上开发的。然而，LLMs的（缺乏）多元文化知识无法有效地通过当前用于开发基准的方法进行评估。现有的多元文化评估主要依赖于昂贵和受限制的人类注释或潜在过时的互联网资源。因此，它们很难捕捉文化规范的复杂性、动态性和多样性。LLM生成的基准具有潜力，但存在风险传播它们本应测量的相同偏见。为了协同人类注释者的创造力和专业文化知识以及基于LLM的自动化的可扩展性和标准化性，我们引入了CulturalTeaming，这是一种利用人工智能协作构建真正具有挑战性的评估数据集的互动红队系统，用于评估LLMs的多元文化知识，同时提高注释者的能力和体验。我们的研究表明，CulturalTeaming的各种AI辅助模式支持注释者以一种游戏化的方式创建现代LLMs无法做到的文化问题。重要的是，增加的AI辅助水平（例如，LLM生成的修订提示）使用户能够创造更困难的问题，并增强了他们自己的创造力感，揭示了将更多的AI辅助纳入现代评估数据集创建程序的潜力。通过一系列1小时的研讨会，我们收集了CULTURALBENCH-V0.1，这是一个紧凑但高质量的评估数据集，其中包含用户的红队尝试，不同系列的现代LLMs的准确率范围从37.7%到72.2%，揭示了LLMs在多元文化能力方面存在显着差距。

更新时间: 2024-04-10 00:25:09

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.06664v1

A Comprehensive Survey on Uncertainty Quantification for Deep Learning

Deep neural networks (DNNs) have achieved tremendous success in making accurate predictions for computer vision, natural language processing, as well as science and engineering domains. However, it is also well-recognized that DNNs sometimes make unexpected, incorrect, but overconfident predictions. This can cause serious consequences in high-stake applications, such as autonomous driving, medical diagnosis, and disaster response. Uncertainty quantification (UQ) aims to estimate the confidence of DNN predictions beyond prediction accuracy. In recent years, many UQ methods have been developed for DNNs. It is of great practical value to systematically categorize these UQ methods and compare their advantages and disadvantages. However, existing surveys mostly focus on categorizing UQ methodologies from a neural network architecture perspective or a Bayesian perspective and ignore the source of uncertainty that each methodology can incorporate, making it difficult to select an appropriate UQ method in practice. To fill the gap, this paper presents a systematic taxonomy of UQ methods for DNNs based on the types of uncertainty sources (data uncertainty versus model uncertainty). We summarize the advantages and disadvantages of methods in each category. We show how our taxonomy of UQ methodologies can potentially help guide the choice of UQ method in different machine learning problems (e.g., active learning, robustness, and reinforcement learning). We also identify current research gaps and propose several future research directions.

Updated: 2024-04-10 00:19:54

标题: 一个关于深度学习的不确定性量化的综合调查

摘要: 深度神经网络（DNNs）在计算机视觉、自然语言处理以及科学和工程领域取得了巨大成功，能够进行准确预测。然而，人们也普遍认识到DNNs有时会做出意外的、不正确但过于自信的预测。这可能在高风险应用中造成严重后果，例如自动驾驶、医学诊断和灾难响应。不确定性量化（UQ）旨在估计DNN预测的信心水平，超越预测准确性。近年来，已开发了许多面向DNN的UQ方法。系统地对这些UQ方法进行分类并比较它们的优缺点具有极大的实用价值。然而，现有调查主要集中在从神经网络架构角度或贝叶斯角度对UQ方法进行分类，忽略了每种方法可以包含的不确定性来源，使得在实践中选择适当的UQ方法变得困难。为了填补这一空白，本文基于数据不确定性与模型不确定性的来源类型，提出了一种系统的DNNs的UQ方法分类法。我们总结了每个类别中方法的优缺点。我们展示了我们的UQ方法论分类法如何潜在地帮助指导在不同的机器学习问题中选择UQ方法（例如主动学习、鲁棒性和强化学习）。我们还确定了当前的研究空白，并提出了几个未来的研究方向。

更新时间: 2024-04-10 00:19:54

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2302.13425v4