Arxiv Day: Article

Accelerating MRI Uncertainty Estimation with Mask-based Bayesian Neural Network

Accurate and reliable Magnetic Resonance Imaging (MRI) analysis is particularly important for adaptive radiotherapy, a recent medical advance capable of improving cancer diagnosis and treatment. Recent studies have shown that IVIM-NET, a deep neural network (DNN), can achieve high accuracy in MRI analysis, indicating the potential of deep learning to enhance diagnostic capabilities in healthcare. However, IVIM-NET does not provide calibrated uncertainty information needed for reliable and trustworthy predictions in healthcare. Moreover, the expensive computation and memory demands of IVIM-NET reduce hardware performance, hindering widespread adoption in realistic scenarios. To address these challenges, this paper proposes an algorithm-hardware co-optimization flow for high-performance and reliable MRI analysis. At the algorithm level, a transformation design flow is introduced to convert IVIM-NET to a mask-based Bayesian Neural Network (BayesNN), facilitating reliable and efficient uncertainty estimation. At the hardware level, we propose an FPGA-based accelerator with several hardware optimizations, such as mask-zero skipping and operation reordering. Experimental results demonstrate that our co-design approach can satisfy the uncertainty requirements of MRI analysis, while achieving 7.5 times and 32.5 times speedup on an Xilinx VU13P FPGA compared to GPU and CPU implementations with reduced power consumption.

Updated: 2024-07-07 23:57:40

标题: 利用基于掩模的贝叶斯神经网络加速MRI不确定性估计

摘要: 准确可靠的磁共振成像（MRI）分析对于自适应放射治疗尤为重要，这是一种能够改善癌症诊断和治疗的最新医学进展。最近的研究表明，IVIM-NET，一种深度神经网络（DNN），可以在MRI分析中实现高准确度，显示了深度学习在医疗诊断能力增强方面的潜力。然而，IVIM-NET并未提供在医疗领域可靠和值得信赖的预测所需的校准不确定性信息。此外，IVIM-NET的昂贵计算和内存需求降低了硬件性能，阻碍了在实际场景中的广泛采用。为解决这些挑战，本文提出了一种用于高性能和可靠MRI分析的算法-硬件协同优化流程。在算法层面上，引入了一个转换设计流程，将IVIM-NET转换为基于掩码的贝叶斯神经网络（BayesNN），从而促进可靠且高效的不确定性估计。在硬件层面上，我们提出了一个基于FPGA的加速器，并进行了几项硬件优化，如掩码零跳过和操作重新排序。实验结果表明，我们的协同设计方法可以满足MRI分析的不确定性要求，同时与GPU和CPU实现相比，在Xilinx VU13P FPGA上实现了7.5倍和32.5倍的加速，并降低了功耗。

更新时间: 2024-07-07 23:57:40

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2407.05521v1

Spatial-Temporal Large Language Model for Traffic Prediction

Traffic prediction, an essential component for intelligent transportation systems, endeavours to use historical data to foresee future traffic features at specific locations. Although existing traffic prediction models often emphasize developing complex neural network structures, their accuracy has not improved. Recently, large language models have shown outstanding capabilities in time series analysis. Differing from existing models, LLMs progress mainly through parameter expansion and extensive pretraining while maintaining their fundamental structures. Motivated by these developments, we propose a Spatial-Temporal Large Language Model (ST-LLM) for traffic prediction. In the ST-LLM, we define timesteps at each location as tokens and design a spatial-temporal embedding to learn the spatial location and global temporal patterns of these tokens. Additionally, we integrate these embeddings by a fusion convolution to each token for a unified spatial-temporal representation. Furthermore, we innovate a partially frozen attention strategy to adapt the LLM to capture global spatial-temporal dependencies for traffic prediction. Comprehensive experiments on real traffic datasets offer evidence that ST-LLM is a powerful spatial-temporal learner that outperforms state-of-the-art models. Notably, the ST-LLM also exhibits robust performance in both few-shot and zero-shot prediction scenarios. The code is publicly available at https://github.com/ChenxiLiu-HNU/ST-LLM.

Updated: 2024-07-07 23:57:29

标题: 时空大语言模型用于交通预测

摘要: 交通预测是智能交通系统的一个重要组成部分，旨在利用历史数据预测特定位置未来的交通特征。尽管现有的交通预测模型通常强调开发复杂的神经网络结构，但它们的准确性并未得到提高。最近，大型语言模型在时间序列分析方面表现出了出色的能力。与现有模型不同，LLMs主要通过参数扩展和广泛的预训练来进展，同时保持其基本结构。受到这些发展的启发，我们提出了一种用于交通预测的空间-时间大型语言模型（ST-LLM）。在ST-LLM中，我们将每个位置的时间步长定义为标记，并设计了一个空间-时间嵌入，以学习这些标记的空间位置和全局时间模式。此外，我们通过融合卷积将这些嵌入到每个标记中，以实现统一的空间-时间表示。此外，我们创新了部分冻结的注意力策略，以适应LLM捕捉交通预测的全局空间-时间依赖关系。对真实交通数据集的全面实验证明，ST-LLM是一个强大的空间-时间学习器，优于最先进的模型。值得注意的是，ST-LLM在少样本和零样本预测场景中也表现出稳健的性能。代码公开可在https://github.com/ChenxiLiu-HNU/ST-LLM获取。

更新时间: 2024-07-07 23:57:29

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2401.10134v4

A Theory of Machine Learning

We critically review three major theories of machine learning and provide a new theory according to which machines learn a function when the machines successfully compute it. We show that this theory challenges common assumptions in the statistical and the computational learning theories, for it implies that learning true probabilities is equivalent neither to obtaining a correct calculation of the true probabilities nor to obtaining an almost-sure convergence to them. We also briefly discuss some case studies from natural language processing and macroeconomics from the perspective of the new theory.

Updated: 2024-07-07 23:57:10

标题: 机器学习理论

摘要: 我们对机器学习的三大主要理论进行了批判性审查，并提出了一种新的理论，根据这种理论，当机器成功计算出一个函数时，机器就学会了这个函数。我们展示了这一理论挑战了统计学和计算学习理论中的一些常见假设，因为它意味着学习真实概率既不等同于获得真实概率的正确计算，也不等同于几乎肯定地收敛于真实概率。我们还从新理论的角度简要讨论了一些自然语言处理和宏观经济学的案例研究。

更新时间: 2024-07-07 23:57:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05520v1

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.

Updated: 2024-07-07 23:36:51

标题: 可微模态合成用于平面弦音物理建模和运动模拟

摘要: 在机器学习和计算机听觉领域，音乐生成和可微分音频合成取得了重大进展，但受物理定律指导的乐器振动模拟仍未得到充分探索。为填补这一空白，我们引入了一种新颖的模型，用于模拟非线性弦的时空运动，将模态合成和频谱建模整合到神经网络框架中。我们的模型利用物理特性和基本频率作为输入，输出解决描述非线性弦的偏微分方程的时空字符串状态。经验评估表明，与现有基线架构相比，所提出的架构在弦振动模拟方面实现了更高的准确性。代码和演示可在网上获得。

更新时间: 2024-07-07 23:36:51

领域: eess.AS,cs.AI,cs.SD,eess.SP

下载: http://arxiv.org/abs/2407.05516v1

Learning graph geometry and topology using dynamical systems based message-passing

In this paper we introduce DYMAG: a message passing paradigm for GNNs built on the expressive power of continuous, multiscale graph-dynamics. Standard discrete-time message passing algorithms implicitly make use of simplistic graph dynamics and aggregation schemes which limit their ability to capture fundamental graph topological properties. By contrast, DYMAG makes use of complex graph dynamics based on the heat and wave equation as well as a more complex equation which admits chaotic solutions. The continuous nature of the dynamics are leveraged to generate multiscale (dynamic-time snapshot) representations which we prove are linked to various graph topological and spectral properties. We demonstrate experimentally that DYMAG achieves superior performance in recovering the generating parameters of Erd\"os-Renyi and stochastic block model random graphs and the persistent homology of synthetic graphs and citation network. Since the behavior of proteins and biomolecules is sensitive to graph topology and exhibits important structure at multiple scales, we find that DYMAG outperforms other methods at predicting salient features of various biomolecules.

Updated: 2024-07-07 23:08:05

标题: 使用基于动力系统的消息传递学习图形几何和拓扑特征

摘要: 在本文中，我们介绍了DYMAG：一种建立在连续多尺度图动力学表达能力基础上的GNNs消息传递范式。标准的离散时间消息传递算法隐含地利用了简单的图动力学和聚合方案，限制了它们捕捉基本图拓扑特性的能力。相比之下，DYMAG利用了基于热方程和波动方程的复杂图动力学，以及一个允许混沌解的更复杂方程。动力学的连续性被利用来生成多尺度（动态时间快照）表示，我们证明这些表示与各种图拓扑和谱特性相关联。我们通过实验证明，DYMAG在恢复Erd\"os-Renyi和随机块模型随机图的生成参数以及合成图和引文网络的持久同调方面表现出优越性能。由于蛋白质和生物分子的行为对图拓扑敏感，并且在多个尺度上展现出重要结构，我们发现DYMAG在预测各种生物分子的显著特征方面优于其他方法。

更新时间: 2024-07-07 23:08:05

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2309.09924v4

Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.

Updated: 2024-07-07 22:58:52

标题: 经过状态占用规范化的蒙特卡洛树搜索中可证明有效的长期探索

摘要: 蒙特卡罗树搜索（MCTS）在各个领域取得了成功，但与基于采样的运动规划算法（如快速探索随机树）相比，在长时间探索方面面临挑战。为了解决MCTS的这些局限性，我们提出了一种基于策略优化的树搜索算法，其中包括状态占用度测量正则化，我们称之为“Volume-MCTS”。我们表明，基于计数的探索和基于采样的运动规划可以作为这种状态占用度测量正则化目标的近似解。我们在几个机器人导航问题上测试了我们的方法，发现Volume-MCTS优于AlphaZero，并显示出显著更好的长时间探索性能。

更新时间: 2024-07-07 22:58:52

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.05511v1

SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.

Updated: 2024-07-07 22:57:44

标题: SCATTER：具有耐热、节能特性的算法-电路共稀疏光子加速器与原位光重分布

摘要: 光子计算已经成为加速计算密集型人工智能（AI）工作负载的有前途的解决方案。然而，有限的重新配置性能、高电光转换成本和热敏感性限制了当前光学模拟计算引擎在规模上支持功耗受限、性能敏感的AI工作负载的部署。稀疏性为硬件高效的AI加速器提供了巨大机会。然而，当前密集的光子加速器未能充分利用算法稀疏性的节能潜力。这需要稀疏感知硬件专业化，基本重新设计光子张量核心拓扑结构，并跨层次设备-电路-架构-算法的优化，考虑到硬件非理想性和功耗瓶颈。为了减少冗余功耗消耗，同时最大限度地提高对热变化的稳健性，我们提出了一种新颖的算法-电路共稀疏光子加速器SCATTER，具有通过热容忍、功耗高效的原位光分配和功率门控的动态重配置信号路径的特性。引入了一个经过功率优化、串扰感知的动态稀疏训练框架，用于探索行列结构化稀疏性，并确保边际精度损失和最大功耗效率。广泛的评价显示，我们的交叉优化的加速器SCATTER实现了511倍的面积减少和12.4倍的功耗节约，具有超强的串扰容忍度，实现了空前的电路布局紧凑性和片上功耗效率。

更新时间: 2024-07-07 22:57:44

领域: cs.AR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.05510v1

Space-Time Diffusion Bridge

In this study, we introduce a novel method for generating new synthetic samples that are independent and identically distributed (i.i.d.) from high-dimensional real-valued probability distributions, as defined implicitly by a set of Ground Truth (GT) samples. Central to our method is the integration of space-time mixing strategies that extend across temporal and spatial dimensions. Our methodology is underpinned by three interrelated stochastic processes designed to enable optimal transport from an easily tractable initial probability distribution to the target distribution represented by the GT samples: (a) linear processes incorporating space-time mixing that yield Gaussian conditional probability densities, (b) their diffusion bridge analogs that are conditioned to the initial and final state vectors, and (c) nonlinear stochastic processes refined through score-matching techniques. The crux of our training regime involves fine-tuning the nonlinear model, and potentially the linear models -- to align closely with the GT data. We validate the efficacy of our space-time diffusion approach with numerical experiments, laying the groundwork for more extensive future theory and experiments to fully authenticate the method, particularly providing a more efficient (possibly simulation-free) inference.

Updated: 2024-07-07 22:44:32

标题: 时空扩散桥

摘要: 在这项研究中，我们介绍了一种新颖的方法，用于从高维实值概率分布中生成独立同分布（i.i.d.）的新合成样本，这些概率分布由一组地面真实（GT）样本隐含定义。我们方法的核心是集成跨越时间和空间维度的空间-时间混合策略。我们的方法基于三种相互关联的随机过程，旨在实现从易于处理的初始概率分布到由GT样本表示的目标分布的最优传输：（a）包含空间-时间混合的线性过程，产生高斯条件概率密度，（b）它们的扩散桥模拟，受到初始和最终状态向量的条件约束，以及（c）通过得分匹配技术精炼的非线性随机过程。我们训练制度的关键在于微调非线性模型，可能还包括线性模型，以使其与GT数据紧密对齐。我们通过数值实验验证了我们的空间-时间扩散方法的有效性，为更广泛的未来理论和实验奠定了基础，以完全验证该方法，尤其是提供更高效（可能无需模拟）的推理。

更新时间: 2024-07-07 22:44:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.08847v2

Chain-of-Thought Predictive Control

We study generalizable policy learning from demonstrations for complex low-level control (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subskill decomposition of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level demo segments, the observations at the segment boundaries constitute a chain of planning steps for the task, which we refer to as the chain-of-thought (CoT). Next, we propose a Transformer-based design that effectively learns to predict the CoT as the subskill-level guidance. We couple action and subskill predictions via learnable prompt tokens and a hybrid masking strategy, which enable dynamically updated guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on challenging manipulation tasks with sub-optimal demos.

Updated: 2024-07-07 22:06:34

标题: 链式思维预测控制

摘要: 我们研究了从示范中学习复杂低级控制（例如，接触丰富的物体操作）的可推广策略。我们提出了一种利用次优示范的新颖的分层模仿学习方法。首先，我们提出了一种观察空间不可知的方法，以无监督的方式高效地发现示范的多步子技能分解。通过将临时接近和功能相似的动作分组成子技能级别的示范段，段边界处的观察构成了任务的一系列规划步骤，我们称之为思考链（CoT）。接下来，我们提出了一种基于Transformer的设计，有效地学习预测CoT作为子技能级别的指导。我们通过可学习的提示标记和混合掩模策略将动作和子技能预测耦合，这使得在测试时可以动态更新指导，并改善轨迹的特征表示，以实现可推广策略学习。我们的方法，思考链预测控制（CoTPC），在具有次优示范的挑战性操作任务上始终超越现有的强基线。

更新时间: 2024-07-07 22:06:34

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2304.00776v2

Towards Improving Unit Commitment Economics: An Add-On Tailor for Renewable Energy and Reserve Predictions

Generally, day-ahead unit commitment (UC) is conducted in a predict-then-optimize process: it starts by predicting the renewable energy source (RES) availability and system reserve requirements; given the predictions, the UC model is then optimized to determine the economic operation plans. In fact, predictions within the process are raw. In other words, if the predictions are further tailored to assist UC in making the economic operation plans against realizations of the RES and reserve requirements, UC economics will benefit significantly. To this end, this paper presents a cost-oriented tailor of RES-and-reserve predictions for UC, deployed as an add-on to the predict-then-optimize process. The RES-and-reserve tailor is trained by solving a bi-level mixed-integer programming model: the upper level trains the tailor based on its induced operating cost; the lower level, given tailored predictions, mimics the system operation process and feeds the induced operating cost back to the upper level; finally, the upper level evaluates the training quality according to the fed-back cost. Through this training, the tailor learns to customize the raw predictions into cost-oriented predictions. Moreover, the tailor can be embedded into the existing predict-then-optimize process as an add-on, improving the UC economics. Lastly, the presented method is compared to traditional, binary-relaxation, neural network-based, stochastic, and robust methods.

Updated: 2024-07-07 22:00:33

标题: 朝着改进机组组合经济性：针对可再生能源和备用预测的定制附加模块

摘要: 通常，提前一天的机组调度（UC）是在一个预测-优化的过程中进行的：首先预测可再生能源（RES）的可用性和系统备用要求；根据这些预测，UC模型被优化以确定经济运营计划。事实上，这个过程中的预测是原始的。换句话说，如果这些预测进一步定制以帮助UC根据RES和备用要求的实现制定经济运营计划，UC经济将受益显著。为此，本文提出了一种以成本为导向的RES和备用预测定制方法，作为预测-优化过程的附加组件。RES和备用预测定制方法通过解决双层混合整数规划模型进行训练：上层基于其引起的运营成本对训练器进行训练；下层根据定制的预测，模拟系统运行过程，并将引起的运营成本反馈给上层；最后，上层根据反馈的成本评估训练质量。通过这种训练，定制器学会将原始预测定制为以成本为导向的预测。此外，定制器可以嵌入到现有的预测-优化过程中作为附加组件，提高UC的经济效益。最后，本方法与传统的、二元松弛、基于神经网络的、随机的和稳健的方法进行了比较。

更新时间: 2024-07-07 22:00:33

领域: math.OC,cs.LG,cs.SY,eess.SY,stat.AP

下载: http://arxiv.org/abs/2208.13065v4

Synthetic Participatory Planning of Shard Automated Electric Mobility Systems

Unleashing the synergies among rapidly evolving mobility technologies in a multi-stakeholder setting presents unique challenges and opportunities for addressing urban transportation problems. This paper introduces a novel synthetic participatory method that critically leverages large language models (LLMs) to create digital avatars representing diverse stakeholders to plan shared automated electric mobility systems (SAEMS). These calibratable agents collaboratively identify objectives, envision and evaluate SAEMS alternatives, and strategize implementation under risks and constraints. The results of a Montreal case study indicate that a structured and parameterized workflow provides outputs with higher controllability and comprehensiveness on an SAEMS plan than that generated using a single LLM-enabled expert agent. Consequently, this approach provides a promising avenue for cost-efficiently improving the inclusivity and interpretability of multi-objective transportation planning, suggesting a paradigm shift in how we envision and strategize for sustainable transportation systems.

Updated: 2024-07-07 21:56:23

标题: 合成参与式碎片化自动化电动出行系统规划

摘要: 在一个多利益相关者的环境中释放快速发展的移动技术之间的协同效应，为解决城市交通问题提供了独特的挑战和机遇。本文介绍了一种新颖的综合参与方法，该方法关键地利用大型语言模型(LLMs)来创建数字化化身，代表不同利益相关者规划共享自动化电动机动系统（SAEMS）。这些可校准的代理人共同确定目标，设想和评估SAEMS的替代方案，并在风险和限制下制定实施策略。蒙特利尔案例研究的结果表明，一个结构化和参数化的工作流程提供了比单个LLM启用的专家代理人生成的SAEMS计划更具可控性和全面性的输出。因此，这种方法为成本有效地改善多目标交通规划的包容性和可解释性提供了一个有前途的途径，暗示了我们在如何设想和制定可持续交通系统的战略上出现了范式转变。

更新时间: 2024-07-07 21:56:23

领域: cs.CE,cs.AI,cs.CY,cs.HC,cs.MA

下载: http://arxiv.org/abs/2404.12317v4

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

With Retrieval Augmented Generation (RAG), Large Language Models (LLMs) are playing a pivotal role in information search and are being adopted globally. Although the multilingual capability of LLMs offers new opportunities to bridge the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known occurrences? In this paper, we studied LLM's linguistic preference in a RAG-based information search setting. We found that LLMs displayed systemic bias towards information in the same language as the query language in both information retrieval and answer generation. Furthermore, in scenarios where there is little information in the language of the query, LLMs prefer documents in high-resource languages, reinforcing the dominant views. Such bias exists for both factual and opinion-based queries. Our results highlight the linguistic divide within multilingual LLMs in information search systems. The seemingly beneficial multilingual capability of LLMs may backfire on information parity by reinforcing language-specific information cocoons or filter bubbles further marginalizing low-resource views.

Updated: 2024-07-07 21:26:36

标题: 伪多语者：关于多语言大语言模型信息不对称的研究

摘要: 通过检索增强生成（RAG），大型语言模型（LLMs）在信息搜索中发挥着至关重要的作用，并被全球采用。尽管LLMs的多语言能力为跨越语言障碍提供了新的机会，但这些能力是否转化为在已知多语源之间存在语言隔阴和知识冲突的现实场景中？本文研究了基于RAG的信息搜索设置中LLMs的语言偏好。我们发现LLMs在信息检索和答案生成中对与查询语言相同语言的信息显示出系统性偏见。此外，在查询语言中很少有信息的情况下，LLMs更倾向于高资源语言的文档，强化了主导观点。这种偏见存在于基于事实和基于观点的查询中。我们的结果突显了多语言LLMs在信息搜索系统中存在的语言隔阴。看似有益的LLMs的多语言能力可能通过强化特定语言的信息茧或过滤泡泡而对信息平等产生负面影响，进一步边缘化低资源观点。

更新时间: 2024-07-07 21:26:36

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.05502v1

Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

Updated: 2024-07-07 21:11:23

标题: 赫西亚意识低秩扰动在顺序鲁棒连续学习中的应用

摘要: 持续学习旨在连续学习一系列任务，而不忘记从之前任务中获得的知识。在这项工作中，我们提出了一种适用于持续学习的Hessian Aware Low-Rank Perturbation算法。通过将参数转换沿着顺序任务的权重矩阵转换建模，我们提出在神经网络的每一层中应用低秩逼近于任务自适应参数。具体地，我们在理论上证明了Hessian和所提出的低秩逼近之间的定量关系。然后根据由特定层梯度和低秩逼近误差估计的经验损失的边际增量全局确定逼近秩。此外，我们通过修剪较不重要的参数来控制模型容量，减少参数增长。我们在各种基准测试上进行了大量实验，包括一个具有大规模任务的数据集，并将我们的方法与一些最近的最新方法进行比较，以证明我们提出的方法的有效性和可扩展性。实证结果表明，我们的方法在不同基准测试中表现更好，特别是在实现任务顺序稳健性和处理遗忘问题方面。源代码位于https://github.com/lijiaqi/HALRP。

更新时间: 2024-07-07 21:11:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.15161v4

BugNIST -- a Large Volumetric Dataset for Object Detection under Domain Shift

Domain shift significantly influences the performance of deep learning algorithms, particularly for object detection within volumetric 3D images. Annotated training data is essential for deep learning-based object detection. However, annotating densely packed objects is time-consuming and costly. Instead, we suggest training models on individually scanned objects, causing a domain shift between training and detection data. To address this challenge, we introduce the BugNIST dataset, comprising 9154 micro-CT volumes of 12 bug types and 388 volumes of tightly packed bug mixtures. This dataset is characterized by having objects with the same appearance in the source and target domains, which is uncommon for other benchmark datasets for domain shift. During training, individual bug volumes labeled by class are utilized, while testing employs mixtures with center point annotations and bug type labels. Together with the dataset, we provide a baseline detection analysis, with the aim of advancing the field of 3D object detection methods.

Updated: 2024-07-07 21:10:27

标题: BugNIST--一个用于领域漂移下目标检测的大容量体积数据集

摘要: 域偏移显著影响深度学习算法的性能，特别是对于体积三维图像中的目标检测。深度学习基于对象检测的注释训练数据至关重要。然而，注释密集对象是耗时且昂贵的。相反，我们建议在单独扫描的对象上训练模型，导致训练和检测数据之间的域偏移。为了解决这一挑战，我们引入了BugNIST数据集，包括12种昆虫类型的9154个微CT体积和388个紧密包装的昆虫混合体积。该数据集的特点是源域和目标域中具有相同外观的对象，这在其他域偏移基准数据集中是不常见的。在训练期间，利用按类标记的单个昆虫体积，而测试则使用具有中心点注释和昆虫类型标签的混合体积。除了数据集，我们还提供了基线检测分析，旨在推进三维物体检测方法领域的发展。

更新时间: 2024-07-07 21:10:27

领域: cs.CV,cs.AI,I.2.10; I.4.6

下载: http://arxiv.org/abs/2304.01838v3

Prospective Messaging: Learning in Networks with Communication Delays

Inter-neuron communication delays are ubiquitous in physically realized neural networks such as biological neural circuits and neuromorphic hardware. These delays have significant and often disruptive consequences on network dynamics during training and inference. It is therefore essential that communication delays be accounted for, both in computational models of biological neural networks and in large-scale neuromorphic systems. Nonetheless, communication delays have yet to be comprehensively addressed in either domain. In this paper, we first show that delays prevent state-of-the-art continuous-time neural networks called Latent Equilibrium (LE) networks from learning even simple tasks despite significant overparameterization. We then propose to compensate for communication delays by predicting future signals based on currently available ones. This conceptually straightforward approach, which we call prospective messaging (PM), uses only neuron-local information, and is flexible in terms of memory and computation requirements. We demonstrate that incorporating PM into delayed LE networks prevents reaction lags, and facilitates successful learning on Fourier synthesis and autoregressive video prediction tasks.

Updated: 2024-07-07 20:54:14

标题: 前瞻性通信：具有通信延迟的网络学习

摘要: Inter-neuron communication delays are common in both biological neural circuits and neuromorphic hardware. These delays can have significant negative effects on network dynamics during training and inference. It is important to consider these delays in computational models of neural networks and large-scale neuromorphic systems. However, communication delays have not been fully addressed in either field. In this paper, we demonstrate that delays hinder the learning capabilities of Latent Equilibrium (LE) networks, even with excessive parameters. We propose a solution to compensate for communication delays by predicting future signals based on current ones, a method we call prospective messaging (PM). PM uses only local neuron information and is adaptable in terms of memory and computation requirements. By incorporating PM into delayed LE networks, we eliminate reaction delays and improve learning performance on tasks such as Fourier synthesis and autoregressive video prediction.

更新时间: 2024-07-07 20:54:14

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05494v1

Multi-level Reliability Interface for Semantic Communications over Wireless Networks

Semantic communication, when examined through the lens of joint source-channel coding (JSCC), maps source messages directly into channel input symbols, where the measure of success is defined by end-to-end distortion rather than traditional metrics such as block error rate. Previous studies have shown significant improvements achieved through deep learning (DL)-driven JSCC compared to traditional separate source and channel coding. However, JSCC is impractical in existing communication networks, where application and network providers are typically different entities connected over general-purpose TCP/IP links. In this paper, we propose designing the source and channel mappings separately and sequentially via a novel multi-level reliability interface. This conceptual interface enables semi-JSCC at both the learned source and channel mappers and achieves many of the gains observed in existing DL-based JSCC work (which would require a fully joint design between the application and the network), such as lower end-to-end distortion and graceful degradation of distortion with channel quality. We believe this work represents an important step towards realizing semantic communications in wireless networks.

Updated: 2024-07-07 20:15:10

标题: 无线网络上语义通信的多级可靠性接口

摘要: 语义通信，通过联合源-信道编码（JSCC）的视角来考察，将源消息直接映射到信道输入符号，成功的度量标准是端到端失真，而不是传统的块错误率等指标。先前的研究表明，通过深度学习（DL）驱动的JSCC相比传统的分离源和信道编码取得了显著的改善。然而，在现有通信网络中，JSCC是不切实际的，应用和网络提供商通常是连接在通用TCP/IP链路上的不同实体。在本文中，我们提出通过一种新颖的多级可靠性接口分别和顺序地设计源和信道映射。这种概念接口使得在学习源和信道映射器上实现了半JSCC，并实现了现有基于DL的JSCC工作中观察到的许多收益（这需要应用和网络之间的完全联合设计），例如更低的端到端失真和随信道质量的优雅失真降级。我们认为这项工作是实现无线网络中语义通信的重要一步。

更新时间: 2024-07-07 20:15:10

领域: cs.IT,cs.LG,eess.IV,eess.SP,math.IT

下载: http://arxiv.org/abs/2407.05487v1

Learning to Price Homogeneous Data

We study a data pricing problem, where a seller has access to $N$ homogeneous data points (e.g. drawn i.i.d. from some distribution). There are $m$ types of buyers in the market, where buyers of the same type $i$ have the same valuation curve $v_i:[N]\rightarrow [0,1]$, where $v_i(n)$ is the value for having $n$ data points. \textit{A priori}, the seller is unaware of the distribution of buyers, but can repeat the market for $T$ rounds so as to learn the revenue-optimal pricing curve $p:[N] \rightarrow [0, 1]$. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an \emph{anonymous} pricing curve $p_t$. A new buyer appears and may choose to purchase some amount of data. She then reveals her type \emph{only if} she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve $\tilde{O}\left(m\sqrt{T}\right)$ regret in the stochastic setting and $\tilde{O}\left(m^{\frac{3}{2}}\sqrt{T}\right)$ regret in the adversarial setting.

Updated: 2024-07-07 20:02:52

标题: 学习如何定价同质数据

摘要: 我们研究了一个数据定价问题，其中卖家可以访问$N$个同质数据点（例如从某个分布中独立同分布地抽取）。市场上有$m$种类型的买家，同一类型$i$的买家具有相同的估值曲线$v_i:[N]\rightarrow [0,1]$，其中$v_i(n)$表示拥有$n$个数据点的价值。卖家事先不知道买家的分布，但可以重复进行$T$轮市场以学习最优收入定价曲线$p:[N] \rightarrow [0, 1]$。为了解决这个在线学习问题，我们首先开发了新颖的离散化方案来逼近任何定价曲线。与先前的工作相比，我们的离散化方案的规模随近似参数的增加而逐渐扩大，这导致在线学习中更好的后悔。在满足数据的光滑性和边际收益递减等假设下，离散化大小可以进一步减小。然后我们转向在线学习问题，无论是在随机还是对抗设置中。在每一轮中，卖家选择一个“匿名”的定价曲线$p_t$。一个新的买家出现并可能选择购买一定数量的数据。只有在她购买时才会透露她的类型。我们的在线算法建立在经典算法（如UCB和FTPL）的基础上，但需要新颖的想法来考虑这种反馈的不对称性，并处理定价曲线空间的广阔性。利用先前开发的改进的离散化方案，我们能够在随机设置中实现$\tilde{O}\left(m\sqrt{T}\right)$后悔，在对抗设置中实现$\tilde{O}\left(m^{\frac{3}{2}}\sqrt{T}\right)$后悔。

更新时间: 2024-07-07 20:02:52

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2407.05484v1

Just read twice: closing the recall gap for recurrent language models

Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives $11.0 \pm 1.3$ points of improvement, averaged across $16$ recurrent LMs and the $6$ ICL tasks, with $11.9\times$ higher throughput than FlashAttention-2 for generation prefill (length $32$k, batch size $16$, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides $99\%$ of Transformer quality at $360$M params., $30$B tokens and $96\%$ at $1.3$B params., $50$B tokens on average across the tasks, with $19.2\times$ higher throughput for prefill than FA2.

Updated: 2024-07-07 19:55:09

标题: 只需阅读两次：弥合循环语言模型的召回差距

摘要: 最近出现了一些与Transformers在语言建模困惑度方面竞争的大型循环语言模型（例如Mamba，RWKV）。令人兴奋的是，这些架构在推断过程中使用恒定数量的内存。然而，由于内存有限，循环LM无法回忆和利用长上下文中的所有信息，导致了上下文学习质量的脆弱性。对于高效的LM，一个关键挑战是选择存储还是丢弃信息。在这项工作中，我们观察到向LM呈现信息的顺序影响了选择困难程度。为了形式化这一点，我们表明信息回忆的难度归结为一个称为集合不相交性（SD）的问题的难度，这是通信复杂性中的一个典型问题，需要一个流算法（例如循环模型）来决定输入的集合是否不相交。我们在实证和理论上表明，解决SD所需的循环内存随着集合顺序的变化而变化，即较小的集合是否首先出现在上下文中。我们的分析表明，为了减少对数据顺序的依赖，我们可以将信息按正确的顺序放置在上下文中，或者非因果地处理提示。为此，我们提出：（1）JRT-Prompt，其中上下文在提示中多次重复，有效地向模型展示所有数据顺序。这在16个循环LM和6个ICL任务中平均提高了11.0 ± 1.3个点，并且在生成预先填充（长度32k，批量大小16，NVidia H100）方面比FlashAttention-2的吞吐量高11.9倍。然后我们提出（2）JRT-RNN，它使用非因果前缀线性注意力来处理提示，并在各项任务中平均提供了Transformer质量的99％，参数为360M，令牌为30B，以及参数为1.3B，令牌为50B时为96％，且对于预先填充的吞吐量比FA2高19.2倍。

更新时间: 2024-07-07 19:55:09

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.05483v1

Discounted Pseudocosts in MILP

In this article, we introduce the concept of discounted pseudocosts, inspired by discounted total reward in reinforcement learning, and explore their application in mixed-integer linear programming (MILP). Traditional pseudocosts estimate changes in the objective function due to variable bound changes during the branch-and-bound process. By integrating reinforcement learning concepts, we propose a novel approach incorporating a forward-looking perspective into pseudocost estimation. We present the motivation behind discounted pseudocosts and discuss how they represent the anticipated reward for branching after one level of exploration in the MILP problem space. Initial experiments on MIPLIB 2017 benchmark instances demonstrate the potential of discounted pseudocosts to enhance branching strategies and accelerate the solution process for challenging MILP problems.

Updated: 2024-07-07 19:41:38

标题: MILP中的折扣伪成本

摘要: 在本文中，我们介绍了折扣伪成本的概念，受到强化学习中折扣总奖励的启发，并探讨了它们在混合整数线性规划（MILP）中的应用。传统的伪成本估计在分支和界限过程中由于变量边界变化而导致的目标函数的变化。通过整合强化学习概念，我们提出了一种将前瞻性视角纳入伪成本估计的新方法。我们阐述了折扣伪成本背后的动机，并讨论了它们如何代表在MILP问题空间的一级探索之后分支的预期奖励。对MIPLIB 2017基准实例的初步实验表明，折扣伪成本有潜力增强分支策略并加速解决具有挑战性的MILP问题的过程。

更新时间: 2024-07-07 19:41:38

领域: cs.AI,cs.LG,math.OC,90C11 (Primary), 90C10, 90-08 (Secondary)

下载: http://arxiv.org/abs/2407.06237v1

Explore until Confident: Efficient Exploration for Embodied Question Answering

We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions. However, there are two main challenges when using VLMs in EQA: they do not have an internal memory for mapping the scene to be able to plan how to explore over time, and their confidence can be miscalibrated and can cause the robot to prematurely stop exploration or over-explore. We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM - leveraging its vast knowledge of relevant regions of the scene for exploration. Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration - leading to a more calibrated and efficient exploration strategy. To test our framework in simulation, we also contribute a new EQA dataset with diverse, realistic human-robot scenarios and scenes built upon the Habitat-Matterport 3D Research Dataset (HM3D). Both simulated and real robot experiments show our proposed approach improves the performance and efficiency over baselines that do no leverage VLM for exploration or do not calibrate its confidence. Webpage with experiment videos and code: https://explore-eqa.github.io/

Updated: 2024-07-07 19:40:31

标题: 探索直到自信：面向具身问答的高效探索

摘要: 我们考虑了具体化问答（EQA）的问题，这指的是需要主动探索环境以收集信息直到对问题的答案有信心的具体化代理（如机器人）的情况。在这项工作中，我们利用大型视觉语言模型（VLMs）强大的语义推理能力来高效地探索和回答这类问题。然而，在EQA中使用VLMs时存在两个主要挑战：它们没有内部记忆用于将场景映射，以能够规划如何随时间探索，并且它们的置信度可能被错误校准，可能导致机器人过早停止探索或过度探索。我们提出一种方法，首先基于深度信息和通过VLM的视觉提示构建场景的语义地图，利用其对场景相关区域的广泛知识进行探索。接下来，我们使用合规预测来校准VLM的问答置信度，使机器人知道何时停止探索 - 这导致更加校准和高效的探索策略。为了在模拟中测试我们的框架，我们还贡献了一个新的EQA数据集，其中包含多样化、逼真的人机场景和场景，构建在Habitat-Matterport 3D研究数据集（HM3D）之上。模拟和真实机器人实验显示，我们提出的方法改善了性能和效率，超过了不利用VLM进行探索或未校准其置信度的基线。实验视频和代码网页：https://explore-eqa.github.io/

更新时间: 2024-07-07 19:40:31

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.15941v3

VC Theory for Inventory Policies

Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalization error of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of ``learning less is more'' in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error an order of magnitude lower than that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions.

Updated: 2024-07-07 19:32:17

标题: 库存政策的VC理论

摘要: 计算能力和人工智能的进步增加了对强化学习方法在库存管理中的兴趣。本文为这些方法提供了理论基础，并研究了限制在库存理论中已经建立的策略结构的好处。特别地，我们通过利用著名的Vapnik-Chervonenkis（VC）理论，证明了学习几种众所周知的库存策略类的泛化保证，包括基准库存和（s，S）策略。我们应用VC理论中的伪维数和Fat-shattering维度来确定库存策略的泛化误差，即库存策略在训练数据上的表现与在看不见的数据上的预期表现之间的差异。我们专注于没有上下文的经典设置，但允许对需求序列进行任意分布，并且不做出任何关于时间独立性的假设。我们通过数值模拟来验证我们的监督学习结果。在管理方面，我们的理论和模拟翻译为以下见解。首先，在库存管理中有一个“学习越少越好”的原则：根据可用的数据量，限制自己在一个更简单但可能次优的库存策略类中可能有益于最小化过拟合错误。其次，策略类中的参数数量可能不是过拟合错误的正确衡量标准：事实上，由T时间变化的基准库存水平定义的策略类的泛化误差比两个参数（s，S）策略类低一个数量级。最后，我们的研究表明，将基准库存和库存位置的概念纳入黑盒学习机中，而不是让这些机器直接学习订购数量行动，可能是有益的。

更新时间: 2024-07-07 19:32:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.11509v2

Personalized Language Modeling from Personalized Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be ineffective in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, including a user model that maps user information to user representations and can flexibly encode our assumptions on user preferences. We develop new learning objectives to perform personalized Direct Preference Optimization that jointly learns a user model and a personalized language model. We demonstrate the efficacy of our proposed method through (1) a synthetic task where we fine-tune a GPT-J 6B model to align with users with conflicting preferences on generation length; and (2) an instruction following task where we fine-tune a Tulu-7B model to generate responses for users with diverse preferences on the style of responses. In both cases, our learned models can generate personalized responses that are better aligned with the preferences of individual users.

Updated: 2024-07-07 19:31:21

标题: 个性化语言建模基于个性化人类反馈

摘要: 人类反馈强化学习（RLHF）通常用于微调大型语言模型，以更好地符合人类偏好。然而，在这一框架下开发的算法的基本前提可能存在问题，因为编码在人类反馈中的用户偏好可能是多样化的。在这项工作中，我们旨在通过开发构建个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务，并解释了为什么普通的RLHF在这种情况下可能效果不佳。然后，我们提出了一个通用的个性化RLHF（P-RLHF）框架，包括一个用户模型，将用户信息映射到用户表示，并且可以灵活地编码我们对用户偏好的假设。我们开发了新的学习目标，执行个性化的直接偏好优化，同时学习用户模型和个性化语言模型。我们通过以下方式证明了我们提出的方法的有效性：（1）在一个合成任务中，我们微调了一个GPT-J 6B模型，以使其与对生成长度有冲突偏好的用户保持一致；（2）在一个遵循指令的任务中，我们微调了一个Tulu-7B模型，为对回复风格有不同偏好的用户生成回应。在两种情况下，我们学习的模型都能够生成更符合个体用户偏好的个性化回应。

更新时间: 2024-07-07 19:31:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.05133v2

Compact Proofs of Model Performance via Mechanistic Interpretability

We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-07-07 19:19:51

标题: 通过机械可解释性的简洁模型性能证明

摘要: 我们建议使用机械可解释性技术——将模型权重逆向工程成人类可解释的算法——来推导并紧凑地证明模型性能的形式保证。我们通过正式证明151个小型transformer在Max-of-$K$任务上的准确性下限来原型化这种方法。我们创建了102种不同的计算机辅助证明策略，并评估它们在我们的每个模型上的长度和紧密度。使用定量指标，我们发现较短的证明似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解会导致更紧密的性能边界。我们通过定性地检查我们的一部分证明来确认这些联系。最后，我们确定结构化噪声的复合是利用机械可解释性生成模型性能紧凑证明的关键挑战。

更新时间: 2024-07-07 19:19:51

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v7

Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses

Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and in the face of rapid LLM advancements. In this study, we introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses. Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors and existing synthetic generation methods in both accuracy and latency, indicating efficacy of our approach.

Updated: 2024-07-07 19:19:32

标题: 通过扰动式合成数据生成增强系统响应中的幻觉检测

摘要: 在大型语言模型（LLM）输出中检测幻觉是至关重要的，然而传统的微调用于这一分类任务因昂贵且迅速过时的注释过程而受阻，尤其是在多个垂直领域和面对LLM快速进展的情况下。在这项研究中，我们引入了一种通过重写系统响应来自动生成忠实和幻觉输出的方法。实验结果表明，在我们生成的数据集上微调的T5-base模型在准确性和延迟方面均超过了最先进的零-shot检测器和现有的合成生成方法，表明我们的方法的有效性。

更新时间: 2024-07-07 19:19:32

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.05474v1

$\textbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting

Recently, there has been a growing interest in leveraging pre-trained large language models (LLMs) for various time series applications. However, the semantic space of LLMs, established through the pre-training, is still underexplored and may help yield more distinctive and informative representations to facilitate time series forecasting. To this end, we propose Semantic Space Informed Prompt learning with LLM ($S^2$IP-LLM) to align the pre-trained semantic space with time series embeddings space and perform time series forecasting based on learned prompts from the joint space. We first design a tokenization module tailored for cross-modality alignment, which explicitly concatenates patches of decomposed time series components to create embeddings that effectively encode the temporal dynamics. Next, we leverage the pre-trained word token embeddings to derive semantic anchors and align selected anchors with time series embeddings by maximizing the cosine similarity in the joint space. This way, $S^2$IP-LLM can retrieve relevant semantic anchors as prompts to provide strong indicators (context) for time series that exhibit different temporal dynamics. With thorough empirical studies on multiple benchmark datasets, we demonstrate that the proposed $S^2$IP-LLM can achieve superior forecasting performance over state-of-the-art baselines. Furthermore, our ablation studies and visualizations verify the necessity of prompt learning informed by semantic space.

Updated: 2024-07-07 19:14:34

标题: $\textbf{S}^2$IP-LLM：基于语义空间的提示学习与LLM结合用于时间序列预测

摘要: 最近，人们越来越关注利用预训练的大型语言模型（LLMs）来进行各种时间序列应用。然而，通过预训练建立的LLMs的语义空间仍然未被充分探索，可能有助于产生更具有区别性和信息量的表示，以促进时间序列预测。为此，我们提出了基于LLM的语义空间通知提示学习（S^2IP-LLM），以将预训练的语义空间与时间序列嵌入空间对齐，并基于联合空间中学习到的提示进行时间序列预测。我们首先设计了一个针对跨模态对齐的分词模块，该模块明确地将分解的时间序列组件的补丁串联起来，以创建有效地编码时间动态的嵌入。接下来，我们利用预训练的词标记嵌入来推导语义锚点，并通过在联合空间中最大化余弦相似性来将选定的锚点与时间序列嵌入对齐。这样，S^2IP-LLM可以检索相关的语义锚点作为提示，为展现不同时间动态的时间序列提供强有力的指示（上下文）。通过对多个基准数据集进行彻底的实证研究，我们证明了所提出的S^2IP-LLM可以在预测性能上优于最先进的基线。此外，我们的消融研究和可视化验证了由语义空间指导的提示学习的必要性。

更新时间: 2024-07-07 19:14:34

领域: cs.LG

下载: http://arxiv.org/abs/2403.05798v2

Deep Learning for Protein-Ligand Docking: Are We There Yet?

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the practical context of (1) using predicted (apo) protein structures for docking (e.g., for broad applicability); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for practical protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that all recent DL docking methods but one fail to generalize to multi-ligand protein targets and also that template-based docking algorithms perform equally well or better for multi-ligand docking as recent single-ligand DL docking methods, suggesting areas of improvement for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

Updated: 2024-07-07 19:12:04

标题: 蛋白质-配体对接的深度学习：我们已经到达了吗？

摘要: 配体结合对蛋白质结构及其体内功能的影响对现代生物医学研究和生物技术发展工作，如药物发现，具有诸多意义。尽管最近已经引入了几种为蛋白质-配体对接设计的深度学习（DL）方法和基准，但迄今为止，尚无先前的研究系统地研究了在实际上下文中使用预测的（裸）蛋白质结构进行对接（例如，用于广泛适用性）；同时对接多个配体到给定目标蛋白质（例如，用于酶设计）；以及没有先验知识的配体结合口袋（例如，用于口袋泛化）的行为。为了深入了解对接方法在现实世界中的实用性，我们介绍了PoseBench，这是第一个全面的实用蛋白质-配体对接基准。PoseBench使研究人员能够严格系统地评估DL对接方法用于裸至全溶蛋白质-配体对接和蛋白质-配体结构生成的性能，并使用单一和多配体基准数据集，我们首次向DL社区介绍了后者。根据PoseBench的实证结果，我们发现所有最近的DL对接方法，除一种外，都无法推广到多配体蛋白靶标，并且基于模板的对接算法在多配体对接方面表现同样好或更好，这表明了未来工作的改进方向。代码、数据、教程和基准结果可在https://github.com/BioinfoMachineLearning/PoseBench获取。

更新时间: 2024-07-07 19:12:04

领域: cs.LG,cs.AI,q-bio.BM,q-bio.QM,I.2.1; J.3

下载: http://arxiv.org/abs/2405.14108v3

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images

Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and a diffusion model, using BRATS20, BRATS21 and a chest x-ray pneumonia dataset, to synthesize brain MRI and chest x-ray images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are more likely to memorize the training images, compared to StyleGAN, especially for small datasets and when using 2D slices from 3D volumes. Researchers should be careful when using diffusion models (and to some extent GANs) for medical imaging, if the final goal is to share the synthetic images.

Updated: 2024-07-07 19:09:39

标题: 警惕扩散模型用于合成医学图像--与GANs在记忆脑部MRI和胸部X射线图像方面的比较

摘要: 扩散模型最初是为了文本到图像生成而开发的，现在也被用于生成高质量的合成图像。在 GANs 之前，扩散模型使用各种评估指标展现出令人印象深刻的结果。然而，常用的评估指标如 FID 和 IS 并不适用于确定扩散模型是否仅仅是在复制训练图像。在这里，我们使用 BRATS20、BRATS21 和胸部 X 射线肺炎数据集，训练 StyleGAN 和扩散模型，合成脑部 MRI 和胸部 X 射线图像，并测量合成图像与所有训练图像之间的相关性。我们的结果显示，与 StyleGAN 相比，扩散模型更有可能记忆训练图像，尤其是对于小数据集和在使用来自 3D 体积的 2D 切片时。研究人员在使用扩散模型（在某种程度上也是 GANs）进行医学成像时，如果最终目标是共享合成图像，应该谨慎。

更新时间: 2024-07-07 19:09:39

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2305.07644v3

The infrastructure powering IBM's Gen AI model development

AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.

Updated: 2024-07-07 18:39:33

标题: 支持IBM的Gen AI模型开发的基础设施

摘要: 人工智能基础设施在开发和部署先进人工智能模型的速度和成本竞争力中起着关键作用。目前对强大人工智能基础设施进行模型训练的需求是由生成式人工智能和基础模型的出现推动的，有时数千个GPU必须合作进行单个训练作业，以便在合理的时间内训练模型。提供高效和高性能的人工智能训练需要一个端到端的解决方案，结合硬件、软件和整体遥测来满足多种类型的人工智能工作负载。在这份报告中，我们描述了IBM推动我们生成式人工智能模型开发的混合云基础设施。这个基础设施包括(1) Vela：一个AI优化的超级计算能力，直接集成到IBM云中，提供可扩展、动态、多租户和地理分布式基础设施，用于大规模模型训练和其他人工智能工作流程步骤；(2) Blue Vela：一个大规模、专门构建的本地托管环境，优化支持我们最大和最有抱负的人工智能模型训练任务。Vela为IBM提供了高性能和灵活性的双重好处，以适应不断发展的商业环境。Blue Vela为我们提供了迅速开发我们最大和最有抱负的模型的好处，以及未来对行业中不断发展的模型景观进行未来保障。总之，它们使IBM能够在人工智能模型和商业产品的开发中快速创新。

更新时间: 2024-07-07 18:39:33

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.05467v1

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework's functionality using a programming language different from the framework's default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.

Updated: 2024-07-07 18:39:27

标题: 研究TensorFlow和PyTorch绑定对机器学习软件质量的影响

摘要: 机器学习框架（如TensorFlow和PyTorch）的绑定允许开发人员使用与框架默认语言（通常为Python）不同的编程语言集成框架的功能。本文研究了在C＃，Rust，Python和JavaScript中使用TensorFlow和PyTorch绑定对软件质量的影响，包括正确性（训练和测试准确性）和时间成本（训练和推理时间），在训练和执行推理时使用五种广泛使用的深度学习模型。我们的实验表明，可以在一个绑定中训练模型，并在另一个绑定中进行推理，而不会丢失准确性。我们的研究是首次表明，与默认的Python绑定相比，使用非默认绑定可以从时间成本的角度帮助提高机器学习软件质量，同时仍然达到相同水平的正确性。

更新时间: 2024-07-07 18:39:27

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2407.05466v1

Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information

Misinformation is still a major societal problem and the arrival of Large Language Models (LLMs) only added to it. This paper analyzes synthetic, false, and genuine information in the form of text from spectral analysis, visualization, and explainability perspectives to find the answer to why the problem is still unsolved despite multiple years of research and a plethora of solutions in the literature. Various embedding techniques on multiple datasets are used to represent information for the purpose. The diverse spectral and non-spectral methods used on these embeddings include t-distributed Stochastic Neighbor Embedding (t-SNE), Principal Component Analysis (PCA), and Variational Autoencoders (VAEs). Classification is done using multiple machine learning algorithms. Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Integrated Gradients are used for the explanation of the classification. The analysis and the explanations generated show that misinformation is quite closely intertwined with genuine information and the machine learning algorithms are not as effective in separating the two despite the claims in the literature.

Updated: 2024-07-07 18:31:09

标题: 用机器学习进行真相实验：合成、虚假和真实信息的谱分析和可解释分类

摘要: 错误信息仍然是一个重要的社会问题，大型语言模型（LLM）的到来进一步加剧了这一问题。本文从谱分析、可视化和可解释性的角度分析了文本形式的合成、虚假和真实信息，以找到为什么这个问题尽管多年的研究和大量文献中的解决方案，但仍然没有解决的答案。为此，使用了多种嵌入技术在多个数据集上表示信息。对这些嵌入使用的多样化的谱和非谱方法包括t-分布随机邻居嵌入（t-SNE）、主成分分析（PCA）和变分自动编码器（VAE）。使用多种机器学习算法进行分类。使用局部可解释的模型无关解释（LIME）、SHapley Additive exPlanations（SHAP）和Integrated Gradients对分类进行解释。分析和生成的解释表明，错误信息与真实信息密切相关，并且尽管文献中的声明，机器学习算法在区分这两者方面并不那么有效。

更新时间: 2024-07-07 18:31:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.05464v1

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

Language technologies should be judged on their usefulness in real-world use cases. An often overlooked aspect in natural language processing (NLP) research and evaluation is language variation in the form of non-standard dialects or language varieties (hereafter, varieties). Most NLP benchmarks are limited to standard language varieties. To fill this gap, we propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties, which aggregates an extensive set of task-varied variety datasets (10 text-level tasks covering 281 varieties). This allows for a comprehensive evaluation of NLP system performance on different language varieties. We provide substantial evidence of performance disparities between standard and non-standard language varieties, and we also identify language clusters with large performance divergence across tasks. We believe DIALECTBENCH provides a comprehensive view of the current state of NLP for language varieties and one step towards advancing it further. Code/data: https://github.com/ffaisal93/DialectBench

Updated: 2024-07-07 18:21:30

标题: 方言基准测试：一个用于方言、变体和密切相关语言的自然语言处理基准测试

摘要: 语言技术应该根据其在实际使用案例中的有用性来进行评判。在自然语言处理（NLP）研究和评估中经常被忽视的一个方面是非标准方言或语言变体（以下简称为变体）的语言变化。大多数NLP基准测试都限于标准语言变体。为填补这一空白，我们提出了DIALECTBENCH，这是第一个针对变体的大规模NLP基准测试，汇集了一系列不同任务的变体数据集（涵盖281种变体的10个文本级任务）。这使得可以全面评估NLP系统在不同语言变体上的表现。我们提供了大量证据表明标准和非标准语言变体之间的性能差距，并且还确定了在不同任务之间具有较大性能差异的语言簇。我们相信DIALECTBENCH提供了对语言变体当前NLP状态的全面视图，并是进一步推进其发展的一步。代码/数据：https://github.com/ffaisal93/DialectBench

更新时间: 2024-07-07 18:21:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11009v2

CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks

The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Although several anomaly detection (AD) approaches for CAV networks are proposed, they often fail to: i) detect multiple anomalies in specific sensor(s) with high accuracy or F1 score, and ii) identify the specific sensor being attacked. In response, this paper proposes a novel framework tailored to CAV networks, called CAV-AD, for distinguishing abnormal readings amidst multiple anomaly data while identifying malicious sensors. Specifically, CAV-AD comprises two main components: i) A novel CNN model architecture called optimized omni-scale CNN (O-OS-CNN), which optimally selects the time scale by generating all possible kernel sizes for input time series data; ii) An amplification block to increase the values of anomaly readings, enhancing sensitivity for detecting anomalies. Not only that, but CAV-AD integrates the proposed O-OS-CNN with a Kalman filter to instantly identify the malicious sensors. We extensively train CAV-AD using real-world datasets containing both instant and constant attacks, evaluating its performance in detecting intrusions from multiple anomalies, which presents a more challenging scenario. Our results demonstrate that CAV-AD outperforms state-of-the-art methods, achieving an average accuracy of 98% and an average F1 score of 89\%, while accurately identifying the malicious sensors.

Updated: 2024-07-07 18:19:03

标题: CAV-AD：一种用于检测CAV网络中异常数据和恶意传感器的强大框架

摘要: 汽车的连接和自动化汽车的采用引起了各行业的广泛兴趣，包括公共交通、地下采矿和农业部门。然而，自动化汽车对传感器读数的依赖使它们容易受到重大威胁。操纵这些读数可能会损害自动化汽车网络安全，对恶意活动构成严重风险。尽管为自动化汽车网络提出了几种异常检测（AD）方法，但它们经常无法：i）高准确度或F1分数检测特定传感器中的多个异常，ii）识别正在受攻击的特定传感器。为此，本文提出了一种专为自动化汽车网络定制的新框架，称为CAV-AD，用于在识别恶意传感器的同时在多个异常数据中区分异常读数。具体来说，CAV-AD包括两个主要组成部分：i）一种称为优化全尺度CNN（O-OS-CNN）的新型CNN模型架构，通过生成输入时间序列数据的所有可能的卷积核尺寸来优化选择时间尺度；ii）一个放大块，用于增加异常读数的值，增强检测异常的灵敏度。此外，CAV-AD将拟议的O-OS-CNN与卡尔曼滤波器相结合，以立即识别恶意传感器。我们使用包含即时和恒定攻击的真实世界数据集对CAV-AD进行了广泛培训，评估其在检测来自多个异常的入侵方面的性能，这呈现了一个更具挑战性的场景。我们的结果表明，CAV-AD优于现有技术方法，实现了平均准确率为98％和平均F1分数为89％，同时准确识别了恶意传感器。

更新时间: 2024-07-07 18:19:03

领域: cs.AI

下载: http://arxiv.org/abs/2407.05461v1

A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions

Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery. It has been applied to a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness of cognitive status, it can serve as the basis for personalized services such as well-designed medical treatment, teaching strategy and vocational training. This paper aims to provide a survey of current models for cognitive diagnosis, with more attention on new developments using machine learning-based methods. By comparing the model structures, parameter estimation algorithms, model evaluation methods and applications, we provide a relatively comprehensive review of the recent trends in cognitive diagnosis models. Further, we discuss future directions that are worthy of exploration. In addition, we release two Python libraries: EduData for easy access to some relevant public datasets we have collected, and EduCDM that implements popular CDMs to facilitate both applications and research purposes.

Updated: 2024-07-07 18:02:00

标题: 认知诊断模型调查：新发展与未来方向

摘要: 认知诊断已经发展了几十年，作为一种有效的评估人类认知状态的测量工具，如能力水平和知识掌握。它已被应用于教育、体育、心理诊断等广泛领域。通过提供更好的认知状态意识，它可以作为个性化服务的基础，如设计良好的医疗治疗、教学策略和职业培训。本文旨在对认知诊断的当前模型进行调查，更加关注使用基于机器学习的方法进行的新发展。通过比较模型结构、参数估计算法、模型评估方法和应用，我们提供了对最近认知诊断模型趋势的相对全面的回顾。此外，我们讨论值得探索的未来方向。此外，我们发布了两个Python库：EduData，方便访问我们收集的一些相关公共数据集，以及EduCDM，实现流行的CDM以促进应用和研究目的。

更新时间: 2024-07-07 18:02:00

领域: cs.AI

下载: http://arxiv.org/abs/2407.05458v1

JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics Framework

Particle-based fluid simulations have emerged as a powerful tool for solving the Navier-Stokes equations, especially in cases that include intricate physics and free surfaces. The recent addition of machine learning methods to the toolbox for solving such problems is pushing the boundary of the quality vs. speed tradeoff of such numerical simulations. In this work, we lead the way to Lagrangian fluid simulators compatible with deep learning frameworks, and propose JAX-SPH - a Smoothed Particle Hydrodynamics (SPH) framework implemented in JAX. JAX-SPH builds on the code for dataset generation from the LagrangeBench project (Toshev et al., 2023) and extends this code in multiple ways: (a) integration of further key SPH algorithms, (b) restructuring the code toward a Python package, (c) verification of the gradients through the solver, and (d) demonstration of the utility of the gradients for solving inverse problems as well as a Solver-in-the-Loop application. Our code is available at https://github.com/tumaer/jax-sph.

Updated: 2024-07-07 17:53:28

标题: JAX-SPH：一种可微分的平滑粒子流体动力学框架

摘要: 基于粒子的流体模拟已经成为解决Navier-Stokes方程的强大工具，特别是在包含复杂物理和自由表面的情况下。最近将机器学习方法添加到解决这类问题的工具箱中，推动了这种数值模拟在质量与速度之间的权衡的边界。在这项工作中，我们引领了与深度学习框架兼容的拉格朗日流体模拟器的道路，并提出了JAX-SPH - 一种在JAX中实现的平滑粒子流体动力学（SPH）框架。JAX-SPH建立在LagrangeBench项目（Toshev等，2023年）中用于数据集生成的代码基础上，并通过多种方式对此代码进行了扩展：（a）集成更多关键的SPH算法，（b）将代码重构为Python包，（c）通过求解器验证梯度，以及（d）展示梯度在解决反问题以及解算器-循环应用中的实用性。我们的代码可在https://github.com/tumaer/jax-sph上找到。

更新时间: 2024-07-07 17:53:28

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2403.04750v2

Neural SPH: Improved Neural Modeling of Lagrangian Fluid Dynamics

Smoothed particle hydrodynamics (SPH) is omnipresent in modern engineering and scientific disciplines. SPH is a class of Lagrangian schemes that discretize fluid dynamics via finite material points that are tracked through the evolving velocity field. Due to the particle-like nature of the simulation, graph neural networks (GNNs) have emerged as appealing and successful surrogates. However, the practical utility of such GNN-based simulators relies on their ability to faithfully model physics, providing accurate and stable predictions over long time horizons - which is a notoriously hard problem. In this work, we identify particle clustering originating from tensile instabilities as one of the primary pitfalls. Based on these insights, we enhance both training and rollout inference of state-of-the-art GNN-based simulators with varying components from standard SPH solvers, including pressure, viscous, and external force components. All Neural SPH-enhanced simulators achieve better performance than the baseline GNNs, often by orders of magnitude in terms of rollout error, allowing for significantly longer rollouts and significantly better physics modeling. Code available at https://github.com/tumaer/neuralsph.

Updated: 2024-07-07 17:44:40

标题: 神经SPH：改进的拉格朗日流体动力学神经建模

摘要: 光滑粒子流体动力学（SPH）在现代工程和科学学科中无处不在。SPH是一类拉格朗日方案，通过有限的材料点对流体动力学进行离散化，这些材料点通过不断演化的速度场进行跟踪。由于仿真的粒子特性，图神经网络（GNNs）已经成为吸引人且成功的替代方案。然而，此类基于GNN的模拟器的实际效用取决于它们忠实地模拟物理，提供准确稳定的预测，能够在长时间范围内实现 - 这是一个众所周知的难题。在这项工作中，我们确定了由拉伸不稳定性引起的粒子聚类现象是主要的缺陷之一。基于这些见解，我们通过将标准SPH求解器的不同组件（包括压力、粘性和外部力组件）与最先进的基于GNN的模拟器的训练和推理推断进行增强。所有经过神经SPH增强的模拟器都比基线GNNs表现更好，通常在推理错误方面达到数量级的提升，从而使得推理时间更长，物理建模更加精确。代码可在https://github.com/tumaer/neuralsph上找到。

更新时间: 2024-07-07 17:44:40

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2402.06275v2

Deep Stochastic Mechanics

This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schr\"odinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function by sampling from the Markovian diffusion. Depending on the latent dimension, our method may have far lower computational complexity in higher dimensions. Moreover, we propose novel equations for stochastic quantum mechanics, resulting in quadratic computational complexity with respect to the number of dimensions. Numerical simulations verify our theoretical findings and show a significant advantage of our method compared to other deep-learning-based approaches used for quantum mechanics.

Updated: 2024-07-07 17:42:40

标题: 深度随机力学

摘要: 这篇论文介绍了一种基于深度学习的新方法，用于数值模拟受随机力学和生成扩散模型启发的时变Schr\"odinger方程。与现有方法不同，这些方法的计算复杂度在问题维度呈指数增长，而我们的方法允许我们通过从马尔可夫扩散中进行抽样来适应波函数的潜在低维结构。根据潜在维度，我们的方法在高维度中可能具有远低的计算复杂度。此外，我们提出了用于随机量子力学的新方程，导致计算复杂度与维度数量成二次关系。数值模拟验证了我们的理论发现，并显示了我们的方法与其他用于量子力学的基于深度学习的方法相比的显著优势。

更新时间: 2024-07-07 17:42:40

领域: cs.LG,quant-ph,stat.ML

下载: http://arxiv.org/abs/2305.19685v5

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models. Our contributions are (1) generalizing the theory of causal abstraction from mechanism replacement (i.e., hard and soft interventions) to arbitrary mechanism transformation (i.e., functionals from old mechanisms to new mechanisms), (2) providing a flexible, yet precise formalization for the core concepts of modular features, polysemantic neurons, and graded faithfulness, and (3) unifying a variety of mechanistic interpretability methodologies in the common language of causal abstraction, namely activation and path patching, causal mediation analysis, causal scrubbing, causal tracing, circuit analysis, concept erasure, sparse autoencoders, differential binary masking, distributed alignment search, and activation steering.

Updated: 2024-07-07 17:21:55

标题: 因果抽象：机械解释可解释性的理论基础

摘要: 因果抽象提供了机械性可解释性的理论基础，这一领域关注提供可理解的算法，这些算法是对黑盒AI模型已知但不透明的低级细节的忠实简化。我们的贡献包括：(1)将因果抽象理论从机制替换（即硬性和软性干预）推广到任意机制转换（即从旧机制到新机制的功能性），(2)为模块特征、多义神经元和渐进忠实性等核心概念提供灵活而精确的形式化，以及(3)将各种机械性可解释性方法统一在因果抽象的共同语言中，即激活和路径修补、因果中介分析、因果擦洗、因果跟踪、电路分析、概念擦除、稀疏自动编码器、差异二进制掩模、分布式对齐搜索和激活引导。

更新时间: 2024-07-07 17:21:55

领域: cs.AI

下载: http://arxiv.org/abs/2301.04709v2

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

Updated: 2024-07-07 17:20:51

标题: 在FPGA上快速、可扩展、节能的非逐元素矩阵乘法

摘要: 现代神经网络（NN）架构严重依赖大量的乘-累加算术运算，构成了主要的计算成本。因此，本文提出了一种高吞吐量、可扩展和能效高的非逐元素矩阵乘法单元在FPGA上作为NN的基本组件。我们首先简化了MADDNESS算法的层间和层内冗余，这是一种基于查找表的近似矩阵乘法，设计了一种快速、高效、可扩展的近似矩阵乘法模块，称为“近似乘法单元（AMU）”。AMU通过专门的内存管理和访问设计进一步优化了基于查找表的矩阵乘法，将计算开销与输入分辨率分离，显著提高了基于FPGA的NN加速器的效率。实验结果表明，使用我们的AMU可以实现高达9倍的吞吐量和112倍的能效高于基于FPGA的量化神经网络（QNN）加速器的最新解决方案。

更新时间: 2024-07-07 17:20:51

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.02362v2

SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification

This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the SmurfCat team. Using data augmentation through machine translation and a special filtering procedure, we collected an additional multilingual parallel dataset for text detoxification. Using the obtained data, we fine-tuned several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task. We applied the ORPO alignment technique to the final model. Our final model has only 3.7 billion parameters and achieves state-of-the-art results for the Ukrainian language and near state-of-the-art results for other languages. In the competition, our team achieved first place in the automated evaluation with a score of 0.52 and second place in the final human evaluation with a score of 0.74.

Updated: 2024-07-07 17:19:34

标题: SmurfCat在PAN 2024 TextDetox中：多语言变换器的文本净化对齐

摘要: 本文介绍了SmurfCat团队在PAN-2024比赛中多语言文本净化任务的解决方案。通过机器翻译和特殊过滤程序进行数据增强，我们收集了一个额外的多语言平行数据集用于文本净化。利用获得的数据，我们在文本净化任务上对几个多语言序列到序列模型进行了微调，例如mT0和Aya。我们将ORPO对齐技术应用于最终模型。我们的最终模型仅有37亿参数，并在乌克兰语言上取得了最先进的结果，并在其他语言上接近最先进的结果。在比赛中，我们团队在自动评估中获得了0.52的得分，排名第一，并在最终人工评估中获得了0.74的得分，排名第二。

更新时间: 2024-07-07 17:19:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05449v1

Towards Perceived Security, Perceived Privacy, and the Universal Design of E-Payment Applications

With the growth of digital monetary transactions and cashless payments, encouraged by the COVID-19 pandemic, use of e-payment applications is on the rise. It is thus imperative to understand and evaluate the current posture of e-payment applications from three major user-facing angles: security, privacy, and usability. To this, we created a high-fidelity prototype of an e-payment application that encompassed features that we wanted to test with users. We then conducted a pilot study where we recruited 12 participants who tested our prototype. We find that both security and privacy are important for users of e-payment applications. Additionally, some participants perceive the strength of security and privacy based on the usability of the application. We provide recommendations such as universal design of e-payment applications.

Updated: 2024-07-07 17:15:09

标题: 朝着感知安全、感知隐私和电子支付应用的通用设计

摘要: 随着数字货币交易和无现金支付的增长，受COVID-19大流行的鼓励，电子支付应用的使用量不断增加。因此，有必要从三个主要用户角度来了解和评估电子支付应用的当前状况：安全性、隐私性和可用性。为此，我们创建了一个高保真度的电子支付应用原型，包括我们想要与用户测试的功能。然后，我们进行了一项试点研究，招募了12名参与者来测试我们的原型。我们发现，对于电子支付应用的用户来说，安全性和隐私性都很重要。此外，一些参与者认为安全性和隐私性的强度取决于应用的可用性。我们提供了建议，如电子支付应用的通用设计。

更新时间: 2024-07-07 17:15:09

领域: cs.HC,cs.CR

下载: http://arxiv.org/abs/2407.05446v1

Language Models Encode Collaborative Signals in Recommendation

Recent studies empirically indicate that language models (LMs) encode rich world knowledge beyond mere semantics, attracting significant attention across various fields. However, in the recommendation domain, it remains uncertain whether LMs implicitly encode user preference information. Contrary to the prevailing understanding that LMs and traditional recommender models learn two distinct representation spaces due to a huge gap in language and behavior modeling objectives, this work rethinks such understanding and explores extracting a recommendation space directly from the language representation space. Surprisingly, our findings demonstrate that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance. This outcome suggests the homomorphism between the language representation space and an effective recommendation space, implying that collaborative signals may indeed be encoded within advanced LMs. Motivated by these findings, we propose a simple yet effective collaborative filtering (CF) model named AlphaRec, which utilizes language representations of item textual metadata (e.g., titles) instead of traditional ID-based embeddings. Specifically, AlphaRec is comprised of three main components: a multilayer perceptron (MLP), graph convolution, and contrastive learning (CL) loss function, making it extremely easy to implement and train. Our empirical results show that AlphaRec outperforms leading ID-based CF models on multiple datasets, marking the first instance of such a recommender with text embeddings achieving this level of performance. Moreover, AlphaRec introduces a new language-representation-based CF paradigm with several desirable advantages: being easy to implement, lightweight, rapid convergence, superior zero-shot recommendation abilities in new domains, and being aware of user intention.

Updated: 2024-07-07 17:05:24

标题: 语言模型在推荐中编码协作信号

摘要: 最近的研究实证表明，语言模型（LMs）编码了丰富的世界知识，超越了单纯的语义，吸引了各个领域的重要关注。然而，在推荐领域，尚不清楚LMs是否隐含地编码了用户偏好信息。与普遍理解相反，LMs和传统的推荐模型学习两个不同的表示空间，因为语言和行为建模目标存在巨大差距，这项工作重新思考了这种理解，并探索直接从语言表示空间提取推荐空间。令人惊讶的是，我们的研究发现，当从先进的LM表示线性映射到物品表示时，推荐性能更优。这一结果表明语言表示空间与有效的推荐空间之间存在同胚关系，暗示协同信号可能确实被编码在先进的LMs中。受到这些发现的启发，我们提出了一个简单而有效的协同过滤（CF）模型，名为AlphaRec，它利用物品文本元数据（例如标题）的语言表示，而不是传统的基于ID的嵌入。具体来说，AlphaRec由三个主要组件组成：多层感知器（MLP）、图卷积和对比学习（CL）损失函数，使其非常容易实现和训练。我们的实证结果表明，AlphaRec在多个数据集上的表现优于领先的基于ID的CF模型，标志着这种具有文本嵌入的推荐系统首次达到这种性能水平。此外，AlphaRec引入了一种基于语言表示的CF范式，具有几个可取之处：易于实现、轻量级、快速收敛、在新领域具有优越的零-shot推荐能力，并且能够了解用户意图。

更新时间: 2024-07-07 17:05:24

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.05441v1

Explainable AI: Comparative Analysis of Normal and Dilated ResNet Models for Fundus Disease Classification

This paper presents dilated Residual Network (ResNet) models for disease classification from retinal fundus images. Dilated convolution filters are used to replace normal convolution filters in the higher layers of the ResNet model (dilated ResNet) in order to improve the receptive field compared to the normal ResNet model for disease classification. This study introduces computer-assisted diagnostic tools that employ deep learning, enhanced with explainable AI techniques. These techniques aim to make the tool's decision-making process transparent, thereby enabling medical professionals to understand and trust the AI's diagnostic decision. They are particularly relevant in today's healthcare landscape, where there is a growing demand for transparency in AI applications to ensure their reliability and ethical use. The dilated ResNet is used as a replacement for the normal ResNet to enhance the classification accuracy of retinal eye diseases and reduce the required computing time. The dataset used in this work is the Ocular Disease Intelligent Recognition (ODIR) dataset which is a structured ophthalmic database with eight classes covering most of the common retinal eye diseases. The evaluation metrics used in this work include precision, recall, accuracy, and F1 score. In this work, a comparative study has been made between normal ResNet models and dilated ResNet models on five variants namely ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The dilated ResNet model shows promising results as compared to normal ResNet with an average F1 score of 0.71, 0.70, 0.69, 0.67, and 0.70 respectively for the above respective variants in ODIR multiclass disease classification.

Updated: 2024-07-07 17:03:12

标题: 可解释的人工智能：正常和扩张ResNet模型在底片疾病分类中的比较分析

摘要: 本文提出了通过视网膜眼底图像进行疾病分类的扩张残差网络（ResNet）模型。在ResNet模型的高层中使用扩张卷积滤波器来替代普通卷积滤波器（扩张ResNet），以改善感受野，相较于普通ResNet模型，用于疾病分类。本研究引入了采用深度学习，并结合可解释的人工智能技术的计算机辅助诊断工具。这些技术旨在使工具的决策过程透明化，从而使医疗专业人员能够理解和信任人工智能的诊断决策。它们在当前的医疗保健环境中尤为重要，因为人们对人工智能应用的透明度需求不断增加，以确保其可靠性和道德使用。扩张ResNet用作普通ResNet的替代，以提高视网膜眼疾分类的准确性并减少所需的计算时间。本研究使用的数据集是眼科疾病智能识别（ODIR）数据集，它是一个结构化的眼科数据库，包含八个类别，涵盖了大多数常见的视网膜眼疾。本文使用的评估指标包括精度、召回率、准确度和F1分数。在本研究中，对五种变体ResNet-18、ResNet-34、ResNet-50、ResNet-101和ResNet-152的普通ResNet模型和扩张ResNet模型进行了比较研究。扩张ResNet模型在ODIR多类疾病分类中的表现比普通ResNet有了显著的改进，分别为0.71、0.70、0.69、0.67和0.70的平均F1分数。

更新时间: 2024-07-07 17:03:12

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05440v1

On the Anatomy of Attention

We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focus on attention mechanisms: translating folklore into mathematical derivations, and constructing a taxonomy of attention variants in the literature. As a first example of an empirical investigation underpinned by our formalism, we identify recurring anatomical components of attention, which we exhaustively recombine to explore a space of variations on the attention mechanism.

Updated: 2024-07-07 17:03:05

标题: 关于注意力解剖学

摘要: 我们引入了一个范畴论图解形式，以便系统地关联和推理机器学习模型。我们的图表直观地呈现架构，但又不失关键细节，模型之间的自然关系通过图形变换捕捉，重要的差异和相似之处一目了然。在本文中，我们关注注意力机制：将民间传说翻译成数学推导，并构建文献中的注意力变种分类。作为我们形式主义支持的经验调查的第一个例子，我们识别了注意力的常见解剖组件，我们通过彻底重组这些组件来探索注意力机制的变化空间。

更新时间: 2024-07-07 17:03:05

领域: cs.LG,math.CT,68T01, 18M30,I.2.6

下载: http://arxiv.org/abs/2407.02423v2

Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos

Accurately localizing two-dimensional (2D) ultrasound (US) fetal brain images in the 3D brain, using minimal computational resources, is an important task for automated US analysis of fetal growth and development. We propose an uncertainty-aware deep learning model for automated 3D plane localization in 2D fetal brain images. Specifically, a multi-head network is trained to jointly regress 3D plane pose from 2D images in terms of different geometric transformations. The model explicitly learns to predict uncertainty to allocate higher weight to inputs with low variances across different transformations to improve performance. Our proposed method, QAERTS, demonstrates superior pose estimation accuracy than the state-of-the-art and most of the uncertainty-based approaches, leading to 9% improvement on plane angle (PA) for localization accuracy, and 8% on normalized cross-correlation (NCC) for sampled image quality. QAERTS also demonstrates efficiency, containing 5$\times$ fewer parameters than ensemble-based approach, making it advantageous in resource-constrained settings. In addition, QAERTS proves to be more robust to noise effects observed in freehand US scanning by leveraging rotational discontinuities and explicit output uncertainties.

Updated: 2024-07-07 16:54:09

标题: 几何变换不确定性用于改善从自由手2D超声视频中预测3D胎儿脑姿势

摘要: 在胎儿生长和发育的自动超声分析中，准确将二维（2D）超声（US）胎儿脑图像定位在三维脑中，同时使用最少的计算资源是一项重要任务。我们提出了一种基于不确定性的深度学习模型，用于在2D胎儿脑图像中自动定位3D平面。具体来说，一个多头网络被训练来同时回归不同几何变换中的3D平面姿态。该模型明确学习了如何预测不确定性，以分配更高的权重给在不同变换中具有低方差的输入，以提高性能。我们提出的方法QAERTS展示了比最先进的和大多数基于不确定性的方法更优越的姿态估计精度，导致平面角（PA）的定位精度提高了9％，采样图像质量的标准化互相关（NCC）提高了8％。QAERTS还展示了高效性，比基于集成的方法少了5倍的参数，使其在资源受限的环境中具有优势。此外，QAERTS通过利用旋转不连续性和显式输出不确定性，证明了对自由超声扫描中观察到的噪音效应更加稳健。

更新时间: 2024-07-07 16:54:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.13235v2

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

Large language models (LLMs) and prompt engineering hold significant potential for advancing computer programming education through personalized instruction. This paper explores this potential by investigating three critical research questions: the systematic categorization of prompt engineering strategies tailored to diverse educational needs, the empowerment of LLMs to solve complex problems beyond their inherent capabilities, and the establishment of a robust framework for evaluating and implementing these strategies. Our methodology involves categorizing programming questions based on educational requirements, applying various prompt engineering strategies, and assessing the effectiveness of LLM-generated responses. Experiments with GPT-4, GPT-4o, Llama3-8b, and Mixtral-8x7b models on datasets such as LeetCode and USACO reveal that GPT-4o consistently outperforms others, particularly with the "multi-step" prompt strategy. The results show that tailored prompt strategies significantly enhance LLM performance, with specific strategies recommended for foundational learning, competition preparation, and advanced problem-solving. This study underscores the crucial role of prompt engineering in maximizing the educational benefits of LLMs. By systematically categorizing and testing these strategies, we provide a comprehensive framework for both educators and students to optimize LLM-based learning experiences. Future research should focus on refining these strategies and addressing current LLM limitations to further enhance educational outcomes in computer programming instruction.

Updated: 2024-07-07 16:41:07

标题: 利用LLMs增强计算机编程教育：关于Python代码生成有效提示工程的研究

摘要: 大型语言模型（LLMs）和提示工程通过个性化指导在推动计算机编程教育方面有着重要潜力。本文通过研究三个关键研究问题来探讨这一潜力：针对不同教育需求定制的提示工程策略的系统分类、赋予LLMs解决超越其固有能力的复杂问题的能力，以及建立一个评估和实施这些策略的强大框架。我们的方法包括根据教育需求对编程问题进行分类、应用各种提示工程策略，并评估LLMs生成的响应的有效性。在LeetCode和USACO等数据集上对GPT-4、GPT-4o、Llama3-8b和Mixtral-8x7b模型进行实验，结果显示GPT-4o在特别是采用“多步”提示策略时始终表现优于其他模型。结果表明，定制的提示策略显著提升了LLMs的性能，针对基础学习、竞赛准备和高级问题解决分别推荐了具体策略。本研究强调了提示工程在最大化LLMs教育效益中的关键作用。通过系统分类和测试这些策略，我们为教育工作者和学生提供了一个全面的框架，以优化基于LLMs的学习体验。未来的研究应该专注于完善这些策略，并解决当前LLMs的局限性，以进一步提升计算机编程教育教学的教育成果。

更新时间: 2024-07-07 16:41:07

领域: cs.AI,K.3.2; I.2.7

下载: http://arxiv.org/abs/2407.05437v1

LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models

Temporal reasoning (TR) is a critical component of artificial intelligence, encompassing understanding and processing temporal information and relationships between events. To discover and study the TR ability in Large Language Models (LLMs), various datasets have been constructed in different ways for evaluating various aspects of TR ability. Our work proposes a novel approach to design and develop a pipeline for constructing datasets to evaluate the TR ability of LLMs by leveraging random directed graph generation, LTL formula, and the NuSMV model checker. Based on the pipeline, we have also constructed a dataset as a benchmark, namely LTLBench, consisting of 2,000 TR challenges and evaluated six LLMs with it. Furthermore, we have conducted additional experiments to discover the impact of increasing the number of events and formula operators on the complexity of TR problems and the performance of LLMs. We have demonstrated that although LLMs exhibit some promise in handling TR challenges, they still struggle with complex TR. We expect this work can offer insights into TR ability in LLMs while also providing a valuable tool for future TR evaluations.

Updated: 2024-07-07 16:37:06

标题: LTLBench：面向大型语言模型中时间逻辑推理评估的基准的研究

摘要: 时间推理（TR）是人工智能的关键组成部分，涵盖了理解和处理时间信息以及事件之间关系的能力。为了发现和研究大型语言模型（LLMs）中的TR能力，已经以不同的方式构建了各种数据集，用于评估TR能力的各个方面。我们的工作提出了一种新颖的方法，通过利用随机有向图生成、LTL公式和NuSMV模型检查器来设计和开发一个用于评估LLMs的TR能力的数据集构建流程。基于这个流程，我们还构建了一个名为LTLBench的基准数据集，包括2,000个TR挑战，并对其中的六个LLMs进行了评估。此外，我们进行了额外的实验，以探索增加事件数量和公式运算符对TR问题复杂性和LLMs性能的影响。我们已经证明，尽管LLMs在处理TR挑战方面表现出一定的潜力，但它们仍然在复杂的TR问题上遇到困难。我们希望这项工作可以揭示LLMs中的TR能力，同时为未来的TR评估提供有价值的工具。

更新时间: 2024-07-07 16:37:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05434v1

InstructIR: High-Quality Image Restoration Following Human Instructions

Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR

Updated: 2024-07-07 16:34:45

标题: InstructIR: 遵循人类指示进行高质量图像恢复

摘要: 图像恢复是一个基本问题，涉及从其降级观测中恢复高质量干净图像。一体化图像恢复模型可以有效地使用特定于降解的信息作为提示来引导恢复模型，从而恢复各种类型和程度的图像。在这项工作中，我们提出了第一种使用人类编写的指令来引导图像恢复模型的方法。在给定自然语言提示的情况下，我们的模型可以从其降级的对应物中恢复高质量的图像，考虑到多种降解类型。我们的方法InstructIR在包括图像去噪、去雨、去模糊、去雾以及（低光）图像增强在内的几个恢复任务上取得了最先进的结果。InstructIR比之前的一体化恢复方法提高了+1dB。此外，我们的数据集和结果为基于文本引导的图像恢复和增强的新研究建立了一个新的基准。我们的代码、数据集和模型可在以下链接找到：https://github.com/mv-lab/InstructIR

更新时间: 2024-07-07 16:34:45

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2401.16468v4

Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review

Background: Mental stress and its consequent mental disorders (MDs) are significant public health issues. With the advent of machine learning (ML), there's potential to harness computational techniques for better understanding and addressing these problems. This review seeks to elucidate the current ML methodologies employed in this domain to enhance the detection, prediction, and analysis of mental stress and MDs. Objective: This review aims to investigate the scope of ML methodologies used in the detection, prediction, and analysis of mental stress and MDs. Methods: Utilizing a rigorous scoping review process with PRISMA-ScR guidelines, this investigation delves into the latest ML algorithms, preprocessing techniques, and data types used in the context of stress and stress-related MDs. Results and Discussion: A total of 98 peer-reviewed publications were examined. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among ML algorithms. Physiological parameters such as heart rate measurements and skin response are prevalently used as stress predictors due to their rich explanatory information and ease of data acquisition. Dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, are frequently observed as crucial steps preceding the training of ML algorithms. Conclusion: This review identifies significant research gaps and outlines future directions for the field. These include model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for the detection and prediction of stress and stress-related MDs. Keywords: Machine Learning; Deep Learning; Data Preprocessing; Stress Detection; Stress Prediction; Stress Monitoring; Mental Disorders

Updated: 2024-07-07 16:31:46

标题: 机器学习、深度学习和数据预处理技术用于应激、压力相关精神障碍的检测、预测和监测：范围审阅

摘要: 背景：心理压力及其导致的心理障碍（MDs）是重要的公共卫生问题。随着机器学习（ML）的出现，有潜力利用计算技术更好地理解和解决这些问题。本综述旨在阐明当前在该领域应用的ML方法论，以增强对心理压力和MDs的检测、预测和分析。目标：本综述旨在调查在检测、预测和分析心理压力和MDs中使用的ML方法论的范围。方法：利用严格的范围审查过程和PRISMA-ScR指南，本研究深入研究了最新的ML算法、预处理技术和数据类型，在压力和压力相关的MDs背景下的使用情况。结果和讨论：共检查了98篇同行评审的出版物。研究结果突出显示，支持向量机（SVM）、神经网络（NN）和随机森林（RF）模型在ML算法中始终表现出较高的准确性和稳健性。生理参数如心率测量和皮肤反应广泛用作压力预测因子，因为它们具有丰富的解释信息和易于获取的数据。在ML算法训练之前，降维技术，包括映射、特征选择、过滤和降噪，经常被视为关键步骤。结论：本综述确定了重要的研究空白，并概述了该领域的未来方向。这些包括模型可解释性、模型个性化、自然设置的整合以及用于检测和预测压力和压力相关MDs的实时处理能力。关键词：机器学习；深度学习；数据预处理；压力检测；压力预测；压力监测；心理障碍

更新时间: 2024-07-07 16:31:46

领域: cs.LG

下载: http://arxiv.org/abs/2308.04616v2

Prototypical Reward Network for Data-Efficient RLHF

The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling stable and reliable structural learning from fewer samples, Proto-RM significantly enhances LLMs' adaptability and accuracy in interpreting human preferences. Extensive experiments on various datasets demonstrate that Proto-RM significantly improves the performance of reward models and LLMs in human feedback tasks, achieving comparable and usually better results than traditional methods, while requiring significantly less data. in data-limited scenarios. This research offers a promising direction for enhancing the efficiency of reward models and optimizing the fine-tuning of language models under restricted feedback conditions.

Updated: 2024-07-07 16:29:17

标题: 经典的奖励网络用于数据高效的RLHF

摘要: 强化学习从人类反馈中获得奖励模型（RLHF）已被证明对于微调大型语言模型（LLMs）非常有效。值得注意的是，为RLHF收集人类反馈可能会耗费大量资源，并导致LLMs和复杂任务的可扩展性问题。我们提出的框架Proto-RM利用原型网络来增强在有限人类反馈下的奖励模型。通过使结构学习从更少的样本中变得稳定可靠，Proto-RM显著增强了LLMs在解释人类偏好方面的适应性和准确性。对各种数据集进行的广泛实验表明，Proto-RM显著提高了奖励模型和LLMs在人类反馈任务中的性能，实现了与传统方法相当甚至更好的结果，同时需要显著更少的数据。在数据有限的情况下，这项研究为增强奖励模型的效率以及在受限反馈条件下优化语言模型的微调提供了一个有前途的方向。

更新时间: 2024-07-07 16:29:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06606v2

BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains

Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute.

Updated: 2024-07-07 16:03:33

标题: BiRoDiff：双足机器人在未知地形上的扩散策略

摘要: 在未知地形上行进对于双足机器人处理新颖的现实挑战至关重要，从而扩展它们在灾难响应和探索中的实用性。在这项工作中，我们引入了一个轻量级框架，学习了一个单一的行走控制器，可以在多种地形上实现行进。我们设计了一个基于扩散模型的实时机器人控制器，不仅可以在单一策略中捕捉具有不同速度的多个行为，而且在未知地形上表现良好。我们的控制器通过离线数据进行学习，这比在线学习在可伸缩性、培训方案的简单性等方面更好。我们在我们定制的双足机器人模型Stoch BiRo上的仿真中设计并实施了基于扩散模型的策略控制器。我们展示了它相对于典型生成模型的泛化能力和高频控制步骤生成，后者需要大量的计算资源。

更新时间: 2024-07-07 16:03:33

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.05424v1

EMBANet: A Flexible Efffcient Multi-branch Attention Network

This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are considered in this work, both of which can represent multi-scale features at a more granular level and increase the range of receptive fields. By integrating the MBC and attention module, a multi-branch attention (MBA) module is consequently developed to capture the channel-wise interaction of feature maps for establishing the long-range channel dependency. By substituting the 3x3 convolutions in the bottleneck blocks of the ResNet with the proposed MBA, a novel block namely efficient multi-branch attention (EMBA) is obtained, which can be easily plugged into the state-of-the-art backbone CNN models. Furthermore, a new backbone network called EMBANet is established by stacking the EMBA blocks. The proposed EMBANet is extensively evaluated on representative computer vision tasks including: classification, detection, and segmentation. And it demonstrates consistently superior performance over the popular backbones.

Updated: 2024-07-07 15:50:01

标题: EMBANet：一种灵活高效的多分支注意力网络

摘要: 这项工作提出了一个新的模块，即多分支连接（MBC），用于处理输入张量并获得多尺度特征图。所提出的MBC模块通过允许灵活调整变换操作的类型和分支数量，为注意力网络的设计带来了新的自由度。本文考虑了两个重要的变换操作符，即复用和分裂，两者都可以以更精细的级别表示多尺度特征并增加感受野范围。通过集成MBC和注意力模块，因此开发了一个多分支注意力（MBA）模块，以捕获特征图的通道间相互作用，从而建立远程通道依赖关系。通过将ResNet的瓶颈块中的3x3卷积替换为所提出的MBA，得到了一个新的块，即高效多分支注意力（EMBA），可以轻松插入最先进的骨干CNN模型中。此外，通过堆叠EMBA块建立了一个名为EMBANet的新的骨干网络。所提出的EMBANet在包括分类、检测和分割在内的代表性计算机视觉任务上进行了广泛评估。它表现出持续优于流行骨干的性能。

更新时间: 2024-07-07 15:50:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05418v1

See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition

The rapid expansion of large foundation models within the pre-training and fine-tuning framework has underscored that larger models often yield better results. However, the scaling up of large foundation models has led to soaring costs in fine-tuning and parameter storage, rendering extensive adaptations impractical. This challenge has sparked the development of parameter-efficient fine-tuning (PEFT), which focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads. While recent years have witnessed a significant success in PEFT, a deep understanding of the fundamental principles behind these methods remains unexplored. To this end, here we take the first step to unify all approaches by dissecting them from a decomposition perspective. We initiate a comprehensive mathematical analysis of these methods, allowing us to delve deeply into their underlying mechanisms, and we explore the reasons behind the variations in performance among different techniques. Furthermore, inspired by our theoretical analysis, we introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications. Our empirical validations, conducted across multiple datasets, demonstrate the efficacy of these methods, showcasing both theoretical validity and practical performance improvements under the guidance of our analytical findings. We believe our work will deepen researchers' understanding of PEFT and other techniques, prompting further contemplation and advancing the research across the whole community.

Updated: 2024-07-07 15:44:42

标题: 通过分解的方式站在肩膀上更高效地微调参数

摘要: 大型基础模型在预训练和微调框架中的快速扩展突显出，更大的模型通常会产生更好的结果。然而，大型基础模型的扩展导致微调和参数存储成本飙升，使广泛的适应变得不切实际。这一挑战促使了参数高效微调（PEFT）的发展，重点是优化选定的一部分参数，同时保持其余参数固定，显著降低了计算和存储开销。近年来，PEFT取得了重大成功，但对这些方法背后的基本原理的深入理解尚未被探索。为此，我们采用分解视角首次统一所有方法，开展了全面的数学分析，使我们能够深入研究它们的基本机制，并探讨不同技术之间性能变化的原因。受我们理论分析的启发，我们引入了两种新的PEFT方法，以及一个简单而有效的框架，旨在增强PEFT技术在各种应用中的性能。我们在多个数据集上进行的实证验证表明了这些方法的有效性，展示了在我们的分析结果指导下，这些方法在理论上的有效性和实际性能改进。我们相信我们的工作将加深研究人员对PEFT和其他技术的理解，促使进一步思考，并推动整个社区的研究发展。

更新时间: 2024-07-07 15:44:42

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05417v1

Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models

Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset, while ignoring data that is either misleading or brings unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data leads to worse performance and instabilities of the surrogate model, often termed sample-wise ``double descent''. We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.

Updated: 2024-07-07 15:44:26

标题: 信息焦虑：对信息错失的不健康恐惧。一种消除误导数据以建立更健康模型的方法

摘要: 误导性或不必要的数据可能对机器学习（ML）模型的健康或准确性产生巨大影响。我们提出了一种类似于贝叶斯实验设计的贝叶斯顺序选择方法，该方法识别数据集中的关键信息，同时忽略那些误导性的数据或给选定的替代模型带来不必要的复杂性的数据。我们的方法提高了逐样本误差的收敛性，并消除了更多数据导致更糟糕的性能和替代模型不稳定的情况，通常称为逐样本“双下降”。我们发现这些不稳定性是基础映射的复杂性的结果，并与极端事件和重尾部相关。我们的方法具有两个关键特征。首先，选择算法动态耦合了选择的模型和数据。数据基于其对改进所选模型的优点而被选择，而不是严格与其他数据进行比较。其次，方法的自然收敛消除了将数据分成训练、测试和验证集的需求。相反，选择度量标准通过模型的全局统计量本质上评估测试和验证误差。这确保关键信息永远不会在测试或验证中被浪费。该方法应用了高斯过程回归和深度神经网络替代模型。

更新时间: 2024-07-07 15:44:26

领域: cs.LG,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2208.13080v3

SBoRA: Low-Rank Adaptation with Regional Weight Updates

This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA further reduces the computational and memory requirements of LoRA while enhancing learning performance. By leveraging orthogonal standard basis vectors to initialize one of the low-rank matrices, either A or B, SBoRA enables regional weight updates and memory-efficient fine-tuning. This approach gives rise to two variants, SBoRA-FA and SBoRA-FB, where only one of the matrices is updated, resulting in a sparse update matrix with a majority of zero rows or columns. Consequently, the majority of the fine-tuned model's weights remain unchanged from the pre-trained weights. This characteristic of SBoRA, wherein regional weight updates occur, is reminiscent of the modular organization of the human brain, which efficiently adapts to new tasks. Our empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning. Furthermore, we evaluate the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks. Code is available at https://github.com/CityUHK-AI/SBoRA

Updated: 2024-07-07 15:37:13

标题: SBoRA：区域权重更新的低秩适应

摘要: 本文介绍了标准基础LoRA（SBoRA），这是一种新颖的参数高效微调方法，适用于大型语言模型，它建立在低秩适应（LoRA）和正交适应的开创性工作之上。SBoRA进一步降低了LoRA的计算和内存需求，同时提高了学习性能。通过利用正交标准基础向量来初始化低秩矩阵A或B中的一个，SBoRA实现了区域权重更新和内存高效微调。这种方法产生了两个变体，SBoRA-FA和SBoRA-FB，其中只更新一个矩阵，导致稀疏更新矩阵，其中大部分行或列为零。因此，大部分微调模型的权重与预训练的权重保持不变。SBoRA的这种特征，即区域权重更新的发生，让人联想到人类大脑的模块化组织，它能够高效地适应新任务。我们的实证结果表明，在各种微调任务中，包括常识推理和算术推理，SBoRA-FA在LoRA方面表现出卓越性能。此外，我们评估了QSBoRA在不同规模的量化LLaMA模型上的有效性，突显了其在有效适应新任务方面的潜力。代码可在https://github.com/CityUHK-AI/SBoRA找到。

更新时间: 2024-07-07 15:37:13

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.05413v1

FM-OSD: Foundation Model-Enabled One-Shot Detection of Anatomical Landmarks

One-shot detection of anatomical landmarks is gaining significant attention for its efficiency in using minimal labeled data to produce promising results. However, the success of current methods heavily relies on the employment of extensive unlabeled data to pre-train an effective feature extractor, which limits their applicability in scenarios where a substantial amount of unlabeled data is unavailable. In this paper, we propose the first foundation model-enabled one-shot landmark detection (FM-OSD) framework for accurate landmark detection in medical images by utilizing solely a single template image without any additional unlabeled data. Specifically, we use the frozen image encoder of visual foundation models as the feature extractor, and introduce dual-branch global and local feature decoders to increase the resolution of extracted features in a coarse to fine manner. The introduced feature decoders are efficiently trained with a distance-aware similarity learning loss to incorporate domain knowledge from the single template image. Moreover, a novel bidirectional matching strategy is developed to improve both robustness and accuracy of landmark detection in the case of scattered similarity map obtained by foundation models. We validate our method on two public anatomical landmark detection datasets. By using solely a single template image, our method demonstrates significant superiority over strong state-of-the-art one-shot landmark detection methods.

Updated: 2024-07-07 15:37:02

标题: FM-OSD: 基于基础模型的解剖标志一次性检测

摘要: 一次性检测解剖标志点正在引起越来越多的关注，因为它可以高效地利用最少的标记数据产生有希望的结果。然而，目前方法的成功在很大程度上依赖于使用大量未标记数据来预先训练有效的特征提取器，这限制了它们在无大量未标记数据的情况下的适用性。本文提出了一种基于基础模型的一次性标志点检测（FM-OSD）框架，通过仅利用单个模板图像而不使用任何额外未标记数据，实现对医学图像中标志点的准确检测。具体地，我们使用视觉基础模型的冻结图像编码器作为特征提取器，并引入双分支全局和局部特征解码器，以粗到细的方式提高提取特征的分辨率。引入的特征解码器通过一种距离感知相似性学习损失进行高效训练，以将领域知识从单个模板图像中整合进来。此外，我们开发了一种新颖的双向匹配策略，以改善在基础模型获得的散布相似性地图情况下标志点检测的鲁棒性和准确性。我们在两个公共解剖标志点检测数据集上验证了我们的方法。通过仅使用单个模板图像，我们的方法在强大的最新一次性标志点检测方法上展现了显著的优势。

更新时间: 2024-07-07 15:37:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05412v1

DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation

Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference. However, these methods often guide plausible continuations by greedily selecting targets, which, while completing the task, may disrupt the natural patterns of human language generation. In this work, we propose a novel decoding framework, DECIDER, which enables us to program rules on how we complete tasks to control a PLM. Differing from previous work, our framework transforms the encouragement of target words into the encouragement of all words that satisfy the rule. Specifically, DECIDER is a dual system where a PLM is equipped with a First-OrderLogic (FOL) reasoner to express and evaluate the rules, and a decision function to merge the outputs from both systems to steer the generation. Experiments on CommonGen and PersonaChat demonstrate that DECIDER can effectively follow given rules to achieve generation tasks in a more human-like manner.

Updated: 2024-07-07 15:32:31

标题: DECIDER：一种用于语言生成的双系统规则可控解码框架

摘要: 受限解码方法旨在通过在推理过程中使用特定目标词控制由预训练语言模型（PLM）生成的文本的含义或风格。然而，这些方法通常通过贪婪地选择目标来引导合理的延续，这可能会破坏人类语言生成的自然模式。在这项工作中，我们提出了一种新颖的解码框架DECIDER，它使我们能够在控制PLM的任务完成方式时制定规则。与先前的工作不同，我们的框架将目标词的鼓励转化为满足规则的所有词的鼓励。具体而言，DECIDER是一个双系统，其中PLM配备有一阶逻辑（FOL）推理器来表达和评估规则，以及一个决策函数来合并来自两个系统的输出以引导生成。在CommonGen和PersonaChat上的实验证明，DECIDER可以有效地遵循给定的规则以更类似于人类的方式完成生成任务。

更新时间: 2024-07-07 15:32:31

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2403.01954v3

Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper

Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the public sectors, are not allowed to use production data for testing purposes due to privacy concerns. The alternatives are to use anonymized data, or synthetically generated data. In this paper, we elaborate on these alternatives and compare them in an industrial context. Further we focus on synthetic data generation and investigate the use of recurrent neural networks for this purpose. In our preliminary experiments, we were able to generate representative and highly accurate data using a recurrent neural network. These results open new research questions that we discuss here, and plan to investigate in our future research.

Updated: 2024-07-07 15:28:41

标题: 使用递归神经网络生成合成测试数据：一个立场论文

摘要: 在许多行业中，测试生产类似的测试环境是质量保证流程中的一个重要部分。为信息密集型服务提供这样的测试环境涉及建立足够丰富的数据库，以便能够模拟各种用户场景。虽然生产数据可能是最佳标准，但许多组织，特别是在公共部门，由于隐私问题不允许使用生产数据进行测试。替代方案是使用匿名数据或合成生成的数据。在本文中，我们详细阐述了这些替代方案，并在工业背景下进行了比较。此外，我们专注于合成数据生成，并研究了使用循环神经网络进行此目的的可能性。在我们的初步实验中，我们能够使用循环神经网络生成具有代表性和高度准确性的数据。这些结果引发了新的研究问题，我们在这里讨论，并计划在未来的研究中进行调查。

更新时间: 2024-07-07 15:28:41

领域: cs.SE,cs.DB,cs.LG,cs.LO

下载: http://arxiv.org/abs/2407.05410v1

SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection

Split learning enables efficient and privacy-aware training of a deep neural network by splitting a neural network so that the clients (data holders) compute the first layers and only share the intermediate output with the central compute-heavy server. This paradigm introduces a new attack medium in which the server has full control over what the client models learn, which has already been exploited to infer the private data of clients and to implement backdoors in the client models. Although previous work has shown that clients can successfully detect such training-hijacking attacks, the proposed methods rely on heuristics, require tuning of many hyperparameters, and do not fully utilize the clients' capabilities. In this work, we show that given modest assumptions regarding the clients' compute capabilities, an out-of-the-box outlier detection method can be used to detect existing training-hijacking attacks with almost-zero false positive rates. We conclude through experiments on different tasks that the simplicity of our approach we name \textit{SplitOut} makes it a more viable and reliable alternative compared to the earlier detection methods.

Updated: 2024-07-07 15:25:37

标题: SplitOut：通过异常检测实现拆分学习中的即插即用训练劫持检测

摘要: Split learning使得深度神经网络的训练变得高效且注重隐私，通过将神经网络分割，使得客户端（数据持有者）计算前几层并仅与中央计算密集的服务器共享中间输出。这种范式引入了一种新的攻击媒介，其中服务器完全控制客户端模型学习的内容，这已被利用来推断客户端的私人数据并在客户端模型中实施后门。尽管先前的工作表明客户端可以成功检测此类训练劫持攻击，但所提出的方法依赖于启发式方法，需要调整许多超参数，并且未充分利用客户端的能力。在这项工作中，我们展示了在对客户端的计算能力进行合理假设的情况下，可以使用开箱即用的异常检测方法几乎零误报率地检测现有的训练劫持攻击。通过在不同任务上的实验证明，我们所称的\textit{SplitOut}方法的简单性使其成为与早期检测方法相比更可行和可靠的选择。

更新时间: 2024-07-07 15:25:37

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2302.08618v3

Instance Temperature Knowledge Distillation

Knowledge distillation (KD) enhances the performance of a student network by allowing it to learn the knowledge transferred from a teacher network incrementally. Existing methods dynamically adjust the temperature to enable the student network to adapt to the varying learning difficulties at different learning stages of KD. KD is a continuous process, but when adjusting the temperature, these methods consider only the immediate benefits of the operation in the current learning phase and fail to take into account its future returns. To address this issue, we formulate the adjustment of temperature as a sequential decision-making task and propose a method based on reinforcement learning, termed RLKD. Importantly, we design a novel state representation to enable the agent to make more informed action (i.e. instance temperature adjustment). To handle the problem of delayed rewards in our method due to the KD setting, we explore an instance reward calibration approach. In addition,we devise an efficient exploration strategy that enables the agent to learn valuable instance temperature adjustment policy more efficiently. Our framework can serve as a plug-and-play technique to be inserted into various KD methods easily, and we validate its effectiveness on both image classification and object detection tasks. Our project is at https://www.zayx.me/ITKD.github.io/.

Updated: 2024-07-07 15:25:05

标题: 实例温度知识蒸馏

摘要: 知识蒸馏（KD）通过允许学生网络逐渐学习从教师网络传递过来的知识来增强学生网络的性能。现有方法动态调整温度，使学生网络能够适应KD不同学习阶段的不同学习难度。KD是一个持续的过程，但在调整温度时，这些方法仅考虑当前学习阶段操作的即时效益，并未考虑其未来回报。为了解决这个问题，我们将温度的调整形式化为一个顺序决策任务，并提出了一种基于强化学习的方法，称为RLKD。重要的是，我们设计了一种新颖的状态表示，使代理能够做出更明智的行为（即实例温度调整）。为了处理由于KD设置而导致我们方法中的延迟奖励问题，我们探索了一种实例奖励校准方法。此外，我们设计了一种高效的探索策略，使代理能够更有效地学习有价值的实例温度调整策略。我们的框架可以作为一种插拔式技术，轻松地插入到各种KD方法中，并且我们验证了其在图像分类和目标检测任务上的有效性。我们的项目网址是https://www.zayx.me/ITKD.github.io/。

更新时间: 2024-07-07 15:25:05

领域: cs.LG,cs.AI,I.4.0

下载: http://arxiv.org/abs/2407.00115v3

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role in LLM-based TTS models. Current speech tokens are learned in an unsupervised manner, which lacks explicit semantic information and alignment to the text. In this paper, we propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis. Experimental results show that supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning. Moreover, we find that utilizing large-scale data further improves the synthesis performance, indicating the scalable capacity of CosyVoice. To the best of our knowledge, this is the first attempt to involve supervised speech tokens into TTS models.

Updated: 2024-07-07 15:16:19

标题: CosyVoice：基于监督语义标记的可扩展多语言零样本文本转语音合成器

摘要: 近年来，大型语言模型（LLM）基于文本到语音（TTS）的趋势已经成为主流，因为它们具有高度自然和零样本容量。在这种范式中，语音信号被离散化为标记序列，这些序列由一个以文本为提示的LLM建模，并由基于标记的声码器重建为波形。显然，语音标记在基于LLM的TTS模型中起着至关重要的作用。目前的语音标记是以无监督方式学习的，缺乏明确的语义信息和与文本的对齐。在本文中，我们提出使用监督语义标记来表示语音，这些标记是通过在编码器中插入矢量量化从多语言语音识别模型中派生的。基于这些标记，我们进一步提出了一个可扩展的零样本TTS合成器CosyVoice，它包括一个LLM用于文本到标记生成和一个条件流匹配模型用于标记到语音合成。实验结果表明，监督语义标记在内容一致性和说话者相似性方面明显优于现有的无监督标记，适用于零样本语音克隆。此外，我们发现利用大规模数据进一步提高了合成性能，表明了CosyVoice的可扩展容量。据我们所知，这是首次尝试将监督语音标记纳入TTS模型中。

更新时间: 2024-07-07 15:16:19

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2407.05407v1

iSign: A Benchmark for Indian Sign Language Processing

Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/

Updated: 2024-07-07 15:07:35

标题: iSign：印度手语处理的基准

摘要: 印度手语在发展机器学习和数据驱动方法进行自动语言处理方面资源有限。尽管基于文本/音频的语言处理技术在过去几年显示出巨大的研究兴趣和巨大的进步，但由于需要更多资源，手语仍需要赶上步伐。为了弥补这一差距，在这项工作中，我们提出了iSign：一个用于印度手语（ISL）处理的基准。我们对这项工作做出了三项主要贡献。首先，我们发布了一个拥有超过118K个视频句子/短语对的最大ISL-英语数据集。据我们所知，这是可用于ISL的最大手语数据集。其次，我们提出了多个NLP特定任务（包括手语视频到文本，手语姿势到文本，文本到姿势，单词预测和手语语义），并将它们与基准模型进行了基准测试，以便研究社区更容易地获取。第三，我们提供了关于提议基准测试的详细见解，并对ISL的工作方式进行了一些语言学见解。我们简化了手语处理的评估，解决了NLP研究社区对手语的差距。我们通过以下网站发布了数据集、任务和模型：https://exploration-lab.github.io/iSign/

更新时间: 2024-07-07 15:07:35

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05404v1

FinLangNet: A Novel Deep Learning Framework for Credit Risk Prediction Using Linguistic Analogy in Financial Data

Recent industrial applications in risk prediction still heavily rely on extensively manually-tuned, statistical learning methods. Real-world financial data, characterized by its high dimensionality, sparsity, high noise levels, and significant imbalance, poses unique challenges for the effective application of deep neural network models. In this work, we introduce a novel deep learning risk prediction framework, FinLangNet, which conceptualizes credit loan trajectories in a structure that mirrors linguistic constructs. This framework is tailored for credit risk prediction using real-world financial data, drawing on structural similarities to language by adapting natural language processing techniques. It particularly emphasizes analyzing the development and forecastability of mid-term credit histories through multi-head and sequences of detailed financial events. Our research demonstrates that FinLangNet surpasses traditional statistical methods in predicting credit risk and that its integration with these methods enhances credit overdue prediction models, achieving a significant improvement of over 4.24\% in the Kolmogorov-Smirnov metric.

Updated: 2024-07-07 14:59:55

标题: FinLangNet：一种新颖的深度学习框架，利用金融数据中的语言类比进行信用风险预测

摘要: 最近工业应用中的风险预测仍然严重依赖于经过广泛手动调整的统计学习方法。现实世界中的金融数据具有高维度、稀疏性、高噪声水平和显著的不平衡性，对深度神经网络模型的有效应用提出了独特挑战。在这项工作中，我们引入了一种新颖的深度学习风险预测框架FinLangNet，该框架将信用贷款轨迹概念化为与语言结构相似的形式。这个框架专门为使用真实世界金融数据进行信用风险预测而设计，利用自然语言处理技术来适应与语言的结构相似性。它特别强调通过多头和详细金融事件序列分析中期信用历史的发展和可预测性。我们的研究表明，FinLangNet在预测信用风险方面超过了传统的统计方法，并且与这些方法的集成增强了信用逾期预测模型，在Kolmogorov-Smirnov指标上取得了超过4.24%的显著改进。

更新时间: 2024-07-07 14:59:55

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2404.13004v2

Memory, Consciousness and Large Language Model

With the development in cognitive science and Large Language Models (LLMs), increasing connections have come to light between these two distinct fields. Building upon these connections, we propose a conjecture suggesting the existence of a duality between LLMs and Tulving's theory of memory. We identify a potential correspondence between Tulving's synergistic ecphory model (SEM) of retrieval and the emergent abilities observed in LLMs, serving as supporting evidence for our conjecture. Furthermore, we speculate that consciousness may be considered a form of emergent ability based on this duality. We also discuss how other theories of consciousness intersect with our research.

Updated: 2024-07-07 14:58:22

标题: 记忆、意识和大型语言模型

摘要: 随着认知科学和大型语言模型（LLMs）的发展，这两个不同领域之间的联系日益凸显。在这些联系的基础上，我们提出了一个假设，暗示了LLMs和图尔文记忆理论之间存在一种对偶关系。我们确定了图尔文的协同记忆模型（SEM）与LLMs中观察到的新兴能力之间的潜在对应关系，这为我们的假设提供了支持证据。此外，我们推测意识可能被视为基于这种对偶关系的一种新兴能力形式。我们还讨论了其他意识理论与我们研究的交叉点。

更新时间: 2024-07-07 14:58:22

领域: q-bio.NC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.02509v2

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet

Updated: 2024-07-07 14:57:38

标题: CaFNet：一种基于置信度的雷达相机深度估计框架

摘要: 深度估计对自动驾驶来说是至关重要的，可以准确解释3D场景。最近，由于雷达具有鲁棒性和低成本特性，雷达-摄像头深度估计变得越来越受到关注。因此，本文介绍了一种两阶段的端到端可训练的自信感知融合网络（CaFNet），用于密集深度估计，将RGB图像与稀疏嘈杂的雷达点云数据结合起来。第一阶段解决了雷达特定的挑战，如模糊的仰角和嘈杂的测量，通过预测雷达置信度图和初步粗糙深度图。提出了一种新的方法来生成置信度图的真实数据，该方法涉及将每个雷达点与其相应的对象关联起来，以识别潜在的投影表面。这些地图，连同初始雷达输入，由第二个编码器处理。为了最终深度估计，我们创新了一种自信感知的门控融合机制，有效地整合雷达和图像特征，从而通过滤除雷达噪声提高了深度图的可靠性。我们在nuScenes数据集上评估的方法表现出卓越性能，将当前领先模型的平均绝对误差（MAE）提高了3.2％，均方根误差（RMSE）提高了2.7％。源代码：https://github.com/harborsarah/CaFNet

更新时间: 2024-07-07 14:57:38

领域: cs.CV,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.00697v2

IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems.

Updated: 2024-07-07 14:55:04

标题: IL-TUR：印度法律文本理解和推理的基准

摘要: 全球法律系统中的案件和文件数量呈指数级增长。迫切需要开发自然语言处理（NLP）和机器学习（ML）技术，以自动处理和理解法律文件，以简化法律系统。然而，评估和比较针对法律领域专门设计的各种NLP模型是具有挑战性的。本文通过提出IL-TUR来解决这一挑战：面向印度法律文本理解和推理的基准。IL-TUR包含单语（英语、印地语）和多语言（9种印度语言）领域特定任务，涵盖了从理解和推理印度法律文件的不同方面的法律系统。我们为每个任务提供基线模型（包括基于LLM的模型），概述了模型与基本事实之间的差距。为了促进法律领域的进一步研究，我们创建了一个排行榜（可在https://exploration-lab.github.io/IL-TUR/找到），研究社区可以在该排行榜上上传和比较法律文本理解系统。

更新时间: 2024-07-07 14:55:04

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05399v1

A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models

Predictive student models are increasingly used in learning environments. However, due to the rising social impact of their usage, it is now all the more important for these models to be both sufficiently accurate and fair in their predictions. To evaluate algorithmic fairness, a new metric has been developed in education, namely the Model Absolute Density Distance (MADD). This metric enables us to measure how different a predictive model behaves regarding two groups of students, in order to quantify its algorithmic unfairness. In this paper, we thus develop a post-processing method based on this metric, that aims at improving the fairness while preserving the accuracy of relevant predictive models' results. We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data, and obtain successful results. Our source code and data are in open access at https://github.com/melinaverger/MADD .

Updated: 2024-07-07 14:53:41

标题: 基于MADD指标的预测学生模型公平后处理方法

摘要: 预测性学生模型在学习环境中越来越被广泛使用。然而，由于它们的使用日益增加的社会影响，现在这些模型在预测上既要足够准确，又要公平。为了评估算法的公平性，在教育领域开发了一个新的度量标准，即模型绝对密度距离（MADD）。该度量标准使我们能够衡量预测模型在两组学生中的行为差异，以量化其算法的不公平性。在本文中，我们基于这一度量标准开发了一种后处理方法，旨在提高公平性同时保留相关预测模型结果的准确性。我们在预测在线课程中学生成功的任务上尝试了我们的方法，使用模拟和真实世界的教育数据，并取得了成功的结果。我们的源代码和数据可以在https://github.com/melinaverger/MADD上找到。

更新时间: 2024-07-07 14:53:41

领域: cs.CY,cs.AI,cs.DM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05398v1

Quantum Multiplier Based on Exponent Adder

Quantum multiplication is a fundamental operation in quantum computing. It is important to have a quantum multiplier with low complexity. In this paper, we propose the Quantum Multiplier Based on Exponent Adder (QMbead), a new approach that requires just $\log_2(n)$ qubits to multiply two $n$-bit integer numbers, in addition to $O(n)$ ancillary qubits used for quantum state preparation. The QMbead uses a so-called exponent encoding to respectively represent two multiplicands as two superposition states which are prepared by a quantum state preparation method, then employs a quantum adder to obtain the sum of these two superposition states, and subsequently measures the outputs of the quantum adder to calculate the product of the multiplicands. Different quantum adders can be used in the QMbead. The circuit depth and time complexity of the QMbead, using a logarithmic-depth quantum carry lookahead adder (QCLA) as adder, are $O(\log n)$ and $O(n \log n)$, respectively. The gate complexity of the QMbead is $O(n)$. The circuit depth and gate complexity of the QMbead is better than existing quantum multipliers such as the quantum Karatsuba multiplier and the QFT based multiplier. The time complexity of the QMbead is identical to that of the fastest classical multiplication algorithm, Harvey-Hoeven algorithm. Interestingly, the QMbead maintains an advantage over the Harvey-Hoeven algorithm, given that the latter is only suitable for excessively large numbers, whereas the QMbead is valid for both small and large numbers. The multiplicand can be either an integer or a decimal number. The QMbead has been implemented on quantum simulators to compute products with a bit length of up to 273 bits using only 17 qubits, excluding the ancillary qubits used for quantum state preparation. This establishes QMbead as an efficient solution for multiplying large integer or decimal numbers with many bits.

Updated: 2024-07-07 14:52:32

标题: 基于指数加法器的量子乘法器

摘要: 量子乘法是量子计算中的基本操作。拥有复杂度低的量子乘法器是非常重要的。在本文中，我们提出了基于指数加法器（QMbead）的量子乘法器，这是一种新方法，仅需要$\log_2(n)$量子比特来相乘两个$n$比特整数，另外还需要$O(n)$用于量子态准备的辅助量子比特。QMbead使用所谓的指数编码来分别表示两个乘数作为两个准备好的超位置态，然后利用量子加法器获得这两个超位置态的和，随后测量量子加法器的输出来计算乘数的乘积。QMbead可以使用不同的量子加法器。使用对数深度的量子进位前瞻加法器（QCLA）作为加法器时，QMbead的电路深度和时间复杂度分别为$O(\log n)$和$O(n \log n)$。QMbead的门复杂度为$O(n)$。QMbead的电路深度和门复杂度优于现有的量子乘法器，如量子Karatsuba乘法器和基于QFT的乘法器。QMbead的时间复杂度与最快的经典乘法算法Harvey-Hoeven算法相同。有趣的是，QMbead相对于Harvey-Hoeven算法具有优势，因为后者只适用于过大的数字，而QMbead适用于小数字和大数字。乘数可以是整数或小数。QMbead已在量子模拟器上实现，可使用仅17个量子比特计算长度高达273位的产品，不包括用于量子态准备的辅助量子比特。这使QMbead成为计算具有许多位的大整数或小数的有效解决方案。

更新时间: 2024-07-07 14:52:32

领域: quant-ph,cs.CC,cs.CR,math.QA

下载: http://arxiv.org/abs/2309.10204v3

Normative Conditional Reasoning as a Fragment of HOL

We report on the mechanization of (preference-based) conditional normative reasoning. Our focus is on Aqvist's system E for conditional obligation, and its extensions. Our mechanization is achieved via a shallow semantical embedding in Isabelle/HOL. We consider two possible uses of the framework. The first one is as a tool for meta-reasoning about the considered logic. We employ it for the automated verification of deontic correspondences (broadly conceived) and related matters, analogous to what has been previously achieved for the modal logic cube. The equivalence is automatically verified in one direction, leading from the property to the axiom. The second use is as a tool for assessing ethical arguments. We provide a computer encoding of a well-known paradox (or impossibility theorem) in population ethics, Parfit's repugnant conclusion. While some have proposed overcoming the impossibility theorem by abandoning the presupposed transitivity of ''better than'', our formalisation unveils a less extreme approach, suggesting among other things the option of weakening transitivity suitably rather than discarding it entirely. Whether the presented encoding increases or decreases the attractiveness and persuasiveness of the repugnant conclusion is a question we would like to pass on to philosophy and ethics.

Updated: 2024-07-07 14:51:25

标题: 规范性条件推理作为HOL的一个片段

摘要: 我们报告了（基于偏好的）条件规范推理的机械化。我们的重点是Aqvist的条件义务系统E及其扩展。我们通过在Isabelle/HOL中进行浅层语义嵌入来实现机械化。我们考虑了该框架的两种可能用途。第一种是作为对所考虑逻辑的元推理工具。我们将其用于自动验证义务对应（广义概念）和相关问题，类似于先前为模态逻辑立方体所实现的内容。等价性在一个方向上自动验证，从属性导向公理。第二种用途是作为评估伦理论证的工具。我们提供了人口伦理学中著名悖论（或不可能定理）Parfit的令人厌恶的结论的计算机编码。尽管一些人提出通过放弃预设的“比...更好”的传递性来克服不可能定理，我们的形式化揭示了一个不那么极端的方法，提出了适当减弱传递性而非完全丢弃的选择。所提供的编码是否增加或减少了令人厌恶的结论的吸引力和说服力，这是我们希望交给哲学和伦理学讨论的问题。

更新时间: 2024-07-07 14:51:25

领域: cs.LO,cs.AI,cs.SC,03B60, 03B15, 68T27, 68T30, 68T15,I.2.3; I.2.4; I.2.0; F.4

下载: http://arxiv.org/abs/2308.10686v4

Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.

Updated: 2024-07-07 14:50:59

标题: 进化触发检测和基于轻量级模型修复的后门防御

摘要: 深度神经网络（DNNs）已被广泛应用于诸如自动驾驶和人脸识别等许多领域。然而，DNN模型容易受到后门攻击的影响。DNN模型中的后门可以通过带有触发器的毒化输入被激活，并导致错误预测，这在应用中引发严重的安全问题。当前的防御手段很难有效消除后门，尤其在触发器的大小和数量在物理世界中是变化的情况下，限制了计算资源。我们提出了一种基于进化触发器检测和轻量级模型修复的有效后门防御方法。在我们方法的第一阶段中，提出了基于CAM关注的进化触发器过滤器（CETF）用于触发器检测。CETF是一种基于有效样本预处理的方法，采用进化算法，我们的实验结果表明，CETF不仅可以准确区分带有触发器的图像与干净图像，而且由于其在不同后门攻击情况下的简单性和稳定性，可以广泛应用于实践中。在我们方法的第二阶段中，我们利用由CETF检测到的触发器，结合几种轻量级的遗忘方法进行模型修复，这也有助于展示后门与批量归一化层之间的潜在相关性。源代码将在接受后发布。

更新时间: 2024-07-07 14:50:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.05396v1

Strategy-Proof Auctions through Conformal Prediction

Auctions are key for maximizing sellers' revenue and ensuring truthful bidding among buyers. Recently, an approach known as differentiable economics based on deep learning shows promise in learning optimal auction mechanisms for multiple items and participants. However, this approach has no guarantee of strategy-proofness at test time. Strategy-proofness is crucial as it ensures that buyers are incentivized to bid their true valuations, leading to optimal and fair auction outcomes without the risk of manipulation. Building on conformal prediction, we introduce a novel approach to achieve strategy-proofness with rigorous statistical guarantees. The key novelties of our method are: (i) the formulation of a regret prediction model, used to quantify at test time violations of strategy-proofness; and (ii) an auction acceptance rule that leverages the predicted regret to ensure that for a new auction, the data-driven mechanism meets the strategy-proofness requirement with high probability (e.g., 99\%). Numerical experiments demonstrate the necessity for rigorous guarantees, the validity of our theoretical results, and the applicability of our proposed method.

Updated: 2024-07-07 14:48:38

标题: 通过一致预测实现无懈可击的拍卖

摘要: 拍卖是最大化卖方收入并确保买家之间真实竞标的关键。最近，一种基于深度学习的不同iable经济学方法显示出学习多个物品和参与者的最佳拍卖机制的潜力。然而，这种方法在测试时没有策略可靠性的保证。策略可靠性至关重要，因为它确保买家有动力出价他们的真实估值，从而实现最佳和公平的拍卖结果，而不会有操纵风险。基于符合性预测，我们引入了一种新方法来实现具有严格统计保证的策略可靠性。我们方法的关键创新点是：（i）制定一个遗憾预测模型，用于在测试时量化策略可靠性的违规情况；（ii）一个拍卖接受规则，利用预测的遗憾来确保对于一个新的拍卖，数据驱动的机制以很高的概率（例如99%）满足策略可靠性要求。数值实验证明了严格保证的必要性，我们理论结果的有效性以及我们提出的方法的适用性。

更新时间: 2024-07-07 14:48:38

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2405.12016v3

Active Learning and Bayesian Optimization: a Unified Perspective to Learn with a Goal

Science and Engineering applications are typically associated with expensive optimization problems to identify optimal design solutions and states of the system of interest. Bayesian optimization and active learning compute surrogate models through efficient adaptive sampling schemes to assist and accelerate this search task toward a given optimization goal. Both those methodologies are driven by specific infill/learning criteria which quantify the utility with respect to the set goal of evaluating the objective function for unknown combinations of optimization variables. While the two fields have seen an exponential growth in popularity in the past decades, their dualism and synergy have received relatively little attention to date. This paper discusses and formalizes the synergy between Bayesian optimization and active learning as symbiotic adaptive sampling methodologies driven by common principles. In particular, we demonstrate this unified perspective through the formalization of the analogy between the Bayesian infill criteria and active learning criteria as driving principles of both the goal-driven procedures. To support our original perspective, we propose a general classification of adaptive sampling techniques to highlight similarities and differences between the vast families of adaptive sampling, active learning, and Bayesian optimization. Accordingly, the synergy is demonstrated mapping the Bayesian infill criteria with the active learning criteria, and is formalized for searches informed by both a single information source and multiple levels of fidelity. In addition, we provide guidelines to apply those learning criteria investigating the performance of different Bayesian schemes for a variety of benchmark problems to highlight benefits and limitations over mathematical properties that characterize real-world applications.

Updated: 2024-07-07 14:38:37

标题: 主动学习和贝叶斯优化：学习目标的统一视角

摘要: 科学和工程应用通常与昂贵的优化问题相关联，以确定最佳设计解决方案和感兴趣系统的状态。贝叶斯优化和主动学习通过高效的自适应采样方案计算替代模型，以帮助和加速朝着给定的优化目标进行搜索任务。这两种方法都受特定的填充/学习标准驱动，这些标准量化了评估未知优化变量组合的目标函数的效用。尽管过去几十年，这两个领域都在人气上呈指数增长，但它们的二重性和协同性到目前为止受到相对较少的关注。本文讨论并正式化了贝叶斯优化和主动学习之间的协同性，作为共同原则驱动的共生适应性采样方法。特别是，我们通过形式化贝叶斯填充标准和主动学习标准之间的类比，作为驱动目标驱动程序的原则来展示这种统一视角。为了支持我们的原始视角，我们提出了一般的自适应采样技术分类，以突出自适应采样、主动学习和贝叶斯优化之间的相似性和差异。因此，通过将贝叶斯填充标准与主动学习标准进行映射，展示了协同作用，并为那些受单一信息源和多个保真度级别启发的搜索进行了正式化。此外，我们提供了应用这些学习标准的指导方针，研究不同贝叶斯方案在各种基准问题上的表现，以突出对数学特性的好处和局限性，这些特性表征了现实世界的应用。

更新时间: 2024-07-07 14:38:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2303.01560v4

Image-Conditional Diffusion Transformer for Underwater Image Enhancement

Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.

Updated: 2024-07-07 14:34:31

标题: 图像条件扩散变换器用于水下图像增强

摘要: 水下图像增强（UIE）因其对水下操作和海洋工程的重要性而受到广泛关注。受生成模型的最新进展的启发，我们提出了一种基于图像条件扩散变换器（ICDT）的新型UIE方法。我们的方法将受损的水下图像作为条件输入，并将其转换为潜在空间，其中应用ICDT。ICDT在去噪扩散概率模型（DDPM）中用变换器替代传统的U-Net骨干，因此继承了变换器的可伸缩性等有利特性。此外，我们使用涉及方差的混合损失函数对ICDT进行训练，以实现更好的对数似然，同时显著加速采样过程。我们在水下图像网数据集上实验评估ICDT的可伸缩性，并与UIE先前的工作进行比较。除了良好的扩展性外，我们最大的模型ICDT-XL/2优于所有比较方法，实现了图像增强的最新技术（SOTA）质量。

更新时间: 2024-07-07 14:34:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05389v1

Shadows of quantum machine learning

Quantum machine learning is often highlighted as one of the most promising practical applications for which quantum computers could provide a computational advantage. However, a major obstacle to the widespread use of quantum machine learning models in practice is that these models, even once trained, still require access to a quantum computer in order to be evaluated on new data. To solve this issue, we introduce a new class of quantum models where quantum resources are only required during training, while the deployment of the trained model is classical. Specifically, the training phase of our models ends with the generation of a 'shadow model' from which the classical deployment becomes possible. We prove that: i) this class of models is universal for classically-deployed quantum machine learning; ii) it does have restricted learning capacities compared to 'fully quantum' models, but nonetheless iii) it achieves a provable learning advantage over fully classical learners, contingent on widely-believed assumptions in complexity theory. These results provide compelling evidence that quantum machine learning can confer learning advantages across a substantially broader range of scenarios, where quantum computers are exclusively employed during the training phase. By enabling classical deployment, our approach facilitates the implementation of quantum machine learning models in various practical contexts.

Updated: 2024-07-07 14:33:43

标题: 量子机器学习的阴影

摘要: 量子机器学习通常被认为是量子计算机能够提供计算优势的最有前景的实际应用之一。然而，在实践中广泛使用量子机器学习模型的一个主要障碍是，即使这些模型经过训练，仍然需要访问量子计算机才能评估新数据。为了解决这个问题，我们引入了一类新的量子模型，在训练过程中只需要量子资源，而训练后的模型部署是经典的。具体来说，我们模型的训练阶段以生成一个“影子模型”结束，从而使经典部署成为可能。我们证明：i）这类模型在经典部署的量子机器学习中是普适的；ii）与“完全量子”模型相比，它的学习能力受到限制，但尽管如此，iii）它在完全经典学习者上实现了可证明的学习优势，前提是在复杂性理论中广泛认为的假设。这些结果提供了令人信服的证据，表明量子机器学习能够在更广泛的情景中提供学习优势，其中量子计算机仅在训练阶段中被使用。通过实现经典部署，我们的方法促进了在各种实际情境中实施量子机器学习模型。

更新时间: 2024-07-07 14:33:43

领域: quant-ph,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.00061v2

Mind the Model, Not the Agent: The Primacy Bias in Model-based RL

The primacy bias in model-free reinforcement learning (MFRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of MFRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias in MFRL. However, the primacy bias in model-based reinforcement learning (MBRL) remains unexplored. In this work, we focus on investigating the primacy bias in MBRL. We begin by observing that resetting the agent's parameters harms its performance in the context of MBRL. We further find that the primacy bias in MBRL is more closely related to the primacy bias of the world model instead of the primacy bias of the agent. Based on this finding, we propose \textit{world model resetting}, a simple yet effective technique to alleviate the primacy bias in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The experimental results show that \textit{world model resetting} can significantly alleviate the primacy bias in the model-based setting and improve the algorithm's performance. We also give a guide on how to perform \textit{world model resetting} effectively.

Updated: 2024-07-07 14:32:02

标题: 注意模型，而不是代理：基于模型的强化学习中的主导偏见

摘要: 在无模型强化学习（MFRL）中的首要偏见，指的是代理倾向于过早过度拟合数据并丧失从新数据中学习的能力，可能会显著降低MFRL算法的性能。先前的研究表明，采用简单的技术，如重置代理的参数，可以显著减轻MFRL中的首要偏见。然而，模型基础强化学习（MBRL）中的首要偏见尚未受到探索。在这项工作中，我们专注于研究MBRL中的首要偏见。我们首先观察到，在MBRL的背景下重置代理的参数会损害其性能。我们进一步发现，MBRL中的首要偏见更与世界模型的首要偏见相关，而不是与代理的首要偏见相关。基于这一发现，我们提出“世界模型重置”，这是一种简单而有效的技术，用于减轻MBRL中的首要偏见。我们将我们的方法应用于两种不同的MBRL算法，MBPO和DreamerV2。我们验证了我们的方法在MuJoCo和DeepMind控制套件上的多个连续控制任务以及Atari 100k基准上的离散控制任务上的有效性。实验结果表明，“世界模型重置”可以显著减轻模型基础设定中的首要偏见，并提高算法的性能。我们还提供了如何有效执行“世界模型重置”的指南。

更新时间: 2024-07-07 14:32:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.15017v2

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

Updated: 2024-07-07 14:21:04

标题: 多元和谐：将神经网络与典范相关分析相结合

摘要: 将多个经过训练的模型的预测通过集成来结合通常是提高准确性的一种好方法，利用模型学习到的不同特征，然而这种方法伴随着高计算和存储成本。模型融合，即通过合并多个模型的参数将它们合并为一个模型，可以降低这些成本，但在实践中效果并不好。事实上，神经网络损失景观是高维且非凸的，通过学习找到的最小值通常是通过高损失障碍分开的。近年来，许多研究致力于找到将一个网络的特征与第二个网络的特征匹配的排列，降低参数空间中它们之间线性路径上的损失障碍。然而，排列是有限制的，因为它们假设不同模型的神经元之间存在一对一的映射。我们提出了一种新的模型合并算法，CCA Merge，它基于典型相关分析，旨在最大化模型特征的线性组合之间的相关性。我们展示了我们的对齐方法在平均训练在相同或不同数据拆分上的模型时比过去的方法表现更好。我们还将这种分析扩展到更难的设置，即合并超过2个模型的情况，我们发现CCA Merge比过去的方法显著更有效。我们的代码可以在https://github.com/shoroi/align-n-merge 上公开获取。

更新时间: 2024-07-07 14:21:04

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2407.05385v1

YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention

The accurate prediction of drug molecule solubility is essential for determining their therapeutic effectiveness and safety, influencing the drug's ADME processes. Traditional solubility prediction techniques often fail to capture the complex nature of molecular tructures, leading to notable deviations between predictions and actual results. For example, the Discussion on Advanced Drug-Like Compound Structures. Lusci highlighted issues in capturing crucial cyclic structural information in molecules with ring structures. To overcome this issue, our research introduces a novel deep learning framework combining attention-based transformers, Long Short-Term Memory (LSTM) networks, and Graph Convolutional Networks (GCN), aimed at enhancing the precision of solubility predictions. Utilizing a training set of 9,943 compounds and testing on an anticancer compound dataset, our method achieved a correlation coefficient ($R^2$) of 0.59 and a Root Mean Square Error (RMSE) of 0.57, which outperforms the benchmark models' scores of 0.52 ($R^2$) and 0.61 (RMSE). Importantly, in an additional independent test, our model significantly outperformed the baseline with an RMSE of 1.05 compared to 1.28, a relative accuracy improvement of 45.9%. This research not only demonstrates the vast potential of deep learning for improving solubility prediction accuracy but also offers novel insights for drug design and selection in the future. Continued efforts will be directed towards optimizing the model architecture and extending its application to better support the drug development process, underscoring the pivotal role of deep learning in drug discovery.

Updated: 2024-07-07 14:10:38

标题: YZS模型：基于图卷积网络和Transformer-Attention的有机药物溶解度预测模型

摘要: 药物分子溶解度的准确预测对确定其治疗效果和安全性至关重要，影响药物的ADME过程。传统的溶解度预测技术通常无法捕捉分子结构的复杂性，导致预测结果与实际结果之间存在显著偏差。例如，对于具有环结构的分子，Lusci在高级药物类似化合物结构讨论中突出了捕捉关键的环式结构信息的问题。为了克服这一问题，我们的研究引入了一种新颖的深度学习框架，结合基于注意力的transformers、长短期记忆（LSTM）网络和图卷积网络（GCN），旨在提高溶解度预测的精度。利用包含9,943种化合物的训练集，并在一个抗癌化合物数据集上进行测试，我们的方法实现了相关系数（$R^2$）为0.59，均方根误差（RMSE）为0.57，优于基准模型的得分（$R^2$为0.52，RMSE为0.61）。重要的是，在另一个独立测试中，我们的模型较基准模型显著优越，RMSE为1.05，相比于1.28，相对准确率提高了45.9%。这项研究不仅展示了深度学习在提高溶解度预测准确性方面的巨大潜力，还为未来的药物设计和选择提供了新的见解。我们将继续努力优化模型架构，并扩展其应用以更好地支持药物开发过程，突显了深度学习在药物发现中的关键作用。

更新时间: 2024-07-07 14:10:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.19136v3

AiGAS-dEVL: An Adaptive Incremental Neural Gas Model for Drifting Data Streams under Extreme Verification Latency

The ever-growing speed at which data are generated nowadays, together with the substantial cost of labeling processes cause Machine Learning models to face scenarios in which data are partially labeled. The extreme case where such a supervision is indefinitely unavailable is referred to as extreme verification latency. On the other hand, in streaming setups data flows are affected by exogenous factors that yield non-stationarities in the patterns (concept drift), compelling models learned incrementally from the data streams to adapt their modeled knowledge to the concepts within the stream. In this work we address the casuistry in which these two conditions occur together, by which adaptation mechanisms to accommodate drifts within the stream are challenged by the lack of supervision, requiring further mechanisms to track the evolution of concepts in the absence of verification. To this end we propose a novel approach, AiGAS-dEVL (Adaptive Incremental neural GAS model for drifting Streams under Extreme Verification Latency), which relies on growing neural gas to characterize the distributions of all concepts detected within the stream over time. Our approach exposes that the online analysis of the behavior of these prototypical points over time facilitates the definition of the evolution of concepts in the feature space, the detection of changes in their behavior, and the design of adaptation policies to mitigate the effect of such changes in the model. We assess the performance of AiGAS-dEVL over several synthetic datasets, comparing it to that of state-of-the-art approaches proposed in the recent past to tackle this stream learning setup. Our results reveal that AiGAS-dEVL performs competitively with respect to the rest of baselines, exhibiting a superior adaptability over several datasets in the benchmark while ensuring a simple and interpretable instance-based adaptation strategy.

Updated: 2024-07-07 14:04:57

标题: AiGAS-dEVL：一种适应性增量神经气体模型，用于极端验证延迟下的漂移数据流

摘要: 目前数据生成速度不断增长，加上标注过程的高昂成本，导致机器学习模型面临数据部分标记的情况。在极端情况下，监督一直不可用被称为极端验证延迟。另一方面，在流式设置中，数据流受到外部因素的影响，导致模式中出现非静态性（概念漂移），迫使从数据流逐渐学习的模型将其建模知识适应流中的概念。在这项工作中，我们解决了这两种情况同时发生的情况，适应机制以适应流中的漂移受到监督不足的挑战，需要进一步机制来跟踪概念的演变在没有验证的情况下。为此，我们提出了一种新颖的方法，AiGAS-dEVL（极端验证延迟下漂移流的自适应增量神经气体模型），它依赖于增长神经气体来描述流中检测到的所有概念的分布随时间的变化。我们的方法揭示了这些原型点随时间的在线分析有助于定义特征空间中概念的演变，检测其行为的变化，并设计适应策略以减轻这些变化对模型的影响。我们在几个合成数据集上评估了AiGAS-dEVL的性能，将其与最近提出的用于应对这种流学习设置的最新方法进行了比较。我们的结果显示，AiGAS-dEVL在基准测试中相对于其他基准表现出竞争力，展现出在多个数据集上优越的适应性，同时确保简单且可解释的基于实例的适应策略。

更新时间: 2024-07-07 14:04:57

领域: cs.LG,cs.AI,cs.NE,68T05,I.2.6

下载: http://arxiv.org/abs/2407.05379v1

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse knowledge. Nevertheless, conventional knowledge distillation approaches are not tailored to modular models and struggle with unique architectures and enormous parameter counts. Motivated by these challenges, we propose module-to-module knowledge distillation (m2mKD) for transferring knowledge between modules. m2mKD combines teacher modules of a pretrained monolithic model and student modules of a modular model with a shared meta model respectively to encourage the student module to mimic the behaviour of the teacher module. We evaluate m2mKD on two modular neural architectures: Neural Attentive Circuits (NACs) and Vision Mixture-of-Experts (V-MoE). Applying m2mKD to NACs yields significant improvements in IID accuracy on Tiny-ImageNet (up to 5.6%) and OOD robustness on Tiny-ImageNet-R (up to 4.2%). Additionally, the V-MoE-Base model trained with m2mKD achieves 3.5% higher accuracy than end-to-end training on ImageNet-1k. Code is available at https://github.com/kamanphoebe/m2mKD.

Updated: 2024-07-07 14:03:04

标题: m2mKD：模块到模块知识蒸馏用于模块化变压器

摘要: 模块化神经架构因其强大的泛化能力和高效的适应新领域而受到关注。然而，训练这些模型面临挑战，因为固有稀疏连接性导致的优化困难。通过利用来自单体模型的知识，通过知识蒸馏等技术可以促进训练，并实现多样知识的整合。然而，传统的知识蒸馏方法并不适用于模块化模型，并且在处理独特的架构和庞大的参数数量时存在困难。受到这些挑战的激励，我们提出了用于模块之间知识蒸馏（m2mKD）的方法，以在模块之间传递知识。m2mKD将预训练的单体模型的教师模块和模块化模型的学生模块与共享的元模型结合起来，分别鼓励学生模块模仿教师模块的行为。我们在两种模块化神经架构上评估了m2mKD：神经关注电路（NACs）和视觉专家混合（V-MoE）。将m2mKD应用于NACs在Tiny-ImageNet上显著提高了IID准确性（最多5.6%），在Tiny-ImageNet-R上提高了OOD鲁棒性（最多4.2%）。此外，使用m2mKD训练的V-MoE-Base模型在ImageNet-1k上比端到端训练的准确率高出3.5%。代码可在https://github.com/kamanphoebe/m2mKD获取。

更新时间: 2024-07-07 14:03:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.16918v3

Collective Innovation in Groups of Large Language Models

Human culture relies on collective innovation: our ability to continuously explore how existing elements in our environment can be combined to create new ones. Language is hypothesized to play a key role in human culture, driving individual cognitive capacities and shaping communication. Yet the majority of models of collective innovation assign no cognitive capacities or language abilities to agents. Here, we contribute a computational study of collective innovation where agents are Large Language Models (LLMs) that play Little Alchemy 2, a creative video game originally developed for humans that, as we argue, captures useful aspects of innovation landscapes not present in previous test-beds. We, first, study an LLM in isolation and discover that it exhibits both useful skills and crucial limitations. We, then, study groups of LLMs that share information related to their behaviour and focus on the effect of social connectivity on collective performance. In agreement with previous human and computational studies, we observe that groups with dynamic connectivity out-compete fully-connected groups. Our work reveals opportunities and challenges for future studies of collective innovation that are becoming increasingly relevant as Generative Artificial Intelligence algorithms and humans innovate alongside each other.

Updated: 2024-07-07 13:59:46

标题: 大型语言模型群体中的集体创新

摘要: 人类文化依赖于集体创新：我们不断探索如何将环境中现有元素结合起来创造新元素的能力。据推测，语言在人类文化中起着关键作用，推动个体认知能力并塑造交流。然而，大多数集体创新模型并未赋予代理人认知能力或语言能力。在这里，我们提出了一个关于集体创新的计算研究，代理人是大型语言模型（LLMs），他们玩的是《小魔术2》这款原本为人类开发的创意视频游戏，我们认为这款游戏捕捉到了以前测试平台中没有的创新景观的有用方面。首先，我们研究了一个独立的LLM，并发现它展示了有用的技能和关键的局限性。然后，我们研究了分享行为相关信息的LLM群体，并关注社会连接对集体表现的影响。与以前的人类和计算研究一致，我们观察到动态连接的群体胜过完全连接的群体。我们的工作揭示了未来关于集体创新的研究的机遇和挑战，这些研究因生成性人工智能算法和人类并行创新而变得越来越相关。

更新时间: 2024-07-07 13:59:46

领域: cs.AI

下载: http://arxiv.org/abs/2407.05377v1

Online Drift Detection with Maximum Concept Discrepancy

Continuous learning from an immense volume of data streams becomes exceptionally critical in the internet era. However, data streams often do not conform to the same distribution over time, leading to a phenomenon called concept drift. Since a fixed static model is unreliable for inferring concept-drifted data streams, establishing an adaptive mechanism for detecting concept drift is crucial. Current methods for concept drift detection primarily assume that the labels or error rates of downstream models are given and/or underlying statistical properties exist in data streams. These approaches, however, struggle to address high-dimensional data streams with intricate irregular distribution shifts, which are more prevalent in real-world scenarios. In this paper, we propose MCD-DD, a novel concept drift detection method based on maximum concept discrepancy, inspired by the maximum mean discrepancy. Our method can adaptively identify varying forms of concept drift by contrastive learning of concept embeddings without relying on labels or statistical properties. With thorough experiments under synthetic and real-world scenarios, we demonstrate that the proposed method outperforms existing baselines in identifying concept drifts and enables qualitative analysis with high explainability.

Updated: 2024-07-07 13:57:50

标题: 使用最大概念差异的在线漂移检测

摘要: 在互联网时代，持续学习来自海量数据流变得异常重要。然而，数据流通常不会随着时间保持相同的分布，导致一种称为概念漂移的现象。由于固定的静态模型对推断概念漂移数据流是不可靠的，因此建立一种适应性机制来检测概念漂移至关重要。当前的概念漂移检测方法主要假设下游模型的标签或错误率已知，并/或在数据流中存在基本的统计特性。然而，这些方法往往难以处理高维数据流中复杂的不规则分布变化，这种情况在实际场景中更为普遍。在本文中，我们提出了一种基于最大概念差异的新概念漂移检测方法MCD-DD，灵感来自于最大均值差异。我们的方法可以通过对概念嵌入进行对比学习来自适应地识别不同形式的概念漂移，而无需依赖标签或统计特性。通过在合成和实际场景下的彻底实验，我们证明了所提出的方法在识别概念漂移方面优于现有基线，并能够提供高可解释性的定性分析。

更新时间: 2024-07-07 13:57:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05375v1

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment, through the lens of statistical analysis. We find that existing SSL algorithms which generate pseudo-labels using heuristic strategies or uncalibrated model confidence are unreliable when imbalanced class distributions bias pseudo-labels. To address this, we introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL) to enhance the quality of pseudo-labelling for imbalanced SSL. We propose to learn refinement and thresholding parameters from a partition of the training dataset in a class-balanced way. SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis. Our experiments show that SEVAL surpasses state-of-the-art SSL methods, delivering more accurate and effective pseudo-labels in various imbalanced SSL situations. SEVAL, with its simplicity and flexibility, can enhance various SSL techniques effectively. The code is publicly available~\footnote{\url{https://github.com/ZerojumpLine/SEVAL}}.

Updated: 2024-07-07 13:46:22

标题: 学习标签细化和阈值调整以应对不平衡的半监督学习

摘要: 半监督学习（SSL）算法在面对不平衡的训练数据时往往表现不佳。在这种情况下，生成的伪标签可能会偏向于多数类，并且使用这些伪标签的模型可能会进一步放大这种偏见。在本研究中，我们通过统计分析的视角探讨了不平衡SSL中的伪标记策略，包括伪标签的优化和阈值调整。我们发现，使用启发式策略或未校准模型置信度生成伪标签的现有SSL算法在存在不平衡类分布偏向的情况下是不可靠的。为了解决这个问题，我们提出了基于验证数据的伪标签优化的半监督学习（SEVAL）方法，以提高不平衡SSL的伪标签质量。我们建议从一个平衡各类的训练数据集的分区中学习优化和阈值参数。SEVAL能够适应特定任务，提高伪标签的准确性，并确保每个类别的伪标签的正确性。我们的实验表明，SEVAL超越了最先进的SSL方法，在各种不平衡SSL情况下提供更准确和有效的伪标签。SEVAL以其简单性和灵活性，能有效增强各种SSL技术。代码已公开发布。

更新时间: 2024-07-07 13:46:22

领域: cs.LG

下载: http://arxiv.org/abs/2407.05370v1

Music Era Recognition Using Supervised Contrastive Learning and Artist Information

Does popular music from the 60s sound different than that of the 90s? Prior study has shown that there would exist some variations of patterns and regularities related to instrumentation changes and growing loudness across multi-decadal trends. This indicates that perceiving the era of a song from musical features such as audio and artist information is possible. Music era information can be an important feature for playlist generation and recommendation. However, the release year of a song can be inaccessible in many circumstances. This paper addresses a novel task of music era recognition. We formulate the task as a music classification problem and propose solutions based on supervised contrastive learning. An audio-based model is developed to predict the era from audio. For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training. Experimental result on Million Song Dataset demonstrates that the audio-based model achieves 54% in accuracy with a tolerance of 3-years range; incorporating the artist information with the MMC framework for training leads to 9% improvement further.

Updated: 2024-07-07 13:43:55

标题: 使用监督对比学习和艺术家信息识别音乐时代

摘要: 这项研究探讨了60年代的流行音乐是否与90年代的流行音乐有所不同。之前的研究表明，随着跨越多个十年的趋势变化，乐器变化和声音变大等模式和规律可能存在一些变化。这表明，通过音频和艺术家信息等音乐特征来感知一首歌的时代是可能的。音乐时代信息可以成为播放列表生成和推荐的重要特征。然而，在许多情况下，一首歌的发行年份可能无法获取。本文研究了一项新颖的音乐时代识别任务。我们将这一任务定为音乐分类问题，并提出了基于监督对比学习的解决方案。我们开发了一个基于音频的模型来预测音乐时代。在艺术家信息可获取的情况下，我们扩展了基于音频的模型以接受多模态输入，并开发了一个名为MultiModal Contrastive (MMC)学习的框架来增强训练。在百万首歌曲数据集上的实验结果表明，基于音频的模型在3年范围内的准确率达到了54％；将艺术家信息与MMC框架结合进行训练进一步提高了9％。

更新时间: 2024-07-07 13:43:55

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2407.05368v1

Learning Closed-form Equations for Subgrid-scale Closures from High-fidelity Data: Promises and Challenges

There is growing interest in discovering interpretable, closed-form equations for subgrid-scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation-discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh-B\'enard convection (RBC). Across common filters (e.g., Gaussian, box), we robustly discover closures of the same form for momentum and heat fluxes. These closures depend on nonlinear combinations of gradients of filtered variables, with constants that are independent of the fluid/flow properties and only depend on filter type/size. We show that these closures are the nonlinear gradient model (NGM), which is derivable analytically using Taylor-series. Indeed, we suggest that with common (physics-free) equation-discovery algorithms, for many common systems/physics, discovered closures are consistent with the leading term of the Taylor-series (except when cutoff filters are used). Like previous studies, we find that large-eddy simulations with NGM closures are unstable, despite significant similarities between the true and NGM-predicted fluxes (correlations $> 0.95$). We identify two shortcomings as reasons for these instabilities: in 2D, NGM produces zero kinetic energy transfer between resolved and subgrid scales, lacking both diffusion and backscattering. In RBC, potential energy backscattering is poorly predicted. Moreover, we show that SGS fluxes diagnosed from data, presumed the ''truth'' for discovery, depend on filtering procedures and are not unique. Accordingly, to learn accurate, stable closures in future work, we propose several ideas around using physics-informed libraries, loss functions, and metrics. These findings are relevant to closure modeling of any multi-scale system.

Updated: 2024-07-07 13:40:20

标题: 学习从高保真数据中学习亚网格尺度封闭形式方程式：承诺与挑战

摘要: 越来越多的人对在地球系统中复杂过程的亚网格尺度（SGS）闭合/参数化的可解释性、闭合形式方程感兴趣。在这里，我们应用一种常见的方程发现技术，利用大量库从经过滤的2D湍流和瑞利-贝纳德对流（RBC）的直接数值模拟中学习闭合。在常见的滤波器（例如高斯、盒状）下，我们稳健地发现了动量和热通量的相同形式的闭合。这些闭合取决于经过滤变量的梯度的非线性组合，其中常数与流体/流动性质无关，仅取决于滤波器类型/大小。我们展示了这些闭合是非线性梯度模型（NGM），可以使用泰勒级数进行解析推导。实际上，我们建议使用常见（无物理意义）的方程发现算法，对于许多常见的系统/物理学，发现的闭合与泰勒级数的主导项一致（除非使用了截止滤波器）。与先前的研究一样，我们发现使用NGM闭合的大涡模拟是不稳定的，尽管真实和NGM预测的通量之间存在显著相似性（相关性>0.95）。我们确定了两个原因导致这些不稳定性：在2D中，NGM产生了解析和亚网格尺度之间的零动能转移，缺乏扩散和反向散射。在RBC中，潜在能量反向散射的预测不佳。此外，我们展示了从数据中诊断出的SGS通量，假定为“真相”用于发现，取决于滤波过程，且不是唯一的。因此，为了在未来的工作中学习准确、稳定的闭合，我们提出了几个关于使用基于物理的库、损失函数和度量的想法。这些发现对于任何多尺度系统的闭合建模都是相关的。

更新时间: 2024-07-07 13:40:20

领域: physics.flu-dyn,cs.LG,physics.ao-ph,76F65 (Primary) 86A08, 68T01, 76F05, 76F35 (Secondary),J.2; I.2.0; G.1.8

下载: http://arxiv.org/abs/2306.05014v3

ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models

In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural language processing, logical reasoning, and generalization capabilities. Despite their potential, the absence of a performance evaluation benchmark for LLM in the power sector has limited the effective application of these technologies. Addressing this gap, our study introduces "ElecBench", an evaluation benchmark of LLMs within the power sector. ElecBench aims to overcome the shortcomings of existing evaluation benchmarks by providing comprehensive coverage of sector-specific scenarios, deepening the testing of professional knowledge, and enhancing decision-making precision. The framework categorizes scenarios into general knowledge and professional business, further divided into six core performance metrics: factuality, logicality, stability, security, fairness, and expressiveness, and is subdivided into 24 sub-metrics, offering profound insights into the capabilities and limitations of LLM applications in the power sector. To ensure transparency, we have made the complete test set public, evaluating the performance of eight LLMs across various scenarios and metrics. ElecBench aspires to serve as the standard benchmark for LLM applications in the power sector, supporting continuous updates of scenarios, metrics, and models to drive technological progress and application.

Updated: 2024-07-07 13:38:05

标题: ElecBench：用于大型语言模型的电力调度评估基准

摘要: 为了应对电网稳定的紧急需求以及可再生能源整合和电力市场动态所带来的复杂挑战，电力行业越来越寻求创新技术解决方案。在这种背景下，大型语言模型（LLMs）已成为提高效率并促进电力行业智能进步的关键技术，其具有出色的自然语言处理、逻辑推理和泛化能力。尽管具有潜力，但是缺乏电力行业LLM的性能评估基准限制了这些技术的有效应用。为填补这一空白，我们的研究引入了“ElecBench”，这是电力行业LLMs的评估基准。ElecBench旨在通过提供全面覆盖与行业相关场景、深入测试专业知识以及增强决策精度来克服现有评估基准的缺点。该框架将场景分为一般知识和专业业务两类，进一步分为六个核心性能指标：真实性、逻辑性、稳定性、安全性、公平性和表达性，并细分为24个次指标，深入揭示了LLMs在电力行业应用中的能力和局限性。为确保透明度，我们已经公开了完整的测试集，评估了八个LLMs在各种场景和指标下的表现。ElecBench旨在成为电力行业LLMs应用的标准基准，支持不断更新场景、指标和模型，推动技术进步和应用。

更新时间: 2024-07-07 13:38:05

领域: cs.AI

下载: http://arxiv.org/abs/2407.05365v1

Medical Unlearnable Examples: Securing Medical Data from Unauthorized Training via Sparsity-Aware Local Masking

The rapid expansion of AI in healthcare has led to a surge in medical data generation and storage, boosting medical AI development. However, fears of unauthorized use, like training commercial AI models, hinder researchers from sharing their valuable datasets. To encourage data sharing, one promising solution is to introduce imperceptible noise into the data. This method aims to safeguard the data against unauthorized training by inducing degradation in the generalization ability of the trained model. However, they are not effective and efficient when applied to medical data, mainly due to the ignorance of the sparse nature of medical images. To address this problem, we propose the Sparsity-Aware Local Masking (SALM) method, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previously. This simple yet effective approach, by focusing on local areas, significantly narrows down the search space for disturbances and fully leverages the characteristics of sparsity. Our extensive experiments across various datasets and model architectures demonstrate that SALM effectively prevents unauthorized training of different models and outperforms previous SoTA data protection methods.

Updated: 2024-07-07 13:36:22

标题: 医学不可学习的例子：通过稀疏感知局部遮蔽保护医疗数据免受未经授权的训练

摘要: 在医疗领域人工智能的快速发展导致医疗数据的生成和存储激增，推动了医疗人工智能的发展。然而，对未经授权使用的担忧，如训练商业人工智能模型，阻碍了研究人员分享他们珍贵的数据集。为了鼓励数据共享，一种有前途的解决方案是向数据中引入难以察觉的噪音。这种方法旨在通过在培训模型的泛化能力中引入退化来保护数据，防止未经授权的训练。然而，当应用于医疗数据时，它们并不有效和高效，主要是因为忽视了医疗图像的稀疏性质。为了解决这个问题，我们提出了一种新颖的方法，即稀疏感知局部遮罩（SALM）方法，该方法有选择地扰动显著的像素区域，而不是整个图像如以前的方法。这种简单而有效的方法，通过专注于局部区域，显著缩小了干扰的搜索空间，并充分利用了稀疏性质。我们在各种数据集和模型架构上进行的大量实验证明，SALM有效地防止了不同模型的未经授权训练，并优于以前的最先进数据保护方法。

更新时间: 2024-07-07 13:36:22

领域: eess.IV,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.10573v2

PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representation entanglement and localization, which largely hinders their prediction performance and leads to performance inconsistency on tabular tasks. To overcome these problems, we explore a novel direction of applying prototype learning for tabular ML and propose a prototype-based tabular representation learning framework, PTaRL, for tabular prediction tasks. The core idea of PTaRL is to construct prototype-based projection space (P-Space) and learn the disentangled representation around global data prototypes. Specifically, PTaRL mainly involves two stages: (i) Prototype Generation, that constructs global prototypes as the basis vectors of P-Space for representation, and (ii) Prototype Projection, that projects the data samples into P-Space and keeps the core global data information via Optimal Transport. Then, to further acquire the disentangled representations, we constrain PTaRL with two strategies: (i) to diversify the coordinates towards global prototypes of different representations within P-Space, we bring up a diversification constraint for representation calibration; (ii) to avoid prototype entanglement in P-Space, we introduce a matrix orthogonalization constraint to ensure the independence of global prototypes. Finally, we conduct extensive experiments in PTaRL coupled with state-of-the-art deep tabular ML models on various tabular benchmarks and the results have shown our consistent superiority.

Updated: 2024-07-07 13:32:03

标题: PTaRL:基于原型的表格表示学习通过空间校准

摘要: 表格数据在各种现实世界领域中发挥着非常重要的作用，如医疗保健、工程、金融等。随着深度学习的成功，许多基于深度网络（例如Transformer、ResNet）的表格机器学习（ML）方法在表格基准测试中取得了竞争性表现。然而，现有的深度表格ML方法存在表示纠缠和局部化问题，这在很大程度上阻碍了它们的预测性能，并导致在表格任务上表现不一致。为了克服这些问题，我们探索了一种新颖的方法，将原型学习应用于表格ML，并提出了一种基于原型的表格表示学习框架PTaRL，用于表格预测任务。PTaRL的核心思想是构建基于原型的投影空间（P-Space）并学习围绕全局数据原型的解耦表示。具体而言，PTaRL主要包括两个阶段：（i）原型生成，构建全局原型作为P-Space的表示基向量；（ii）原型投影，将数据样本投影到P-Space中，并通过最优传输保留核心全局数据信息。为了进一步获得解耦表示，我们通过两种策略对PTaRL进行约束：（i）为了使坐标朝向P-Space中不同表示的全局原型多样化，我们提出了一个多样化约束进行表示校准；（ii）为了避免P-Space中原型的纠缠，我们引入了一个矩阵正交化约束，以确保全局原型的独立性。最后，我们在各种表格基准测试中对PTaRL与最先进的深度表格ML模型进行了广泛实验，并结果显示我们的持续优势。

更新时间: 2024-07-07 13:32:03

领域: cs.LG

下载: http://arxiv.org/abs/2407.05364v1

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

Updated: 2024-07-07 13:16:32

标题: 超越静态人工智能评估：推进人际互动评估以减少LLM的危害和风险

摘要: 模型评估对于理解人工智能系统的安全性、风险和社会影响至关重要。虽然大多数现实世界的人工智能应用涉及人机交互，但大多数当前的评估（例如常见基准测试）并不包括人类因素。相反，它们以有限的方式融入人类因素，评估模型的安全性，因此无法捕捉人与模型交互的复杂性。本文讨论并实现了一个新兴评估类别的定义——“人机交互评估”（HIEs），重点关注人与模型交互的评估或人类使用模型的过程和结果。首先，我们认为HIEs可以用来增加安全性评估的有效性，评估直接的人类影响和交互特定的危害，并指导未来对模型社会影响的评估。其次，我们提出了一个以安全为重点的HIE设计框架——包含人-LLM交互分类法——分为三个阶段：（1）确定风险或危害领域，（2）描述使用情境，（3）选择评估参数。第三，我们将我们的框架应用于两个潜在的评估，用于过度依赖和说服风险。最后，我们提出了具体的建议，以解决HIEs的成本、可复制性和不代表性的问题。

更新时间: 2024-07-07 13:16:32

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10632v4

Auditing of AI: Legal, Ethical and Technical Approaches

AI auditing is a rapidly growing field of research and practice. This review article, which doubles as an editorial to Digital Societys topical collection on Auditing of AI, provides an overview of previous work in the field. Three key points emerge from the review. First, contemporary attempts to audit AI systems have much to learn from how audits have historically been structured and conducted in areas like financial accounting, safety engineering and the social sciences. Second, both policymakers and technology providers have an interest in promoting auditing as an AI governance mechanism. Academic researchers can thus fill an important role by studying the feasibility and effectiveness of different AI auditing procedures. Third, AI auditing is an inherently multidisciplinary undertaking, to which substantial contributions have been made by computer scientists and engineers as well as social scientists, philosophers, legal scholars and industry practitioners. Reflecting this diversity of perspectives, different approaches to AI auditing have different affordances and constraints. Specifically, a distinction can be made between technology-oriented audits, which focus on the properties and capabilities of AI systems, and process oriented audits, which focus on technology providers governance structures and quality management systems. The next step in the evolution of auditing as an AI governance mechanism, this article concludes, should be the interlinking of these available (and complementary) approaches into structured and holistic procedures to audit not only how AI systems are designed and used but also how they impact users, societies and the natural environment in applied settings over time.

Updated: 2024-07-07 12:49:58

标题: AI的审计：法律、道德和技术方法

摘要: 人工智能审计是一个快速发展的研究和实践领域。这篇评论文章，同时也是《数字社会》关于人工智能审计的专题集的社论，概述了该领域先前的工作。评论中出现了三个关键观点。首先，当代审计人工智能系统的尝试可以从历史上在财务会计、安全工程和社会科学等领域结构和开展审计的方式中学到很多。其次，决策者和技术提供商都有兴趣将审计作为人工智能治理机制。学术研究人员因此可以通过研究不同人工智能审计程序的可行性和有效性来发挥重要作用。第三，人工智能审计是一项固有的跨学科工作，计算机科学家、工程师以及社会科学家、哲学家、法律学者和行业从业者都做出了重要贡献。反映这种多元化观点，人工智能审计的不同方法具有不同的功能和约束。具体而言，可以区分技术导向审计，重点关注人工智能系统的属性和能力，以及过程导向审计，重点关注技术提供商的治理结构和质量管理系统。本文得出结论，审计作为一种人工智能治理机制的演进的下一步应该是将这些可用的（和互补的）方法相互联系起来，形成结构化和全面的程序，不仅审计人工智能系统的设计和使用方式，还审计它们如何随着时间在应用环境中影响用户、社会和自然环境。

更新时间: 2024-07-07 12:49:58

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.06235v1

Open Ad Hoc Teamwork with Cooperative Game Theory

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.

Updated: 2024-07-07 12:43:35

标题: 用合作博弈论开展的即时团队合作

摘要: 特定任务临时团队合作存在一个具有挑战性的问题，需要设计一个能够与队友协作而无需事先协调或共同训练的代理。开放式特定任务临时团队合作（OAHT）通过考虑具有不断变化的队友数量的环境，即开放团队，进一步增加了这一挑战。在实践中，解决这一问题的一个有前途的解决方案是利用图神经网络的泛化能力，以处理各种类型代理的不受限数量，称为基于图的策略学习（GPL）。然而，它在协调图上的联合Q值表示缺乏令人信服的解释。在本文中，我们建立了一个新理论，通过合作博弈理论的视角来理解OAHT的联合Q值表示及其学习范式。基于我们的理论，我们提出了一种基于GPL框架的新算法，名为CIAO，其中包括额外的可证明的实现技巧，可以促进学习。实验结果的演示可在https://sites.google.com/view/ciao2024 上找到，实验代码已发布在https://github.com/hsvgbkhgbv/CIAO。

更新时间: 2024-07-07 12:43:35

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2402.15259v5

Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach

Machine learning has been successfully applied in developing malware detection systems, with a primary focus on accuracy, and increasing attention to reducing computational overhead and improving model interpretability. However, an important question remains underexplored: How well can machine learning-based models detect entirely new forms of malware not present in the training data? In this study, we present a machine learning-based system for detecting obfuscated malware that is not only highly accurate, lightweight and interpretable, but also capable of successfully adapting to new types of malware attacks. Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family. This system was built after training 15 distinct random forest-based models, each on a different malware subtype from the CIC-MalMem-2022 dataset. These models were evaluated against the entire range of malware subtypes, including all unseen malware subtypes. To maintain the system's streamlined nature, training was confined to the top five most important features, which also enhanced interpretability. The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file. We also illustrate how the Shapley additive explanations technique can facilitate the interpretation of the model predictions. Our research contributes to advancing malware detection methodologies, pioneering the feasibility of detecting obfuscated malware by exclusively training a model on a single or a few carefully selected malware subtypes and applying it to detect unseen subtypes.

Updated: 2024-07-07 12:41:40

标题: 检测新的混淆恶意软件变体：一种轻量级且可解释的机器学习方法

摘要: 机器学习已成功应用于开发恶意软件检测系统，主要关注准确性，并越来越关注减少计算开销和提高模型可解释性。然而，一个重要问题仍未被深入探讨：基于机器学习的模型能多好地检测到训练数据中不存在的全新形式的恶意软件？在这项研究中，我们提出了一种基于机器学习的系统，用于检测混淆的恶意软件，不仅准确率高，轻量级和可解释性强，而且能够成功适应新类型的恶意软件攻击。尽管仅在一个恶意软件子类型-间谍软件家族的Transponder上接受训练，我们的系统能够检测到15种恶意软件子类型。该系统是在CIC-MalMem-2022数据集中对15个不同的恶意软件子类型分别训练的随机森林模型基础上构建的。这些模型被评估了整个恶意软件子类型范围，包括所有未见过的恶意软件子类型。为了保持系统的精简性，训练仅限于五个最重要的特征，这也增强了可解释性。Transponder集中的模型表现出高准确性，超过99.8％，每个文件的平均处理速度为5.7微秒。我们还说明了Shapley加性解释技术如何促进模型预测的解释。我们的研究有助于推进恶意软件检测方法，开创了通过专门对单个或少数精选的恶意软件子类型训练模型，并将其应用于检测未知子类型的混淆恶意软件的可行性。

更新时间: 2024-07-07 12:41:40

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.07918v1

The US Algorithmic Accountability Act of 2022 vs. The EU Artificial Intelligence Act: What can they learn from each other?

On the whole, the U.S. Algorithmic Accountability Act of 2022 (US AAA) is a pragmatic approach to balancing the benefits and risks of automated decision systems. Yet there is still room for improvement. This commentary highlights how the US AAA can both inform and learn from the European Artificial Intelligence Act (EU AIA).

Updated: 2024-07-07 12:31:13

标题: 2022年美国算法问责法与欧盟人工智能法：它们可以互相学习什么？

摘要: 总体而言，2022年美国算法问责法（US AAA）是一种权衡自动决策系统的好处和风险的务实方法。然而，仍有改进的空间。本评论强调了美国AAA如何既能借鉴又能吸取欧洲人工智能法（EU AIA）的经验。

更新时间: 2024-07-07 12:31:13

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.06234v1

AI and Social Theory

In this paper, we sketch a programme for AI driven social theory. We begin by defining what we mean by artificial intelligence (AI) in this context. We then lay out our model for how AI based models can draw on the growing availability of digital data to help test the validity of different social theories based on their predictive power. In doing so, we use the work of Randall Collins and his state breakdown model to exemplify that, already today, AI based models can help synthesize knowledge from a variety of sources, reason about the world, and apply what is known across a wide range of problems in a systematic way. However, we also find that AI driven social theory remains subject to a range of practical, technical, and epistemological limitations. Most critically, existing AI systems lack three essential capabilities needed to advance social theory in ways that are cumulative, holistic, open-ended, and purposeful. These are (1) semanticization, i.e., the ability to develop and operationalize verbal concepts to represent machine-manipulable knowledge, (2) transferability, i.e., the ability to transfer what has been learned in one context to another, and (3) generativity, i.e., the ability to independently create and improve on concepts and models. We argue that if the gaps identified here are addressed by further research, there is no reason why, in the future, the most advanced programme in social theory should not be led by AI-driven cumulative advances.

Updated: 2024-07-07 12:26:16

标题: 人工智能和社会理论

摘要: 在这篇论文中，我们勾画了一个由人工智能驱动的社会理论计划。我们首先定义了在这个背景下我们所指的人工智能（AI）。然后我们提出了我们的模型，说明基于人工智能模型如何利用不断增长的数字数据来帮助测试不同社会理论的有效性，基于它们的预测能力。在这样做的过程中，我们利用了Randall Collins的工作和他的国家崩溃模型来举例说明，即已经今天，基于人工智能的模型可以帮助从各种来源综合知识，思考世界，并以系统化的方式在广泛问题范围内应用已知的知识。然而，我们也发现，由人工智能驱动的社会理论仍然受到一系列实际、技术和认识论限制的影响。最关键的是，现有的人工智能系统缺乏三种必要的能力，这些能力需要推动社会理论以一种累积、整体性、开放性和目的性的方式发展。这些能力是：（1）语义化，即，发展和操作化口头概念以表示机器可操作的知识的能力，（2）可转移性，即，在一个领域学到的东西能够应用到另一个领域的能力，（3）生成性，即，独立创造和改进概念和模型的能力。我们认为，如果进一步的研究能够解决这里所指出的差距，那么将来，社会理论中最先进的计划应该由人工智能驱动的累积进步来引领是毫无理由的。

更新时间: 2024-07-07 12:26:16

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.06233v1

The Switch, the Ladder, and the Matrix: Models for Classifying AI Systems

Organisations that design and deploy artificial intelligence (AI) systems increasingly commit themselves to high-level, ethical principles. However, there still exists a gap between principles and practices in AI ethics. One major obstacle organisations face when attempting to operationalise AI Ethics is the lack of a well-defined material scope. Put differently, the question to which systems and processes AI ethics principles ought to apply remains unanswered. Of course, there exists no universally accepted definition of AI, and different systems pose different ethical challenges. Nevertheless, pragmatic problem-solving demands that things should be sorted so that their grouping will promote successful actions for some specific end. In this article, we review and compare previous attempts to classify AI systems for the purpose of implementing AI governance in practice. We find that attempts to classify AI systems found in previous literature use one of three mental model. The Switch, i.e., a binary approach according to which systems either are or are not considered AI systems depending on their characteristics. The Ladder, i.e., a risk-based approach that classifies systems according to the ethical risks they pose. And the Matrix, i.e., a multi-dimensional classification of systems that take various aspects into account, such as context, data input, and decision-model. Each of these models for classifying AI systems comes with its own set of strengths and weaknesses. By conceptualising different ways of classifying AI systems into simple mental models, we hope to provide organisations that design, deploy, or regulate AI systems with the conceptual tools needed to operationalise AI governance in practice.

Updated: 2024-07-07 12:16:01

标题: 开关、梯子和矩阵：用于分类人工智能系统的模型

摘要: 组织设计和部署人工智能（AI）系统的机构越来越致力于高水平的伦理原则。然而，在AI伦理实践中仍存在原则和实践之间的差距。当组织试图使AI伦理运作化时，他们所面临的一个主要障碍是缺乏明确定义的物质范围。换句话说，AI伦理原则应该适用于哪些系统和流程的问题仍未得到解答。当然，AI没有普遍接受的定义，不同的系统面临不同的伦理挑战。然而，务实的问题解决要求对事物进行分类，以便它们的分组将促进针对某一特定目标的成功行动。在本文中，我们回顾并比较了以前试图对AI系统进行分类以实施实践中的AI治理的尝试。我们发现以前文献中的AI系统分类尝试使用了三种心智模型之一。开关，即根据系统特征来确定系统是否被视为AI系统的二元方法。阶梯，即根据系统提出的伦理风险来分类系统的风险为基础的方法。和矩阵，即考虑各种因素的系统的多维分类，如上下文，数据输入和决策模型。对于分类AI系统的每个模型都有其自己的优缺点。通过将不同的分类AI系统的方式概念化为简单的心智模型，我们希望为设计，部署或监管AI系统的机构提供在实践中实施AI治理所需的概念工具。

更新时间: 2024-07-07 12:16:01

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.05341v1

Interpreting the Residual Stream of ResNet18

A mechanistic understanding of the computations learned by deep neural networks (DNNs) is far from complete. In the domain of visual object recognition, prior research has illuminated inner workings of InceptionV1, but DNNs with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural mechanism which InceptionV1 lacks. We observe that for a given block, channel features of the stream are updated along a spectrum: either the input feature skips to the output, the block feature overwrites the output, or the output is some mixture between the input and block features. Furthermore, we show that many residual stream channels compute scale invariant representations through a mixture of the input's smaller-scale feature with the block's larger-scale feature. This not only mounts evidence for the universality of scale equivariance, but also presents how the residual stream further implements scale invariance. Collectively, our results begin an interpretation of the residual stream in visual object recognition, finding it to be a flexible feature manager and a medium to build scale invariant representations.

Updated: 2024-07-07 12:13:03

标题: 解释ResNet18的残余流量

摘要: 深度神经网络（DNNs）学习的计算机制尚未完全理解。在视觉对象识别领域，先前的研究揭示了InceptionV1的内部工作方式，但具有不同架构的DNNs仍然大多未被探索。本研究重点研究了ResNet18，特别关注其残差流，这是InceptionV1所缺乏的一种架构机制。我们观察到，在给定的块中，流的通道特征会沿着一个谱更新：要么输入特征跳到输出，要么块特征覆盖输出，或者输出是输入和块特征之间的混合体。此外，我们展示了许多残差流通道通过输入的较小尺度特征与块的较大尺度特征的混合计算出尺度不变的表示。这不仅为尺度等变性的普遍性提供了证据，还展示了残差流如何进一步实现尺度不变性。总的来说，我们的结果开始解释在视觉对象识别中的残差流，发现它是一个灵活的特征管理器和构建尺度不变表示的媒介。

更新时间: 2024-07-07 12:13:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.05340v1

Can Separators Improve Chain-of-Thought Prompting?

Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting. These separators are designed to help the LLMs understand their thought processes better while reasoning. Interestingly, it turns out that COT-SEP significantly improves the LLMs' performances on complex reasoning tasks (e.g., GSM8K, AQuA, CSQA), compared with the vanilla CoT, which does not use separators. We also study the effects of the type and the location of separators tested on multiple LLMs, including GPT-3.5-Turbo, GPT-4, and LLaMA-2 7B.

Updated: 2024-07-07 12:03:48

标题: 分隔符能改善思维链提示吗？

摘要: Chain-of-thought（CoT）提示是一种简单有效的方法，用于提高大型语言模型（LLMs）的推理能力。CoT的基本思想是通过在输入提示中放置示例让LLMs逐步分解他们的思维过程。然而，CoT提示的密集结构示例可能导致LLMs的认知负荷过重。受人类认知的启发，我们引入了COT-SEP，一种策略性地在CoT提示中在每个示例末尾使用分隔符的方法。这些分隔符旨在帮助LLMs更好地理解他们的思维过程。有趣的是，COT-SEP显著提高了LLMs在复杂推理任务上（例如GSM8K，AQuA，CSQA）的表现，与不使用分隔符的普通CoT相比。我们还研究了在多个LLMs（包括GPT-3.5-Turbo，GPT-4和LLaMA-27B）上测试的分隔符类型和位置对性能的影响。

更新时间: 2024-07-07 12:03:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10645v2

Challenges and Best Practices in Corporate AI Governance:Lessons from the Biopharmaceutical Industry

While the use of artificial intelligence (AI) systems promises to bring significant economic and social benefits, it is also coupled with ethical, legal, and technical challenges. Business leaders thus face the question of how to best reap the benefits of automation whilst managing the associated risks. As a first step, many companies have committed themselves to various sets of ethics principles aimed at guiding the design and use of AI systems. So far so good. But how can well-intentioned ethical principles be translated into effective practice? And what challenges await companies that attempt to operationalize AI governance? In this article, we address these questions by drawing on our first-hand experience of shaping and driving the roll-out of AI governance within AstraZeneca, a biopharmaceutical company. The examples we discuss highlight challenges that any organization attempting to operationalize AI governance will have to face. These include questions concerning how to define the material scope of AI governance, how to harmonize standards across decentralized organizations, and how to measure the impact of specific AI governance initiatives. By showcasing how AstraZeneca managed these operational questions, we hope to provide project managers, CIOs, AI practitioners, and data privacy officers responsible for designing and implementing AI governance frameworks within other organizations with generalizable best practices. In essence, companies seeking to operationalize AI governance are encouraged to build on existing policies and governance structures, use pragmatic and action-oriented terminology, focus on risk management in development and procurement, and empower employees through continuous education and change management.

Updated: 2024-07-07 12:01:42

标题: 企业AI治理中的挑战和最佳实践：生物制药行业的经验启示

摘要: 虽然人工智能（AI）系统的使用承诺带来显著的经济和社会利益，但也伴随着伦理、法律和技术挑战。因此，企业领导者面临如何在管理相关风险的同时最大程度地实现自动化益处的问题。作为第一步，许多公司已经致力于各种旨在指导AI系统设计和使用的伦理原则。到目前为止一切顺利。但善意的伦理原则如何转化为有效实践？企业试图将AI治理实施化将面临什么挑战？在本文中，我们通过我们在生物制药公司阿斯利康塑造和推动AI治理推出的第一手经验来回答这些问题。我们讨论的例子突出了任何试图将AI治理实施化的组织将不得不面对的挑战。这些包括关于如何定义AI治理的实质范围、如何在分散式组织中协调标准，以及如何衡量特定AI治理举措的影响的问题。通过展示阿斯利康是如何解决这些操作问题的，我们希望为其他组织内负责设计和实施AI治理框架的项目经理、首席信息官、AI从业者和数据隐私官提供通用最佳实践。简而言之，鼓励寻求将AI治理实施化的公司建立在现有政策和治理结构的基础上，使用务实和行动导向的术语，专注于在开发和采购中的风险管理，并通过持续教育和变革管理赋予员工权力。

更新时间: 2024-07-07 12:01:42

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.05339v1

A Blueprint for Auditing Generative AI

The widespread use of generative AI systems is coupled with significant ethical and social challenges. As a result, policymakers, academic researchers, and social advocacy groups have all called for such systems to be audited. However, existing auditing procedures fail to address the governance challenges posed by generative AI systems, which display emergent capabilities and are adaptable to a wide range of downstream tasks. In this chapter, we address that gap by outlining a novel blueprint for how to audit such systems. Specifically, we propose a three-layered approach, whereby governance audits (of technology providers that design and disseminate generative AI systems), model audits (of generative AI systems after pre-training but prior to their release), and application audits (of applications based on top of generative AI systems) complement and inform each other. We show how audits on these three levels, when conducted in a structured and coordinated manner, can be a feasible and effective mechanism for identifying and managing some of the ethical and social risks posed by generative AI systems. That said, it is important to remain realistic about what auditing can reasonably be expected to achieve. For this reason, the chapter also discusses the limitations not only of our three-layered approach but also of the prospect of auditing generative AI systems at all. Ultimately, this chapter seeks to expand the methodological toolkit available to technology providers and policymakers who wish to analyse and evaluate generative AI systems from technical, ethical, and legal perspectives.

Updated: 2024-07-07 11:56:54

标题: 《审计生成式人工智能的蓝图》

摘要: 广泛使用生成式人工智能系统伴随着重大的道德和社会挑战。因此，决策者、学术研究人员和社会倡导团体都呼吁对这些系统进行审计。然而，现有的审计程序未能解决生成式人工智能系统所面临的治理挑战，这些系统展示了新兴能力，并适用于各种下游任务。在本章中，我们通过概述一种新颖的审计蓝图来填补这一空白。具体而言，我们提出了一种三层次的方法，即治理审计（对设计和传播生成式人工智能系统的技术提供者进行审计）、模型审计（在预训练之后但在发布之前对生成式人工智能系统进行审计）和应用审计（基于生成式人工智能系统的应用程序进行审计），这三层次相互补充和互相通知。我们展示了在结构化和协调的方式下进行这三个层次的审计可以是识别和管理生成式人工智能系统所带来的一些道德和社会风险的可行和有效机制。然而，重要的是要对审计能够合理达到的预期保持现实态度。因此，这一章节还讨论了不仅我们的三层次方法的限制，而且审计生成式人工智能系统的前景的限制。最终，本章旨在扩展可供技术提供者和决策者使用的方法论工具箱，以便从技术、道德和法律的角度分析和评估生成式人工智能系统。

更新时间: 2024-07-07 11:56:54

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.05338v1

Artificial intelligence, rationalization, and the limits of control in the public sector: the case of tax policy optimization

The use of artificial intelligence (AI) in the public sector is best understood as a continuation and intensification of long standing rationalization and bureaucratization processes. Drawing on Weber, we take the core of these processes to be the replacement of traditions with instrumental rationality, i.e., the most calculable and efficient way of achieving any given policy objective. In this article, we demonstrate how much of the criticisms, both among the public and in scholarship, directed towards AI systems spring from well known tensions at the heart of Weberian rationalization. To illustrate this point, we introduce a thought experiment whereby AI systems are used to optimize tax policy to advance a specific normative end, reducing economic inequality. Our analysis shows that building a machine-like tax system that promotes social and economic equality is possible. However, it also highlights that AI driven policy optimization (i) comes at the exclusion of other competing political values, (ii) overrides citizens sense of their noninstrumental obligations to each other, and (iii) undermines the notion of humans as self-determining beings. Contemporary scholarship and advocacy directed towards ensuring that AI systems are legal, ethical, and safe build on and reinforce central assumptions that underpin the process of rationalization, including the modern idea that science can sweep away oppressive systems and replace them with a rule of reason that would rescue humans from moral injustices. That is overly optimistic. Science can only provide the means, they cannot dictate the ends. Nonetheless, the use of AI in the public sector can also benefit the institutions and processes of liberal democracies. Most importantly, AI driven policy optimization demands that normative ends are made explicit and formalized, thereby subjecting them to public scrutiny and debate.

Updated: 2024-07-07 11:54:14

标题: 人工智能、合理化和公共部门控制的局限性：税收政策优化案例

摘要: 在公共部门使用人工智能（AI）最好理解为长期存在的理性化和官僚化过程的延续和强化。借鉴韦伯的观点，我们认为这些过程的核心是用工具理性代替传统，即以最可计算和有效的方式实现任何给定的政策目标。在本文中，我们展示了对AI系统的许多批评，无论是来自公众还是学术界，都源自韦伯理性化核心的众所周知的紧张关系。为了说明这一点，我们引入了一个思想实验，即利用AI系统优化税收政策以促进特定的规范目标，减少经济不平等。我们的分析表明，建立一个促进社会和经济平等的类似机器的税收系统是可能的。然而，它也强调了AI驱动的政策优化（i）排除了其他竞争政治价值观，（ii）抹杀了公民对彼此非工具义务的感知，以及（iii）破坏了人类作为自主决定的存在的概念。针对确保AI系统合法、道德和安全的当代学术研究和倡导基于和强化了理性化过程的核心假设，包括现代科学可以扫除压迫性系统并用理性原则来拯救人类免受道德不公正的想法。这是过于乐观的。科学只能提供手段，不能决定目的。尽管如此，在公共部门使用AI也可以使自由民主制度的机构和流程受益。最重要的是，AI驱动的政策优化要求将规范目标明确化和正式化，从而使其受到公众审查和辩论。

更新时间: 2024-07-07 11:54:14

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.05336v1

D2-LRR: A Dual-Decomposed MDLatLRR Approach for Medical Image Fusion

In image fusion tasks, an ideal image decomposition method can bring better performance. MDLatLRR has done a great job in this aspect, but there is still exist some space for improvement. Considering that MDLatLRR focuses solely on the detailed parts (salient features) extracted from input images via latent low-rank representation (LatLRR), the basic parts (principal features) extracted by LatLRR are not fully utilized. Therefore, we introduced an enhanced multi-level decomposition method named dual-decomposed MDLatLRR (D2-LRR) which effectively analyzes and utilizes all image features extracted through LatLRR. Specifically, color images are converted into YUV color space and grayscale images, and the Y-channel and grayscale images are input into the trained parameters of LatLRR to obtain the detailed parts containing four rounds of decomposition and the basic parts. Subsequently, the basic parts are fused using an average strategy, while the detail part is fused using kernel norm operation. The fused image is ultimately transformed back into an RGB image, resulting in the final fusion output. We apply D2-LRR to medical image fusion tasks. The detailed parts are fused employing a nuclear-norm operation, while the basic parts are fused using an average strategy. Comparative analyses among existing methods showcase that our proposed approach attains cutting-edge fusion performance in both objective and subjective assessments.

Updated: 2024-07-07 11:39:46

标题: D2-LRR：一种用于医学图像融合的双分解MDLatLRR方法

摘要: 在图像融合任务中，一种理想的图像分解方法可以带来更好的性能。MDLatLRR在这方面做得很好，但仍然存在一些改进空间。考虑到MDLatLRR仅关注通过潜在低秩表示（LatLRR）从输入图像提取出的详细部分（显著特征），而由LatLRR提取的基本部分（主要特征）未被充分利用。因此，我们引入了一种增强的多级分解方法，名为双分解MDLatLRR（D2-LRR），有效分析并利用通过LatLRR提取的所有图像特征。具体地，彩色图像被转换成YUV色彩空间和灰度图像，Y通道和灰度图像被输入到LatLRR的训练参数中，以获得包含四轮分解和基本部分的详细部分。随后，基本部分使用平均策略融合，而详细部分使用核范数操作进行融合。融合图像最终被转换回RGB图像，得出最终的融合输出。我们将D2-LRR应用于医学图像融合任务中。详细部分采用核范数操作进行融合，而基本部分则使用平均策略进行融合。现有方法之间的比较分析显示，我们提出的方法在客观和主观评估中实现了领先的融合性能。

更新时间: 2024-07-07 11:39:46

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2206.15179v4

Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data

In chemical engineering, process data are expensive to acquire, and complex phenomena are difficult to fully model. We explore the use of physics-informed neural networks (PINNs) for dynamic processes with incomplete mechanistic semi-explicit differential-algebraic equation systems and scarce process data. In particular, we focus on estimating states for which neither direct observational data nor constitutive equations are available. We propose an easy-to-apply heuristic to assess whether estimation of such states may be possible. As numerical examples, we consider a continuously stirred tank reactor and a liquid-liquid separator. We find that PINNs can infer unmeasured states with reasonable accuracy, and they generalize better in low-data scenarios than purely data-driven models. We thus show that PINNs are capable of modeling processes when relatively few experimental data and only partially known mechanistic descriptions are available, and conclude that they constitute a promising avenue that warrants further investigation.

Updated: 2024-07-07 11:30:50

标题: 物理启发的神经网络在具有有限物理知识和数据的动态过程操作中的应用

摘要: 在化学工程中，过程数据的获取成本很高，复杂现象很难完全建模。我们探讨了在动态过程中使用物理信息神经网络（PINNs），这些过程具有不完全的机械半显式微分代数方程系统和稀缺的过程数据。特别是，我们专注于估算既没有直接观测数据也没有构成方程的状态。我们提出了一种易于应用的启发式方法来评估是否可能估算这些状态。作为数值示例，我们考虑了一个连续搅拌罐反应器和一个液液分离器。我们发现PINNs可以以合理的准确性推断未测量的状态，并且在低数据情况下比纯数据驱动模型更具泛化能力。因此，我们展示了当仅有相对较少的实验数据和部分已知的机械描述时，PINNs能够对过程进行建模，并得出结论认为它们构成了一个值得进一步研究的有前途的途径。

更新时间: 2024-07-07 11:30:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.01528v2

Generating multi-scale NMC particles with radial grain architectures using spatial stochastics and GANs

Understanding structure-property relationships of Li-ion battery cathodes is crucial for optimizing rate-performance and cycle-life resilience. However, correlating the morphology of cathode particles, such as in NMC811, and their inner grain architecture with electrode performance is challenging, particularly, due to the significant length-scale difference between grain and particle sizes. Experimentally, it is currently not feasible to image such a high number of particles with full granular detail to achieve representivity. A second challenge is that sufficiently high-resolution 3D imaging techniques remain expensive and are sparsely available at research institutions. To address these challenges, a stereological generative adversarial network (GAN)-based model fitting approach is presented that can generate representative 3D information from 2D data, enabling characterization of materials in 3D using cost-effective 2D data. Once calibrated, this multi-scale model is able to rapidly generate virtual cathode particles that are statistically similar to experimental data, and thus is suitable for virtual characterization and materials testing through numerical simulations. A large dataset of simulated particles with inner grain architecture has been made publicly available.

Updated: 2024-07-07 11:23:17

标题: 利用空间随机性和生成对抗网络生成具有径向晶粒结构的多尺度NMC颗粒

摘要: 理解锂离子电池正极的结构-性能关系对于优化速率性能和循环寿命至关重要。然而，将正极颗粒（例如NMC811）的形态和内部晶粒结构与电极性能相关联是具有挑战性的，特别是由于晶粒和颗粒尺寸之间存在显著的长度尺度差异。目前在实验上，要想以全颗粒细节成像这么多颗粒以实现代表性是不可行的。第二个挑战是，足够高分辨率的3D成像技术仍然昂贵，并且在研究机构中很少见。为了解决这些挑战，提出了一种基于立体学对抗生成网络（GAN）的模型拟合方法，可以从2D数据生成代表性的3D信息，从而使用成本效益的2D数据对材料进行3D表征。一旦校准，这个多尺度模型能够快速生成与实验数据统计相似的虚拟正极颗粒，因此适用于通过数值模拟进行虚拟表征和材料测试。已经公开了一个包含内部晶粒结构的大量模拟颗粒数据集。

更新时间: 2024-07-07 11:23:17

领域: physics.app-ph,cs.AI

下载: http://arxiv.org/abs/2407.05333v1

A Survey on 3D Gaussian Splatting

3D Gaussian splatting (GS) has recently emerged as a transformative technique in the realm of explicit radiance field and computer graphics. This innovative approach, characterized by the utilization of millions of learnable 3D Gaussians, represents a significant departure from mainstream neural radiance field approaches, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representation and differentiable rendering algorithm, not only promises real-time rendering capability but also introduces unprecedented levels of editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the emergence of 3D GS, laying the groundwork for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By enabling unprecedented rendering speed, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation.

Updated: 2024-07-07 11:18:33

标题: 对3D高斯喷射的调查

摘要: 3D高斯喷洒（GS）最近在明确辐射场和计算机图形领域中被提出，成为一项具有变革性的技术。这种创新方法利用了数百万可学习的3D高斯函数，与主流神经辐射场方法明显不同，后者主要使用隐式的、基于坐标的模型将空间坐标映射到像素值。3D GS具有明确的场景表示和可微渲染算法，不仅承诺实时渲染能力，还引入了前所未有的可编辑性水平。这使得3D GS成为下一代3D重建和表示的潜在游戏变革者。在本文中，我们首次系统概述了最近在3D GS领域的发展和关键贡献。我们首先详细探讨了3D GS出现背后的基本原理和推动力，为理解其重要性奠定了基础。我们讨论的焦点是3D GS的实际适用性。通过实现前所未有的渲染速度，3D GS打开了从虚拟现实到互动媒体及更广泛应用的大量可能性。这还伴随着对领先的3D GS模型的比较分析，通过在各种基准任务上评估它们的性能和实际效用来突出它们的表现。调查最后确定了当前的挑战，并提出了未来研究在这一领域的潜在途径。通过这项调查，我们旨在为新手和经验丰富的研究人员提供宝贵资源，促进适用和明确辐射场表示的进一步探索和进步。

更新时间: 2024-07-07 11:18:33

领域: cs.CV,cs.AI,cs.GR,cs.MM

下载: http://arxiv.org/abs/2401.03890v3

Fast Proxy Experiment Design for Causal Effect Identification

Identifying causal effects is a key problem of interest across many disciplines. The two long-standing approaches to estimate causal effects are observational and experimental (randomized) studies. Observational studies can suffer from unmeasured confounding, which may render the causal effects unidentifiable. On the other hand, direct experiments on the target variable may be too costly or even infeasible to conduct. A middle ground between these two approaches is to estimate the causal effect of interest through proxy experiments, which are conducted on variables with a lower cost to intervene on compared to the main target. Akbari et al. [2022] studied this setting and demonstrated that the problem of designing the optimal (minimum-cost) experiment for causal effect identification is NP-complete and provided a naive algorithm that may require solving exponentially many NP-hard problems as a sub-routine in the worst case. In this work, we provide a few reformulations of the problem that allow for designing significantly more efficient algorithms to solve it as witnessed by our extensive simulations. Additionally, we study the closely-related problem of designing experiments that enable us to identify a given effect through valid adjustments sets.

Updated: 2024-07-07 11:09:38

标题: 快速代理实验设计用于因果效应识别

摘要: 确定因果效应是许多学科都感兴趣的一个关键问题。估计因果效应的两种长期方法是观察性研究和实验（随机化）研究。观察性研究可能受到未测量混杂的影响，这可能使因果效应无法辨认。另一方面，对目标变量进行直接实验可能成本过高，甚至不可行。在这两种方法之间的一个中间地带是通过代理实验来估计感兴趣的因果效应，这些实验是在比主要目标更低成本干预的变量上进行的。Akbari等人[2022]研究了这种设置，并证明了为了识别因果效应而设计最优（最小成本）实验的问题是NP完全的，并提供了一个天真的算法，可能需要在最坏的情况下解决指数级别的NP难问题作为子例程。在这项工作中，我们提供了一些问题的重述，允许设计明显更高效的算法来解决它，正如我们广泛的模拟所证实的那样。此外，我们还研究了设计实验的相关问题，使我们能够通过有效的调整集识别给定效应。

更新时间: 2024-07-07 11:09:38

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2407.05330v1

Res2NetFuse: A Novel Res2Net-based Fusion Method for Infrared and Visible Images

The fusion of visible light and infrared images has garnered significant attention in the field of imaging due to its pivotal role in various applications, including surveillance, remote sensing, and medical imaging. Therefore, this paper introduces a novel fusion framework using Res2Net architecture, capturing features across diverse receptive fields and scales for effective extraction of global and local features. Our methodology is structured into three fundamental components: the first part involves the Res2Net-based encoder, followed by the second part, which encompasses the fusion layer, and finally, the third part, which comprises the decoder. The encoder based on Res2Net is utilized for extracting multi-scale features from the input image. Simultaneously, with a single image as input, we introduce a pioneering training strategy tailored for a Res2Net-based encoder. We further enhance the fusion process with a novel strategy based on the attention model, ensuring precise reconstruction by the decoder for the fused image. Experimental results unequivocally showcase our method's unparalleled fusion performance, surpassing existing techniques, as evidenced by rigorous subjective and objective evaluations.

Updated: 2024-07-07 10:54:11

标题: Res2NetFuse：一种新颖的基于Res2Net的红外和可见光图像融合方法

摘要: 可见光和红外图像的融合在成像领域引起了重要关注，因为在监视、遥感和医学成像等各种应用中发挥了关键作用。因此，本文介绍了一种使用Res2Net架构的新型融合框架，该框架可以捕获各种感受野和尺度上的特征，实现全局和局部特征的有效提取。我们的方法分为三个基本组件：第一部分涉及基于Res2Net的编码器，接着是包含融合层的第二部分，最后是包含解码器的第三部分。基于Res2Net的编码器用于从输入图像中提取多尺度特征。同时，我们引入了一种针对Res2Net编码器的创新训练策略，通过单一图像作为输入。我们进一步通过基于注意力模型的新策略增强了融合过程，确保解码器对融合图像进行精确重建。实验结果明确展示了我们方法卓越的融合性能，超越了现有技术，经过严格的主观和客观评估予以证明。

更新时间: 2024-07-07 10:54:11

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2112.14540v4

KAE: A Property-based Method for Knowledge Graph Alignment and Extension

A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.

Updated: 2024-07-07 10:17:03

标题: KAE：一种基于属性的知识图对齐和扩展方法

摘要: 解决语义异构性问题的常见方法是利用一个或多个候选知识图（KG）中编码的信息来进行知识图扩展，其中参考KG和候选KG之间的对齐被视为关键步骤。然而，现有的KG对齐方法主要依赖于实体类型（etype）标签匹配作为先决条件，在实践中表现不佳或在某些情况下不适用。在本文中，我们设计了一个基于机器学习的KG扩展框架，包括一种新颖的基于属性的对齐方法，允许根据用于定义它们的属性对etype进行对齐。主要的直觉是属性有意定义etype，并且这一定义独立于用于命名etype的特定标签，以及KG的特定层次结构。与最先进技术相比，实验结果显示了KG对齐方法的有效性以及所提出的KG扩展框架的优越性，无论是在定量还是定性上。

更新时间: 2024-07-07 10:17:03

领域: cs.AI

下载: http://arxiv.org/abs/2407.05320v1

Vulnerability-Hunter: An Adaptive Feature Perception Attention Network for Smart Contract Vulnerabilities

Smart Contract Vulnerability Detection (SCVD) is crucial to guarantee the quality of blockchain-based systems. Graph neural networks have been shown to be effective in learning semantic representations of smart contract code and are commonly adopted by existing deep learning-based SCVD. However, the current methods still have limitations in their utilization of graph sampling or subgraph pooling based on predefined rules for extracting crucial components from structure graphs of smart contract code. These predefined rule-based strategies, typically designed using static rules or heuristics, demonstrate limited adaptability to dynamically adjust extraction strategies according to the structure and content of the graph in heterogeneous topologies of smart contract code. Consequently, these strategies may not possess universal applicability to all smart contracts, potentially leading to false positives or omissions. To address these problems, we propose AFPNet, a novel vulnerability detection model equipped with a feature perception module that has dynamic weights for comprehensive scanning of the entire smart contract code and automatic extraction of crucial code snippets (the $P$ snippets with the largest weights). Subsequently, the relationship perception attention module employs an attention mechanism to learn dependencies among these code snippets and detect smart contract vulnerabilities. The efforts made by AFPNet consistently enable the capture of crucial code snippets and enhance the performance of SCVD optimization. We conduct an evaluation of AFPNet in the several large-scale datasets with vulnerability labels. The experimental results show that our AFPNet significantly outperforms the state-of-the-art approach by 6.38\%-14.02\% in term of F1-score. The results demonstrate the effectiveness of AFPNet in dynamically extracting valuable information and vulnerability detection.

Updated: 2024-07-07 10:13:41

标题: 《Vulnerability-Hunter: 一种适应性特征感知注意力网络，用于智能合约漏洞》

摘要: 智能合约漏洞检测（SCVD）对于保证基于区块链的系统的质量至关重要。图神经网络已被证明在学习智能合约代码的语义表示方面非常有效，并被现有基于深度学习的SCVD普遍采用。然而，当前的方法在利用图采样或基于预定义规则的子图池化方面仍存在局限，用于从智能合约代码的结构图中提取关键组件。这些基于预定义规则的策略通常使用静态规则或启发式设计，表现出对智能合约代码异构拓扑中的图的结构和内容动态调整提取策略的有限适应性。因此，这些策略可能无法普遍适用于所有智能合约，可能导致误报或遗漏。为解决这些问题，我们提出了AFPNet，一个新颖的漏洞检测模型，配备有一个具有动态权重的特征感知模块，用于全面扫描整个智能合约代码并自动提取关键代码片段（具有最大权重的P个片段）。随后，关系感知注意力模块采用注意力机制学习这些代码片段之间的依赖关系，并检测智能合约漏洞。AFPNet所做的努力一直能捕捉关键代码片段并提升SCVD优化的性能。我们在几个大规模数据集中进行了对AFPNet的评估，这些数据集具有漏洞标签。实验结果显示，我们的AFPNet在F1分数方面显著优于最先进的方法，提高了6.38\%-14.02%。结果表明AFPNet在动态提取有价值信息和漏洞检测方面的有效性。

更新时间: 2024-07-07 10:13:41

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.05318v1

Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. The distilled student model utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.

Updated: 2024-07-07 10:08:34

标题: 拓扑持续引导的知识蒸馏技术用于可穿戴传感器数据

摘要: 深度学习方法在将可穿戴传感器数据转化为可操作健康见解的各种应用中取得了很大成功。一个常见的应用领域是活动识别，在这个领域，深度学习方法仍然存在一些限制，比如对信号质量、传感器特征变化和受试者之间的变化性敏感性。为了缓解这些问题，通过拓扑数据分析（TDA）获取的稳健特征被提出作为潜在解决方案。然而，在深度学习中使用拓扑特征存在两个重要障碍：(1) 使用TDA提取拓扑特征的计算负载较大，以及 (2) 深度学习和TDA得到的不同信号表示使得融合困难。在本文中，为了在时间序列数据的深度学习中整合拓扑方法的优势，我们建议使用两个教师网络，一个在原始时间序列数据上进行训练，另一个在TDA方法生成的持续图像上进行训练。精炼的学生模型在测试时仅利用原始时间序列数据。这种方法解决了两个问题。使用多个教师进行知识蒸馏利用了互补信息，使得模型更加紧凑，具有强大的监督特征和集成更丰富的表示。为了吸收不同模态的理想信息，我们设计了新的约束，包括对特征相关图的正交性约束，以提高特征表现力并让学生轻松从教师那里学习。此外，我们在知识蒸馏中应用了一种退火策略，以实现快速饱和和更好地适应不同特征，同时减少了教师和学生之间的知识差距。最终，我们蒸馏出一个稳健的学生模型，仅使用时间序列数据作为输入，同时隐式保留了拓扑特征。

更新时间: 2024-07-07 10:08:34

领域: eess.SP,cs.LG,math.AT

下载: http://arxiv.org/abs/2407.05315v1

SMART: Submodular Data Mixture Strategy for Instruction Tuning

Instruction Tuning involves finetuning a language model on a collection of instruction-formatted datasets in order to enhance the generalizability of the model to unseen tasks. Studies have shown the importance of balancing different task proportions during finetuning, but finding the right balance remains challenging. Unfortunately, there's currently no systematic method beyond manual tuning or relying on practitioners' intuition. In this paper, we introduce SMART (Submodular data Mixture strAtegy for instRuction Tuning) - a novel data mixture strategy which makes use of a submodular function to assign importance scores to tasks which are then used to determine the mixture weights. Given a fine-tuning budget, SMART redistributes the budget among tasks and selects non-redundant samples from each task. Experimental results demonstrate that SMART significantly outperforms traditional methods such as examples proportional mixing and equal mixing. Furthermore, SMART facilitates the creation of data mixtures based on a few representative subsets of tasks alone and through task pruning analysis, we reveal that in a limited budget setting, allocating budget among a subset of representative tasks yields superior performance compared to distributing the budget among all tasks. The code for reproducing our results is open-sourced at https://github.com/kowndinya-renduchintala/SMART.

Updated: 2024-07-07 09:58:08

标题: 智能：用于指令调整的子模数据混合策略

摘要: 指令调优涉及在一组指令格式的数据集上对语言模型进行微调，以增强模型对未知任务的泛化能力。研究表明，在微调过程中平衡不同任务比例的重要性，但找到正确的平衡仍然具有挑战性。不幸的是，目前除了手动调整或依赖从业者直觉外，没有系统的方法。在本文中，我们介绍了SMART（Submodular data Mixture strAtegy for instRuction Tuning）-一种利用子模块函数为任务分配重要性分数，然后用于确定混合权重的新型数据混合策略。在给定微调预算的情况下，SMART重新分配任务之间的预算，并从每个任务中选择非冗余样本。实验结果表明，SMART明显优于传统方法，如例子比例混合和均匀混合。此外，SMART便于仅基于少数代表性子任务创建数据混合，通过任务修剪分析，我们揭示在有限预算设置中，将预算分配给一组代表性任务比分配给所有任务获得更好的性能。重现我们结果的代码已在https://github.com/kowndinya-renduchintala/SMART上开源。

更新时间: 2024-07-07 09:58:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.08370v2

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-theoretic perspective. Specifically, we formulate the problem as a two-player game and propose a novel algorithm, iterative Nash policy optimization (INPO). The key idea is to let the policy play against itself via no-regret learning, thereby approximating the Nash policy. Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses, which typically incurs high computational or annotation costs. Instead, we introduce a new loss objective that is directly minimized over a preference dataset. We provide theoretical analysis for our approach and demonstrate its effectiveness through experiments on various representative benchmarks. With an LLaMA-3-8B-based SFT model, INPO achieves a 41.5% length-controlled win rate on AlpacaEval 2.0 and a 38.3% win rate on Arena-Hard, showing substantial improvement over the state-of-the-art iterative algorithm [Dong et al., 2024] under the BT model assumption. Additionally, our ablation study highlights the benefits of incorporating KL regularization for response length control.

Updated: 2024-07-07 09:51:26

标题: 迭代纳什政策优化：通过无悔学习将LLMs与一般偏好对齐

摘要: 人类反馈的强化学习（RLHF）在将大型语言模型（LLMs）与人类偏好对齐方面取得了巨大成功。流行的RLHF方法是基于奖励的，遵循布拉德利-特里（BT）模型假设，这可能无法完全捕捉人类偏好的复杂性。在本文中，我们在一个通用偏好框架下探讨RLHF，并从博弈论的角度来解决。具体地，我们将问题建模为一个双人博弈，并提出了一种新颖的算法，迭代纳什策略优化（INPO）。关键思想是通过无悔学习让策略自己对抗自己，从而逼近纳什策略。与先前的方法不同，INPO避免了对个体响应的预期胜率进行估计，通常会产生高计算或标注成本。相反，我们引入了一个直接在偏好数据集上最小化的新损失目标。我们对我们的方法进行了理论分析，并通过在各种代表性基准测试上的实验证明了其有效性。基于LLaMA-3-8B的SFT模型，INPO在AlpacaEval 2.0上实现了41.5%的长度控制胜率，在Arena-Hard上实现了38.3%的胜率，显示出对BT模型假设下最先进的迭代算法[Dong等人，2024年]的显著改进。此外，我们的消融研究突出了将KL正则化用于响应长度控制的好处。

更新时间: 2024-07-07 09:51:26

领域: cs.LG,cs.AI,cs.CL,cs.GT

下载: http://arxiv.org/abs/2407.00617v2

SPO: Sequential Monte Carlo Policy Optimisation

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.

Updated: 2024-07-07 09:48:13

标题: SPO：顺序蒙特卡洛策略优化

摘要: 将规划与学习和决策相结合对于智能代理的长期发展至关重要。最近的研究成功地将基于树的搜索方法和自我对抗学习机制结合起来。然而，由于搜索的顺序性质，这些方法通常面临扩展挑战。虽然实际工程解决方案可以在一定程度上克服这一问题，但往往会对性能产生负面影响。在本文中，我们介绍了SPO：顺序蒙特卡罗策略优化，这是一个基于期望最大化（EM）框架的基于模型的强化学习算法。我们展示了SPO提供了强大的策略改进和高效的扩展性质。基于样本的搜索使其直接适用于离散和连续动作空间，无需修改。我们证明了相对于无模型和基于模型的基线，在连续和离散环境中SPO表现出统计显著的性能改进。此外，SPO搜索的并行性质使其能够有效利用硬件加速器，产生有利的扩展规律。

更新时间: 2024-07-07 09:48:13

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.07963v2

Predicting Word Similarity in Context with Referential Translation Machines

We identify the similarity between two words in English by casting the task as machine translation performance prediction (MTPP) between the words given the context and the distance between their similarities. We use referential translation machines (RTMs), which allows a common representation for training and test sets and stacked machine learning models. RTMs can achieve the top results in Graded Word Similarity in Context (GWSC) task.

Updated: 2024-07-07 09:36:41

标题: 使用指代翻译机预测上下文中的词语相似度

摘要: 我们通过将两个英文单词之间的相似性识别任务转化为给定上下文和相似性距离的机器翻译性能预测（MTPP）来确定它们之间的相似性。我们使用引用翻译机器（RTMs），它允许训练集和测试集具有共同的表示，并堆叠机器学习模型。RTMs在上下文中评分的单词相似性任务中可以取得最佳结果。

更新时间: 2024-07-07 09:36:41

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2407.06230v1

Ternary Spike-based Neuromorphic Signal Processing System

Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural networks (SNNs) and quantization technologies to develop an energy-efficient and lightweight neuromorphic signal processing system. Our system is characterized by two principal innovations: a threshold-adaptive encoding (TAE) method and a quantized ternary SNN (QT-SNN). The TAE method can efficiently encode time-varying analog signals into sparse ternary spike trains, thereby reducing energy and memory demands for signal processing. QT-SNN, compatible with ternary spike trains from the TAE method, quantifies both membrane potentials and synaptic weights to reduce memory requirements while maintaining performance. Extensive experiments are conducted on two typical signal-processing tasks: speech and electroencephalogram recognition. The results demonstrate that our neuromorphic signal processing system achieves state-of-the-art (SOTA) performance with a 94% reduced memory requirement. Furthermore, through theoretical energy consumption analysis, our system shows 7.5x energy saving compared to other SNN works. The efficiency and efficacy of the proposed system highlight its potential as a promising avenue for energy-efficient signal processing.

Updated: 2024-07-07 09:32:19

标题: 三元脉冲型神经形态信号处理系统

摘要: 深度神经网络（DNNs）已成功应用于各种信号处理领域，显著提升了性能。然而，DNNs通常需要大量计算资源，导致显著的经济成本，并对在资源受限的边缘设备上部署提出挑战。本研究利用脉冲神经网络（SNNs）和量化技术开发了一种节能轻量级的神经形态信号处理系统。我们的系统具有两个主要创新：一种自适应阈值编码（TAE）方法和一种量化三值SNN（QT-SNN）。TAE方法能有效地将时变模拟信号编码为稀疏的三值脉冲序列，从而减少信号处理的能量和内存需求。QT-SNN与TAE方法生成的三值脉冲序列兼容，量化膜电位和突触权重，以减少内存需求同时保持性能。我们在两个典型的信号处理任务上进行了大量实验：语音和脑电图识别。结果表明，我们的神经形态信号处理系统实现了最先进的性能，内存需求减少了94%。此外，通过理论能耗分析，我们的系统与其他SNN工作相比节能7.5倍。所提出系统的效率和功效突显了其作为节能信号处理的有前途途径的潜力。

更新时间: 2024-07-07 09:32:19

领域: eess.SP,cs.AI,cs.NE,cs.SD

下载: http://arxiv.org/abs/2407.05310v1

A Flexible, Equivariant Framework for Subgraph GNNs via Graph Products and Graph Coarsening

Subgraph Graph Neural Networks (Subgraph GNNs) enhance the expressivity of message-passing GNNs by representing graphs as sets of subgraphs. They have shown impressive performance on several tasks, but their complexity limits applications to larger graphs. Previous approaches suggested processing only subsets of subgraphs, selected either randomly or via learnable sampling. However, they make suboptimal subgraph selections or can only cope with very small subset sizes, inevitably incurring performance degradation. This paper introduces a new Subgraph GNNs framework to address these issues. We employ a graph coarsening function to cluster nodes into super-nodes with induced connectivity. The product between the coarsened and the original graph reveals an implicit structure whereby subgraphs are associated with specific sets of nodes. By running generalized message-passing on such graph product, our method effectively implements an efficient, yet powerful Subgraph GNN. Controlling the coarsening function enables meaningful selection of any number of subgraphs while, contrary to previous methods, being fully compatible with standard training techniques. Notably, we discover that the resulting node feature tensor exhibits new, unexplored permutation symmetries. We leverage this structure, characterize the associated linear equivariant layers and incorporate them into the layers of our Subgraph GNN architecture. Extensive experiments on multiple graph learning benchmarks demonstrate that our method is significantly more flexible than previous approaches, as it can seamlessly handle any number of subgraphs, while consistently outperforming baseline approaches.

Updated: 2024-07-07 09:32:11

标题: 一种灵活的、等变的子图图神经网络框架：通过图乘积和图粗化

摘要: 子图图神经网络（Subgraph GNNs）通过将图表示为子图集合，增强了消息传递GNN的表达能力。它们在多个任务上表现出色，但其复杂性限制了对较大图的应用。先前的方法建议仅处理子图的子集，选择方式为随机选择或通过可学习的抽样选择。然而，它们会进行次优的子图选择，或者只能处理非常小的子集大小，不可避免地导致性能下降。本文介绍了一种新的子图GNN框架来解决这些问题。我们使用图粗化函数将节点聚类成具有诱导连接性的超节点。粗化图和原始图的乘积揭示了一种隐含结构，通过该结构，子图与特定节点集相关联。通过在这种图乘积上运行广义消息传递，我们的方法有效地实现了一种高效而强大的子图GNN。控制粗化函数可以有意义地选择任意数量的子图，同时与标准训练技术完全兼容，与先前的方法相反。值得注意的是，我们发现由此产生的节点特征张量展现出新的、未探索的排列对称性。我们利用这种结构，对相关的线性等变层进行表征，并将其纳入我们的子图GNN架构的层中。在多个图学习基准上的广泛实验表明，我们的方法比先前的方法更加灵活，因为它可以无缝处理任意数量的子图，同时始终优于基线方法。

更新时间: 2024-07-07 09:32:11

领域: cs.LG

下载: http://arxiv.org/abs/2406.09291v2

MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this paper, we hence introduce MINDECHO, a comprehensive framework for the development and evaluation of KOL RPLAs. MINDECHO collects KOL data from Internet video transcripts in various professional fields, and synthesizes their conversations leveraging GPT-4. Then, the conversations and the transcripts are used for individualized model training and inference-time retrieval, respectively. Our evaluation covers both general dimensions (\ie, knowledge and tones) and fan-centric dimensions for KOLs. Extensive experiments validate the effectiveness of MINDECHO in developing and evaluating KOL RPLAs.

Updated: 2024-07-07 09:08:33

标题: MINDECHO：关键意见领袖的角色扮演语言代理

摘要: 大型语言模型（LLMs）在各种应用中展现出令人印象深刻的性能，其中扮演角色的语言代理人（RPLAs）已经吸引了广泛的用户群体。现在，对代表关键意见领袖（KOLs）的RPLAs的需求正在增长，即塑造其领域趋势和观点的互联网名人。然而，这一领域的研究仍未得到充分探索。因此，在本文中，我们介绍了MINDECHO，这是一个用于发展和评估KOL RPLAs的全面框架。MINDECHO从各种专业领域的互联网视频转录中收集KOL数据，并利用GPT-4综合他们的对话。然后，这些对话和转录分别用于个性化模型训练和推断时间检索。我们的评估涵盖了KOL的一般维度（即，知识和语气）和粉丝中心维度。广泛的实验验证了MINDECHO在开发和评估KOL RPLAs方面的有效性。

更新时间: 2024-07-07 09:08:33

领域: cs.AI

下载: http://arxiv.org/abs/2407.05305v1

Mamba Hawkes Process

Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively. While recent neural network models, including RNNs and Transformers, address some of these issues, they still face challenges with long-term dependencies and computational efficiency. In this paper, we introduce the Mamba Hawkes Process (MHP), which leverages the Mamba state space architecture to capture long-range dependencies and dynamic event interactions. Our results show that MHP outperforms existing models across various datasets. Additionally, we propose the Mamba Hawkes Process Extension (MHP-E), which combines Mamba and Transformer models to enhance predictive capabilities. We present the novel application of the Mamba architecture to Hawkes processes, a flexible and extensible model structure, and a theoretical analysis of the synergy between state space models and Hawkes processes. Experimental results demonstrate the superior performance of both MHP and MHP-E, advancing the field of temporal point process modeling.

Updated: 2024-07-07 08:37:43

标题: 蟒鹰过程

摘要: 不规则和异步事件序列在许多领域中普遍存在，如社交媒体、金融和医疗保健。传统的时间点过程（TPPs），如Hawkes过程，通常难以有效地建模相互抑制和非线性。虽然最近的神经网络模型，包括RNN和Transformer，解决了一些这些问题，但它们仍然面临长期依赖性和计算效率的挑战。在本文中，我们介绍了Mamba Hawkes过程（MHP），利用Mamba状态空间架构捕捉长期依赖性和动态事件相互作用。我们的结果显示，MHP在各种数据集上优于现有模型。此外，我们提出了Mamba Hawkes过程扩展（MHP-E），结合了Mamba和Transformer模型以增强预测能力。我们将Mamba架构新颖地应用于Hawkes过程，这是一种灵活且可扩展的模型结构，并对状态空间模型和Hawkes过程之间的协同作用进行了理论分析。实验结果证明了MHP和MHP-E的卓越性能，推动了时间点过程建模领域的发展。

更新时间: 2024-07-07 08:37:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05302v1

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recent LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models to serve as useful assistants in the workplace. In addition to the benchmark, we provide a mechanism to effortlessly generate thousands of ground-truth observation/action traces, which can be used for fine-tuning existing models. Overall, we expect this work to serve as a useful resource to help the community progress toward capable autonomous agents. The benchmark can be found at https://github.com/ServiceNow/WorkArena/tree/workarena-plus-plus.

Updated: 2024-07-07 07:15:49

标题: WorkArena++：向构成规划和基于推理的共同知识工作任务迈进

摘要: 大型语言模型（LLMs）模仿人类智能的能力导致基于LLMs的自主代理大幅增加。尽管最近的LLMs似乎能够根据用户指令进行规划和推理，但它们在将这些能力应用于自主任务解决方面的有效性尚未得到充分探讨。这在企业环境中尤为明显，自动代理有望产生高影响。为填补这一空白，我们提出了WorkArena++，这是一个由682个任务组成的新型基准，对应于知识工作者经常执行的现实工作流程。WorkArena++旨在评估Web代理的规划、问题解决、逻辑/算术推理、检索和上下文理解能力。我们对最先进的LLMs和视觉-语言模型（VLMs）以及人类工作者进行的实证研究揭示了这些模型在成为工作场所有用助手方面面临的几个挑战。除了基准测试，我们还提供了一种轻松生成成千上万个地面真实观察/行动跟踪的机制，这可以用于微调现有模型。总的来说，我们期望这项工作能作为一个有用的资源，帮助社区向着能力强大的自主代理前进。基准测试可在https://github.com/ServiceNow/WorkArena/tree/workarena-plus-plus找到。

更新时间: 2024-07-07 07:15:49

领域: cs.AI

下载: http://arxiv.org/abs/2407.05291v1

Lack of Systematic Approach to Security of IoT Context Sharing Platforms

IoT context-sharing platforms are an essential component of today's interconnected IoT deployments with their security affecting the entire deployment and the critical infrastructure adopting IoT. We report on a lack of systematic approach to the security of IoT context-sharing platforms and propose the need for a methodological and systematic alternative to evaluate the existing solutions and develop `secure-by-design' solutions. We have identified the key components of a generic IoT context-sharing platform and propose using MITRE ATT&CK for threat modelling of such platforms.

Updated: 2024-07-07 07:11:15

标题: 缺乏对物联网环境共享平台安全的系统化方法

摘要: 物联网（IoT）上下文共享平台是当今互连的IoT部署的重要组成部分，其安全性影响整个部署以及采用IoT的关键基础设施。我们报告了对IoT上下文共享平台安全性缺乏系统化方法，并提出了需要一种方法论和系统性替代方案来评估现有解决方案并开发“安全设计”的解决方案。我们已经确定了通用IoT上下文共享平台的关键组件，并建议使用MITRE ATT&CK对这些平台进行威胁建模。

更新时间: 2024-07-07 07:11:15

领域: cs.CR

下载: http://arxiv.org/abs/2407.05290v1

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners -- so-called meta-learners -- are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

Updated: 2024-07-07 07:07:48

标题: 不受模型限制的元学习器用于估计随时间变化的异质性治疗效应

摘要: 估计随时间变化的异质性治疗效果（HTEs）在许多学科中至关重要，如个性化医学。例如，电子健康记录通常在多个时间段内收集，然后用于个性化治疗决策。目前针对此任务的现有工作主要集中在基于模型的学习者（即适应特定机器学习模型的学习者）。相比之下，模型无关的学习者 -- 即所谓的元学习者 -- 很大程度上尚未被探索。在我们的论文中，我们提出了几种模型无关的元学习者，因此可以与任意机器学习模型（例如变压器）结合使用来估计随时间变化的HTEs。在这里，我们的重点是通过加权伪结果回归获得的学习者，这允许通过直接针对治疗效果进行估计以实现高效估计。然后，我们提供了一项全面的理论分析，对不同的学习者进行了表征，从而使我们能够提供关于何时特定学习者更可取的见解。最后，我们通过数值实验验证了我们的理论见解。总而言之，虽然元学习者已经是静态设置的最新技术，但我们是首次提出了一套全面的元学习者用于估计时变设置中的HTEs。

更新时间: 2024-07-07 07:07:48

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05287v1

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

Updated: 2024-07-07 07:07:04

标题: 稳定性和泛化性：用于（强烈）凸一到$K$级随机优化的随机递归动量算法

摘要: 基于STOchastic Recursive Momentum (STORM)的算法已被广泛发展用于解决一到$K$级($K \geq 3$)随机优化问题。具体来说，它们使用估计器来减轻偏差梯度问题并实现接近最优的收敛结果。然而，对于它们的泛化性能，尤其是在从一级到$K$级优化环境中的过渡阶段，相关研究相对较少。本文通过基于算法稳定性的方法，提供了对三种代表性基于STORM的算法：STORM、COVER和SVMR，在凸和强凸设置下进行一级、二级和$K$级随机优化的全面泛化分析。首先，我们定义了$K$级优化的稳定性并将其与泛化联系起来。然后，我们详细介绍了三种著名STORM算法的稳定性结果。最后，我们通过平衡稳定性结果和优化误差来推导它们的超额风险界。我们的理论结果为完善STORM算法提供了有力证据：(1) 每个估计器可能由于其估计目标的方差而降低其稳定性。 (2) 每个额外级别可能会加剧泛化误差，受稳定性和其累积随机梯度与真实梯度之间的方差的影响。 (3) 增加用于估计器初始计算的批量大小提供了有利的权衡，增强了泛化性能。

更新时间: 2024-07-07 07:07:04

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.05286v1

Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack

Recent years have witnessed the vulnerability of Federated Learning (FL) against gradient leakage attacks, where the private training data can be recovered from the exchanged gradients, making gradient protection a critical issue for the FL training process. Existing solutions often resort to perturbation-based mechanisms, such as differential privacy, where each participating client injects a specific amount of noise into local gradients before aggregating to the server, and the global distribution variation finally conceals the gradient privacy. However, perturbation is not always the panacea for gradient protection since the robustness heavily relies on the injected noise. This intuition raises an interesting question: \textit{is it possible to deactivate existing protection mechanisms by removing the perturbation inside the gradients?} In this paper, we present the answer: \textit{yes} and propose the Perturbation-resilient Gradient Leakage Attack (PGLA), the first attempt to recover the perturbed gradients, without additional access to the original model structure or third-party data. Specifically, we leverage the inherent diffusion property of gradient perturbation protection and construct a novel diffusion-based denoising model to implement PGLA. Our insight is that capturing the disturbance level of perturbation during the diffusion reverse process can release the gradient denoising capability, which promotes the diffusion model to generate approximate gradients as the original clean version through adaptive sampling steps. Extensive experiments demonstrate that PGLA effectively recovers the protected gradients and exposes the FL training process to the threat of gradient leakage, achieving the best quality in gradient denoising and data recovery compared to existing models. We hope to arouse public attention on PGLA and its defense.

Updated: 2024-07-07 07:06:49

标题: Gradient扩散：一种抗干扰梯度泄露攻击

摘要: 近年来，人们越来越意识到联邦学习（FL）对梯度泄漏攻击的脆弱性，其中私人训练数据可以从交换的梯度中恢复，使梯度保护成为FL训练过程中的关键问题。现有的解决方案通常采用基于扰动的机制，如差分隐私，其中每个参与客户端在聚合到服务器之前向本地梯度注入特定数量的噪声，全局分布变化最终掩盖了梯度隐私。然而，扰动并不总是梯度保护的灵丹妙药，因为鲁棒性严重依赖于注入的噪声。这种直觉提出了一个有趣的问题：是否可能通过去除梯度内部的扰动来使现有的保护机制失效？在本文中，我们提出了答案：是的，并提出了抗扰动梯度泄漏攻击（PGLA），这是第一次尝试恢复被扰动的梯度，而无需额外访问原始模型结构或第三方数据。具体而言，我们利用梯度扰动保护的固有扩散性质，并构建了一个基于扩散的去噪模型来实现PGLA。我们的洞察是，在扩散反向过程中捕获扰动的干扰水平可以释放梯度去噪能力，这促使扩散模型通过自适应采样步骤生成与原始干净版本近似的梯度。大量实验证明，PGLA有效地恢复了受保护的梯度，并将FL训练过程暴露于梯度泄漏的威胁中，与现有模型相比，在梯度去噪和数据恢复方面取得了最佳质量。我们希望引起公众对PGLA及其防御的关注。

更新时间: 2024-07-07 07:06:49

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.05285v1

V-IRL: Grounding Virtual Intelligence in Real Life

There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created. To develop AI agents that can sense, think, and act as flexibly as humans in real-world settings, it is imperative to bridge the realism gap between the digital and physical worlds. How can we embody agents in an environment as rich and diverse as the one we inhabit, without the constraints imposed by real hardware and control? Towards this end, we introduce V-IRL: a platform that enables agents to scalably interact with the real world in a virtual yet realistic environment. Our platform serves as a playground for developing agents that can accomplish various practical tasks and as a vast testbed for measuring progress in capabilities spanning perception, decision-making, and interaction with real-world data across the entire globe.

Updated: 2024-07-07 07:00:42

标题: V-IRL: 将虚拟智能植根于现实生活中

摘要: 人类居住的地球与现代人工智能代理所创建的数字领域之间存在感知鸿沟。为了开发能够在现实世界环境中像人类一样灵活感知、思考和行动的人工智能代理，必须弥合数字和物理世界之间的现实差距。我们如何能够在一个像我们居住的环境中体现代理，而不受真实硬件和控制所施加的限制？为此，我们引入了V-IRL：一个平台，使代理能够在一个虚拟但逼真的环境中与现实世界进行可扩展交互。我们的平台既是一个开发代理能够完成各种实际任务的游乐场，也是一个广阔的测试基地，用于衡量在感知、决策和与全球范围内的真实数据进行互动等能力方面取得的进展。

更新时间: 2024-07-07 07:00:42

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.03310v2

Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory. We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method. TDIL is an IRL method designed to address reward sparsity by introducing a denser surrogate reward function that considers environmental dynamics. This surrogate reward function encourages the agent to navigate towards states that are proximal to expert states. In practice, TDIL trains a transition discriminator to differentiate between valid and non-valid transitions in a given environment to compute the surrogate rewards. The experiments demonstrate that TDIL outperforms existing IL approaches and achieves expert-level performance in the single-demonstration IL setting across five widely adopted MuJoCo benchmarks as well as the "Adroit Door" robotic environment.

Updated: 2024-07-07 06:51:03

标题: 专家接近作为单次演示模仿学习的替代奖励

摘要: 在本文中，我们专注于单次演示模仿学习（IL），这是一种实际的方法，适用于在现实世界应用中获取多个专家演示成本高昂或不可行，并且地面真实奖励函数不可用的情况。与具有多个演示的典型IL设置相比，单次演示IL涉及代理仅访问一个专家轨迹。我们强调在这种情况下稀疏奖励信号的问题，并提议通过我们提出的基于转换鉴别器的IL（TDIL）方法来缓解这个问题。TDIL是一种旨在通过引入考虑环境动态的更密集替代奖励函数来解决奖励稀疏性的IRL方法。这个替代奖励函数鼓励代理向接近专家状态的状态导航。在实践中，TDIL训练一个转换鉴别器来区分给定环境中的有效和非有效转换，以计算替代奖励。实验证明，TDIL优于现有的IL方法，并在五个广泛采用的MuJoCo基准测试以及“Adroit Door”机器人环境中实现了单次演示IL设置中的专家级性能。

更新时间: 2024-07-07 06:51:03

领域: cs.LG

下载: http://arxiv.org/abs/2402.01057v3

Mini-Giants: "Small" Language Models and Open Source Win-Win

ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.

Updated: 2024-07-07 06:42:31

标题: Mini-Giants: "Small" 语言模型和开源双赢

摘要: ChatGPT是非常出色的。然而，训练和改进这样庞大的模型成本过高。幸运的是，小型语言模型正在蓬勃发展，并变得越来越有竞争力。我们称它们为“迷你巨人”。我们认为，像Kaggle和迷你巨人这样的开源社区在技术、伦理和社会方面将实现多赢局面。在本文中，我们简要介绍了丰富的背景，讨论了如何获得小型语言模型，提出了小型语言模型的比较研究和评估方法的简要讨论，讨论了现实世界中最需要小型语言模型的应用场景，并以讨论和展望作为结论。

更新时间: 2024-07-07 06:42:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.08189v2

Ricci flow-guided autoencoders in learning time-dependent dynamics

We present a manifold-based autoencoder method for learning dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by simulating Ricci flow in a physics-informed setting, and manifold quantities can be matched so that Ricci flow is empirically achieved. With our method, the manifold is discerned through the training procedure, while the latent evolution due to Ricci flow induces a more accommodating representation over static methods. We present our method on a range of experiments consisting of PDE data that encompasses desirable characteristics such as periodicity and randomness. By incorporating latent dynamics, we sustain a manifold latent representation for all values in the ambient PDE time interval. Furthermore, the dynamical manifold latent space facilitates qualities such as learning for out-of-distribution data, and robustness. We showcase our method by demonstrating these features.

Updated: 2024-07-07 06:10:41

标题: 里奇流引导的自编码器在学习时间依赖动力学中的应用

摘要: 我们提出了一种基于流形的自编码器方法，用于学习时间动态，特别是偏微分方程（PDE），其中流形潜在空间根据里奇流演化。这可以通过在物理信息设置中模拟里奇流来实现，并且可以匹配流形量以实现里奇流。通过我们的方法，在训练过程中可以识别流形，而由于里奇流引起的潜在演化会比静态方法更加适应。我们在一系列实验中展示了我们的方法，这些实验涉及包括周期性和随机性在内的PDE数据的理想特性。通过融入潜在动态，我们为环境PDE时间间隔中的所有值维持了一个流形潜在表示。此外，动态流形潜在空间促进了诸如学习超出分布数据和鲁棒性等特性。我们通过展示这些特性来展示我们的方法。

更新时间: 2024-07-07 06:10:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.14591v8

Strategically-Robust Learning Algorithms for Bidding in First-Price Auctions

Learning to bid in repeated first-price auctions is a fundamental problem at the interface of game theory and machine learning, which has seen a recent surge in interest due to the transition of display advertising to first-price auctions. In this work, we propose a novel concave formulation for pure-strategy bidding in first-price auctions, and use it to analyze natural Gradient-Ascent-based algorithms for this problem. Importantly, our analysis goes beyond regret, which was the typical focus of past work, and also accounts for the strategic backdrop of online-advertising markets where bidding algorithms are deployed -- we provide the first guarantees of strategic-robustness and incentive-compatibility for Gradient Ascent. Concretely, we show that our algorithms achieve $O(\sqrt{T})$ regret when the highest competing bids are generated adversarially, and show that no online algorithm can do better. We further prove that the regret reduces to $O(\log T)$ when the competition is stationary and stochastic, which drastically improves upon the previous best of $O(\sqrt{T})$. Moving beyond regret, we show that a strategic seller cannot exploit our algorithms to extract more revenue on average than is possible under the optimal mechanism. Finally, we prove that our algorithm is also incentive compatible -- it is a (nearly) dominant strategy for the buyer to report her values truthfully to the algorithm as a whole. Altogether, these guarantees make our algorithms the first to simultaneously achieve both optimal regret and strategic-robustness.

Updated: 2024-07-07 06:07:00

标题: 在第一价格拍卖中出价的策略稳健的学习算法

摘要: 学习在重复的一价拍卖中投标是博弈论和机器学习交界处的一个基本问题，由于展示广告向一价拍卖的过渡，这个问题最近引起了人们的极大兴趣。在这项工作中，我们提出了一种新颖的凹面形式，用于在一价拍卖中进行纯策略投标，并使用它来分析基于自然梯度上升的算法。重要的是，我们的分析超越了过去工作的典型焦点——遗憾，还考虑了在线广告市场的战略背景，其中投标算法被部署——我们为梯度上升提供了首个战略稳健性和激励兼容性的保证。具体地，我们证明我们的算法在最高竞争出价被对抗性生成时实现了O(√T)的遗憾，并且证明没有在线算法可以做得更好。我们进一步证明，在竞争是稳定和随机的情况下，遗憾降低到O(log T)，这极大地改进了先前最好的O(√T)。在超越遗憾的基础上，我们展示了一个策略性卖方不能利用我们的算法来平均提取比在最优机制下可能的收入更多。最后，我们证明我们的算法也是激励兼容的——对于买家来说，向整个算法真实报告她的价值是一个（几乎）主导策略。总的来说，这些保证使我们的算法成为首个同时实现最优遗憾和战略稳健性的算法。

更新时间: 2024-07-07 06:07:00

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2402.07363v2

A Deep Dive into the Factors Influencing Financial Success: A Machine Learning Approach

This paper explores various socioeconomic factors that contribute to individual financial success using machine learning algorithms and approaches. Financial success, a critical aspect of all individual's well-being, is a complex concept influenced by various factors. This study aims to understand the determinants of financial success. It examines the survey data from the National Longitudinal Survey of Youth 1997 by the Bureau of Labor Statistics (1), consisting of a sample of 8,984 individuals's longitudinal data over years. The dataset comprises income variables and a large set of socioeconomic variables of individuals. An in-depth analysis shows the effectiveness of machine learning algorithms in financial success research, highlights the potential of leveraging longitudinal data to enhance prediction accuracy, and provides valuable insights into how various socioeconomic factors influence financial success. The findings highlight the significant influence of highest education degree, occupation and gender as the top three determinants of individual income among socioeconomic factors examined. Yearly working hours, age and work tenure follow as three secondary influencing factors, and all other factors including parental household income, industry, parents' highest grade and others are identified as tertiary factors. These insights allow researchers to better understand the complex nature of financial success, and are also crucial for fostering financial success among individuals and advancing broader societal well-being by providing insights for policymakers during decision-making process.

Updated: 2024-07-07 06:01:45

标题: 深入探讨影响财务成功的因素：一种机器学习方法

摘要: 本文探讨了各种社会经济因素对个人财务成功的贡献，采用机器学习算法和方法。财务成功是所有个人幸福的一个关键方面，是一个受多种因素影响的复杂概念。本研究旨在了解财务成功的决定因素。它审查了劳工统计局1997年《全国青年纵向调查》的调查数据，包括8,984名个体在多年间的纵向数据样本。数据集包括个体的收入变量和大量的社会经济变量。深入分析显示了机器学习算法在财务成功研究中的有效性，突显了利用纵向数据提高预测准确性的潜力，并为了解各种社会经济因素如何影响财务成功提供了宝贵见解。研究结果强调了最高教育程度、职业和性别作为所研究的社会经济因素中个人收入的前三个决定因素的显著影响。年工作小时、年龄和工作年限作为三个次要影响因素，所有其他因素包括父母家庭收入、行业、父母的最高学历等被确定为第三级因素。这些见解使研究人员更好地了解财务成功的复杂性，并对促进个人财务成功和提升更广泛社会幸福至关重要，为政策制定者在决策过程中提供见解。

更新时间: 2024-07-07 06:01:45

领域: cs.LG

下载: http://arxiv.org/abs/2405.08233v3

Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients

The training of large models, involving fine-tuning, faces the scarcity of high-quality data. Compared to the solutions based on centralized data centers, updating large models in the Internet of Things (IoT) faces challenges in coordinating knowledge from distributed clients by using their private and heterogeneous data. To tackle such a challenge, we propose KOALA (Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients) to impel the training of large models in IoT. Since the resources obtained by IoT clients are limited and restricted, it is infeasible to locally execute large models and also update them in a privacy-preserving manner. Therefore, we leverage federated learning and knowledge distillation to update large models through collaboration with their small models, which can run locally at IoT clients to process their private data separately and enable large-small model knowledge transfer through iterative learning between the server and clients. Moreover, to support clients with similar or different computing capacities, KOALA is designed with two kinds of large-small model joint learning modes, namely to be homogeneous or heterogeneous. Experimental results demonstrate that compared to the conventional approach, our method can not only achieve similar training performance but also significantly reduce the need for local storage and computing power resources.

Updated: 2024-07-07 05:46:01

标题: 联邦知识转移：资源受限的物联网客户端对大型服务器模型进行微调

摘要: 大型模型的训练，涉及微调，面临高质量数据的匮乏。与基于集中式数据中心的解决方案相比，在物联网(IoT)中更新大型模型面临着协调来自分布式客户端的知识的挑战，这些客户端使用其私有和异构数据。为了解决这一挑战，我们提出了KOALA（资源受限的物联网客户端对大型服务器模型进行联合知识转移微调），以推动物联网中大型模型的训练。由于物联网客户端获得的资源有限且受限，因此在保护隐私的情况下在本地执行大型模型并对其进行更新是不可行的。因此，我们利用联合学习和知识蒸馏通过与其小模型的合作来更新大型模型，这些小模型可以在物联网客户端本地运行，分别处理其私有数据并通过服务器和客户端之间的迭代学习实现大型小型模型的知识转移。此外，为了支持具有相似或不同计算能力的客户端，KOALA设计有两种大型小型模型联合学习模式，即同质或异质。实验结果表明，与传统方法相比，我们的方法不仅可以实现类似的训练性能，还可以显著减少对本地存储和计算资源的需求。

更新时间: 2024-07-07 05:46:01

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05268v1

CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs

We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs). We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships, leading to the generation of simplistic and semantically vague data, impacting quantization accuracy. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization. Specifically, we incorporate a patch-level contrastive learning scheme to generate richer, semantically meaningful data. Furthermore, we leverage contrastive learning in layer-wise evolutionary search for fixed- and mixed-precision quantization to identify optimal quantization parameters while mitigating the effects of a non-smooth loss landscape. Extensive evaluations across various vision tasks demonstrate the superiority of CLAMP-ViT, with performance improvements of up to 3% in top-1 accuracy for classification, 0.6 mAP for object detection, and 1.5 mIoU for segmentation at similar or better compression ratio over existing alternatives. Code is available at https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git

Updated: 2024-07-07 05:39:25

标题: CLAMP-ViT：用于ViTs自适应后训练量化的对比无数据学习

摘要: 我们提出了CLAMP-ViT，一种用于视觉变换器（ViTs）的无数据后训练量化方法。我们确定了最近技术的局限性，特别是它们无法利用有意义的区块间关系，导致生成简单和语义模糊的数据，影响量化准确性。CLAMP-ViT采用两阶段方法，循环地在数据生成和模型量化之间进行调整。具体来说，我们引入了一个区块级对比学习方案，以生成更丰富、语义有意义的数据。此外，我们还利用对比学习在固定和混合精度量化的分层演化搜索中，以确定最佳的量化参数，同时减轻非平滑损失景观的影响。通过对各种视觉任务的广泛评估，我们展示了CLAMP-ViT的优越性，分类准确率提升高达3％，目标检测的mAP提高0.6，分割的mIoU提高1.5，在相似或更好的压缩比下超过现有替代方案。代码可在https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git获取。

更新时间: 2024-07-07 05:39:25

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2407.05266v1

FastSpiker: Enabling Fast Training for Spiking Neural Networks on Event-based Data through Learning Rate Enhancements for Autonomous Embedded Systems

Autonomous embedded systems (e.g., robots) typically necessitate intelligent computation with low power/energy processing for completing their tasks. Such requirements can be fulfilled by embodied neuromorphic intelligence with spiking neural networks (SNNs) because of their high learning quality (e.g., accuracy) and sparse computation. Here, the employment of event-based data is preferred to ensure seamless connectivity between input and processing parts. However, state-of-the-art SNNs still face a long training time to achieve high accuracy, thereby incurring high energy consumption and producing a high rate of carbon emission. Toward this, we propose FastSpiker, a novel methodology that enables fast SNN training on event-based data through learning rate enhancements targeting autonomous embedded systems. In FastSpiker, we first investigate the impact of different learning rate policies and their values, then select the ones that quickly offer high accuracy. Afterward, we explore different settings for the selected learning rate policies to find the appropriate policies through a statistical-based decision. Experimental results show that our FastSpiker offers up to 10.5x faster training time and up to 88.39% lower carbon emission to achieve higher or comparable accuracy to the state-of-the-art on the event-based automotive dataset (i.e., NCARS). In this manner, our FastSpiker methodology paves the way for green and sustainable computing in realizing embodied neuromorphic intelligence for autonomous embedded systems.

Updated: 2024-07-07 05:17:17

标题: FastSpiker：通过学习率增强，在事件数据上实现脉冲神经网络的快速训练，为自主嵌入式系统提供支持

摘要: 自主嵌入式系统（例如，机器人）通常需要智能计算以低功耗/能量处理完成任务。这种要求可以通过具有尖峰神经网络（SNNs）的具体神经形态智能来实现，因为它们具有高学习质量（例如准确性）和稀疏计算。在这里，使用基于事件的数据是首选，以确保输入和处理部分之间的无缝连接。然而，尽管现代SNNs仍面临长时间训练以实现高准确性的问题，因此会产生高能耗和高碳排放率。为此，我们提出了FastSpiker，一种新颖的方法，通过针对自主嵌入式系统的学习率增强，实现对基于事件的数据快速SNN训练。在FastSpiker中，我们首先研究不同学习率策略及其值的影响，然后选择能够快速提供高准确性的策略。随后，我们通过基于统计的决策，探索选择的学习率策略的不同设置，以找到合适的策略。实验结果显示，我们的FastSpiker提供高达10.5倍的更快训练时间，以及高达88.39％的更低碳排放率，以实现与现代技术在基于事件的汽车数据集（即NCARS）上更高或相当的准确性。通过这种方式，我们的FastSpiker方法为实现自主嵌入式系统的具体神经形态智能铺平了绿色和可持续计算的道路。

更新时间: 2024-07-07 05:17:17

领域: cs.NE,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.05262v1

Disciplined Geodesically Convex Programming

Convex programming plays a fundamental role in machine learning, data science, and engineering. Testing convexity structure in nonlinear programs relies on verifying the convexity of objectives and constraints. \citet{grant2006disciplined} introduced a framework, Disciplined Convex Programming (DCP), for automating this verification task for a wide range of convex functions that can be decomposed into basic convex functions (atoms) using convexity-preserving compositions and transformations (rules). However, the restriction to Euclidean convexity concepts can limit the applicability of the framework. For instance, many notable instances of statistical estimators and matrix-valued (sub)routines in machine learning applications are Euclidean non-convex, but exhibit geodesic convexity through a more general Riemannian lens. In this work, we extend disciplined programming to this setting by introducing Disciplined Geodesically Convex Programming (DGCP). We determine convexity-preserving compositions and transformations for geodesically convex functions on general Cartan-Hadamard manifolds, as well as for the special case of symmetric positive definite matrices, a common setting in matrix-valued optimization. For the latter, we also define a basic set of atoms. Our paper is accompanied by a Julia package SymbolicAnalysis.jl, which provides functionality for testing and certifying DGCP-compliant expressions. Our library interfaces with manifold optimization software, which allows for directly solving verified geodesically convex programs.

Updated: 2024-07-07 05:13:51

标题: 纪律几何凸规划

摘要: 凸规划在机器学习、数据科学和工程领域中起着基础性作用。在非线性程序中测试凸性结构依赖于验证目标和约束的凸性。Grant等人引入了一个框架，称为纪律凸规划（DCP），用于自动化验证广泛范围内的可以通过基本凸函数（原子）分解的凸函数的凸性，并使用保持凸性的组合和变换（规则）。然而，限制在欧几里德凸性概念上可能限制了该框架的适用性。例如，在许多显著的统计估计器和机器学习应用中的矩阵值（子）程序中，通常是欧几里德非凸的，但通过更一般的黎曼透镜表现出测地凸性。在这项工作中，我们通过引入纪律测地凸规划（DGCP）将纪律编程扩展到这种设置。我们确定了一般Cartan-Hadamard流形上测地凸函数的保凸组合和变换，以及对于对称正定矩阵这种常见的矩阵值优化设置的特殊情况。对于后者，我们还定义了一组基本的原子。我们的论文附带一个Julia软件包SymbolicAnalysis.jl，提供测试和认证DGCP兼容表达式的功能。我们的库与流形优化软件接口，允许直接解决经过验证的测地凸程序。

更新时间: 2024-07-07 05:13:51

领域: math.OC,cs.LG,cs.MS,stat.ML

下载: http://arxiv.org/abs/2407.05261v1

Multi-scale Conditional Generative Modeling for Microscopic Image Restoration

The advance of diffusion-based generative models in recent years has revolutionized state-of-the-art (SOTA) techniques in a wide variety of image analysis and synthesis tasks, whereas their adaptation on image restoration, particularly within computational microscopy remains theoretically and empirically underexplored. In this research, we introduce a multi-scale generative model that enhances conditional image restoration through a novel exploitation of the Brownian Bridge process within wavelet domain. By initiating the Brownian Bridge diffusion process specifically at the lowest-frequency subband and applying generative adversarial networks at subsequent multi-scale high-frequency subbands in the wavelet domain, our method provides significant acceleration during training and sampling while sustaining a high image generation quality and diversity on par with SOTA diffusion models. Experimental results on various computational microscopy and imaging tasks confirm our method's robust performance and its considerable reduction in its sampling steps and time. This pioneering technique offers an efficient image restoration framework that harmonizes efficiency with quality, signifying a major stride in incorporating cutting-edge generative models into computational microscopy workflows.

Updated: 2024-07-07 05:11:00

标题: 多尺度条件生成建模用于微观图像恢复

摘要: 近年来扩散基础生成模型的进步已经在各种图像分析和合成任务的最新技术中产生了革命性的影响，然而它们在图像恢复领域的应用，特别是在计算显微镜领域，仍然在理论和实证方面尚未得到充分探索。在这项研究中，我们引入了一个多尺度生成模型，通过在小波域内新颖地利用布朗桥过程，增强了条件图像恢复。通过在最低频子带处启动布朗桥扩散过程，并在小波域内的后续多尺度高频子带上应用生成对抗网络，我们的方法在训练和采样过程中提供了显著的加速，同时保持了与最新扩散模型相当的高图像生成质量和多样性。对各种计算显微镜和成像任务的实验结果证实了我们方法的稳健性能以及在采样步骤和时间上的显著减少。这一开创性技术提供了一个高效的图像恢复框架，将效率与质量融为一体，标志着将尖端生成模型纳入计算显微镜工作流程的重要进展。

更新时间: 2024-07-07 05:11:00

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.05259v1

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To improve the recall of the RAG for the target contexts, we introduce a knowledge graph to construct structured data to achieve hard matching at a fine-grained level. Moreover, we normalize the backdoor scenarios in LLMs to analyze the real harm caused by backdoors from both attackers' and users' perspectives and further verify whether the context is a favorable tool for jailbreaking models. Extensive experimental results on truthfulness, language understanding, and harmfulness show that TrojanRAG exhibits versatility threats while maintaining retrieval capabilities on normal queries.

Updated: 2024-07-07 05:03:53

标题: 特洛伊木马RAG：检索增强生成可以成为大型语言模型中的后门驱动程序

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）中表现显著，但引发了对潜在安全威胁的担忧。后门攻击最初证实了LLM在所有阶段都造成了实质性伤害，但成本和鲁棒性受到了批评。攻击LLMs在安全审查中本质上是有风险的，而且成本高昂。此外，LLMs的持续迭代将降低后门的鲁棒性。在本文中，我们提出了TrojanRAG，该方法在检索增强生成中采用联合后门攻击，从而在通用攻击场景中操纵LLMs。具体而言，对手构建精心设计的目标上下文和触发器集。通过对比学习，多对后门捷径被正交优化，从而将触发条件限制在参数子空间中以改善匹配。为了提高RAG对目标上下文的召回率，我们引入知识图构建结构化数据，以在细粒度水平上实现硬匹配。此外，我们将LLMs中的后门场景进行标准化，从攻击者和用户的角度分析后门造成的真实伤害，并进一步验证上下文是否是破解模型的有利工具。大量实验结果表明，TrojanRAG在维持正常查询的检索能力的同时展示了多样化的威胁。

更新时间: 2024-07-07 05:03:53

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2405.13401v4

OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

Binary Neural Networks~(BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50\% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as ``silent weights'', which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights~(OvSW). OvSW first employs Adaptive Gradient Scaling~(AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying~(SAD) to automatically identify ``silent weights'' by tracking weight flipping state, and apply an additional penalty to ``silent weights'' to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6\% and 65.5\% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at \url{https://github.com/JingyangXiang/OvSW}.

Updated: 2024-07-07 05:01:20

标题: OvSW: 克服二进制神经网络的准确性问题

摘要: 二进制神经网络（BNNs）已被证明对于在移动和嵌入式平台上部署深度神经网络非常有效。大多数现有的研究侧重于最小化量化误差，提高表示能力，或设计梯度近似以减轻BNN中的梯度不匹配，而忽略了权重符号翻转，这是实现强大BNN的关键因素。在本文中，我们调查了BNN中权重符号更新的效率。我们观察到，对于普通BNNs，在训练过程中有超过50％的权重保持其符号不变，这些权重不仅分布在重量分布的尾部，还普遍存在于接近零点的位置。我们将这些权重称为“沉默权重”，它们减慢了收敛速度并导致了显著的准确性降级。从理论上讲，我们揭示了这是由于BNNs的梯度与潜在权重分布的独立性。为了解决这个问题，我们提出了Overcome Silent Weights（OvSW）。OvSW首先采用自适应梯度缩放（AGS）建立了梯度与潜在权重分布之间的关系，从而提高了权重符号更新的整体效率。此外，我们设计了沉默感知衰减（SAD）来自动识别“沉默权重”，通过跟踪权重翻转状态，并对“沉默权重”施加额外惩罚以促进它们的翻转。通过高效地更新权重符号，我们的方法在CIFAR10和ImageNet1K数据集上以各种体系结构实现了更快的收敛速度和最先进的性能。例如，OvSW在使用二值化ResNet18和ResNet34体系结构时，在ImageNet1K上分别获得了61.6％和65.5％的top-1准确率。代码可在\url{https://github.com/JingyangXiang/OvSW}上找到。

更新时间: 2024-07-07 05:01:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05257v1

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene. While language and vision foundation models have achieved success in handling various open-vocabulary tasks with abundant training data, OV-3DDet faces a significant challenge due to the limited availability of training data. Although some pioneering efforts have integrated vision-language models (VLM) knowledge into OV-3DDet learning, the full potential of these foundational models has yet to be fully exploited. In this paper, we unlock the textual and visual wisdom to tackle the open-vocabulary 3D detection task by leveraging the language and vision foundation models. We leverage a vision foundation model to provide image-wise guidance for discovering novel classes in 3D scenes. Specifically, we utilize a object detection vision foundation model to enable the zero-shot discovery of objects in images, which serves as the initial seeds and filtering guidance to identify novel 3D objects. Additionally, to align the 3D space with the powerful vision-language space, we introduce a hierarchical alignment approach, where the 3D feature space is aligned with the vision-language feature space using a pre-trained VLM at the instance, category, and scene levels. Through extensive experimentation, we demonstrate significant improvements in accuracy and generalization, highlighting the potential of foundation models in advancing open-vocabulary 3D object detection in real-world scenarios.

Updated: 2024-07-07 04:50:04

标题: 解锁文本和视觉智慧：通过来自文本和图像的全面指导增强的开放词汇3D物体检测

摘要: 开放词汇的三维物体检测（OV-3DDet）旨在在任何新的三维场景中定位和识别已知和以前未见过的物体类别。虽然语言和视觉基础模型在处理各种具有丰富训练数据的开放词汇任务方面取得了成功，但由于训练数据的有限性，OV-3DDet面临着重大挑战。尽管一些开拓性工作已经将视觉语言模型（VLM）的知识整合到OV-3DDet学习中，但这些基础模型的全部潜力尚未完全发挥出来。在本文中，我们利用语言和视觉基础模型解锁文本和视觉智慧，以解决开放词汇的三维检测任务。我们利用视觉基础模型为发现三维场景中的新类提供基于图像的指导。具体来说，我们利用物体检测视觉基础模型使图像中的物体进行零样本发现，这作为识别新的三维物体的初始种子和过滤指导。此外，为了将三维空间与强大的视觉语言空间对齐，我们引入了一种分层对齐方法，其中使用预训练的VLM将三维特征空间与视觉语言特征空间在实例、类别和场景级别进行对齐。通过广泛的实验，我们展示了在准确性和泛化方面的显著改进，突显了基础模型在推动实际场景中的开放词汇三维物体检测方面的潜力。

更新时间: 2024-07-07 04:50:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05256v1

How to characterize imprecision in multi-view clustering?

It is still challenging to cluster multi-view data since existing methods can only assign an object to a specific (singleton) cluster when combining different view information. As a result, it fails to characterize imprecision of objects in overlapping regions of different clusters, thus leading to a high risk of errors. In this paper, we thereby want to answer the question: how to characterize imprecision in multi-view clustering? Correspondingly, we propose a multi-view low-rank evidential c-means based on entropy constraint (MvLRECM). The proposed MvLRECM can be considered as a multi-view version of evidential c-means based on the theory of belief functions. In MvLRECM, each object is allowed to belong to different clusters with various degrees of support (masses of belief) to characterize uncertainty when decision-making. Moreover, if an object is in the overlapping region of several singleton clusters, it can be assigned to a meta-cluster, defined as the union of these singleton clusters, to characterize the local imprecision in the result. In addition, entropy-weighting and low-rank constraints are employed to reduce imprecision and improve accuracy. Compared to state-of-the-art methods, the effectiveness of MvLRECM is demonstrated based on several toy and UCI real datasets.

Updated: 2024-07-07 04:47:49

标题: 如何表征多视角聚类中的不精确性？

摘要: 对于多视图数据进行聚类仍然具有挑战性，因为现有方法只能在结合不同视图信息时将对象分配到特定（单例）聚类中。因此，它无法描述不同聚类重叠区域中对象的不确定性，从而导致高错误风险。本文旨在回答一个问题：如何描述多视图聚类中的不确定性？因此，我们提出了一种基于熵约束的多视图低秩证据c-均值（MvLRECM）。提出的MvLRECM可以被视为基于信念函数理论的多视图版本的证据c-均值。在MvLRECM中，每个对象被允许属于具有不同支持程度（信念质量）的不同聚类，以描述决策时的不确定性。此外，如果一个对象位于几个单例聚类的重叠区域中，它可以被分配到一个元聚类，定义为这些单例聚类的并集，以描述结果中的局部不确定性。此外，还利用熵加权和低秩约束来减少不确定性并提高准确性。通过在几个玩具和UCI真实数据集上的实验，证明了MvLRECM的有效性优于现有方法。

更新时间: 2024-07-07 04:47:49

领域: cs.LG

下载: http://arxiv.org/abs/2404.04970v2

FedCG: Leverage Conditional GAN for Protecting Privacy and Maintaining Competitive Performance in Federated Learning

Federated learning (FL) aims to protect data privacy by enabling clients to build machine learning models collaboratively without sharing their private data. Recent works demonstrate that information exchanged during FL is subject to gradient-based privacy attacks, and consequently, a variety of privacy-preserving methods have been adopted to thwart such attacks. However, these defensive methods either introduce orders of magnitude more computational and communication overheads (e.g., with homomorphic encryption) or incur substantial model performance losses in terms of prediction accuracy (e.g., with differential privacy). In this work, we propose $\textsc{FedCG}$, a novel federated learning method that leverages conditional generative adversarial networks to achieve high-level privacy protection while still maintaining competitive model performance. $\textsc{FedCG}$ decomposes each client's local network into a private extractor and a public classifier and keeps the extractor local to protect privacy. Instead of exposing extractors, $\textsc{FedCG}$ shares clients' generators with the server for aggregating clients' shared knowledge, aiming to enhance the performance of each client's local networks. Extensive experiments demonstrate that $\textsc{FedCG}$ can achieve competitive model performance compared with FL baselines, and privacy analysis shows that $\textsc{FedCG}$ has a high-level privacy-preserving capability. Code is available at https://github.com/yankang18/FedCG

Updated: 2024-07-07 03:57:12

标题: FedCG：利用条件生成对抗网络保护隐私并在联邦学习中保持竞争性能

摘要: 联邦学习（FL）的目标是通过使客户能够在不共享其私人数据的情况下协作构建机器学习模型来保护数据隐私。最近的研究表明，在FL过程中交换的信息容易受到基于梯度的隐私攻击的影响，因此，已经采用了各种隐私保护方法来防范此类攻击。然而，这些防御方法要么引入数量级更多的计算和通信开销（例如，使用同态加密），要么在预测准确性方面造成实质性的模型性能损失（例如，使用差分隐私）。在这项工作中，我们提出了一种名为$\textsc{FedCG}$的新型联邦学习方法，利用条件生成对抗网络实现高水平的隐私保护，同时仍保持竞争性的模型性能。$\textsc{FedCG}$将每个客户端的本地网络分解为私有提取器和公共分类器，并将提取器保持在本地以保护隐私。$\textsc{FedCG}$不会暴露提取器，而是与服务器共享客户端的生成器，以聚合客户端的共享知识，旨在提升每个客户端本地网络的性能。大量实验证明，与FL基线相比，$\textsc{FedCG}$可以实现具有竞争性的模型性能，并且隐私分析表明$\textsc{FedCG}$具有高水平的隐私保护能力。代码可在https://github.com/yankang18/FedCG获取。

更新时间: 2024-07-07 03:57:12

领域: cs.LG

下载: http://arxiv.org/abs/2111.08211v3

Bidirectional Uncertainty-Based Active Learning for Open Set Annotation

Active learning (AL) in open set scenarios presents a novel challenge of identifying the most valuable examples in an unlabeled data pool that comprises data from both known and unknown classes. Traditional methods prioritize selecting informative examples with low confidence, with the risk of mistakenly selecting unknown-class examples with similarly low confidence. Recent methods favor the most probable known-class examples, with the risk of picking simple already mastered examples. In this paper, we attempt to query examples that are both likely from known classes and highly informative, and propose a Bidirectional Uncertainty-based Active Learning (BUAL) framework. Specifically, we achieve this by first pushing the unknown class examples toward regions with high-confidence predictions, i.e., the proposed Random Label Negative Learning method. Then, we propose a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling. BUAL successfully extends existing uncertainty-based AL methods to complex open-set scenarios. Extensive experiments on multiple datasets with varying openness demonstrate that BUAL achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/BUAL.

Updated: 2024-07-07 03:48:33

标题: 双向基于不确定性的主动学习用于开放集注释

摘要: 主动学习（AL）在开放集场景中提出了一个新的挑战，即在一个包含来自已知和未知类别数据的未标记数据池中识别最有价值的示例。传统方法优先选择具有低置信度的信息示例，存在误选类似低置信度的未知类示例的风险。最近的方法倾向于选择最可能的已知类示例，存在选择简单已掌握示例的风险。在本文中，我们尝试查询可能来自已知类别且具有高信息量的示例，并提出了双向不确定性主导的主动学习（BUAL）框架。具体来说，我们首先通过将未知类示例推向具有高置信度预测的区域来实现这一目标，即提出的随机标签负学习方法。然后，我们提出了一种双向不确定性抽样策略，通过同时估计正向和负向学习引起的不确定性来执行一致和稳定的抽样。BUAL成功将现有的基于不确定性的AL方法扩展到复杂的开放式场景。对多个数据集进行的广泛实验表明，BUAL实现了最先进的性能。代码可在https://github.com/chenchenzong/BUAL上找到。

更新时间: 2024-07-07 03:48:33

领域: cs.LG

下载: http://arxiv.org/abs/2402.15198v2

JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.

Updated: 2024-07-07 03:36:14

标题: JSCDS：一种基于Jason-Shannon离散度的龋病RGB图像核心数据选择方法-高效学习

摘要: 基于深度学习的RGB龋齿检测提高了龋齿识别的效率，对预防口腔疾病至关重要。深度学习模型的性能取决于高质量的数据，并需要大量的训练资源，使得有效部署具有挑战性。通过消除低质量和令人困惑的数据，核心数据选择旨在提高训练效率，同时不显著影响模型性能。然而，基于距离的数据选择方法很难区分高维龋齿数据之间的依赖关系。为了解决这个问题，我们提出了一种基于Jensen-Shannon散度（JSCDS）的核心数据选择方法，用于有效的龋齿图像学习和龋齿分类。我们将核心数据选择标准描述为不同类别样本的分布。JSCDS通过在龋齿分类网络中计算样本嵌入表示来计算群集中心，并利用Jensen-Shannon散度计算数据样本和群集中心之间的互信息，捕捉高维数据之间的非线性依赖关系。平均互信息被计算以适应上述分布，作为构建模型训练核心集的标准。对RGB龋齿数据集进行的大量实验表明，JSCDS在预测性能和时间消耗方面优于其他数据选择方法。值得注意的是，JSCDS仅使用50%的核心数据就超过了完整数据集模型的性能，其性能优势在70%核心数据中变得更加显著。

更新时间: 2024-07-07 03:36:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.00362v2

Deep Probability Aggregation Clustering

Combining machine clustering with deep models has shown remarkable superiority in deep clustering. It modifies the data processing pipeline into two alternating phases: feature clustering and model training. However, such alternating schedule may lead to instability and computational burden issues. We propose a centerless clustering algorithm called Probability Aggregation Clustering (PAC) to proactively adapt deep learning technologies, enabling easy deployment in online deep clustering. PAC circumvents the cluster center and aligns the probability space and distribution space by formulating clustering as an optimization problem with a novel objective function. Based on the computation mechanism of the PAC, we propose a general online probability aggregation module to perform stable and flexible feature clustering over mini-batch data and further construct a deep visual clustering framework deep PAC (DPAC). Extensive experiments demonstrate that PAC has superior clustering robustness and performance and DPAC remarkably outperforms the state-of-the-art deep clustering methods.

Updated: 2024-07-07 03:31:00

标题: 深概率聚合聚类

摘要: 将机器聚类与深度模型结合在一起在深度聚类中表现出卓越的优势。它将数据处理流程修改为两个交替阶段：特征聚类和模型训练。然而，这种交替安排可能导致不稳定性和计算负担问题。我们提出了一种无中心聚类算法，称为概率聚合聚类（PAC），以主动适应深度学习技术，实现在线深度聚类的简单部署。PAC绕过聚类中心，通过制定一个新颖的目标函数将概率空间和分布空间进行对齐，将聚类表述为一个优化问题。基于PAC的计算机制，我们提出了一个通用的在线概率聚合模块，以在小批量数据上执行稳定和灵活的特征聚类，并进一步构建一个深度视觉聚类框架深度PAC（DPAC）。大量实验证明PAC具有优越的聚类鲁棒性和性能，DPAC明显优于最先进的深度聚类方法。

更新时间: 2024-07-07 03:31:00

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.05246v1

Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"

Models like Delphi have been able to label ethical dilemmas as moral or immoral with astonishing accuracy. This paper challenges accuracy as a holistic metric for ethics modeling by identifying issues with translating moral dilemmas into text-based input. It demonstrates these issues with contrast sets that substantially reduce the performance of classifiers trained on the dataset Moral Stories. Ultimately, we obtain concrete estimates for how much specific forms of data misrepresentation harm classifier accuracy. Specifically, label-changing tweaks to the descriptive content of a situation (as small as 3-5 words) can reduce classifier accuracy to as low as 51%, almost half the initial accuracy of 99.8%. Associating situations with a misleading social norm lowers accuracy to 98.8%, while adding textual bias (i.e. an implication that a situation already fits a certain label) lowers accuracy to 77%. These results suggest not only that many ethics models have substantially overfit, but that several precautions are required to ensure that input accurately captures a moral dilemma. This paper recommends re-examining the structure of a social norm, training models to ask for context with defeasible reasoning, and filtering input for textual bias. Doing so not only gives us the first concrete estimates of the average cost to accuracy of misrepresenting ethics data, but gives researchers practical tips for considering these estimates in research.

Updated: 2024-07-07 03:22:49

标题: 一些预测伦理建模中的问题：一组“道德故事”的注释对比集合

摘要: 像Delphi这样的模型已经能够以惊人的准确性将道德困境标记为道德或不道德。本文挑战将准确性作为伦理建模的整体度量标准，通过识别将道德困境转化为基于文本的输入时存在的问题。它通过对比集展示了这些问题，这些问题显著降低了在数据集《道德故事》上训练的分类器的性能。最终，我们获得了关于特定形式的数据误传对分类器准确性的伤害程度的具体估计。具体而言，对情境的描述内容进行标签更改的微调（只需3-5个单词）可以将分类器的准确性降至51%，几乎是初始准确性99.8%的一半。将情境与误导性社会规范联系起来会降低准确性至98.8%，而添加文本偏见（即暗示情境已经符合某个标签）会将准确性降至77%。这些结果不仅表明许多伦理模型已经显著过拟合，而且还需要采取几项预防措施，以确保输入准确捕捉道德困境。本文建议重新审视社会规范的结构，训练模型使用可推翻性推理来要求上下文，并过滤具有文本偏见的输入。这样做不仅为我们提供了关于误传伦理数据对准确性的平均成本的第一个具体估计，还为研究人员提供了在研究中考虑这些估计的实用建议。

更新时间: 2024-07-07 03:22:49

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.05244v1

Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.

Updated: 2024-07-07 03:20:26

标题: 《Imperative Learning: 一个自监督的神经符号学习框架用于机器人自主性》

摘要: 数据驱动方法，如强化学习和模仿学习，在机器人自主性方面取得了显著成功。然而，它们以数据为中心的本质仍然阻碍了它们很好地泛化到不断变化的环境。此外，为机器人任务收集大量数据集通常是不切实际和昂贵的。为了克服这些挑战，我们引入了一种新的自监督神经符号（NeSy）计算框架，即命令学习（IL），用于机器人自主性，利用符号推理的泛化能力。IL框架由三个主要组件组成：神经模块、推理引擎和记忆系统。我们将IL定义为一种特殊的双层优化（BLO），它使三个模块之间实现相互学习。这克服了与数据驱动方法相关的标签密集型障碍，并利用符号推理涉及逻辑推理、物理原理、几何分析等。我们讨论了IL的几种优化技术，并验证了它们在包括路径规划、规则归纳、最优控制、视觉里程计和多机器人路由在内的五个不同机器人自主任务中的有效性。通过各种实验，我们展示了IL可以显著增强机器人自主性能力，并预计它将催生跨不同领域的进一步研究。

更新时间: 2024-07-07 03:20:26

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16087v2

Blending Data-Driven Priors in Dynamic Games

As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/.

Updated: 2024-07-07 02:54:35

标题: 在动态游戏中融合数据驱动的先验知识

摘要: 随着智能机器人如自动驾驶车辆在人群中的部署越来越多，这些系统应该利用基于模型的博弈理论规划者还是基于数据驱动策略进行安全、交互感知的运动规划的程度仍然是一个开放的问题。现有的动态博弈形式假设所有代理都是任务驱动的，并且行为最优化。然而，在现实中，人类往往偏离这些模型所规定的决策，他们的行为更适合在一个带有噪声的理性范式下近似。在这项工作中，我们研究了一种合理的方法，将基于数据驱动的参考策略与基于优化的博弈论策略相结合。我们制定了KLGame算法，用Kullback-Leibler（KL）正则化来解决与一般、随机、可能是多模态的参考策略相关的非合作动态博弈。我们的方法为每个决策者提供了一个可调参数，允许在任务驱动和数据驱动行为之间进行调节。我们提出了一种有效的算法，用于实时计算KLGame的多模态近似反馈纳什均衡策略。通过一系列模拟和真实世界的自动驾驶场景，我们展示了KLGame策略可以更有效地融入参考策略的指导，并考虑到有噪声的理性人类行为，相对于非正则化的基线。网站包含额外信息、视频和代码：https://kl-games.github.io/。

更新时间: 2024-07-07 02:54:35

领域: cs.RO,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2402.14174v3

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R\'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.

Updated: 2024-07-07 02:35:55

标题: 循环采样DP-SGD在非凸复合损失函数上的最后迭代隐私

摘要: 差分隐私随机梯度下降（DP-SGD）是一类通过差分隐私会计技术提供一定级别差分隐私保证的优化算法族。然而，当前会计技术假设与实际DP-SGD实现存在显著差异。例如，它们可能假设损失函数是Lipschitz连续和凸函数，随机以替换方式抽取批次，或者省略梯度裁剪步骤。在本研究中，我们分析了最常用的DP-SGD变体，其中我们以替换方式循环抽取批次，执行梯度裁剪，并且仅发布最后一个DP-SGD迭代。更具体地说，在不假设损失函数的凸性、光滑性或Lipschitz连续性的情况下，我们建立了对最后一个DP-SGD迭代的新的R\'enyi差分隐私（RDP）界限，只要（i）DP-SGD步长相对于损失函数中的拓扑常数很小，且（ii）损失函数是弱凸函数的温和假设。此外，我们展示了当目标函数的弱凸性参数接近零时，我们的界限趋于先前建立的凸界限。在非Lipschitz光滑损失函数的情况下，我们提供了一个在DP-SGD迭代次数方面表现良好的较弱界限。

更新时间: 2024-07-07 02:35:55

领域: cs.LG,cs.DS,math.OC,stat.ML,65K10 (Primary), 60G15, 68P27,G.3; G.1.6

下载: http://arxiv.org/abs/2407.05237v1

Advancing Prompt Recovery in NLP: A Deep Dive into the Integration of Gemma-2b-it and Phi2 Models

Prompt recovery, a crucial task in natural language processing, entails the reconstruction of prompts or instructions that language models use to convert input text into a specific output. Although pivotal, the design and effectiveness of prompts represent a challenging and relatively untapped field within NLP research. This paper delves into an exhaustive investigation of prompt recovery methodologies, employing a spectrum of pre-trained language models and strategies. Our study is a comparative analysis aimed at gauging the efficacy of various models on a benchmark dataset, with the goal of pinpointing the most proficient approach for prompt recovery. Through meticulous experimentation and detailed analysis, we elucidate the outstanding performance of the Gemma-2b-it + Phi2 model + Pretrain. This model surpasses its counterparts, showcasing its exceptional capability in accurately reconstructing prompts for text transformation tasks. Our findings offer a significant contribution to the existing knowledge on prompt recovery, shedding light on the intricacies of prompt design and offering insightful perspectives for future innovations in text rewriting and the broader field of natural language processing.

Updated: 2024-07-07 02:15:26

标题: 推动NLP中的快速恢复：深入探讨Gemma-2b-it和Phi2模型的整合

摘要: 快速恢复是自然语言处理中一个至关重要的任务，它涉及到重建语言模型用于将输入文本转换为特定输出所使用的提示或指令。尽管至关重要，但提示的设计和有效性代表了自然语言处理研究中一个具有挑战性且相对未被开发的领域。本文深入研究了快速恢复方法论，采用了一系列预训练语言模型和策略。我们的研究是一项比较分析，旨在评估各种模型在基准数据集上的有效性，以确定最有效的提示恢复方法。通过细致的实验和详细的分析，我们阐明了Gemma-2b-it + Phi2模型 + 预训练的卓越表现。这个模型超越了其他模型，展示了其在准确重建文本转换任务的提示方面的卓越能力。我们的发现为现有的提示恢复知识做出了重要贡献，揭示了提示设计的复杂性，并为文本重写和自然语言处理领域的未来创新提供了富有洞见的观点。

更新时间: 2024-07-07 02:15:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05233v1

PAPM: A Physics-aware Proxy Model for Process Systems

In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating efficacy, they fall short in terms of exploration depth and universality. To address these shortcomings, we introduce a physics-aware proxy model (PAPM) that fully incorporates partial prior physics of process systems, which includes multiple input conditions and the general form of conservation relations, resulting in better out-of-sample generalization. Additionally, PAPM contains a holistic temporal-spatial stepping module for flexible adaptation across various process systems. Through systematic comparisons with state-of-the-art pure data-driven and physics-aware models across five two-dimensional benchmarks in nine generalization tasks, PAPM notably achieves an average performance improvement of 6.7%, while requiring fewer FLOPs, and just 1% of the parameters compared to the prior leading method. The code is available at https://github.com/pengwei07/PAPM.

Updated: 2024-07-07 02:10:05

标题: PAPM：过程系统的物理感知代理模型

摘要: 在过程系统的代理建模领域，传统的数据驱动深度学习方法经常遇到重大挑战，如由大量数据引起的巨大训练成本和有限的泛化能力。作为一种有前途的替代方法，具有物理意识的模型整合了部分物理知识以改善这些挑战。尽管表现出有效性，但它们在探索深度和普遍性方面存在不足。为了解决这些缺点，我们引入了一种完全整合了过程系统部分先验物理的物理感知代理模型（PAPM），其中包括多个输入条件和守恒关系的一般形式，从而实现更好的样本外泛化。此外，PAPM包含一个全面的时空步进模块，可以灵活适应各种过程系统。通过在九个泛化任务中的五个二维基准上与最先进的纯数据驱动和具有物理意识的模型进行系统比较，PAPM显着实现了平均性能提升6.7％，同时需要更少的FLOPs，并且与先前领先方法相比，仅占参数的1％。代码可在https://github.com/pengwei07/PAPM获得。

更新时间: 2024-07-07 02:10:05

领域: cs.LG

下载: http://arxiv.org/abs/2407.05232v1

Semi-adaptive Synergetic Two-way Pseudoinverse Learning System

Deep learning has become a crucial technology for making breakthroughs in many fields. Nevertheless, it still faces two important challenges in theoretical and applied aspects. The first lies in the shortcomings of gradient descent based learning schemes which are time-consuming and difficult to determine the learning control hyperparameters. Next, the architectural design of the model is usually tricky. In this paper, we propose a semi-adaptive synergetic two-way pseudoinverse learning system, wherein each subsystem encompasses forward learning, backward learning, and feature concatenation modules. The whole system is trained using a non-gradient descent learning algorithm. It simplifies the hyperparameter tuning while improving the training efficiency. The architecture of the subsystems is designed using a data-driven approach that enables automated determination of the depth of the subsystems. We compare our method with the baselines of mainstream non-gradient descent based methods and the results demonstrate the effectiveness of our proposed method. The source code for this paper is available at http://github.com/B-berrypie/Semi-adaptive-Synergetic-Two-way-Pseudoinverse-Learning-System}{http://github.com/B-berrypie/Semi-adaptive-Synergetic-Two-way-Pseudoinverse-Learning-System.

Updated: 2024-07-07 02:02:44

标题: 半自适应协同双向伪逆学习系统

摘要: 深度学习已经成为许多领域取得突破的关键技术。然而，在理论和应用方面仍面临两个重要挑战。首先是基于梯度下降的学习方案存在的缺陷，这些方案耗时且难以确定学习控制超参数。其次，模型的架构设计通常比较棘手。本文提出了一种半自适应协同双向伪逆学习系统，其中每个子系统包括前向学习、反向学习和特征串联模块。整个系统使用非梯度下降学习算法进行训练，简化了超参数调整，同时提高了训练效率。子系统的架构设计采用数据驱动方法，实现了子系统深度的自动确定。我们将我们的方法与主流非梯度下降方法的基线进行比较，结果表明我们提出的方法的有效性。本文的源代码可在http://github.com/B-berrypie/Semi-adaptive-Synergetic-Two-way-Pseudoinverse-Learning-System找到。

更新时间: 2024-07-07 02:02:44

领域: cs.LG

下载: http://arxiv.org/abs/2406.18931v2

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

The deployment of pre-trained models (PTMs) has greatly advanced the field of continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping the pre-trained backbone frozen while employing parameter-efficient tuning (PET) techniques to instruct representation learning. Despite the popularity of Prompt-based PET for CL, its empirical design often leads to sub-optimal performance in our evaluation of different PTMs and target tasks. To this end, we propose a unified framework for CL with PTMs and PET that provides both theoretical and empirical advancements. We first perform an in-depth theoretical analysis of the CL objective in a pre-training context, decomposing it into hierarchical components namely within-task prediction, task-identity inference and task-adaptive prediction. We then present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimizes the decomposed objective through incorporating task-specific and task-shared knowledge via mainstream PET techniques along with efficient recovery of pre-trained representations. Leveraging this framework, we delve into the distinct impacts of implementation strategy, PET technique and PET architecture, as well as adaptive knowledge accumulation amidst pronounced distribution changes. Finally, across various CL scenarios, our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.

Updated: 2024-07-07 01:50:25

标题: HiDe-PET：通过参数高效调整的分层分解实现持续学习

摘要: 预训练模型（PTMs）的部署极大地推动了持续学习（CL）领域的发展，实现了正向的知识转移和对灾难性遗忘的韧性。为了在顺序到达的任务中保持这些优势，一个有前途的方向涉及保持预训练的骨干结构冻结，同时利用参数高效调整（PET）技术来指导表示学习。尽管Prompt-based PET在CL中很受欢迎，但其经验设计经常导致我们在不同的PTMs和目标任务的评估中表现出次优性能。为此，我们提出了一个统一的CL框架，其中包括PTMs和PET，提供了理论和实证方面的进展。我们首先对预训练背景下的CL目标进行了深入的理论分析，将其分解为层次化组件，即任务内预测、任务身份推断和任务自适应预测。然后，我们提出了层次分解PET（HiDe-PET），这是一种创新方法，通过结合主流PET技术以及有效恢复预训练表示，明确优化分解的目标。利用这一框架，我们探讨了实施策略、PET技术和PET架构的不同影响，以及在明显的分布变化中的自适应知识积累。最后，在各种CL场景中，我们的方法展示了比近期强大基线更为显著的性能。

更新时间: 2024-07-07 01:50:25

领域: cs.LG

下载: http://arxiv.org/abs/2407.05229v1

MenuCraft: Interactive Menu System Design with Large Language Models

Menu system design for user interfaces is a challenging task involving many design options and various human factors. For example, one crucial factor that designers need to consider is the semantic and systematic relation of menu commands. However, capturing these relations can be challenging due to limited available resources. Large language models can be helpful in this regard, using their pre-training knowledge to design and refine menu systems. In this paper, we propose MenuCraft, an AI-assisted designer for menu design that enables collaboration between the designer and a dialogue system to design menus. MenuCraft offers an interactive language-based menu design tool that simplifies the menu design process and enables easy customization of design options. MenuCraft supports a variety of interactions through dialog that allows performing in-context learning.

Updated: 2024-07-07 01:21:07

标题: MenuCraft：使用大型语言模型设计交互式菜单系统

摘要: 用户界面的菜单系统设计是一个具有挑战性的任务，涉及许多设计选项和各种人因素。例如，设计师需要考虑的一个关键因素是菜单命令的语义和系统化关系。然而，由于资源有限，捕捉这些关系可能是具有挑战性的。大型语言模型在这方面可能是有帮助的，利用它们的预训练知识来设计和完善菜单系统。在本文中，我们提出了MenuCraft，一个用于菜单设计的AI辅助设计师，它使设计师和对话系统之间能够协作设计菜单。MenuCraft提供了一个交互式基于语言的菜单设计工具，简化了菜单设计过程，并实现了设计选项的轻松定制。MenuCraft通过对话支持各种交互方式，允许进行上下文学习。

更新时间: 2024-07-07 01:21:07

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2303.04496v3

On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed

Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale (SGS) parameterizations for climate modeling. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is instability when integrated with numerical solvers of large-scale processes. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially non-local dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the receptive field (RF)-the region of the input an NN uses to predict an output-can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles ($\mathbf{R^2 \approx 0.99}$) cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. Some architectures, e.g., Fourier neural operators, have inherently large RF. We also demonstrate that learning non-local dynamics can be crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the entire zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the need to integrate ML theory with physics for designing/analyzing data-driven algorithms for weather/climate modeling.

Updated: 2024-07-07 01:15:52

标题: 关于学习非局部动力学对稳定数据驱动气候建模的重要性：一个一维重力波-QBO试验平台

摘要: 机器学习（ML）技术，特别是神经网络（NNs），在学习气候建模中的亚网格尺度（SGS）参数化方面表现出潜力。然而，数据驱动参数化的一个主要问题，特别是那些通过监督算法学习的参数化，在与大尺度过程的数值解算器集成时存在不稳定性。目前的解决方法通常是临时的，缺乏理论基础。在这里，我们结合ML理论和气候物理学，以解决基于NN的参数化中不稳定性的一个来源。我们通过使用重力波（GW）参数化作为测试平台的准二年振荡（QBO）的1D模型，展示了学习空间非局部动力学的重要性。虽然常见的离线指标未能识别学习非局部动力学中的不足之处，但我们表明，接受域（RF）-NN用来预测输出所使用的输入区域-可以事先识别不稳定性。我们发现，似乎能够准确预测风速剖面中的GW作用力的NN参数化（$\mathbf{R^2 \approx 0.99}$）在RF太小以捕捉非局部动力学时会导致不稳定的模拟，而具有相同大小但足够大的RF的NN是稳定的。一些架构，例如傅里叶神经算子，具有固有的大RF。我们还证明了学习非局部动力学对于数据驱动的整个经度风场的时空仿真器的稳定性和准确性至关重要。鉴于气候系统中非局部动力学的普遍存在，我们预期使用有效的RF，可以对任何NN架构进行计算，对于许多应用都是重要的。这项工作强调了在设计/分析用于天气/气候建模的数据驱动算法时，需要将ML理论与物理学相结合。

更新时间: 2024-07-07 01:15:52

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2407.05224v1

PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current efficient answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models (LLMs). There are two challenges: a lack of diverse evaluation data and that models are too big and non-transparent; LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing guidelines and datasets for evaluating machine QA adopted from human QA community. We also propose an efficient, low-resource, and interpretable QA evaluation method more stable than an exact match and neural methods.

Updated: 2024-07-07 01:14:16

标题: 书呆子（对吝啬鬼的多样答案提供者文本进行精确评估）：开放领域问答的高效评估分析和基准测试

摘要: 问答（QA）只有在我们知道答案是否正确时才能取得进展，但对于许多最具挑战性和有趣的QA示例，当前高效的答案正确性（AC）度量与人类判断不一致，特别是来自大型语言模型（LLMs）的冗长、自由形式的答案。存在两个挑战：缺乏多样化的评估数据以及模型过于庞大且不透明；基于LLM的评分器与人类更相关，但这种昂贵的任务仅在有限的QA数据集上进行过测试。我们通过提供从人类QA社区采纳的评估机器QA的指导方针和数据集来纠正这些问题。我们还提出了一种高效、低资源、可解释的QA评估方法，比精确匹配和神经方法更稳定。

更新时间: 2024-07-07 01:14:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.11161v2

Larimar: Large Language Models with Episodic Memory Control

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar

Updated: 2024-07-07 00:51:44

标题: 蓝宝石：具有情节记忆控制的大型语言模型

摘要: 大型语言模型（LLMs）中存储的知识的高效且准确更新是当今最紧迫的研究挑战之一。本文介绍了Larimar - 一种新颖的、受大脑启发的架构，用于增强LLMs的分布式情景记忆。Larimar的记忆允许动态、一次性更新知识，无需进行计算昂贵的重新训练或微调。在多个事实编辑基准测试上的实验结果表明，Larimar在挑战性的顺序编辑设置中达到了与大多数竞争基线相当的准确性，但在速度上也表现出色 - 取决于基础LLM，速度提高了8-10倍，同时由于所提出的架构简单、与LLM无关，因此具有灵活性和普遍性。我们进一步提供了选择性事实遗忘、信息泄漏预防以及输入上下文长度泛化的机制，并展示了它们的有效性。我们的代码可在https://github.com/IBM/larimar 上找到。

更新时间: 2024-07-07 00:51:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11901v3