Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
To remove redundant components of large language models (LLMs) without incurring significant computational costs, this work focuses on single-shot pruning without a retraining phase. We simplify the pruning process for Transformer-based LLMs by identifying a depth-2 pruning structure that functions independently. Additionally, we propose two inference-aware pruning criteria derived from the optimization perspective of output approximation, which outperforms traditional training-aware metrics such as gradient and Hessian. We also introduce a two-step reconstruction technique to mitigate pruning errors without model retraining. Experimental results demonstrate that our approach significantly reduces computational costs and hardware requirements while maintaining superior performance across various datasets and models.
Updated: 2024-07-26 23:53:59
标题: 贪心输出近似:无需重新训练的LLMs高效结构修剪
摘要: 为了在不增加显著计算成本的情况下去除大型语言模型(LLMs)的冗余组件,本研究专注于一次性修剪而无需重新训练阶段。我们通过识别一个独立运行的深度2修剪结构,简化了基于Transformer的LLMs的修剪过程。此外,我们提出了两种基于推断的修剪标准,这些标准源自输出逼近的优化观点,优于传统的基于训练的指标,如梯度和Hessian。我们还引入了一种两步重建技术,以减轻修剪错误而无需重新训练模型。实验结果表明,我们的方法显著降低了计算成本和硬件要求,同时在各种数据集和模型中保持了优越的性能。
更新时间: 2024-07-26 23:53:59
领域: cs.AI
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
Large language model (LLM) platforms, such as ChatGPT, have recently begun offering an app ecosystem to interface with third-party services on the internet. While these apps extend the capabilities of LLM platforms, they are developed by arbitrary third parties and thus cannot be implicitly trusted. Apps also interface with LLM platforms and users using natural language, which can have imprecise interpretations. In this paper, we propose a framework that lays a foundation for LLM platform designers to analyze and improve the security, privacy, and safety of current and future third-party integrated LLM platforms. Our framework is a formulation of an attack taxonomy that is developed by iteratively exploring how LLM platform stakeholders could leverage their capabilities and responsibilities to mount attacks against each other. As part of our iterative process, we apply our framework in the context of OpenAI's plugin (apps) ecosystem. We uncover plugins that concretely demonstrate the potential for the types of issues that we outline in our attack taxonomy. We conclude by discussing novel challenges and by providing recommendations to improve the security, privacy, and safety of present and future LLM-based computing platforms.
Updated: 2024-07-26 23:50:35
标题: LLM平台安全性:将系统评估框架应用到OpenAI的ChatGPT插件
摘要: 大型语言模型(LLM)平台,如ChatGPT,最近开始提供一个应用程序生态系统,用于与互联网上的第三方服务进行接口。虽然这些应用程序扩展了LLM平台的功能,但它们是由任意第三方开发的,因此不能被隐式信任。应用程序还使用自然语言与LLM平台和用户进行接口,这可能会导致不精确的解释。在本文中,我们提出了一个框架,为LLM平台设计者分析和改进当前和未来集成第三方LLM平台的安全性、隐私性和安全性奠定了基础。我们的框架是一个攻击分类法的制定,通过反复探讨LLM平台利益相关者如何利用他们的能力和责任来对抗彼此来开发的。作为我们迭代过程的一部分,我们将我们的框架应用于OpenAI的插件(应用程序)生态系统。我们揭示了具体展示我们在攻击分类法中概述的问题类型潜力的插件。最后,我们讨论了新的挑战,并提出了改进现有和未来基于LLM的计算平台的安全性、隐私性和安全性的建议。
更新时间: 2024-07-26 23:50:35
领域: cs.CR,cs.AI,cs.CL,cs.CY,cs.LG
Binary Bleed: Fast Distributed and Parallel Method for Automatic Model Selection
In several Machine Learning (ML) clustering and dimensionality reduction approaches, such as non-negative matrix factorization (NMF), RESCAL, and K-Means clustering, users must select a hyper-parameter k to define the number of clusters or components that yield an ideal separation of samples or clean clusters. This selection, while difficult, is crucial to avoid overfitting or underfitting the data. Several ML applications use scoring methods (e.g., Silhouette and Davies Boulding scores) to evaluate the cluster pattern stability for a specific k. The score is calculated for different trials over a range of k, and the ideal k is heuristically selected as the value before the model starts overfitting, indicated by a drop or increase in the score resembling an elbow curve plot. While the grid-search method can be used to accurately find a good k value, visiting a range of k can become time-consuming and computationally resource-intensive. In this paper, we introduce the Binary Bleed method based on binary search, which significantly reduces the k search space for these grid-search ML algorithms by truncating the target k values from the search space using a heuristic with thresholding over the scores. Binary Bleed is designed to work with single-node serial, single-node multi-processing, and distributed computing resources. In our experiments, we demonstrate the reduced search space gain over a naive sequential search of the ideal k and the accuracy of the Binary Bleed in identifying the correct k for NMFk, K-Means pyDNMFk, and pyDRESCALk with Silhouette and Davies Boulding scores. We make our implementation of Binary Bleed for the NMF algorithm available on GitHub.
Updated: 2024-07-26 23:48:51
标题: 二进制泄漏:用于自动模型选择的快速分布式和并行方法
摘要: 在几种机器学习(ML)聚类和降维方法中,如非负矩阵分解(NMF)、RESCAL和K-Means聚类,用户必须选择一个超参数k来定义产生理想样本分离或干净聚类的簇或成分数量。尽管这种选择很困难,但对于避免过度拟合或拟合不足的数据是至关重要的。几种ML应用使用评分方法(如轮廓和戴维斯-布尔丁分数)来评估特定k的聚类模式稳定性。该分数是针对不同尝试下的一系列k计算的,并且理想的k是根据启动过度拟合的值之前的值启发式选择的,该值由分数下降或增加的情况表示,类似于肘部曲线图。虽然网格搜索方法可以用于准确找到一个好的k值,但访问一系列k可能会变得耗时和计算资源密集。在本文中,我们介绍了基于二分搜索的二进制渗出方法,通过使用启发式和分数阈值在搜索空间上截取目标k值,显著减少了这些网格搜索ML算法的k搜索空间。二进制渗出设计为与单节点串行、单节点多处理和分布式计算资源配合使用。在我们的实验中,我们展示了相对于简单的顺序搜索理想k的减少的搜索空间收益,以及二进制渗出在识别NMFk、K-Means pyDNMFk和pyDRESCALk的正确k时的准确性,这些都是根据轮廓和戴维斯-布尔丁分数进行评估。我们将我们为NMF算法实现的二进制渗出代码放在了GitHub上。
更新时间: 2024-07-26 23:48:51
领域: cs.DC,cs.AI,cs.PF
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.
Updated: 2024-07-26 23:24:39
标题: RAM-EHR: 电子健康记录上的检索增强与临床预测相结合
摘要: 我们提出了RAM-EHR,一个用于改善电子健康记录(EHR)上临床预测的检索增强管道。RAM-EHR首先收集多个知识源,将其转换为文本格式,并使用密集检索来获取与医学概念相关的信息。这一策略解决了与复杂概念名称相关的困难。然后,RAM-EHR通过与一致性正则化共同训练的本地EHR预测模型进行增强,以捕获来自患者就诊和总结知识的补充信息。对两个EHR数据集的实验证明了RAM-EHR相对于先前的知识增强基线的有效性(AUROC增益3.4%,AUPR增益7.2%),强调了RAM-EHR中总结知识对临床预测任务的有效性。该代码将发布在\url{https://github.com/ritaranx/RAM-EHR}。
更新时间: 2024-07-26 23:24:39
领域: cs.CL,cs.AI,cs.IR,q-bio.OT
Task Offloading in Fog Computing with Deep Reinforcement Learning: Future Research Directions Based on Security and Efficiency Enhancements
The surge in Internet of Things (IoT) devices and data generation highlights the limitations of traditional cloud computing in meeting demands for immediacy, Quality of Service, and location-aware services. Fog computing emerges as a solution, bringing computation, storage, and networking closer to data sources. This study explores the role of Deep Reinforcement Learning in enhancing fog computing's task offloading, aiming for operational efficiency and robust security. By reviewing current strategies and proposing future research directions, the paper shows the potential of Deep Reinforcement Learning in optimizing resource use, speeding up responses, and securing against vulnerabilities. It suggests advancing Deep Reinforcement Learning for fog computing, exploring blockchain for better security, and seeking energy-efficient models to improve the Internet of Things ecosystem. Incorporating artificial intelligence, our results indicate potential improvements in key metrics, such as task completion time, energy consumption, and security incident reduction. These findings provide a concrete foundation for future research and practical applications in optimizing fog computing architectures.
Updated: 2024-07-26 22:54:26
标题: 雾计算中的任务卸载与深度强化学习:基于安全性和效率增强的未来研究方向
摘要: 随着物联网设备和数据生成量的激增,传统云计算在满足即时性、服务质量和位置感知服务需求方面显示出了局限性。雾计算作为一种解决方案出现,将计算、存储和网络更靠近数据源。本研究探讨了深度强化学习在增强雾计算任务卸载中的作用,旨在提高运营效率和强化安全性。通过审查当前策略并提出未来研究方向,本文展示了深度强化学习在优化资源利用、加快响应速度和保护免受漏洞威胁方面的潜力。它建议推进深度强化学习在雾计算中的应用,探索区块链以提升安全性,并寻求能源高效模型以改善物联网生态系统。结合人工智能,我们的结果表明在关键指标(如任务完成时间、能源消耗和安全事件减少)方面存在潜在改进空间。这些发现为未来研究和优化雾计算架构的实际应用提供了具体基础。
更新时间: 2024-07-26 22:54:26
领域: cs.CR
Accuracy-Privacy Trade-off in the Mitigation of Membership Inference Attack in Federated Learning
Over the last few years, federated learning (FL) has emerged as a prominent method in machine learning, emphasizing privacy preservation by allowing multiple clients to collaboratively build a model while keeping their training data private. Despite this focus on privacy, FL models are susceptible to various attacks, including membership inference attacks (MIAs), posing a serious threat to data confidentiality. In a recent study, Rezaei \textit{et al.} revealed the existence of an accuracy-privacy trade-off in deep ensembles and proposed a few fusion strategies to overcome it. In this paper, we aim to explore the relationship between deep ensembles and FL. Specifically, we investigate whether confidence-based metrics derived from deep ensembles apply to FL and whether there is a trade-off between accuracy and privacy in FL with respect to MIA. Empirical investigations illustrate a lack of a non-monotonic correlation between the number of clients and the accuracy-privacy trade-off. By experimenting with different numbers of federated clients, datasets, and confidence-metric-based fusion strategies, we identify and analytically justify the clear existence of the accuracy-privacy trade-off.
Updated: 2024-07-26 22:44:41
标题: 在联邦学习中缓解成员推断攻击的准确性-隐私权衡
摘要: 在过去几年中,联邦学习(FL)作为机器学习中的一种突出方法出现,强调通过允许多个客户端共同构建模型并保持其训练数据私有性来保护隐私。尽管侧重于隐私,但FL模型容易受到各种攻击,包括成员推断攻击(MIAs),对数据保密性构成严重威胁。在最近的一项研究中,Rezaei等人揭示了深度集成中存在的准确性-隐私权衡,并提出了一些融合策略来克服这一问题。在本文中,我们旨在探讨深度集成和FL之间的关系。具体地,我们调查从深度集成中得出的基于置信度的度量是否适用于FL,以及在FL中是否存在准确性和隐私之间的权衡与MIA相关。实证研究表明客户端数量与准确性-隐私权衡之间缺乏非单调相关性。通过尝试不同数量的联邦客户端、数据集和基于置信度度量的融合策略,我们确定并分析证明了准确性-隐私权衡的明显存在。
更新时间: 2024-07-26 22:44:41
领域: cs.LG,cs.AI,cs.CR
Large Language Models as Co-Pilots for Causal Inference in Medical Studies
The validity of medical studies based on real-world clinical data, such as observational studies, depends on critical assumptions necessary for drawing causal conclusions about medical interventions. Many published studies are flawed because they violate these assumptions and entail biases such as residual confounding, selection bias, and misalignment between treatment and measurement times. Although researchers are aware of these pitfalls, they continue to occur because anticipating and addressing them in the context of a specific study can be challenging without a large, often unwieldy, interdisciplinary team with extensive expertise. To address this expertise gap, we explore the use of large language models (LLMs) as co-pilot tools to assist researchers in identifying study design flaws that undermine the validity of causal inferences. We propose a conceptual framework for LLMs as causal co-pilots that encode domain knowledge across various fields, engaging with researchers in natural language interactions to provide contextualized assistance in study design. We provide illustrative examples of how LLMs can function as causal co-pilots, propose a structured framework for their grounding in existing causal inference frameworks, and highlight the unique challenges and opportunities in adapting LLMs for reliable use in epidemiological research.
Updated: 2024-07-26 22:43:15
标题: 大型语言模型作为医学研究中因果推断的副驾驶
摘要: 基于真实世界临床数据的医学研究的有效性,如观察性研究,取决于绘制关于医疗干预的因果结论所必需的关键假设。许多发表的研究存在缺陷,因为它们违反这些假设,涉及残余混杂、选择偏差和治疗与测量时间不匹配等偏差。虽然研究人员意识到这些陷阱,但由于在特定研究背景下预期和解决它们可能具有挑战性,因此这些问题仍然存在,并且需要一个庞大、常常难以控制的跨学科团队,具有广泛的专业知识。为了弥补这种专业知识缺口,我们探讨了大型语言模型(LLMs)作为副驾驶工具的使用,以帮助研究人员识别研究设计缺陷,从而损害因果推断的有效性。我们提出了LLMs作为因果副驾驶的概念框架,该框架跨越各个领域的领域知识,通过自然语言交互与研究人员互动,提供研究设计的情境化辅助。我们提供了LLMs如何作为因果副驾驶的示例,提出了一个结构化框架,将它们与现有的因果推断框架相结合,并强调了将LLMs调整为可靠用于流行病学研究的独特挑战和机会。
更新时间: 2024-07-26 22:43:15
领域: cs.AI
Towards Scalable and Stable Parallelization of Nonlinear RNNs
Conventional nonlinear RNNs are not naturally parallelizable across the sequence length, whereas transformers and linear RNNs are. Lim et al. [2024] therefore tackle parallelized evaluation of nonlinear RNNs by posing it as a fixed point problem, solved with Newton's method. By deriving and applying a parallelized form of Newton's method, they achieve huge speedups over sequential evaluation. However, their approach inherits cubic computational complexity and numerical instability. We tackle these weaknesses. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably to full-Newton, use less memory, and are faster. To stabilize Newton's method, we leverage a connection between Newton's method damped with trust regions and Kalman smoothing. This connection allows us to stabilize Newtons method, per the trust region, while using efficient parallelized Kalman algorithms to retain performance. We compare these methods empirically, and highlight the use cases where each algorithm excels.
Updated: 2024-07-26 22:38:11
标题: 朝着可扩展和稳定的非线性RNN并行化方向前进
摘要: 传统的非线性RNN在序列长度上并不自然地可并行化,而transformers和线性RNN可以。因此,Lim等人[2024]通过将非线性RNN的并行化评估视为一个固定点问题,并利用牛顿法来解决。通过推导并应用牛顿法的并行化形式,他们实现了比顺序评估大幅加快的速度。然而,他们的方法继承了三次计算复杂度和数值不稳定性。我们解决了这些弱点。为了减少计算复杂度,我们应用准牛顿逼近,并展示它们与完全牛顿方法相比收敛性相当,使用更少的内存,并且更快。为了稳定牛顿法,我们利用牛顿法与信任区域阻尼和卡尔曼平滑之间的联系。这种联系使我们能够稳定牛顿法,根据信任区域,同时利用高效的并行化卡尔曼算法保持性能。我们通过实证比较这些方法,并突出每种算法擅长的用例。
更新时间: 2024-07-26 22:38:11
领域: cs.LG,I.2.6
To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling
Reference classes in healthcare establish healthy norms, such as pediatric growth charts of height and weight, and are used to chart deviations from these norms which represent potential clinical risk. How the demographics of the reference class influence clinical interpretation of deviations is unknown. Using normative modeling, a method for building reference classes, we evaluate the fairness (racial bias) in reference models of structural brain images that are widely used in psychiatry and neurology. We test whether including race in the model creates fairer models. We predict self-reported race using the deviation scores from three different reference class normative models, to better understand bias in an integrated, multivariate sense. Across all of these tasks, we uncover racial disparities that are not easily addressed with existing data or commonly used modeling techniques. Our work suggests that deviations from the norm could be due to demographic mismatch with the reference class, and assigning clinical meaning to these deviations should be done with caution. Our approach also suggests that acquiring more representative samples is an urgent research priority.
Updated: 2024-07-26 22:34:05
标题: 你属于哪个参考类别?用规范建模测量参考类别的种族公平性
摘要: 在医疗保健中,参考类别建立了健康规范,例如儿童身高和体重的生长图表,并用于记录与这些规范偏差,这些偏差代表潜在的临床风险。参考类别的人口统计资料如何影响偏差的临床解释尚不清楚。使用规范建模,一种构建参考类别的方法,我们评估了在精神病学和神经学中广泛使用的结构性脑图像参考模型中的公平性(种族偏见)。我们测试了在模型中包含种族是否会创建更公平的模型。我们使用来自三种不同参考类别规范模型的偏差分数来预测自报种族,以更好地理解综合、多变量意义上的偏见。在所有这些任务中,我们发现了不能轻易用现有数据或常用建模技术解决的种族差异。我们的工作表明,偏差可能是由于与参考类别的人口统计数据不匹配,因此应谨慎地将临床意义分配给这些偏差。我们的方法还表明,获取更具代表性的样本是一项紧急的研究重点。
更新时间: 2024-07-26 22:34:05
领域: cs.LG,cs.CV,cs.CY
GPT Deciphering Fedspeak: Quantifying Dissent Among Hawks and Doves
Markets and policymakers around the world hang on the consequential monetary policy decisions made by the Federal Open Market Committee (FOMC). Publicly available textual documentation of their meetings provides insight into members' attitudes about the economy. We use GPT-4 to quantify dissent among members on the topic of inflation. We find that transcripts and minutes reflect the diversity of member views about the macroeconomic outlook in a way that is lost or omitted from the public statements. In fact, diverging opinions that shed light upon the committee's "true" attitudes are almost entirely omitted from the final statements. Hence, we argue that forecasting FOMC sentiment based solely on statements will not sufficiently reflect dissent among the hawks and doves.
Updated: 2024-07-26 22:16:40
标题: GPT解析《联邦储备委员会讲话》:量化鹰派和鸽派之间的异议
摘要: 世界各国的市场和政策制定者都密切关注美联储公开市场委员会(FOMC)所做出的重要货币政策决策。他们会公开会议记录提供了成员对经济态势的看法。我们使用GPT-4来量化成员在通货膨胀问题上的分歧。我们发现,会议记录和会议纪要反映了成员对宏观经济前景的多元观点,而这些观点在公开声明中被遗漏或省略了。事实上,揭示委员会“真实”态度的不同意见几乎完全被遗漏在最终声明中。因此,我们认为,仅仅根据声明来预测FOMC的情绪将不足以反映鹰派和鸽派之间的分歧。
更新时间: 2024-07-26 22:16:40
领域: cs.AI
Surveys Considered Harmful? Reflecting on the Use of Surveys in AI Research, Development, and Governance
Calls for engagement with the public in Artificial Intelligence (AI) research, development, and governance are increasing, leading to the use of surveys to capture people's values, perceptions, and experiences related to AI. In this paper, we critically examine the state of human participant surveys associated with these topics. Through both a reflexive analysis of a survey pilot spanning six countries and a systematic literature review of 44 papers featuring public surveys related to AI, we explore prominent perspectives and methodological nuances associated with surveys to date. We find that public surveys on AI topics are vulnerable to specific Western knowledge, values, and assumptions in their design, including in their positioning of ethical concepts and societal values, lack sufficient critical discourse surrounding deployment strategies, and demonstrate inconsistent forms of transparency in their reporting. Based on our findings, we distill provocations and heuristic questions for our community, to recognize the limitations of surveys for meeting the goals of engagement, and to cultivate shared principles to design, deploy, and interpret surveys cautiously and responsibly.
Updated: 2024-07-26 22:10:49
标题: 调查有害吗?反思在人工智能研究、开发和治理中使用调查的作用
摘要: 随着人们对人工智能(AI)研究、开发和治理中与公众的参与呼声日益增加,调查用于捕捉人们与AI相关的价值观、观念和经验。本文对与这些话题相关的人类参与者调查的现状进行了批判性审视。通过对涵盖六个国家的调查试点的反思分析和对涉及AI的公众调查的44篇论文进行系统文献综述,我们探讨了与迄今调查相关的突出观点和方法论细微差别。我们发现,关于AI话题的公众调查在设计中容易受到特定西方知识、价值观和假设的影响,包括对伦理概念和社会价值的定位,缺乏足够的围绕部署策略的批判性讨论,并在报告中展示不一致的透明度形式。根据我们的发现,我们为我们的社区提炼出挑衅性和启发式问题,以认识到调查在实现参与目标方面的局限性,并培育共享原则,谨慎和负责地设计、部署和解释调查。
更新时间: 2024-07-26 22:10:49
领域: cs.CY,cs.AI,cs.HC
FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification
Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorates the overall FL performance. In this paper, we propose , a novel client update Approximation and Rectification algorithm for FL to address the client unavailability issue. FedAR can get all clients involved in the global model update to achieve a high-quality global model on the server, which also furnishes accurate predictions for each client. To this end, the server uses the latest update from each client as a surrogate for its current update. It then assigns a different weight to each client's surrogate update to derive the global model, in order to guarantee contributions from both available and unavailable clients. Our theoretical analysis proves that FedAR achieves optimal convergence rates on non-IID datasets for both convex and non-convex smooth loss functions. Extensive empirical studies show that FedAR comprehensively outperforms state-of-the-art FL baselines including FedAvg, MIFA, FedVARP and Scaffold in terms of the training loss, test accuracy, and bias mitigation. Moreover, FedAR also depicts impressive performance in the presence of a large number of clients with severe client unavailability.
Updated: 2024-07-26 21:56:52
标题: FedAR:使用本地更新近似和校正解决联邦学习中的客户端不可用问题
摘要: 联邦学习(FL)使客户能够在隐私保护的情况下,在服务器的协调下协作训练机器学习模型。FL面临的主要挑战之一是,由于客户资源限制和间歇性网络连接,服务器可能无法在每一轮中收到每个客户端的本地更新。不可用客户的存在严重损害了整体FL性能。在本文中,我们提出了一种新颖的客户端更新近似和矫正算法FedAR,以解决客户端不可用问题。FedAR可以让所有客户端参与全局模型更新,从而在服务器上实现高质量的全局模型,同时为每个客户端提供准确的预测。为此,服务器将每个客户端的最新更新用作其当前更新的替代品。然后,为了确保来自可用和不可用客户端的贡献,为每个客户端的替代更新分配不同的权重以推导全局模型。我们的理论分析证明,对于非IID数据集和凸凹平滑损失函数,FedAR实现了最佳收敛速率。大量的实证研究表明,FedAR在训练损失、测试准确性和偏差缓解方面全面优于包括FedAvg、MIFA、FedVARP和Scaffold在内的最先进的FL基线。此外,在存在大量客户端严重不可用的情况下,FedAR也表现出色。
更新时间: 2024-07-26 21:56:52
领域: cs.LG,cs.DC
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoRA in this context. We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm designed to enable parallel training of multiple low-rank heads across computing nodes, thereby reducing the need for frequent synchronization. Our approach includes extensive experimentation on vision transformers using various vision datasets, demonstrating that LTE is competitive with standard pre-training.
Updated: 2024-07-26 21:56:47
标题: 使用并行低秩适配器从零开始训练神经网络
摘要: 深度学习模型的可伸缩性在根本上受到计算资源、内存和通信的限制。尽管像低秩适应(LoRA)这样的方法已经降低了模型微调的成本,但其在模型预训练中的应用仍然未被充分探索。本文探讨了将LoRA扩展至模型预训练,识别了该上下文中标准LoRA的固有约束和限制。我们引入了LoRA-the-Explorer(LTE),这是一种新颖的双层优化算法,旨在实现跨计算节点并行训练多个低秩头部,从而减少频繁同步的需求。我们的方法在使用各种视觉数据集的视觉变换器上进行了大量实验,证明LTE在标准预训练中具有竞争力。
更新时间: 2024-07-26 21:56:47
领域: cs.LG,cs.AI,cs.CV
Cutting through the noise to motivate people: A comprehensive analysis of COVID-19 social media posts de/motivating vaccination
The COVID-19 pandemic exposed significant weaknesses in the healthcare information system. The overwhelming volume of misinformation on social media and other socioeconomic factors created extraordinary challenges to motivate people to take proper precautions and get vaccinated. In this context, our work explored a novel direction by analyzing an extensive dataset collected over two years, identifying the topics de/motivating the public about COVID-19 vaccination. We analyzed these topics based on time, geographic location, and political orientation. We noticed that while the motivating topics remain the same over time and geographic location, the demotivating topics change rapidly. We also identified that intrinsic motivation, rather than external mandate, is more advantageous to inspire the public. This study addresses scientific communication and public motivation in social media. It can help public health officials, policymakers, and social media platforms develop more effective messaging strategies to cut through the noise of misinformation and educate the public about scientific findings.
Updated: 2024-07-26 21:51:19
标题: 穿越噪音激励人们:COVID-19社交媒体帖子激励/阻止接种疫苗的综合分析
摘要: COVID-19大流行暴露了医疗信息系统中的重大弱点。社交媒体上大量虚假信息以及其他社会经济因素给人们带来了巨大挑战,激励他们采取适当的预防措施并接种疫苗。在这种背景下,我们的工作通过分析两年收集的大量数据,探讨了一个新颖的方向,识别了影响公众对COVID-19疫苗态度的话题。我们根据时间、地理位置和政治取向分析了这些话题。我们注意到,尽管激励话题随着时间和地理位置保持不变,但贬低话题变化迅速。我们还发现,内在动机比外在指导更有利于激励公众。这项研究涉及社交媒体中科学传播和公众动机。它可以帮助公共卫生官员、政策制定者和社交媒体平台制定更有效的信息传递策略,以打破虚假信息的噪音,教育公众了解科学发现。
更新时间: 2024-07-26 21:51:19
领域: cs.CY,cs.CL,cs.LG,cs.SI
On Consistency of Signature Using Lasso
Signatures are iterated path integrals of continuous and discrete-time processes, and their universal nonlinearity linearizes the problem of feature selection in time series data analysis. This paper studies the consistency of signature using Lasso regression, both theoretically and numerically. We establish conditions under which the Lasso regression is consistent both asymptotically and in finite sample. Furthermore, we show that the Lasso regression is more consistent with the It\^o signature for time series and processes that are closer to the Brownian motion and with weaker inter-dimensional correlations, while it is more consistent with the Stratonovich signature for mean-reverting time series and processes. We demonstrate that signature can be applied to learn nonlinear functions and option prices with high accuracy, and the performance depends on properties of the underlying process and the choice of the signature.
Updated: 2024-07-26 21:29:58
标题: 关于使用Lasso算法的签名一致性
摘要: 签名是连续和离散时间过程的迭代路径积分,它们的普遍非线性使时间序列数据分析中的特征选择问题线性化。本文从理论和数值两方面研究了使用Lasso回归对签名的一致性。我们建立了Lasso回归在渐近和有限样本中均一致的条件。此外,我们展示了对于接近布朗运动且具有较弱跨维度相关性的时间序列和过程,Lasso回归与It\^o签名更为一致,而对于均值回归时间序列和过程,更一致于Stratonovich签名。我们证明签名可以应用于学习非线性函数和高精度的期权价格,且性能取决于基础过程的特性和签名的选择。
更新时间: 2024-07-26 21:29:58
领域: stat.ML,cs.LG,math.ST,stat.AP,stat.TH
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policymakers are tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI. In this work we develop a method for using large language models (LLMs) to evaluate the efficacy of a given piece of policy at mitigating specified negative impacts. We do so by using GPT-4 to generate scenarios both pre- and post-introduction of policy and translating these vivid stories into metrics based on human perceptions of impacts. We leverage an already established taxonomy of impacts of generative AI in the media environment to generate a set of scenario pairs both mitigated and non-mitigated by the transparency policy in Article 50 of the EU AI Act. We then run a user study (n=234) to evaluate these scenarios across four risk-assessment dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. We find that this transparency legislation is perceived to be effective at mitigating harms in areas such as labor and well-being, but largely ineffective in areas such as social cohesion and security. Through this case study we demonstrate the efficacy of our method as a tool to iterate on the effectiveness of policy for mitigating various negative impacts. We expect this method to be useful to researchers or other stakeholders who want to brainstorm the potential utility of different pieces of policy or other mitigation strategies.
Updated: 2024-07-26 21:23:14
标题: 模拟政策影响:开发一种生成场景撰写方法,以评估监管的感知效果
摘要: 人工智能技术的快速发展对个人和社会产生了众多未来影响。决策者的任务是迅速做出反应,并制定能够减轻这些影响的政策。然而,预测政策的有效性是一项困难的任务,因为一些影响可能只能在未来观察到,相应的政策可能并不适用于未来人工智能的发展。在这项工作中,我们开发了一种使用大型语言模型(LLMs)来评估给定政策在减轻指定负面影响方面的有效性的方法。我们通过使用GPT-4生成政策引入前后的情景,并将这些生动故事转化为基于人类感知影响的指标来实现这一目的。我们利用媒体环境中生成式人工智能影响的已建立分类法,生成一组由《欧盟人工智能法案》第50条透明度政策减轻和未减轻的情景对。然后,我们进行了一项用户研究(n=234),评估这些情景在四个风险评估维度上的表现:严重性、可信度、影响程度和对弱势群体的特定性。我们发现,这项透明度立法被认为在减轻劳动和福祉等领域的危害方面是有效的,但在社会凝聚力和安全等领域基本无效。通过这个案例研究,我们展示了我们的方法作为评估各种负面影响减轻政策有效性的工具的效力。我们期望这种方法对希望思考不同政策或其他减轻策略的潜在效用的研究人员或其他利益相关者有用。
更新时间: 2024-07-26 21:23:14
领域: cs.CL,cs.AI
NARVis: Neural Accelerated Rendering for Real-Time Scientific Point Cloud Visualization
Exploring scientific datasets with billions of samples in real-time visualization presents a challenge - balancing high-fidelity rendering with speed. This work introduces a novel renderer - Neural Accelerated Renderer (NAR), that uses the neural deferred rendering framework to visualize large-scale scientific point cloud data. NAR augments a real-time point cloud rendering pipeline with high-quality neural post-processing, making the approach ideal for interactive visualization at scale. Specifically, we train a neural network to learn the point cloud geometry from a high-performance multi-stream rasterizer and capture the desired postprocessing effects from a conventional high-quality renderer. We demonstrate the effectiveness of NAR by visualizing complex multidimensional Lagrangian flow fields and photometric scans of a large terrain and compare the renderings against the state-of-the-art high-quality renderers. Through extensive evaluation, we demonstrate that NAR prioritizes speed and scalability while retaining high visual fidelity. We achieve competitive frame rates of $>$ 126 fps for interactive rendering of $>$ 350M points (i.e., an effective throughput of $>$ 44 billion points per second) using $\sim$12 GB of memory on RTX 2080 Ti GPU. Furthermore, we show that NAR is generalizable across different point clouds with similar visualization needs and the desired post-processing effects could be obtained with substantial high quality even at lower resolutions of the original point cloud, further reducing the memory requirements.
Updated: 2024-07-26 21:21:13
标题: NARVis:用于实时科学点云可视化的神经加速渲染
摘要: 使用神经加速渲染器(NAR)来实时可视化拥有数十亿样本的科学数据集是一个挑战,需要平衡高保真度渲染和速度。本文介绍了一种新型渲染器 - 神经加速渲染器(NAR),它使用神经延迟渲染框架来可视化大规模科学点云数据。NAR通过高质量的神经后处理增强了实时点云渲染管线,使该方法非常适合大规模交互式可视化。具体来说,我们训练一个神经网络来从高性能多流光栅化器中学习点云几何,并从传统高质量渲染器中捕获所需的后处理效果。我们通过可视化复杂的多维拉格朗日流场和大型地形的光度扫描来展示NAR的有效性,并将渲染结果与最先进的高质量渲染器进行比较。通过广泛的评估,我们证明NAR在保持高视觉保真度的同时优先考虑速度和可扩展性。我们使用RTX 2080 Ti GPU的约12 GB内存实现了竞争性的帧率,即每秒超过126帧的交互式渲染超过350M点(即每秒超过44亿点的有效吞吐量)。此外,我们展示了NAR在不同点云之间具有普适性,具有类似可视化需求,并且即使在原始点云的较低分辨率下,也可以获得相当高质量的所需后处理效果,进一步降低了内存需求。
更新时间: 2024-07-26 21:21:13
领域: cs.GR,cs.CV,cs.HC,cs.LG
Solving Robotics Problems in Zero-Shot with Vision-Language Models
We introduce Wonderful Team, a multi-agent visual LLM (VLLM) framework for solving robotics problems in the zero-shot regime. By zero-shot we mean that, for a novel environment, we feed a VLLM an image of the robot's environment and a description of the task, and have the VLLM output the sequence of actions necessary for the robot to complete the task. Prior work on VLLMs in robotics has largely focused on settings where some part of the pipeline is fine-tuned, such as tuning an LLM on robot data or training a separate vision encoder for perception and action generation. Surprisingly, due to recent advances in the capabilities of VLLMs, this type of fine-tuning may no longer be necessary for many tasks. In this work, we show that with careful engineering, we can prompt a single off-the-shelf VLLM to handle all aspects of a robotics task, from high-level planning to low-level location-extraction and action-execution. Wonderful Team builds on recent advances in multi-agent LLMs to partition tasks across an agent hierarchy, making it self-corrective and able to effectively partition and solve even long-horizon tasks. Extensive experiments on VIMABench and real-world robotic environments demonstrate the system's capability to handle a variety of robotic tasks, including manipulation, visual goal-reaching, and visual reasoning, all in a zero-shot manner. These results underscore a key point: vision-language models have progressed rapidly in the past year, and should strongly be considered as a backbone for robotics problems going forward.
Updated: 2024-07-26 21:18:57
标题: 使用视觉-语言模型在零样本情况下解决机器人问题
摘要: 我们介绍了Wonderful Team,这是一个用于在零样本情况下解决机器人问题的多智能体视觉LLM(VLLM)框架。零样本意味着对于一个新环境,我们向VLLM提供机器人环境的图像和任务描述,并让VLLM输出完成任务所需的动作序列。先前在机器人领域的VLLM工作主要集中在对管道的某些部分进行微调,例如在机器人数据上微调LLM或训练一个单独的视觉编码器用于感知和动作生成。令人惊讶的是,由于VLLM能力的最新进展,对于许多任务来说,这种微调可能不再是必要的。在这项工作中,我们展示了通过仔细的工程设计,我们可以促使一个现成的VLLM处理机器人任务的所有方面,从高级规划到低级位置提取和动作执行。Wonderful Team借鉴了多智能体LLM的最新进展,将任务分配给智能体层次结构,使其具有自我纠正能力,并能有效地分割和解决甚至长期任务。在VIMABench和现实世界的机器人环境上进行了大量实验,展示了该系统处理各种机器人任务的能力,包括操作、视觉目标达成和视觉推理,都是以零样本方式进行。这些结果强调了一个关键点:视觉语言模型在过去一年中取得了快速进展,应该被视为未来解决机器人问题的骨干。
更新时间: 2024-07-26 21:18:57
领域: cs.AI,cs.RO
Boosted generalized normal distributions: Integrating machine learning with operations knowledge
Applications of machine learning (ML) techniques to operational settings often face two challenges: i) ML methods mostly provide point predictions whereas many operational problems require distributional information; and ii) They typically do not incorporate the extensive body of knowledge in the operations literature, particularly the theoretical and empirical findings that characterize specific distributions. We introduce a novel and rigorous methodology, the Boosted Generalized Normal Distribution ($b$GND), to address these challenges. The Generalized Normal Distribution (GND) encompasses a wide range of parametric distributions commonly encountered in operations, and $b$GND leverages gradient boosting with tree learners to flexibly estimate the parameters of the GND as functions of covariates. We establish $b$GND's statistical consistency, thereby extending this key property to special cases studied in the ML literature that lacked such guarantees. Using data from a large academic emergency department in the United States, we show that the distributional forecasting of patient wait and service times can be meaningfully improved by leveraging findings from the healthcare operations literature. Specifically, $b$GND performs 6% and 9% better than the distribution-agnostic ML benchmark used to forecast wait and service times respectively. Further analysis suggests that these improvements translate into a 9% increase in patient satisfaction and a 4% reduction in mortality for myocardial infarction patients. Our work underscores the importance of integrating ML with operations knowledge to enhance distributional forecasts.
Updated: 2024-07-26 21:18:26
标题: 增强的广义正态分布:将机器学习与运营知识整合在一起
摘要: 将机器学习(ML)技术应用于运营环境通常面临两个挑战:i)ML方法主要提供点预测,而许多运营问题需要分布信息;和ii)它们通常不包括运营文献中广泛的知识体系,特别是表征特定分布的理论和经验发现。我们引入了一种新颖严密的方法论,Boosted Generalized Normal Distribution($b$GND),来解决这些挑战。广义正态分布(GND)涵盖了在运营中常见的一系列参数分布,而$b$GND利用了梯度提升和树学习器来灵活估计GND的参数作为协变量的函数。我们证明了$b$GND的统计一致性,从而将这一关键属性扩展到ML文献中缺乏此类保证的特殊案例。利用美国一家大型学术急诊科数据,我们展示了通过利用医疗运营文献中的发现,可以显著改善患者等候和服务时间的分布预测。具体而言,$b$GND分别比用于预测等候和服务时间的分布不可知的ML基准表现出6%和9%的改进。进一步分析表明,这些改进转化为患者满意度的增加9%和心肌梗死患者死亡率的降低4%。我们的工作强调了将ML与运营知识整合以增强分布预测的重要性。
更新时间: 2024-07-26 21:18:26
领域: cs.LG,stat.ME,stat.ML,60E05, 62G07, 62F99, 68T01, 90B22, 90B50
Many-Shot In-Context Learning for Molecular Inverse Design
Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL) for a variety of generative and discriminative chemical design tasks. The newly expanded context windows of LLMs can further improve ICL capabilities for molecular inverse design and lead optimization. To take full advantage of these capabilities we developed a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL. Our approach involves iterative inclusion of LLM generated molecules with high predicted performance, along with experimental data. We further integrated our method in a multi-modal LLM which allows for the interactive modification of generated molecular structures using text instructions. As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.
Updated: 2024-07-26 21:10:50
标题: 许多拍摄情境学习用于分子逆设计
摘要: 大型语言模型(LLMs)在各种生成性和判别性化学设计任务的少样本上下文学习(ICL)中表现出很好的性能。LLMs的新扩展上下文窗口可以进一步提高分子逆向设计和引导优化的ICL能力。为了充分利用这些能力,我们开发了一种新的半监督学习方法,克服了许多样本ICL中可用实验数据的缺乏。我们的方法涉及迭代地包含LLM生成的预测性能高的分子,以及实验数据。我们进一步将我们的方法集成到一个多模式LLM中,允许使用文本指令对生成的分子结构进行交互修改。正如我们展示的那样,这种新方法在分子设计方面大大改进了现有的ICL方法,同时对科学家来说易于获取和使用。
更新时间: 2024-07-26 21:10:50
领域: cs.CL,cs.AI
Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data (Sup3rWind) and Application to Ukraine
With an increasing share of the electricity grid relying on wind to provide generating capacity and energy, there is an expanding global need for historically accurate high-resolution wind data. Conventional downscaling methods for generating these data have a high computational burden and require extensive tuning for historical accuracy. In this work, we present a novel deep learning-based spatiotemporal downscaling method, using generative adversarial networks (GANs), for generating historically accurate high-resolution wind resource data from the European Centre for Medium-Range Weather Forecasting Reanalysis version 5 data (ERA5). We achieve results comparable in historical accuracy and spatiotemporal variability to conventional downscaling by training a GAN model with ERA5 low-resolution input and high-resolution targets from the Wind Integration National Dataset, while reducing computational costs over dynamical downscaling by two orders of magnitude. Spatiotemporal cross-validation shows low error and high correlations with observations and excellent agreement with holdout data across distributions of physical metrics. We apply this approach to downscale 30-km hourly ERA5 data to 2-km 5-minute wind data for January 2000 through December 2023 at multiple hub heights over Eastern Europe. Uncertainty is estimated over the period with observational data by additionally downscaling the members of the European Centre for Medium-Range Weather Forecasting Ensemble of Data Assimilations. Comparisons against observational data from the Meteorological Assimilation Data Ingest System and multiple wind farms show comparable performance to the CONUS validation. This 24-year data record is the first member of the super resolution for renewable energy resource data with wind from reanalysis data dataset (Sup3rWind).
Updated: 2024-07-26 21:07:17
标题: 超分辨率技术在风能再分析数据中的应用(Sup3rWind)及其在乌克兰的应用
摘要: 随着电网越来越依赖风力发电容量和能源的增加,全球对历史准确高分辨率风能数据的需求不断扩大。传统的下尺度方法生成这些数据具有很高的计算负担,并且需要广泛调整以确保历史准确性。在这项工作中,我们提出了一种新颖的基于深度学习的时空下尺度方法,使用生成对抗网络(GANs)从欧洲中期天气预报中心重新分析版本5数据(ERA5)生成历史准确的高分辨率风资源数据。通过训练一个GAN模型,将ERA5低分辨率输入和来自风能整合国家数据集的高分辨率目标进行比较,我们实现了与传统下尺度相当的历史准确性和时空变异性,同时将计算成本降低了两个数量级。时空交叉验证显示出低误差和与观测数据的高相关性,与保留数据在物理指标分布上表现出良好一致性。我们将这种方法应用于将30公里每小时的ERA5数据下尺度到东欧的多个中心高度的2公里5分钟风数据,时间跨越从2000年1月到2023年12月。通过额外将欧洲中期天气预报中心数据同化集合中的成员进行下尺度,利用观测数据估计了不确定性。与气象同化数据摄入系统和多个风电场的观测数据进行比较,表现与CONUS验证相当。这个24年数据记录是第一个来自重新分析数据的超高分辨率可再生能源资源数据的成员(Sup3rWind)。
更新时间: 2024-07-26 21:07:17
领域: physics.ao-ph,cs.LG
Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network
Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not support inference time reconstruction quality assessment, as coordinate-level errors cannot be evaluated in the absence of ground truth data. We propose a parameter-efficient multi-decoder SRN (MDSRN) ensemble architecture consisting of a shared feature grid with multiple lightweight multi-layer perceptron decoders. MDSRN can generate a set of plausible predictions for a given input coordinate to compute the mean as the prediction of the multi-decoder ensemble and the variance as a confidence score. The coordinate-level variance can be rendered along with the data to inform the reconstruction quality, or be integrated into uncertainty-aware volume visualization algorithms. To prevent the misalignment between the quantified variance and the prediction quality, we propose a novel variance regularization loss for ensemble learning that promotes the Regularized multi-decoder SRN (RMDSRN) to obtain a more reliable variance that correlates closely to the true model error. We comprehensively evaluate the quality of variance quantification and data reconstruction of Monte Carlo Dropout, Mean Field Variational Inference, Deep Ensemble, and Predicting Variance compared to the proposed MDSRN and RMDSRN across diverse scalar field datasets. We demonstrate that RMDSRN attains the most accurate data reconstruction and competitive variance-error correlation among uncertain SRNs under the same neural network parameter budgets.
Updated: 2024-07-26 21:02:11
标题: 正则化多解码器集成用于错误感知场景表示网络
摘要: 特征格场景表示网络(SRNs)已被应用于科学数据,作为紧凑的功能替代品用于分析和可视化。由于SRNs是黑盒丢失数据表示,评估预测质量对于科学可视化应用至关重要,以确保科学家可以信任被可视化的信息。目前,现有的架构不支持推断时间重建质量评估,因为在没有地面实况数据的情况下无法评估坐标级误差。我们提出了一个参数高效的多解码器SRN(MDSRN)集成架构,由共享特征网格和多个轻量级多层感知器解码器组成。MDSRN可以为给定输入坐标生成一组合理的预测,计算均值作为多解码器集成的预测,方差作为置信度评分。坐标级方差可以与数据一起呈现以通知重建质量,或者集成到具有不确定性感知的体积可视化算法中。为防止定量化方差与预测质量之间的不一致,我们提出了一种新颖的用于集成学习的方差正则化损失,以促进正则化多解码器SRN(RMDSRN)获得更可靠的与真实模型误差密切相关的方差。我们全面评估了蒙特卡罗辍学、均值场变分推断、深度集成和预测方差与提出的MDSRN和RMDSRN在不同标量场数据集上的方差量化和数据重建质量。我们展示了在相同神经网络参数预算下,RMDSRN实现了最准确的数据重建和竞争性的方差-误差相关性,是不确定SRNs中最可靠的选择。
更新时间: 2024-07-26 21:02:11
领域: cs.LG,cs.AI,cs.CV,cs.GR,cs.HC
Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning
Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization procedure to automate budget decision-making for cities, relying on feature store, model training and serving, optimizers, and backtesting; proposing state-of-the-art deep learning (DL) estimator based on S-Learner and a novel tensor B-Spline regression model, we solve high-dimensional optimization with ADMM and primal-dual interior point convex optimization, substantially improving Uber's resource allocation efficiency.
Updated: 2024-07-26 20:51:37
标题: 优步使用因果推断机器学习进行实用市场优化
摘要: Uber长期以来一直面临市场杠杆的预算分配挑战,例如司机的激励和乘客的促销活动;理解杠杆预算变化的影响并估算成本效率以实现预定预算至关重要,目标是最大化业务价值的最佳分配;我们引入了一种端到端的机器学习和优化程序,用于自动化城市预算决策,依赖于特征存储、模型训练与服务、优化器和回测;提出了基于S-Learner和一种新颖的张量B样条回归模型的最先进深度学习(DL)估计器,我们使用ADMM和原始-对偶内点凸优化解决高维优化问题,大幅改善了Uber的资源分配效率。
更新时间: 2024-07-26 20:51:37
领域: cs.LG,stat.ML,62J99
On the Conflict of Robustness and Learning in Collaborative Machine Learning
Collaborative Machine Learning (CML) allows participants to jointly train a machine learning model while keeping their training data private. In many scenarios where CML is seen as the solution to privacy issues, such as health-related applications, safety is also a primary concern. To ensure that CML processes produce models that output correct and reliable decisions \emph{even in the presence of potentially untrusted participants}, researchers propose to use \textit{robust aggregators} to filter out malicious contributions that negatively influence the training process. In this work, we formalize the two prevalent forms of robust aggregators in the literature. We then show that neither can provide the intended protection: either they use distance-based metrics that cannot reliably identify malicious inputs to training; or use metrics based on the behavior of the loss function which create a conflict with the ability of CML participants to learn, i.e., they cannot eliminate the risk of compromise without preventing learning.
Updated: 2024-07-26 20:29:44
标题: 关于协作机器学习中健壮性与学习的冲突
摘要: 协作机器学习(CML)允许参与者共同训练一个机器学习模型,同时保持他们的训练数据私密。在许多场景中,CML被视为解决隐私问题的方案,如与健康相关的应用,安全性也是一个主要关注点。为了确保CML过程产生的模型能够输出正确和可靠的决策,甚至在存在潜在不受信任的参与者的情况下,研究人员提议使用稳健的聚合器来过滤出恶意贡献,从而消极影响训练过程。在这项工作中,我们在文献中形式化了两种流行的稳健聚合器形式。然后我们展示,这两种方法都无法提供预期的保护:要么它们使用基于距离的度量,无法可靠地识别恶意输入到训练中;或者使用基于损失函数行为的度量,这会与CML参与者学习的能力产生冲突,即它们无法在防止学习的情况下消除妥协的风险。
更新时间: 2024-07-26 20:29:44
领域: cs.LG,cs.CR
RobustNeRF: Ignoring Distractors with Robust Losses
Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and real-world scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyper-parameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than pre-processing or modeling transient objects. More results on our page https://robustnerf.github.io.
Updated: 2024-07-26 19:34:31
标题: RobustNeRF: 使用稳健损失忽略干扰因素
摘要: 神经辐射场(NeRF)在给定多视角、校准图像的静态场景时表现出色。当场景包含分心因素时(在图像捕获过程中不持续存在的移动物体、光照变化、阴影),会出现视角相关效果或“漂浮物”等伪影。为了应对分心因素,我们提倡一种用于NeRF训练的鲁棒估计方法,将训练数据中的分心因素建模为优化问题的异常值。我们的方法成功地从场景中去除了异常值,并在合成和真实场景上优于我们的基线。我们的技术简单地融入现代NeRF框架中,只需少量超参数。它不假设对分心因素的类型有先验知识,而是专注于优化问题而非预处理或建模瞬时对象。更多结果请查看我们的页面https://robustnerf.github.io。
更新时间: 2024-07-26 19:34:31
领域: cs.CV,cs.LG
Mathematical Programming Algorithms for Convex Hull Approximation with a Hyperplane Budget
We consider the following problem in computational geometry: given, in the d-dimensional real space, a set of points marked as positive and a set of points marked as negative, such that the convex hull of the positive set does not intersect the negative set, find K hyperplanes that separate, if possible, all the positive points from the negative ones. That is, we search for a convex polyhedron with at most K faces, containing all the positive points and no negative point. The problem is known in the literature for pure convex polyhedral approximation; our interest stems from its possible applications in constraint learning, where points are feasible or infeasible solutions of a Mixed Integer Program, and the K hyperplanes are linear constraints to be found. We cast the problem as an optimization one, minimizing the number of negative points inside the convex polyhedron, whenever exact separation cannot be achieved. We introduce models inspired by support vector machines and we design two mathematical programming formulations with binary variables. We exploit Dantzig-Wolfe decomposition to obtain extended formulations, and we devise column generation algorithms with ad-hoc pricing routines. We compare computing time and separation error values obtained by all our approaches on synthetic datasets, with number of points from hundreds up to a few thousands, showing our approaches to perform better than existing ones from the literature. Furthermore, we observe that key computational differences arise, depending on whether the budget K is sufficient to completely separate the positive points from the negative ones or not. On 8-dimensional instances (and over), existing convex hull algorithms become computational inapplicable, while our algorithms allow to identify good convex hull approximations in minutes of computation.
Updated: 2024-07-26 19:34:11
标题: 数学规划算法用于带有超平面预算的凸壳近似
摘要: 我们考虑计算几何中的以下问题:在d维实数空间中,给定一组标记为正的点和一组标记为负的点,使得正集的凸包不与负集相交,如果可能的话,找到K个超平面,将所有正点与负点分开。也就是说,我们寻找一个最多有K个面的凸多面体,包含所有正点但不包含任何负点。这个问题在文献中已知为纯凸多面体逼近问题;我们的兴趣源自其在约束学习中的潜在应用,其中点是混合整数规划的可行或不可行解,而K个超平面是要找到的线性约束。我们将问题构建为一个优化问题,当无法实现精确分离时,最小化凸多面体内负点的数量。我们提出了受支持向量机启发的模型,并设计了两种带有二进制变量的数学规划形式。我们利用Dantzig-Wolfe分解获得扩展形式,并设计了具有特定定价程序的列生成算法。我们在合成数据集上比较了所有方法的计算时间和分离误差值,点数从数百个到数千个不等,结果表明我们的方法比文献中现有方法表现更好。此外,我们观察到,关键的计算差异取决于预算K是否足以完全将正点与负点分开。在8维实例(及以上),现有的凸包算法变得计算上不适用,而我们的算法能够在几分钟内识别出良好的凸包逼近。
更新时间: 2024-07-26 19:34:11
领域: math.OC,cs.CG,cs.LG
Effective Large Language Model Debugging with Best-first Tree Search
Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for programmers and it enables iterative code refinement towards a correct implementation. In this work, we propose a novel algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes. Our key contributions are 1) a best-first tree search algorithm with self-reflections (BESTER) that achieves state-of-the-art Pass@1 in three code generation benchmarks. BESTER maintains its superiority when we measure pass rates taking into account additional inference costs incurred by tree search. 2) A novel interpretability study on what self-reflections attend to in buggy programs and how they impact bug fixes, which provides a deeper understanding of the debugging process. 3) An extensive study on when self-reflections are effective in finding bugs.
Updated: 2024-07-26 19:26:00
标题: 使用最佳优先树搜索进行有效的大型语言模型调试
摘要: 大型语言模型(LLMs)在代码生成任务中表现出了潜力。然而,它们的编写代码能力往往范围有限:虽然它们可以成功实现简单功能,但在处理更复杂任务时往往遇到困难。与人类程序员相比,LLM编写代码的一个基本区别是它不能始终准确地发现和修复错误。调试是程序员的关键技能,它使得逐步代码完善向正确实现迭代。在这项工作中,我们提出了一种新颖的算法,通过自我反思和搜索,使LLMs能够调试其代码,模型尝试识别先前的错误。我们的主要贡献有:1)一种带有自我反思的最佳优先树搜索算法(BESTER),在三个代码生成基准测试中实现了最先进的Pass@1。当我们考虑树搜索产生的额外推理成本时,BESTER保持其优越性。2)对有错误程序中自我反思关注的新颖可解释性研究,以及它们如何影响错误修复,从而更深入地理解调试过程。3)在何时自我反思有效地发现错误的广泛研究。
更新时间: 2024-07-26 19:26:00
领域: cs.SE,cs.AI,cs.LG
Flusion: Integrating multiple data sources for accurate influenza predictions
Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble that combines gradient boosting quantile regression models with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only the target signal; all models were trained jointly on data for multiple locations. Flusion was the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and locations. These results indicate the value of sharing information across locations and surveillance signals, especially when doing so adds to the pool of available training data.
Updated: 2024-07-26 19:24:02
标题: Flusion:整合多个数据源以获得准确的流感预测
摘要: 在过去的十年里,美国疾病控制和预防中心(CDC)组织了一年一度的流感预测挑战,其动机是准确的概率预测可以提高态势认识,并产生更有效的公共卫生行动。从2021/22年流感季开始,该挑战的预测目标基于CDC的国家医疗安全网络(NHSN)监测系统报告的住院人数。通过NHSN报告的流感住院情况是在过去几年内开始的,因此仅有有限的历史数据可用于此信号。为了在目标监测系统数据有限的情况下制作预测,我们使用了两个具有较长历史记录的信号来增加这些数据:1)ILI+,用于估计门诊就诊中患者患有流感的比例;和2)在一组选定的医疗机构中实验室确诊的流感住院率。我们的模型Flusion是一个集成模型,将梯度提升分位回归模型与贝叶斯自回归模型结合在一起。梯度提升模型是基于所有三个数据信号进行训练的,而自回归模型是仅基于目标信号进行训练的;所有模型都是基于多个位置的数据共同训练的。Flusion是2023/24季度CDC流感预测挑战中表现最佳的模型。在本文中,我们调查了导致Flusion成功的因素,并发现其强劲表现主要是由于使用了一个梯度提升模型,该模型是基于多个监测信号和位置的数据共同训练的。这些结果表明,在跨地点和监测信号共享信息的情况下,特别是当这样做可以增加可用训练数据池时,有价值。
更新时间: 2024-07-26 19:24:02
领域: stat.ML,cs.LG,q-bio.PE,stat.AP
Actively Learning Combinatorial Optimization Using a Membership Oracle
We consider solving a combinatorial optimization problem with an unknown linear constraint using a membership oracle that, given a solution, determines whether it is feasible or infeasible with absolute certainty. The goal of the decision maker is to find the best possible solution subject to a budget on the number of oracle calls. Inspired by active learning based on Support Vector Machines (SVMs), we adapt a classical framework in order to solve the problem by learning and exploiting a surrogate linear constraint. The resulting new framework includes training a linear separator on the labeled points and selecting new points to be labeled, which is achieved by applying a sampling strategy and solving a 0-1 integer linear program. Following the active learning literature, one can consider using SVM as a linear classifier and the information-based sampling strategy known as Simple margin. We improve on both sides: we propose an alternative sampling strategy based on mixed-integer quadratic programming and a linear separation method inspired by an algorithm for convex optimization in the oracle model. We conduct experiments on the pure knapsack problem and on a college study plan problem from the literature to show how different linear separation methods and sampling strategies influence the quality of the results in terms of objective value.
Updated: 2024-07-26 19:14:26
标题: 使用会员Oracle积极学习组合优化
摘要: 我们考虑使用一个成员资格Oracle来解决一个具有未知线性约束的组合优化问题,该Oracle在给定解决方案的情况下,确定其是否可行或不可行的绝对确定性。决策者的目标是在调用Oracle的次数受限的情况下找到最佳解决方案。受基于支持向量机(SVM)的主动学习启发,我们调整了一个经典框架,通过学习和利用一个替代线性约束来解决问题。新的框架包括在标记点上训练一个线性分离器,并选择要标记的新点,这是通过应用采样策略和解决0-1整数线性程序实现的。根据主动学习文献,可以考虑使用SVM作为线性分类器和已知为简单边界的信息性采样策略。我们在两方面提出改进:基于混合整数二次规划的替代采样策略以及受Oracle模型中凸优化算法启发的线性分割方法。我们对纯背包问题和文献中的大学学习计划问题进行实验,以展示不同线性分割方法和采样策略如何影响结果的目标值质量。
更新时间: 2024-07-26 19:14:26
领域: cs.LG,math.OC
Towards a Transformer-Based Pre-trained Model for IoT Traffic Classification
The classification of IoT traffic is important to improve the efficiency and security of IoT-based networks. As the state-of-the-art classification methods are based on Deep Learning, most of the current results require a large amount of data to be trained. Thereby, in real-life situations, where there is a scarce amount of IoT traffic data, the models would not perform so well. Consequently, these models underperform outside their initial training conditions and fail to capture the complex characteristics of network traffic, rendering them inefficient and unreliable in real-world applications. In this paper, we propose IoT Traffic Classification Transformer (ITCT), a novel approach that utilizes the state-of-the-art transformer-based model named TabTransformer. ITCT, which is pre-trained on a large labeled MQTT-based IoT traffic dataset and may be fine-tuned with a small set of labeled data, showed promising results in various traffic classification tasks. Our experiments demonstrated that the ITCT model significantly outperforms existing models, achieving an overall accuracy of 82%. To support reproducibility and collaborative development, all associated code has been made publicly available.
Updated: 2024-07-26 19:13:11
标题: 朝向基于Transformer的预训练模型,用于IoT流量分类
摘要: 物联网流量的分类对于提高基于物联网的网络的效率和安全性非常重要。由于现有的分类方法基于深度学习,大部分当前的结果需要大量数据进行训练。因此,在现实生活中,当物联网流量数据稀缺时,这些模型的表现就不会那么好。因此,这些模型在初始训练条件之外表现不佳,无法捕捉网络流量的复杂特征,使它们在实际应用中效率低下且不可靠。在本文中,我们提出了一种新颖的方法,即物联网流量分类Transformer(ITCT),该方法利用了一种名为TabTransformer的最新的基于Transformer的模型。ITCT在一个大型标记的基于MQTT的物联网流量数据集上进行了预训练,并可以通过一小部分标记数据进行微调,在各种流量分类任务中展现出了可观的结果。我们的实验表明,ITCT模型明显优于现有模型,达到了总体准确率为82%。为了支持可重复性和协作开发,所有相关代码都已公开可用。
更新时间: 2024-07-26 19:13:11
领域: cs.NI,cs.AI
Rapid Likelihood Free Inference of Compact Binary Coalescences using Accelerated Hardware
We report a gravitational-wave parameter estimation algorithm, AMPLFI, based on likelihood-free inference using normalizing flows. The focus of AMPLFI is to perform real-time parameter estimation for candidates detected by machine-learning based compact binary coalescence search, Aframe. We present details of our algorithm and optimizations done related to data-loading and pre-processing on accelerated hardware. We train our model using binary black-hole (BBH) simulations on real LIGO-Virgo detector noise. Our model has $\sim 6$ million trainable parameters with training times $\lesssim 24$ hours. Based on online deployment on a mock data stream of LIGO-Virgo data, Aframe + AMPLFI is able to pick up BBH candidates and infer parameters for real-time alerts from data acquisition with a net latency of $\sim 6$s.
Updated: 2024-07-26 19:07:18
标题: 快速无似然推断紧凑二进制合并使用加速硬件
摘要: 我们报告了一种基于基于归一化流的无似然推断的引力波参数估计算法AMPLFI。AMPLFI的重点是为由基于机器学习的紧凑二进制合并搜索Aframe检测到的候选对象执行实时参数估计。我们介绍了我们的算法的细节以及与数据加载和预处理相关的优化,在加速硬件上进行。我们使用真实LIGO-Virgo探测器噪声上的二进制黑洞(BBH)模拟来训练我们的模型。我们的模型具有大约600万个可训练参数,并且训练时间小于24小时。基于在LIGO-Virgo数据的模拟数据流上的在线部署,Aframe + AMPLFI能够捕获BBH候选对象并从数据采集的实时警报中推断参数,净延迟约为6秒。
更新时间: 2024-07-26 19:07:18
领域: gr-qc,astro-ph.IM,cs.LG
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
We introduce a novel yet straightforward neural network initialization scheme that modifies conventional methods like Xavier and Kaiming initialization. Inspired by the concept of emergence and leveraging the emergence measures proposed by Li (2023), our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition, and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
Updated: 2024-07-26 18:56:47
标题: 通过促进出现的初始化方案提升神经网络性能
摘要: 我们介绍了一种新颖而直观的神经网络初始化方案,该方案修改了传统方法如Xavier和Kaiming初始化。受到“涌现”概念的启发,并利用李(2023)提出的涌现度量,我们的方法调整了逐层权重缩放因子以实现更高的涌现值。这种改进很容易实现,与GradInit相比,初始化不需要额外的优化步骤。我们在各种架构上评估了我们的方法,包括用于图像识别的MLP和卷积架构,以及用于机器翻译的变压器。我们展示了在模型准确性和训练速度方面的显着改进,无论是否使用批量归一化。我们方法的简单性、理论创新和可证明的经验优势使其成为神经网络初始化实践的有效增强。这些结果表明了利用涌现改进神经网络训练方法的一个有前途的方向。代码可在https://github.com/johnnyjingzeli/EmergenceInit上找到。
更新时间: 2024-07-26 18:56:47
领域: cs.LG,cs.CV
Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models
The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with challenges in delivering timely and accurate information to clients, particularly concerning critical aspects like potential imprisonment duration or financial repercussions. Compounded by the scarcity of legal experts, there's an urgent need to enhance the efficiency of traditional legal workflows. Recent advances in deep learning, especially Large Language Models (LLMs), offer promising solutions to this challenge. Leveraging LLMs' mathematical reasoning capabilities, we propose a novel approach integrating LLM-based methodologies with specially designed prompts to address precision requirements in legal Artificial Intelligence (LegalAI) applications. The proposed work seeks to bridge the gap between traditional legal practices and modern technological advancements, paving the way for a more accessible, efficient, and equitable legal system. To validate this method, we introduce a curated dataset tailored to precision-oriented LegalAI tasks, serving as a benchmark for evaluating LLM-based approaches. Extensive experimentation confirms the efficacy of our methodology in generating accurate numerical estimates within the legal domain, emphasizing the role of LLMs in streamlining legal processes and meeting the evolving demands of LegalAI.
Updated: 2024-07-26 18:46:39
标题: 通过大型语言模型优化法律领域的数值估计和运营效率
摘要: 法律领域涵盖了各种类型的诉讼,给律师们带来了挑战,需要向客户提供及时准确的信息,特别是涉及潜在监禁时间或财务后果等关键方面。由于法律专家稀缺,迫切需要提高传统法律工作流程的效率。近年来,深度学习特别是大型语言模型(LLMs)的进展为解决这一挑战提供了有希望的解决方案。利用LLMs的数学推理能力,我们提出了一种新颖的方法,将LLM基于方法论与特别设计的提示集成,以解决法律人工智能(LegalAI)应用中的精度要求。该方法旨在弥合传统法律实践和现代技术进步之间的差距,为更具可访问性、高效性和公平性的法律制度铺平道路。为了验证这一方法,我们引入了一个专门针对精度导向的LegalAI任务的策划数据集,作为评估基于LLM方法的方法的基准。广泛的实验验证了我们的方法在生成法律领域内准确数值估计方面的有效性,强调了LLMs在简化法律程序和满足LegalAI不断发展需求中的作用。
更新时间: 2024-07-26 18:46:39
领域: cs.AI,cs.CL
A Fault Prognostic System for the Turbine Guide Bearings of a Hydropower Plant Using Long-Short Term Memory (LSTM)
Hydroelectricity, being a renewable source of energy, globally fulfills the electricity demand. Hence, Hydropower Plants (HPPs) have always been in the limelight of research. The fast-paced technological advancement is enabling us to develop state-of-the-art power generation machines. This has not only resulted in improved turbine efficiency but has also increased the complexity of these systems. In lieu thereof, efficient Operation & Maintenance (O&M) of such intricate power generation systems has become a more challenging task. Therefore, there has been a shift from conventional reactive approaches to more intelligent predictive approaches in maintaining the HPPs. The research is therefore targeted to develop an artificially intelligent fault prognostics system for the turbine bearings of an HPP. The proposed method utilizes the Long Short-Term Memory (LSTM) algorithm in developing the model. Initially, the model is trained and tested with bearing vibration data from a test rig. Subsequently, it is further trained and tested with realistic bearing vibration data obtained from an HPP operating in Pakistan via the Supervisory Control and Data Acquisition (SCADA) system. The model demonstrates highly effective predictions of bearing vibration values, achieving a remarkably low RMSE.
Updated: 2024-07-26 18:45:42
标题: 一种使用长短期记忆(LSTM)的水电站汽轮机导向轴承故障预测系统
摘要: 水电是一种可再生能源,在全球范围内满足了电力需求。因此,水电站(HPPs)一直是研究的焦点。快速发展的技术进步使我们能够开发最先进的发电机。这不仅提高了涡轮机的效率,还增加了这些系统的复杂性。因此,对这些复杂的发电系统进行有效的运行和维护(O&M)已经成为一项更具挑战性的任务。因此,从传统的反应性方法转向更智能的预测性方法来维护水电站。因此,该研究旨在为水电站的涡轮轴承开发一种人工智能故障预测系统。所提出的方法利用长短期记忆(LSTM)算法来开发模型。首先,该模型通过来自测试设备的轴承振动数据进行训练和测试。随后,它进一步通过巴基斯坦一座水电站通过监控和数据采集(SCADA)系统获取的真实轴承振动数据进行训练和测试。该模型展示了对轴承振动值的高效预测,实现了极低的RMSE。
更新时间: 2024-07-26 18:45:42
领域: cs.AI,cs.SY,eess.SY
GraphBPE: Molecular Graphs Meet Byte-Pair Encoding
With the increasing attention to molecular machine learning, various innovations have been made in designing better models or proposing more comprehensive benchmarks. However, less is studied on the data preprocessing schedule for molecular graphs, where a different view of the molecular graph could potentially boost the model's performance. Inspired by the Byte-Pair Encoding (BPE) algorithm, a subword tokenization method popularly adopted in Natural Language Processing, we propose GraphBPE, which tokenizes a molecular graph into different substructures and acts as a preprocessing schedule independent of the model architectures. Our experiments on 3 graph-level classification and 3 graph-level regression datasets show that data preprocessing could boost the performance of models for molecular graphs, and GraphBPE is effective for small classification datasets and it performs on par with other tokenization methods across different model architectures.
Updated: 2024-07-26 18:45:09
标题: GraphBPE: 分子图与字节对编码相遇
摘要: 随着对分子机器学习的关注不断增加,设计更好的模型或提出更全面的基准已经取得了各种创新。然而,在分子图数据预处理方面的研究较少,其中对分子图的不同视角可能会提升模型的性能。受字节对编码(BPE)算法的启发,该算法是自然语言处理中广泛采用的一种子词分词方法,我们提出了GraphBPE,将分子图分词为不同的子结构,并作为一个独立于模型架构的预处理计划。我们在3个图级分类和3个图级回归数据集上的实验表明,数据预处理可以提升分子图模型的性能,GraphBPE对于小分类数据集非常有效,并且在不同模型架构中与其他分词方法表现相当。
更新时间: 2024-07-26 18:45:09
领域: cs.LG,cs.AI,physics.chem-ph,q-bio.BM
Learning in Mean Field Games: A Survey
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malham\'e, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives.
Updated: 2024-07-26 18:20:46
标题: 学习均场博弈:一项调查
摘要: 与大量玩家的非合作和合作游戏有许多应用,但当玩家数量增加时,通常变得难以处理。由Lasry和Lions以及Huang、Caines和Malham'e引入的Mean Field Games(MFGs)依赖于均场近似,允许玩家数量增长到无限。解决这些游戏的传统方法通常依赖于解决具有完整模型知识的偏微分或随机微分方程。最近,强化学习(RL)已经显示出解决规模复杂问题的潜力。RL和MFG的结合有望解决在人口规模和环境复杂性方面非常庞大的游戏。在这项调查中,我们回顾了最近迅速增长的关于RL方法在MFG中学习均衡和社会最优的文献。我们首先确定MFG的最常见设置(静态、稳态和演变)。然后,我们提出了一个经典迭代方法的一般框架(基于最佳响应计算或策略评估),以精确解决MFG。借助这些算法和与马尔可夫决策过程的联系,我们解释了如何使用RL以无模型方式学习MFG解决方案。最后,我们在一个基准问题上进行了数值说明,并得出一些展望。
更新时间: 2024-07-26 18:20:46
领域: cs.LG,cs.AI,cs.GT,math.OC
Artificial neural networks on graded vector spaces
We develop new artificial neural network models for graded vector spaces, which are suitable when different features in the data have different significance (weights). This is the first time that such models are designed mathematically and they are expected to perform better than neural networks over usual vector spaces, which are the special case when the gradings are all 1s.
Updated: 2024-07-26 18:17:58
标题: 在分级向量空间上的人工神经网络
摘要: 我们为分级向量空间开发了新的人工神经网络模型,适用于数据中不同特征具有不同重要性(权重)的情况。这是第一次数学设计这样的模型,预计其性能将优于传统向量空间上的神经网络,传统向量空间是当分级都为1时的特例。
更新时间: 2024-07-26 18:17:58
领域: cs.AI
Supervised Learning based Method for Condition Monitoring of Overhead Line Insulators using Leakage Current Measurement
As a new practical and economical solution to the aging problem of overhead line (OHL) assets, the technical policies of most power grid companies in the world experienced a gradual transition from scheduled preventive maintenance to a risk-based approach in asset management. Even though the accumulation of contamination is predictable within a certain degree, there are currently no effective ways to identify the risk of the insulator flashover in order to plan its replacement. This paper presents a novel machine learning (ML) based method for estimating the flashover probability of the cup-and-pin glass insulator string. The proposed method is based on the Extreme Gradient Boosting (XGBoost) supervised ML model, in which the leakage current (LC) features and applied voltage are used as the inputs. The established model can estimate the critical flashover voltage (U50%) for various designs of OHL insulators with different voltage levels. The proposed method is also able to accurately determine the condition of the insulator strings and instruct asset management engineers to take appropriate actions.
Updated: 2024-07-26 18:11:49
标题: 监督学习方法用于利用漏电流测量进行架空线绝缘子状态监测
摘要: 作为解决架空线路 (OHL) 资产老化问题的一个新的实用且经济的解决方案,世界上大多数电网公司的技术政策逐渐从定期预防性维护过渡到基于风险的资产管理方法。尽管在一定程度上可以预测污染物的积累,但目前还没有有效的方法来识别绝缘子击穿的风险以便计划替换。本文提出了一种基于机器学习 (ML) 的方法,用于估计杯-销式玻璃绝缘子串的击穿概率。所提出的方法基于极端梯度提升 (XGBoost) 监督式 ML 模型,其中使用漏电流 (LC) 特征和施加电压作为输入。建立的模型可以估计不同电压级别的各种设计的 OHL 绝缘子的临界击穿电压 (U50%)。所提出的方法还能够准确确定绝缘子串的状态,并指导资产管理工程师采取适当的行动。
更新时间: 2024-07-26 18:11:49
领域: cs.LG,stat.ML
Instruction Mining: Instruction Data Selection for Tuning Large Language Models
Large language models (LLMs) are initially pretrained for broad capabilities and then finetuned with instruction-following datasets to improve their performance in interacting with humans. Despite advances in finetuning, a standardized guideline for selecting high-quality datasets to optimize this process remains elusive. In this paper, we first propose InstructMining, an innovative method designed for automatically selecting premium instruction-following data for finetuning LLMs. Specifically, InstructMining utilizes natural language indicators as a measure of data quality, applying them to evaluate unseen datasets. During experimentation, we discover that double descent phenomenon exists in large language model finetuning. Based on this observation, we further leverage BlendSearch to help find the best subset among the entire dataset (i.e., 2,532 out of 100,000). Experiment results show that InstructMining-7B achieves state-of-the-art performance on two of the most popular benchmarks: LLM-as-a-judge and Huggingface OpenLLM leaderboard.
Updated: 2024-07-26 18:09:11
标题: 指令挖掘:用于调整大型语言模型的指令数据选择
摘要: 大型语言模型(LLMs)最初是为了广泛的能力而进行预训练,然后通过遵循指示的数据进行微调,以提高它们与人类互动的性能。尽管在微调方面取得了进展,但选择优质数据集以优化这一过程的标准指南仍然难以界定。在本文中,我们首先提出了InstructMining,这是一种创新方法,旨在自动选择用于微调LLMs的优质遵循指示数据。具体而言,InstructMining利用自然语言指标作为数据质量的衡量标准,将其应用于评估未见过的数据集。在实验过程中,我们发现大型语言模型微调中存在双峰现象。基于这一观察结果,我们进一步利用BlendSearch来帮助找到整个数据集中最佳子集(即从10万个中选出2532个)。实验结果显示,InstructMining-7B在两个最受欢迎的基准测试中取得了最先进的表现:LLM作为评委和Huggingface OpenLLM排行榜。
更新时间: 2024-07-26 18:09:11
领域: cs.CL,cs.AI,cs.LG
SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments
This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervision is a challenge. Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem. PPOEM applies the forward-backward algorithm (for Hidden Markov Models) to optimize the expected returns for an option-augmented policy. However, this learning approach is unstable during on-policy rollouts. It is also unsuited for learning causal policies without the knowledge of future trajectories, since option assignments are optimized for offline sequences where the entire episode is available. As an alternative approach, SOAP evaluates the policy gradient for an optimal option assignment. It extends the concept of the generalized advantage estimation (GAE) to propagate option advantages through time, which is an analytical equivalent to performing temporal back-propagation of option policy gradients. This option policy is only conditional on the history of the agent, not future actions. Evaluated against competing baselines, SOAP exhibited the most robust performance, correctly discovering options for POMDP corridor environments, as well as on standard benchmarks including Atari and MuJoCo, outperforming PPOEM, as well as LSTM and Option-Critic baselines. The open-sourced code is available at https://github.com/shuishida/SoapRL.
Updated: 2024-07-26 17:59:55
标题: SOAP-RL:POMDP环境中用于强化学习的顺序选项优势传播
摘要: 这项工作比较了将强化学习算法扩展到具有选项的部分观察马尔可夫决策过程(POMDPs)的方式。选项的一种观点是作为时间扩展的行动,可以实现为允许代理保留超出策略上下文窗口的历史信息的记忆。虽然选项分配可以使用启发式和手工制定的目标来处理,但在没有明确监督的情况下学习时间上一致的选项和相关子策略是一个挑战。提出并深入研究了两种算法,PPOEM和SOAP,以解决这个问题。PPOEM应用前向后向算法(用于隐马尔可夫模型)来优化增加选项的策略的预期回报。然而,这种学习方法在政策内进行的时候是不稳定的。它也不适合学习因果策略,因为选项分配是针对离线序列进行优化的,其中整个情节是可用的。作为替代方法,SOAP评估了用于最佳选项分配的策略梯度。它将广义优势估计(GAE)的概念扩展到通过时间传播选项优势,这相当于执行选项策略梯度的时间反向传播的分析。该选项策略仅依赖于代理的历史,而不是未来的行动。与竞争基线相比,SOAP表现出最为稳健的性能,在POMDP走廊环境中正确发现选项,以及在包括Atari和MuJoCo在内的标准基准测试中优于PPOEM、LSTM和Option-Critic基线。开源代码可在https://github.com/shuishida/SoapRL获取。
更新时间: 2024-07-26 17:59:55
领域: cs.LG,cs.AI
Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation
The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). Our investigation reveals that the benefits of GCNs are more pronounced during testing rather than training. Motivated by this, LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations. This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue. We validate our model across several real-world datasets of different scales, demonstrating that LightGODE not only outperforms GCN-based models in terms of efficiency and effectiveness but also significantly mitigates the embedding discrepancy commonly associated with deeper graph convolution layers. Our LightGODE challenges the prevailing paradigms in RecSys training and suggests re-evaluating the role of graph convolutions, potentially guiding future developments of efficient large-scale graph-based RecSys.
Updated: 2024-07-26 17:59:32
标题: 我们在训练过程中真的需要图卷积吗?轻量级后训练图-ODE用于高效推荐
摘要: 图卷积网络(GCNs)在培训推荐系统(RecSys)中的效率和可扩展性一直是持久关注的问题,阻碍了它们在真实应用中的部署。本文对培训阶段中图卷积的必要性进行了批判性审视,并引入了一种创新的替代方法:轻量级后训练图常微分方程(LightGODE)。我们的调查发现,GCNs的好处在测试阶段比在培训阶段更为显著。受此启发,LightGODE利用了一种新颖的后训练图卷积方法,绕过了GCNs的计算密集型消息传递,并采用非参数连续图常微分方程(ODE)来动态建模节点表示。这种方法显著减少了培训时间,同时实现了细粒度的后训练图卷积,避免了原始培训嵌入空间的扭曲,称为嵌入差异问题。我们在多个不同规模的真实世界数据集上验证了我们的模型,表明LightGODE不仅在效率和效果方面优于基于GCN的模型,而且显著减轻了与较深的图卷积层常见相关的嵌入差异。我们的LightGODE挑战了RecSys培训中的主流范式,并建议重新评估图卷积的作用,可能引导未来高效大规模基于图的RecSys的发展。
更新时间: 2024-07-26 17:59:32
领域: cs.LG,cs.IR
Hybrid summary statistics: neural weak lensing inference beyond the power spectrum
In inference problems, we often have domain knowledge which allows us to define summary statistics that capture most of the information content in a dataset. In this paper, we present a hybrid approach, where such physics-based summaries are augmented by a set of compressed neural summary statistics that are optimised to extract the extra information that is not captured by the predefined summaries. The resulting statistics are very powerful inputs to simulation-based or implicit inference of model parameters. We apply this generalisation of Information Maximising Neural Networks (IMNNs) to parameter constraints from tomographic weak gravitational lensing convergence maps to find summary statistics that are explicitly optimised to complement angular power spectrum estimates. We study several dark matter simulation resolutions in low- and high-noise regimes. We show that i) the information-update formalism extracts at least $3\times$ and up to $8\times$ as much information as the angular power spectrum in all noise regimes, ii) the network summaries are highly complementary to existing 2-point summaries, and iii) our formalism allows for networks with smaller, physically-informed architectures to match much larger regression networks with far fewer simulations needed to obtain asymptotically optimal inference.
Updated: 2024-07-26 17:59:26
标题: 混合摘要统计:超越功率谱的神经弱引力透镜推断
摘要: 在推断问题中,我们经常拥有领域知识,这使得我们能够定义捕捉数据集中大部分信息内容的摘要统计量。在本文中,我们提出了一种混合方法,其中基于物理的摘要统计量被一组压缩的神经摘要统计量所增强,这些统计量被优化以提取未被预定义摘要捕获的额外信息。由此产生的统计量是模型参数的基于模拟或隐式推断的非常强大的输入。我们将这种信息最大化神经网络(IMNNs)的泛化应用于来自映射弱引力透镜收敛图的参数约束,以找到明确优化以补充角功率谱估计的摘要统计量。我们研究了低噪声和高噪声环境中的几种暗物质模拟分辨率。我们表明i)信息更新形式至少提取了$3\times$至$8\times$角功率谱在所有噪声环境中的信息,ii)网络摘要与现有的二点摘要非常互补,iii)我们的形式允许具有较小、受物理启发的架构的网络与需要更少模拟即可获得渐近最优推断的远大回归网络相匹配。
更新时间: 2024-07-26 17:59:26
领域: astro-ph.CO,cs.LG,physics.comp-ph,stat.ML,stat.OT
Wolf: Captioning Everything with a World Summarization Framework
We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of Vision Language Models (VLMs). By utilizing both image and video models, our framework captures different levels of information and summarizes them efficiently. Our approach can be applied to enhance video understanding, auto-labeling, and captioning. To evaluate caption quality, we introduce CapScore, an LLM-based metric to assess the similarity and quality of generated captions compared to the ground truth captions. We further build four human-annotated datasets in three domains: autonomous driving, general scenes, and robotics, to facilitate comprehensive comparisons. We show that Wolf achieves superior captioning performance compared to state-of-the-art approaches from the research community (VILA1.5, CogAgent) and commercial solutions (Gemini-Pro-1.5, GPT-4V). For instance, in comparison with GPT-4V, Wolf improves CapScore both quality-wise by 55.6% and similarity-wise by 77.4% on challenging driving videos. Finally, we establish a benchmark for video captioning and introduce a leaderboard, aiming to accelerate advancements in video understanding, captioning, and data alignment. Leaderboard: https://wolfv0.github.io/leaderboard.html.
Updated: 2024-07-26 17:59:09
标题: 狼:利用世界总结框架为所有内容加标题
摘要: 我们提出了Wolf,一个用于准确视频字幕生成的WOrLd摘要框架。Wolf是一个采用专家混合方法的自动字幕生成框架,利用视觉语言模型(VLMs)的互补优势。通过利用图像和视频模型,我们的框架捕获了不同级别的信息并有效地总结了它们。我们的方法可以应用于增强视频理解,自动标注和字幕生成。为了评估字幕质量,我们引入了CapScore,一种基于LLM的度量标准,用于评估生成的字幕与实际字幕之间的相似性和质量。我们进一步构建了三个领域的四个人工注释数据集:自动驾驶,一般场景和机器人技术,以促进全面的比较。我们展示了与研究界的最先进方法(VILA1.5,CogAgent)和商业解决方案(Gemini-Pro-1.5,GPT-4V)相比,Wolf实现了更优越的字幕生成性能。例如,在具有挑战性的驾驶视频中,与GPT-4V相比,Wolf在质量方面提高了55.6%,在相似性方面提高了77.4%的CapScore。最后,我们为视频字幕生成建立了一个基准,并引入了一个排行榜,旨在加速视频理解,字幕生成和数据对齐的进展。排行榜:https://wolfv0.github.io/leaderboard.html。
更新时间: 2024-07-26 17:59:09
领域: cs.LG,cs.CL,cs.CV
Forecasting Automotive Supply Chain Shortfalls with Heterogeneous Time Series
Operational disruptions can significantly impact companies performance. Ford, with its 37 plants globally, uses 17 billion parts annually to manufacture six million cars and trucks. With up to ten tiers of suppliers between the company and raw materials, any extended disruption in this supply chain can cause substantial financial losses. Therefore, the ability to forecast and identify such disruptions early is crucial for maintaining seamless operations. In this study, we demonstrate how we construct a dataset consisting of many multivariate time series to forecast first-tier supply chain disruptions, utilizing features related to capacity, inventory, utilization, and processing, as outlined in the classical Factory Physics framework. This dataset is technically challenging due to its vast scale of over five hundred thousand time series. Furthermore, these time series, while exhibiting certain similarities, also display heterogeneity within specific subgroups. To address these challenges, we propose a novel methodology that integrates an enhanced Attention Sequence to Sequence Deep Learning architecture, using Neural Network Embeddings to model group effects, with a Survival Analysis model. This model is designed to learn intricate heterogeneous data patterns related to operational disruptions. Our model has demonstrated a strong performance, achieving 0.85 precision and 0.8 recall during the Quality Assurance (QA) phase across Ford's five North American plants. Additionally, to address the common criticism of Machine Learning models as black boxes, we show how the SHAP framework can be used to generate feature importance from the model predictions. It offers valuable insights that can lead to actionable strategies and highlights the potential of advanced machine learning for managing and mitigating supply chain risks in the automotive industry.
Updated: 2024-07-26 17:59:02
标题: 使用异质时间序列预测汽车供应链短缺
摘要: 运营中断会显著影响公司的绩效。福特在全球拥有37家工厂,每年使用170亿个零部件来制造600万辆汽车和卡车。在公司和原材料之间有多达十个供应商层级,供应链中的任何长期中断都可能导致巨额财务损失。因此,能够及时预测和识别这些中断对于保持无缝运营至关重要。在这项研究中,我们展示了如何构建一个包含多个多变量时间序列的数据集,以预测一级供应链中断,利用与容量、库存、利用率和加工相关的特征,如经典工厂物理学框架所述。由于这个数据集规模庞大,超过50万个时间序列,技术上具有挑战性。此外,这些时间序列虽然表现出某些相似性,但在特定子组内也显示出异质性。为了解决这些挑战,我们提出了一种新颖的方法,将增强型注意序列到序列深度学习架构与神经网络嵌入结合起来,以建模群体效应,并结合生存分析模型。这个模型旨在学习与运营中断相关的复杂异质数据模式。我们的模型表现出强大的性能,在福特的五家北美工厂的质量保证(QA)阶段实现了0.85的精度和0.8的召回率。此外,为了解决机器学习模型作为黑匣子的常见批评,我们展示了如何使用SHAP框架从模型预测中生成特征重要性。这提供了有价值的见解,可以带来可操作的策略,并突显了先进机器学习在管理和减轻汽车行业供应链风险方面的潜力。
更新时间: 2024-07-26 17:59:02
领域: stat.ML,cs.LG
A Scalable Quantum Non-local Neural Network for Image Classification
Non-local operations play a crucial role in computer vision enabling the capture of long-range dependencies through weighted sums of features across the input, surpassing the constraints of traditional convolution operations that focus solely on local neighborhoods. Non-local operations typically require computing pairwise relationships between all elements in a set, leading to quadratic complexity in terms of time and memory. Due to the high computational and memory demands, scaling non-local neural networks to large-scale problems can be challenging. This article introduces a hybrid quantum-classical scalable non-local neural network, referred to as Quantum Non-Local Neural Network (QNL-Net), to enhance pattern recognition. The proposed QNL-Net relies on inherent quantum parallelism to allow the simultaneous processing of a large number of input features enabling more efficient computations in quantum-enhanced feature space and involving pairwise relationships through quantum entanglement. We benchmark our proposed QNL-Net with other quantum counterparts to binary classification with datasets MNIST and CIFAR-10. The simulation findings showcase our QNL-Net achieves cutting-edge accuracy levels in binary image classification among quantum classifiers while utilizing fewer qubits.
Updated: 2024-07-26 17:58:57
标题: 一个可扩展的量子非局域神经网络用于图像分类
摘要: 非局部操作在计算机视觉中发挥着关键作用,通过对输入特征的加权和跨越长距离依赖的捕捉,超越了传统卷积操作仅关注局部邻域的限制。非局部操作通常需要计算集合中所有元素之间的成对关系,从而导致时间和内存方面的二次复杂性。由于高计算和内存需求,将非局部神经网络扩展到大规模问题可能具有挑战性。本文介绍了一种混合量子-经典可扩展的非局部神经网络,称为量子非局部神经网络(QNL-Net),以增强模式识别能力。提出的QNL-Net依赖于固有的量子并行性,允许同时处理大量输入特征,从而在量子增强特征空间中进行更有效的计算,并通过量子纠缠涉及成对关系。我们使用数据集MNIST和CIFAR-10将我们提出的QNL-Net与其他量子对等体进行了基准测试。模拟结果显示,我们的QNL-Net在二进制图像分类中实现了尖端的准确性水平,与其他量子分类器相比使用更少的量子比特。
更新时间: 2024-07-26 17:58:57
领域: cs.CV,cs.AI,cs.IT,cs.LG,math.IT,quant-ph
Lessons from Learning to Spin "Pens"
In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.
Updated: 2024-07-26 17:56:01
标题: 学习旋转“笔”技巧的经验教训
摘要: 在日常生活中,类似笔的物体的手上操作是一项重要的技能,因为许多工具如锤子和螺丝刀形状类似。然而,由于缺乏高质量的示范和模拟与真实世界之间存在显著差距,目前基于学习的方法在这项任务上面临困难。在这项工作中,我们通过展示能够旋转类似笔的物体的能力,推动了基于学习的手持操作系统的边界。我们首先使用强化学习来训练一个具有特权信息的神谕策略,并在模拟中生成一个高保真度的轨迹数据集。这有两个目的:1)在模拟中预训练一个感觉运动策略;2)在真实世界中进行开环轨迹重放。然后,我们通过这些真实世界轨迹微调感觉运动策略,使其适应真实世界的动力学。在不到50个轨迹的情况下,我们的策略学会了旋转超过十个具有不同物理特性的类似笔的物体,进行多次旋转。我们对我们的设计选择进行了全面分析,并分享了在开发过程中学到的经验教训。
更新时间: 2024-07-26 17:56:01
领域: cs.RO,cs.AI,cs.LG
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Autonomous agents that address day-to-day digital tasks (e.g., ordering groceries for a household), must not only operate multiple apps (e.g., notes, messaging, shopping app) via APIs, but also generate rich code with complex control flow in an iterative manner based on their interaction with the environment. However, existing benchmarks for tool use are inadequate, as they only cover tasks that require a simple sequence of API calls. To remedy this gap, we built $\textbf{AppWorld Engine}$, a high-quality execution environment (60K lines of code) of 9 day-to-day apps operable via 457 APIs and populated with realistic digital activities simulating the lives of ~100 fictitious users. We then created $\textbf{AppWorld Benchmark}$ (40K lines of code), a suite of 750 natural, diverse, and challenging autonomous agent tasks requiring rich and interactive code generation. It supports robust programmatic evaluation with state-based unit tests, allowing for different ways of completing a task while also checking for unexpected changes, i.e., collateral damage. The state-of-the-art LLM, GPT-4o, solves only ~49% of our 'normal' tasks and ~30% of 'challenge' tasks, while other models solve at least 16% fewer. This highlights the benchmark's difficulty and AppWorld's potential to push the frontiers of interactive coding agents. The project website is available at https://appworld.dev/.
Updated: 2024-07-26 17:55:45
标题: AppWorld:一个可控的应用和人群世界,用于评估交互式编码代理
摘要: 自主代理程序解决日常数字任务(例如,为一个家庭订购杂货)时,不仅必须通过API操作多个应用程序(如笔记、消息和购物应用程序),还必须根据它们与环境的交互以迭代方式生成具有复杂控制流的丰富代码。然而,现有的工具使用基准不足,因为它们只涵盖需要简单API调用序列的任务。 为填补这一空白,我们构建了$\textbf{AppWorld Engine}$,这是一个由60K行代码构成的高质量执行环境,可以通过457个API操作9个日常应用程序,并填充了模拟约100名虚构用户生活的现实数字活动。然后我们创建了$\textbf{AppWorld Benchmark}$(40K行代码),这是一个包含750个自然、多样化和具有挑战性的自主代理任务的套件,需要生成丰富且交互式的代码。它支持基于状态的单元测试的强大编程评估,允许以不同方式完成任务,并检查意外更改,即附带损害。最先进的LLM,GPT-4o,仅解决了我们约49%的“正常”任务和约30%的“挑战”任务,而其他模型解决的任务至少少16%。这突显了基准测试的困难性和AppWorld推动交互式编码代理的潜力。项目网站可在https://appworld.dev/上找到。
更新时间: 2024-07-26 17:55:45
领域: cs.SE,cs.AI,cs.CL,cs.LG
Reinforcement learning for anisotropic p-adaptation and error estimation in high-order solvers
We present a novel approach to automate and optimize anisotropic p-adaptation in high-order h/p solvers using Reinforcement Learning (RL). The dynamic RL adaptation uses the evolving solution to adjust the high-order polynomials. We develop an offline training approach, decoupled from the main solver, which shows minimal overcost when performing simulations. In addition, we derive a RL-based error estimation approach that enables the quantification of local discretization errors. The proposed methodology is agnostic to both the computational mesh and the partial differential equation being solved. The application of RL to mesh adaptation offers several benefits. It enables automated, adaptive mesh refinement, reducing the need for manual intervention. It optimizes computational resources by dynamically allocating high-order polynomials where necessary and minimizing refinement in stable regions. This leads to computational cost savings while maintaining solution accuracy. Furthermore, RL allows for the exploration of unconventional mesh adaptations, potentially enhancing the accuracy and robustness of simulations. This work extends our original research, offering a more robust, reproducible, and generalizable approach applicable to complex three-dimensional problems. We provide validation for laminar and turbulent cases: circular cylinders, Taylor Green Vortex and a 10MW wind turbine to illustrate the flexibility of the proposed approach.
Updated: 2024-07-26 17:55:23
标题: 强化学习用于各向异性p自适应和高阶求解器中的误差估计
摘要: 我们提出了一种新颖的方法,利用强化学习(RL)自动化和优化高阶h/p求解器中的各向异性p自适应。动态RL调整利用不断演化的解来调整高阶多项式。我们开发了一种与主要求解器分离的离线训练方法,在执行模拟时显示出最小的过成本。此外,我们提出了一种基于RL的误差估计方法,可以量化局部离散化误差。所提出的方法对计算网格和正在解决的偏微分方程都是不可知的。 将RL应用于网格自适应具有几个好处。它实现了自动化的自适应网格细化,减少了手动干预的需求。它通过在必要时动态分配高阶多项式并在稳定区域最小化细化来优化计算资源。这导致计算成本节省同时保持解的准确性。此外,RL允许探索非常规网格自适应,可能增强模拟的准确性和稳健性。这项工作扩展了我们的原始研究,提供了一种更加稳健、可重复和可推广的方法,适用于复杂的三维问题。我们提供了层流和湍流情况的验证:圆柱体、泰勒格林涡和10兆瓦风力涡轮机,以说明所提方法的灵活性。
更新时间: 2024-07-26 17:55:23
领域: physics.flu-dyn,cs.LG,physics.comp-ph
Physics-Guided Actor-Critic Reinforcement Learning for Swimming in Turbulence
Turbulent diffusion causes particles placed in proximity to separate. We investigate the required swimming efforts to maintain a particle close to its passively advected counterpart. We explore optimally balancing these efforts with the intended goal by developing and comparing a novel Physics-Informed Reinforcement Learning (PIRL) strategy with prescribed control (PC) and standard physics-agnostic Reinforcement Learning strategies. Our PIRL scheme, coined the Actor-Physicist, is an adaptation of the Actor-Critic algorithm in which the Neural Network parameterized Critic is replaced with an analytically derived physical heuristic function (the physicist). This strategy is then compared with an analytically computed optimal PC policy derived from a stochastic optimal control formulation and standard physics-agnostic Actor-Critic type algorithms.
Updated: 2024-07-26 17:54:59
标题: 物理引导的演员-评论者强化学习在湍流中游泳
摘要: 湍流扩散导致靠近的颗粒分开。我们研究了维持颗粒靠近其被 passively advected 的对应物所需的游泳努力。我们通过开发和比较一种新颖的基于物理信息的强化学习(PIRL)策略与预设控制(PC)和标准的不考虑物理因素的强化学习策略来探索如何最优地平衡这些努力与预期目标。我们的PIRL方案被称为Actor-Physicist,它是Actor-Critic算法的一种改编,其中神经网络参数化的评论者被替换为一个解析推导的物理启发函数(物理学家)。然后,这种策略与从随机最优控制形式推导出的解析计算的最优PC策略以及标准的不考虑物理因素的Actor-Critic类型算法进行比较。
更新时间: 2024-07-26 17:54:59
领域: eess.SY,cs.LG,cs.SY,nlin.CD,physics.flu-dyn,stat.ML
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without a comprehensive assessment of risks. As a first step towards democratic risk assessment and design of general purpose AI, we introduce PARTICIP-AI, a carefully designed framework for laypeople to speculate and assess AI use cases and their impacts. Our framework allows us to study more nuanced and detailed public opinions on AI through collecting use cases, surfacing diverse harms through risk assessment under alternate scenarios (i.e., developing and not developing a use case), and illuminating tensions over AI development through making a concluding choice on its development. To showcase the promise of our framework towards informing democratic AI development, we run a medium-scale study with inputs from 295 demographically diverse participants. Our analyses show that participants' responses emphasize applications for personal life and society, contrasting with most current AI development's business focus. We also surface diverse set of envisioned harms such as distrust in AI and institutions, complementary to those defined by experts. Furthermore, we found that perceived impact of not developing use cases significantly predicted participants' judgements of whether AI use cases should be developed, and highlighted lay users' concerns of techno-solutionism. We conclude with a discussion on how frameworks like PARTICIP-AI can further guide democratic AI development and governance.
Updated: 2024-07-26 17:52:46
标题: Particip-AI:一个民主调查框架,用于预测未来人工智能使用案例、危害和好处
摘要: 通用人工智能,例如ChatGPT,似乎已经降低了公众使用人工智能并利用其力量的障碍。然而,人工智能的治理和发展仍然掌握在少数人手中,发展速度加快,没有全面评估风险。作为民主风险评估和通用人工智能设计的第一步,我们引入了PARTICIP-AI,这是一个精心设计的框架,供普通人推测和评估人工智能用例及其影响。我们的框架使我们能够通过收集使用情况、在备选场景(即开发和不开发用例)下进行风险评估,揭示人们对人工智能发展的紧张关系,通过对其发展做出结论选择,研究更加微妙和详细的公众对人工智能的看法。为了展示我们的框架对民主人工智能开发的承诺,我们进行了一个涉及295名人口多样化参与者的中等规模研究。我们的分析显示,参与者的回应强调了个人生活和社会的应用,与当前大多数人工智能开发的商业重点形成对比。我们还发现了各种设想的伤害,例如对人工智能和机构的不信任,与专家定义的相辅相成。此外,我们发现,不开发用例的感知影响显著预测了参与者对是否应该开发人工智能用例的判断,并突显了普通用户对技术解决方案的担忧。我们结论是,通过讨论PARTICIP-AI这样的框架如何进一步指导民主人工智能的发展和治理。
更新时间: 2024-07-26 17:52:46
领域: cs.CY,cs.AI
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain. This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation, and a minimum amount of annotation budget is available in the target domain. Without referencing the source data, new challenges emerge in identifying the most informative target samples for labeling, establishing cross-domain alignment during adaptation, and ensuring continuous performance improvements through the iterative query-and-adaptation process. In response, we present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead. We propose Contrastive Active Sampling to learn from the hypotheses of the preceding model, thereby querying target samples that are both informative to the current model and persistently challenging throughout active learning. During adaptation, we learn from features of actively selected anchors obtained from previous intermediate models, so that the Visual Persistence-guided Adaptation can facilitate feature distribution alignment and active sample exploitation. Extensive experiments on three widely-used benchmarks show that our LFTL achieves state-of-the-art performance, superior computational efficiency and continuous improvements as the annotation budget increases. Our code is available at https://github.com/lyumengyao/lftl.
Updated: 2024-07-26 17:51:58
标题: 学习所学:通过对比采样和视觉持久性实现无源主动域自适应
摘要: 域自适应(DA)促进了从源领域到相关目标领域的知识转移。本文研究了一种实用的DA范式,即无源数据主动域自适应(SFADA),在适应过程中源数据不可访问,并且在目标领域中可用的注释预算量很少。在没有参考源数据的情况下,出现了识别最具信息量的目标样本进行标注、在适应过程中建立跨领域对齐以及通过迭代查询和适应过程确保持续性性能改进等新挑战。为此,我们提出了从所学到的(LFTL)这一新颖的范式,用于SFADA利用源预训练模型和主动迭代模型所学到的知识,而无需额外的开销。我们提出了对比主动抽样方法,从前一个模型的假设中学习,从而查询对当前模型具有信息量且在主动学习过程中持续具有挑战性的目标样本。在适应过程中,我们从之前中间模型获得的主动选择的锚点的特征中学习,以便视觉持续引导自适应可以促进特征分布对齐和主动样本开发。在三个广泛使用的基准测试上进行的大量实验表明,我们的LFTL实现了最先进的性能、优越的计算效率和随着注释预算的增加而持续改进。我们的代码可在 https://github.com/lyumengyao/lftl 上获取。
更新时间: 2024-07-26 17:51:58
领域: cs.CV,cs.AI,cs.LG
Small Molecule Optimization with Large Language Models
Recent advancements in large language models have opened new possibilities for generative molecular drug design. We present Chemlactica and Chemma, two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. These models demonstrate strong performance in generating molecules with specified properties and predicting new molecular characteristics from limited samples. We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle. Our approach combines ideas from genetic algorithms, rejection sampling, and prompt optimization. It achieves state-of-the-art performance on multiple molecular optimization benchmarks, including an 8% improvement on Practical Molecular Optimization compared to previous methods. We publicly release the training corpus, the language models and the optimization algorithm.
Updated: 2024-07-26 17:51:33
标题: 大型语言模型在小分子优化中的应用
摘要: 最近大型语言模型的进展为生成性分子药物设计开辟了新的可能性。我们介绍了Chemlactica和Chemma,这两个语言模型在一个新颖的包含110M个分子及计算属性的语料库上进行了微调,总计40B个标记。这些模型展示了在生成具有指定属性的分子和从有限样本中预测新的分子特性方面的强大性能。我们引入了一种新颖的优化算法,利用我们的语言模型来优化具有任意属性的分子,即使只有有限的黑匣子访问权限。我们的方法结合了遗传算法、拒绝采样和提示优化的思想。与先前方法相比,在多个分子优化基准测试中取得了最先进的性能,包括在实际分子优化方面提高了8%。我们公开发布了训练语料库、语言模型和优化算法。
更新时间: 2024-07-26 17:51:33
领域: cs.LG,cs.NE,q-bio.QM
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
A central piece in enabling intelligent agentic behavior in foundation models is to make them capable of introspecting upon their behavior, reasoning, and correcting their mistakes as more computation or interaction is available. Even the strongest proprietary large language models (LLMs) do not quite exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, we develop RISE: Recursive IntroSpEction, an approach for fine-tuning LLMs to introduce this capability, despite prior work hypothesizing that this capability may not be possible to attain. Our approach prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning and reinforcement learning, we propose strategies for multi-turn data collection and training so as to imbue an LLM with the capability to recursively detect and correct its previous mistakes in subsequent iterations. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation. We also find that RISE scales well, often attaining larger benefits with more capable models. Our analysis shows that RISE makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities as a result of expressing more complex distributions.
Updated: 2024-07-26 17:50:27
标题: 递归内省:教授语言模型代理如何自我改进
摘要: 在基础模型中实现智能主体行为的关键是使它们能够对自己的行为、推理和错误进行内省,并在更多计算或交互可用时进行纠正。即使是最强大的专有大型语言模型(LLMs),也并没有展示出持续改进其响应的能力,即使在明确告知它们出现错误的情况下也是如此。在本文中,我们开发了RISE:递归内省方法,以引入这种能力来微调LLMs,尽管先前的工作假设这种能力可能无法实现。我们的方法规定了一种迭代微调程序,试图教会模型如何在之前执行的未成功尝试解决难题的情况下修改其响应,可选择性地获得额外的环境反馈。RISE将单轮提示的微调视为解决多轮马尔可夫决策过程(MDP),其中初始状态是提示。受在线模仿学习和强化学习原理的启发,我们提出了多轮数据收集和训练策略,以赋予LLM递归检测和纠正其之前错误的能力。我们的实验表明,RISE使Llama2、Llama3和Mistral模型在数学推理任务中通过更多轮次改进自己,优于几种单轮策略在相等推理时间计算下的表现。我们还发现,RISE的规模扩展良好,通常可以获得更有能力模型的更大益处。我们的分析表明,RISE对于应对具有挑战性提示的正确解决方案进行了有意义的改进,而不会因表达更复杂的分布而破坏单轮能力。
更新时间: 2024-07-26 17:50:27
领域: cs.LG,cs.AI,cs.CL
SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces
This paper introduces SHANGUS, an advanced framework combining Deep Reinforcement Learning (DRL) with heuristic optimization to improve frontier-based exploration efficiency in unknown environments, particularly for intelligent vehicles in autonomous air services, search and rescue operations, and space exploration robotics. SHANGUS harnesses DRL's adaptability and heuristic prioritization, markedly enhancing exploration efficiency, reducing completion time, and minimizing travel distance. The strategy involves a frontier selection node to identify unexplored areas and a DRL navigation node using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for robust path planning and dynamic obstacle avoidance. Extensive experiments in ROS2 and Gazebo simulation environments show SHANGUS surpasses representative traditional methods like the Nearest Frontier (NF), Novel Frontier-Based Exploration Algorithm (CFE), and Goal-Driven Autonomous Exploration (GDAE) algorithms, especially in complex scenarios, excelling in completion time, travel distance, and exploration rate. This scalable solution is suitable for real-time autonomous navigation in fields such as industrial automation, autonomous driving, household robotics, and space exploration. Future research will integrate additional sensory inputs and refine heuristic functions to further boost SHANGUS's efficiency and robustness.
Updated: 2024-07-26 17:42:18
标题: SHANGUS:深度强化学习与启发式优化相遇:用于自动驾驶车辆在未知空间快速基于前沿的探索
摘要: 这篇论文介绍了SHANGUS,一个先进的框架,结合深度强化学习(DRL)和启发式优化,以提高未知环境中基于边界的探索效率,特别是针对智能车辆在自主空中服务、搜索和救援行动以及空间探索机器人领域。SHANGUS利用DRL的适应性和启发式优先级,显著提高了探索效率,减少了完成时间,并最小化了行驶距离。该策略涉及一个边界选择节点来识别未探索区域,并使用双延迟深度确定性策略梯度(TD3)算法的DRL导航节点进行强大的路径规划和动态障碍物避免。在ROS2和Gazebo仿真环境中进行的大量实验表明,SHANGUS超越了代表性的传统方法,如最近边界(NF)、新颖边界探索算法(CFE)和目标驱动自主探索(GDAE)算法,特别在复杂场景中表现出色,优于完成时间、行驶距离和探索速率。这种可扩展的解决方案适用于工业自动化、自动驾驶、家庭机器人和空间探索等领域的实时自主导航。未来研究将整合更多感知输入并优化启发式函数,进一步提升SHANGUS的效率和稳健性。
更新时间: 2024-07-26 17:42:18
领域: cs.RO,cs.AI,cs.SY,eess.SY
On the Pros and Cons of Active Learning for Moral Preference Elicitation
Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.
Updated: 2024-07-26 17:40:52
标题: 关于道德偏好引导中主动学习的利弊
摘要: 计算偏好获取方法是用来在特定环境中定量地了解人们偏好的工具。最近关于偏好获取的研究倡导主动学习作为一种有效的方法,通过迭代构建查询(以特定情境案例之间的比较形式提出)来获取最有可能揭示代理人潜在偏好的信息。在这项工作中,我们认为利用主动学习进行道德偏好获取依赖于对潜在道德偏好的某些假设,这些假设在实践中可能会被违反。具体来说,我们强调以下常见假设:(a)偏好随时间稳定且不受提出查询顺序的影响,(b)选择适当的假设类来建模道德偏好,(c)代理人的回应中存在的噪声是有限的。虽然这些假设在某些领域的偏好获取中可能是适当的,但先前的道德心理学研究表明这些假设可能在道德判断中无效。通过合成模拟违反上述假设的偏好,我们观察到在某些情况下,主动学习的表现可能与基本的随机查询选择方法相似或更糟。然而,模拟结果也表明,如果不稳定性或噪声程度相对较小,并且代理人的偏好可以近似地用于学习的假设类表示,那么主动学习仍然是可行的。我们的研究强调了在实践中进行有效的道德偏好获取所涉及的细微差别,并倡导谨慎使用主动学习作为学习道德偏好的方法论。
更新时间: 2024-07-26 17:40:52
领域: cs.HC,cs.CY,cs.LG
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
Recent studies of large-scale contrastive pretraining in the text embedding domain show that using single-source minibatches, rather than mixed-source minibatches, can substantially improve overall model accuracy. In this work, we explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm to further split training data apart by the semantic clusters within each source. Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset. Additionally, we conceptually connect our clustering approach to both the Topic Aware Sampling (TAS) aspect of the TAS-B methodology and the nearest-neighbor-based hard-negative mining aspect of the ANCE methodology and discuss how this unified view motivates future lines of research on the organization of contrastive pretraining data.
Updated: 2024-07-26 17:36:40
标题: 嵌入和聚类数据可以改善对比预训练
摘要: 最近在文本嵌入领域对大规模对比预训练的研究表明,使用单一源小批量而不是混合源小批量可以显著提高整体模型准确性。在这项工作中,我们探索了通过利用预训练文本嵌入模型和经典的k-means聚类算法,将训练数据分层扩展到源粒度之外,进一步按照每个源内的语义簇拆分训练数据。在实验中,我们观察到在MSMARCO段落检索数据集的查询-段落对上预训练BERT文本嵌入模型时,NDCG@10显著增加。此外,我们将我们的聚类方法概念上连接到TAS-B方法的TAS(主题感知采样)方面和ANCE方法的基于最近邻的硬负采样方面,并讨论这种统一视角如何激发对对比预训练数据组织的未来研究方向。
更新时间: 2024-07-26 17:36:40
领域: cs.LG,cs.CL
Regression prediction algorithm for energy consumption regression in cloud computing based on horned lizard algorithm optimised convolutional neural network-bidirectional gated recurrent unit
For this paper, a prediction study of cloud computing energy consumption was conducted by optimising the data regression algorithm based on the horned lizard optimisation algorithm for Convolutional Neural Networks-Bi-Directional Gated Recurrent Units. Firstly, through Spearman correlation analysis of CPU, usage, memory usage, network traffic, power consumption, number of instructions executed, execution time and energy efficiency, we found that power consumption has the highest degree of positive correlation with energy efficiency, while CPU usage has the highest degree of negative correlation with energy efficiency. In our experiments, we introduced a random forest model and an optimisation model based on the horned lizard optimisation algorithm for testing, and the results show that the optimisation algorithm has better prediction results compared to the random forest model. Specifically, the mean square error (MSE) of the optimisation algorithm is 0.01 smaller than that of the random forest model, and the mean absolute error (MAE) is 0.01 smaller than that of the random forest.3 The results of the combined metrics show that the optimisation algorithm performs more accurately and reliably in predicting energy efficiency. This research result provides new ideas and methods to improve the energy efficiency of cloud computing systems. This research not only expands the scope of application in the field of cloud computing, but also provides a strong support for improving the energy use efficiency of the system.
Updated: 2024-07-26 17:35:20
标题: 基于角蜥算法优化的卷积神经网络-双向门控循环单元的云计算能耗回归预测算法
摘要: 本文通过优化基于角蜥蜴优化算法的数据回归算法,对卷积神经网络-双向门控循环单元的云计算能耗进行了预测研究。首先,通过对CPU使用率、内存使用率、网络流量、功耗、执行指令数量、执行时间和能效的斯皮尔曼相关性分析,我们发现功耗与能效具有最高程度的正相关性,而CPU使用率与能效具有最高程度的负相关性。在实验中,我们引入了基于角蜥蜴优化算法的随机森林模型和优化模型进行测试,结果显示优化算法相比随机森林模型具有更好的预测结果。具体来说,优化算法的均方误差(MSE)比随机森林模型小0.01,平均绝对误差(MAE)比随机森林模型小0.01。综合指标结果显示,优化算法在预测能效方面更加准确可靠。这一研究结果为提高云计算系统的能效提供了新的思路和方法。这项研究不仅扩大了云计算领域的应用范围,还为提高系统能源利用效率提供了有力支持。
更新时间: 2024-07-26 17:35:20
领域: cs.DC,cs.AI,cs.LG
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence lenth. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation capabilities, highlighting the need to further study such paradigms and investigate why long-context models seemingly fail to behave as one might expect.
Updated: 2024-07-26 17:31:51
标题: 长序列模型能够有效地建模长序列吗?比较长文本能力上的架构归纳偏差
摘要: 长序列在现实场景中大量出现,因此适当地对其建模可以打开许多下游用例。然而,深度神经网络经常因多种原因而难以处理这些长序列。最近的进展,无论是在系统工程还是模型设计方面,都已经实现了模型的扩展,据称支持更长的上下文长度。特别是,状态空间和线性递归神经网络系列的模型理论上可以扩展到无限序列长度。然而,这是否太美好而不切实际?我们进行了评估,表明尽管这样的说法在理论上是合理的,但经验观察到仍存在大量实际差距。特别是,在相同设置中,循环模型仍然遭受与具有注意力的长上下文LLMs相同的困境。我们进一步表明,不同的归纳偏差具有不一致的外推能力,突出了进一步研究这些范式并探讨为什么长上下文模型似乎无法按预期行事的必要性。
更新时间: 2024-07-26 17:31:51
领域: cs.LG,cs.AI,cs.CL
Variational Inference via Smoothed Particle Hydrodynamics
A new variational inference method, SPH-ParVI, based on smoothed particle hydrodynamics (SPH), is proposed for sampling partially known densities (e.g. up to a constant) or sampling using gradients. SPH-ParVI simulates the flow of a fluid under external effects driven by the target density; transient or steady state of the fluid approximates the target density. The continuum fluid is modelled as an interacting particle system (IPS) via SPH, where each particle carries smoothed properties, interacts and evolves as per the Navier-Stokes equations. This mesh-free, Lagrangian simulation method offers fast, flexible, scalable and deterministic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference and generative modelling.
Updated: 2024-07-26 17:26:45
标题: 光滑粒子流体动力学下的变分推断
摘要: 一种基于平滑粒子流体动力学(SPH)的新变分推断方法SPH-ParVI,用于对部分已知密度(例如,直到一个常数)进行抽样或使用梯度进行抽样。SPH-ParVI模拟在外部作用下由目标密度驱动的流体流动;流体的瞬态或稳态近似于目标密度。连续流体被建模为通过SPH的相互作用粒子系统(IPS),其中每个粒子携带平滑属性,按照Navier-Stokes方程进行相互作用和演化。这种无网格、拉格朗日模拟方法为一类概率模型(如在贝叶斯推断和生成建模中遇到的模型)提供了快速、灵活、可扩展和确定性的抽样和推断。
更新时间: 2024-07-26 17:26:45
领域: cs.AI,cs.LG,stat.ML
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time for KWS model development. Still, TTS generated data can be lacking diversity compared to real data. To pursue maximizing KWS model accuracy under the constraint of limited resources and current TTS capability, we explored various strategies to mix TTS data and real human speech data, with a focus on minimizing real data use and maximizing diversity of TTS output. Our experimental results indicate that relatively small amounts of real audio data with speaker diversity (100 speakers, 2k utterances) and large amounts of TTS synthesized data can achieve reasonably high accuracy (within 3x error rate of baseline), compared to the baseline (trained with 3.8M real positive utterances).
Updated: 2024-07-26 17:24:50
标题: 利用TTS合成数据高效开发关键词检测模型
摘要: 本文探讨了在最小化开发成本和时间的同时,利用TTS合成的训练数据进行KWS(关键词识别)任务。关键词识别模型需要大量的训练数据才能准确,获取这样的训练数据可能成本高昂。在当前技术水平下,TTS模型可以生成大量自然语音数据,这有助于减少KWS模型开发的成本和时间。然而,与真实数据相比,TTS生成的数据可能缺乏多样性。为了在有限资源和当前TTS能力的约束下最大化KWS模型的准确性,我们探索了各种策略来混合TTS数据和真实人类语音数据,重点在减少真实数据使用和最大化TTS输出的多样性。我们的实验结果表明,相对较小量的真实音频数据(100个说话者,2k个话语)和大量的TTS合成数据可以实现相对较高的准确性(与基线的错误率相比不超过3倍),与基线(使用3.8M个真实正面话语进行训练)相比。
更新时间: 2024-07-26 17:24:50
领域: cs.SD,cs.LG,eess.AS
Variational Inference Using Material Point Method
A new gradient-based particle sampling method, MPM-ParVI, based on material point method (MPM), is proposed for variational inference. MPM-ParVI simulates the deformation of a deformable body (e.g. a solid or fluid) under external effects driven by the target density; transient or steady configuration of the deformable body approximates the target density. The continuum material is modelled as an interacting particle system (IPS) using MPM, each particle carries full physical properties, interacts and evolves following conservation dynamics. This easy-to-implement ParVI method offers deterministic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference (e.g. intractable densities) and generative modelling (e.g. score-based).
Updated: 2024-07-26 17:19:50
标题: 使用材料点方法进行变分推断
摘要: 提出了一种基于材料点方法(MPM)的新梯度粒子采样方法MPM-ParVI,用于变分推断。MPM-ParVI模拟可变形物体(例如固体或流体)在受目标密度驱动的外部影响下的变形;可变形物体的瞬态或稳态配置近似于目标密度。连续材料被建模为使用MPM的相互作用粒子系统(IPS),每个粒子携带完整的物理属性,根据守恒动力学相互作用和演变。这种易于实现的ParVI方法为一类概率模型提供了确定性采样和推断,例如在贝叶斯推断(例如难以处理的密度)和生成建模(例如基于评分)中遇到的模型。
更新时间: 2024-07-26 17:19:50
领域: cs.AI,stat.CO,stat.ML
An Accelerated Multi-level Monte Carlo Approach for Average Reward Reinforcement Learning with General Policy Parametrization
In our study, we delve into average-reward reinforcement learning with general policy parametrization. Within this domain, current guarantees either fall short with suboptimal guarantees or demand prior knowledge of mixing time. To address these issues, we introduce Randomized Accelerated Natural Actor Critic, a method that integrates Multi-level Monte-Carlo and Natural Actor Critic. Our approach is the first to achieve global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ without requiring knowledge of mixing time, significantly surpassing the state-of-the-art bound of $\tilde{\mathcal{O}}(1/T^{1/4})$.
Updated: 2024-07-26 17:16:31
标题: 一种加速的多层蒙特卡洛方法,用于具有一般策略参数化的平均奖励强化学习
摘要: 在我们的研究中,我们深入研究了具有一般策略参数化的平均奖励强化学习。在这个领域中,当前的保证要么不够好,要么需要先前对混合时间的先验知识。为了解决这些问题,我们引入了随机加速自然演员评论家,这是一种集成多级蒙特卡罗和自然演员评论家的方法。我们的方法是第一种在不需要混合时间知识的情况下实现全局收敛速率$\tilde{\mathcal{O}}(1/\sqrt{T})$的方法,明显超过了当前的$\tilde{\mathcal{O}}(1/T^{1/4})$的最新界限。
更新时间: 2024-07-26 17:16:31
领域: cs.LG
Generative Adversarial Networks for Imputing Sparse Learning Performance
Learning performance data, such as correct or incorrect responses to questions in Intelligent Tutoring Systems (ITSs) is crucial for tracking and assessing the learners' progress and mastery of knowledge. However, the issue of data sparsity, characterized by unexplored questions and missing attempts, hampers accurate assessment and the provision of tailored, personalized instruction within ITSs. This paper proposes using the Generative Adversarial Imputation Networks (GAIN) framework to impute sparse learning performance data, reconstructed into a three-dimensional (3D) tensor representation across the dimensions of learners, questions and attempts. Our customized GAIN-based method computational process imputes sparse data in a 3D tensor space, significantly enhanced by convolutional neural networks for its input and output layers. This adaptation also includes the use of a least squares loss function for optimization and aligns the shapes of the input and output with the dimensions of the questions-attempts matrices along the learners' dimension. Through extensive experiments on six datasets from various ITSs, including AutoTutor, ASSISTments and MATHia, we demonstrate that the GAIN approach generally outperforms existing methods such as tensor factorization and other generative adversarial network (GAN) based approaches in terms of imputation accuracy. This finding enhances comprehensive learning data modeling and analytics in AI-based education.
Updated: 2024-07-26 17:09:48
标题: 生成对抗网络用于填补稀疏学习性能
摘要: 学习表现数据,如在智能辅导系统(ITSs)中对问题的正确或错误回答,对于跟踪和评估学习者的进步和知识掌握至关重要。然而,数据稀疏性的问题,表现为未探索的问题和缺失的尝试,阻碍了在ITSs内进行准确评估和提供定制、个性化指导。本文提出使用生成对抗填补网络(GAIN)框架来填补稀疏的学习表现数据,将其重构为跨学习者、问题和尝试维度的三维(3D)张量表示。我们基于定制的GAIN方法计算过程在3D张量空间中填补稀疏数据,通过卷积神经网络显著增强了其输入和输出层。这种适应性还包括使用最小二乘损失函数进行优化,并将输入和输出的形状与问题-尝试矩阵的维度沿着学习者的维度对齐。通过对包括AutoTutor、ASSISTments和MATHia在内的各种ITSs的六个数据集进行广泛实验,我们证明了GAIN方法在填补准确性方面通常优于张量分解和其他生成对抗网络(GAN)方法。这一发现增强了基于人工智能的教育中的全面学习数据建模和分析。
更新时间: 2024-07-26 17:09:48
领域: cs.LG,cs.AI
Engaging with Children's Artwork in Mixed Visual-Ability Families
We present two studies exploring how blind or low-vision (BLV) family members engage with their sighted children's artwork, strategies to support understanding and interpretation, and the potential role of technology, such as AI, therein. Our first study involved 14 BLV individuals, and the second included five groups of BLV individuals with their children. Through semi-structured interviews with AI descriptions of children's artwork and multi-sensory design probes, we found that BLV family members value artwork engagement as a bonding opportunity, preferring the child's storytelling and interpretation over other nonvisual representations. Additionally, despite some inaccuracies, BLV family members felt that AI-generated descriptions could facilitate dialogue with their children and aid self-guided art discovery. We close with specific design considerations for supporting artwork engagement in mixed visual-ability families, including enabling artwork access through various methods, supporting children's corrections of AI output, and distinctions in context vs. content and interpretation vs. description of children's artwork.
Updated: 2024-07-26 17:08:53
标题: 参与混合视觉能力家庭中儿童艺术作品
摘要: 我们提出了两项研究,探讨盲人或视力低下(BLV)家庭成员如何与他们的视力正常孩子的艺术作品交流,支持理解和解释的策略,以及技术(如人工智能)在其中的潜在作用。我们的第一项研究涉及14名BLV个体,第二项研究包括五组BLV个体与他们的孩子。通过半结构化访谈、AI描述儿童艺术作品和多感官设计探针,我们发现BLV家庭成员将参与艺术作品视为一种增进情感联系的机会,更倾向于孩子的叙述和解释而非其他非视觉表现。此外,尽管存在一些不准确性,BLV家庭成员认为AI生成的描述可以促进与孩子的对话,帮助他们自主探索艺术作品。我们最后提出了支持混合视力能力家庭中艺术作品参与的具体设计考虑,包括通过各种方法实现艺术作品的访问,支持孩子对AI输出的纠正,以及区分儿童艺术作品的背景与内容、解释与描述。
更新时间: 2024-07-26 17:08:53
领域: cs.HC,cs.AI
Distilling Multi-Scale Knowledge for Event Temporal Relation Extraction
Event Temporal Relation Extraction (ETRE) is paramount but challenging. Within a discourse, event pairs are situated at different distances or the so-called proximity bands. The temporal ordering communicated about event pairs where at more remote (i.e., ``long'') or less remote (i.e., ``short'') proximity bands are encoded differently. SOTA models have tended to perform well on events situated at either short or long proximity bands, but not both. Nonetheless, real-world, natural texts contain all types of temporal event-pairs. In this paper, we present MulCo: Distilling Multi-Scale Knowledge via Contrastive Learning, a knowledge co-distillation approach that shares knowledge across multiple event pair proximity bands to improve performance on all types of temporal datasets. Our experimental results show that MulCo successfully integrates linguistic cues pertaining to temporal reasoning across both short and long proximity bands and achieves new state-of-the-art results on several ETRE benchmark datasets.
Updated: 2024-07-26 17:04:53
标题: 提取事件时间关系的多尺度知识
摘要: 事件时间关系提取(ETRE)至关重要但具有挑战性。在一个话语中,事件对位于不同距离或所谓的接近度带。关于事件对的时间排序在更遥远(即“长”)或不那么遥远(即“短”)的接近度带上被编码得不同。SOTA模型往往表现良好,无论是位于短距离还是长距离接近度带上的事件,但不是两者都表现良好。然而,现实世界中的自然文本包含所有类型的时间事件对。在本文中,我们提出了MulCo:通过对比学习提炼多尺度知识,这是一种知识共同蒸馏方法,通过在多个事件对接近度带之间共享知识,以提高所有类型的时间数据集上的性能。我们的实验结果表明,MulCo成功地整合了关于时间推理的语言线索,跨越短距离和长距离接近度带,并在几个ETRE基准数据集上取得了新的最先进的结果。
更新时间: 2024-07-26 17:04:53
领域: cs.CL,cs.AI,cs.LG
Downlink CCM Estimation via Representation Learning with Graph Regularization
In this paper, we propose an algorithm for downlink (DL) channel covariance matrix (CCM) estimation for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) communication systems with base station (BS) possessing a uniform linear array (ULA) antenna structure. We make use of the inherent similarity between the uplink (UL) CCM and the DL CCM due to angular reciprocity. We consider a setting where the UL CCM is mapped to DL CCM by a mapping function. We first present a theoretical error analysis of learning a nonlinear embedding by constructing a mapping function, which points to the importance of the Lipschitz regularity of the mapping function for achieving high estimation performance. Then, based on the theoretical ground, we propose a representation learning algorithm as a solution for the estimation problem, where Gaussian RBF kernel interpolators are chosen to map UL CCMs to their DL counterparts. The proposed algorithm is based on the optimization of an objective function that fits a regression model between the DL CCM and UL CCM samples in the training dataset and preserves the local geometric structure of the data in the UL CCM space, while explicitly regulating the Lipschitz continuity of the mapping function in light of our theoretical findings. The proposed algorithm surpasses benchmark methods in terms of three error metrics as shown by simulations.
Updated: 2024-07-26 16:52:30
标题: 通过图正则化的表示学习实现下行链路CCM估计
摘要: 在本文中,我们提出了一种用于频分双工(FDD)大规模多输入多输出(MIMO)通信系统下行(DL)信道协方差矩阵(CCM)估计的算法,基站(BS)具有均匀线性阵列(ULA)天线结构。我们利用上行(UL)CCM和DL CCM之间由于角度互易性的固有相似性。我们考虑UL CCM通过映射函数映射到DL CCM的设置。我们首先对通过构建映射函数学习非线性嵌入的理论误差进行分析,指出映射函数的Lipschitz正则性对于实现高估计性能的重要性。然后,基于理论基础,我们提出了一种表示学习算法作为估计问题的解决方案,其中选择高斯RBF核插值器将UL CCM映射到它们的DL对应物。所提出的算法基于优化一个目标函数,适配训练数据集中DL CCM和UL CCM样本之间的回归模型,并保留UL CCM空间的数据的局部几何结构,同时根据我们的理论发现明确调节映射函数的Lipschitz连续性。通过模拟结果显示,所提出的算法在三个误差度量方面超过了基准方法。
更新时间: 2024-07-26 16:52:30
领域: cs.LG,eess.SP
HADES: Detecting Active Directory Attacks via Whole Network Provenance Analytics
Due to its crucial role in identity and access management in modern enterprise networks, Active Directory (AD) is a top target of Advanced Persistence Threat (APT) actors. Conventional intrusion detection systems (IDS) excel at identifying malicious behaviors caused by malware, but often fail to detect stealthy attacks launched by APT actors. Recent advance in provenance-based IDS (PIDS) shows promises by exposing malicious system activities in causal attack graphs. However, existing approaches are restricted to intra-machine tracing, and unable to reveal the scope of attackers' traversal inside a network. We propose HADES, the first PIDS capable of performing accurate causality-based cross-machine tracing by leveraging a novel concept called logon session based execution partitioning to overcome several challenges in cross-machine tracing. We design HADES as an efficient on-demand tracing system, which performs whole-network tracing only when it first identifies an authentication anomaly signifying an ongoing AD attack, for which we introduce a novel lightweight authentication anomaly detection model rooted in our extensive analysis of AD attacks. To triage attack alerts, we present a new algorithm integrating two key insights we identified in AD attacks. Our evaluations show that HADES outperforms both popular open source detection systems and a prominent commercial AD attack detector.
Updated: 2024-07-26 16:46:29
标题: HADES:通过整个网络溯源分析检测Active Directory攻击
摘要: 由于在现代企业网络中的身份和访问管理中起着至关重要的作用,Active Directory(AD)是高级持续威胁(APT)行为者的主要目标。传统入侵检测系统(IDS)擅长识别由恶意软件引起的恶意行为,但往往无法检测由APT行为者发起的隐秘攻击。最近在基于溯源的IDS(PIDS)方面取得了进展,通过暴露因果攻击图中的恶意系统活动,显示出了希望。然而,现有方法局限于机器内部追踪,无法揭示攻击者在网络内部的遍历范围。我们提出了HADES,这是第一个能够利用一种称为基于登录会话的执行分区的新颖概念,从而克服了跨机器追踪中的几个挑战,从而执行准确的基于因果关系的跨机器追踪的PIDS。我们将HADES设计为一种高效的按需追踪系统,仅在首次识别到表示正在进行的AD攻击的身份验证异常时才执行整个网络追踪,为此我们引入了一种根植于我们对AD攻击的广泛分析的新颖轻量级身份验证异常检测模型。为了对攻击警报进行分类,我们提出了一种整合我们在AD攻击中识别出的两个关键见解的新算法。我们的评估结果表明,HADES优于流行的开源检测系统和一种知名的商业AD攻击检测器。
更新时间: 2024-07-26 16:46:29
领域: cs.CR
Semantic Prototypes: Enhancing Transparency Without Black Boxes
As machine learning (ML) models and datasets increase in complexity, the demand for methods that enhance explainability and interpretability becomes paramount. Prototypes, by encapsulating essential characteristics within data, offer insights that enable tactical decision-making and enhance transparency. Traditional prototype methods often rely on sub-symbolic raw data and opaque latent spaces, reducing explainability and increasing the risk of misinterpretations. This paper presents a novel framework that utilizes semantic descriptions to define prototypes and provide clear explanations, effectively addressing the shortcomings of conventional methods. Our approach leverages concept-based descriptions to cluster data on the semantic level, ensuring that prototypes not only represent underlying properties intuitively but are also straightforward to interpret. Our method simplifies the interpretative process and effectively bridges the gap between complex data structures and human cognitive processes, thereby enhancing transparency and fostering trust. Our approach outperforms existing widely-used prototype methods in facilitating human understanding and informativeness, as validated through a user survey.
Updated: 2024-07-26 16:37:52
标题: 语义原型:在不使用黑匣子的情况下增强透明度
摘要: 随着机器学习(ML)模型和数据集的复杂性不断增加,需求日益迫切地需要提高解释性和可解释性的方法。原型通过将数据中的关键特征封装在一起,提供了启示,使战术决策变得更加透明。传统的原型方法通常依赖于亚符号原始数据和不透明的潜在空间,降低了可解释性,增加了误解的风险。本文提出了一种利用语义描述来定义原型并提供清晰解释的新框架,有效地解决了传统方法的缺点。我们的方法利用基于概念的描述在语义级别上对数据进行聚类,确保原型不仅直观地代表潜在属性,而且易于解释。我们的方法简化了解释过程,有效地弥合了复杂数据结构与人类认知过程之间的差距,从而提高了透明度并增进了信任。我们的方法在促进人类理解和信息性方面优于现有广泛使用的原型方法,通过用户调查得到验证。
更新时间: 2024-07-26 16:37:52
领域: cs.LG,cs.AI
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment
Image classification models often demonstrate unstable performance in real-world applications due to variations in image information, driven by differing visual perspectives of subject objects and lighting discrepancies. To mitigate these challenges, existing studies commonly incorporate additional modal information matching the visual data to regularize the model's learning process, enabling the extraction of high-quality visual features from complex image regions. Specifically, in the realm of multimodal learning, cross-modal alignment is recognized as an effective strategy, harmonizing different modal information by learning a domain-consistent latent feature space for visual and semantic features. However, this approach may face limitations due to the heterogeneity between multimodal information, such as differences in feature distribution and structure. To address this issue, we introduce a Multimodal Alignment and Reconstruction Network (MARNet), designed to enhance the model's resistance to visual noise. Importantly, MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains. Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model. It is a plug-and-play framework that can be rapidly integrated into various image classification frameworks, boosting model performance.
Updated: 2024-07-26 16:30:18
标题: 利用扩散模型统一视觉和语义特征空间,以增强跨模态对齐
摘要: 图像分类模型在真实世界应用中常常表现出不稳定的性能,这是由于图像信息的变化,受主体对象的视觉角度和光照差异驱动。为了缓解这些挑战,现有研究通常会引入额外的模态信息,将视觉数据匹配以规范模型的学习过程,从而使复杂图像区域的高质量视觉特征得以提取。具体来说,在多模态学习领域,跨模态对齐被认为是一种有效的策略,通过学习一个领域一致的潜在特征空间,协调不同模态信息的视觉和语义特征。然而,这种方法可能会面临限制,因为多模态信息之间存在异质性,比如特征分布和结构的差异。为了解决这个问题,我们引入了一个名为多模态对齐和重构网络(MARNet)的模型,旨在增强模型对视觉噪音的抵抗力。重要的是,MARNet包括一个跨模态扩散重构模块,可以平稳稳定地混合不同领域的信息。在两个基准数据集Vireo-Food172和Ingredient-101上进行的实验表明,MARNet有效地提高了模型提取的图像信息的质量。它是一个即插即用的框架,可以快速集成到各种图像分类框架中,提升模型性能。
更新时间: 2024-07-26 16:30:18
领域: cs.CV,cs.AI
On TinyML and Cybersecurity: Electric Vehicle Charging Infrastructure Use Case
As technology advances, the use of Machine Learning (ML) in cybersecurity is becoming increasingly crucial to tackle the growing complexity of cyber threats. While traditional ML models can enhance cybersecurity, their high energy and resource demands limit their applications, leading to the emergence of Tiny Machine Learning (TinyML) as a more suitable solution for resource-constrained environments. TinyML is widely applied in areas such as smart homes, healthcare, and industrial automation. TinyML focuses on optimizing ML algorithms for small, low-power devices, enabling intelligent data processing directly on edge devices. This paper provides a comprehensive review of common challenges of TinyML techniques, such as power consumption, limited memory, and computational constraints; it also explores potential solutions to these challenges, such as energy harvesting, computational optimization techniques, and transfer learning for privacy preservation. On the other hand, this paper discusses TinyML's applications in advancing cybersecurity for Electric Vehicle Charging Infrastructures (EVCIs) as a representative use case. It presents an experimental case study that enhances cybersecurity in EVCI using TinyML, evaluated against traditional ML in terms of reduced delay and memory usage, with a slight trade-off in accuracy. Additionally, the study includes a practical setup using the ESP32 microcontroller in the PlatformIO environment, which provides a hands-on assessment of TinyML's application in cybersecurity for EVCI.
Updated: 2024-07-26 16:25:15
标题: 关于TinyML和网络安全:电动汽车充电基础设施应用案例
摘要: 随着技术的进步,机器学习(ML)在网络安全领域的应用变得日益重要,以应对网络威胁日益复杂的挑战。传统的ML模型虽然能够增强网络安全,但其高能耗和资源需求限制了其应用范围,导致微型机器学习(TinyML)的出现,成为资源受限环境中更为合适的解决方案。TinyML广泛应用于智能家居、医疗保健和工业自动化等领域。TinyML专注于优化小型、低功耗设备的ML算法,使其能够在边缘设备上直接进行智能数据处理。本文综合审查了TinyML技术的常见挑战,如能耗、内存有限和计算约束;同时探讨了这些挑战的潜在解决方案,如能量收集、计算优化技术和用于隐私保护的迁移学习。另一方面,本文讨论了TinyML在推进电动车充电基础设施(EVCIs)网络安全方面的应用,以此为代表性案例。它提出了一个通过TinyML增强EVCI网络安全的实验案例研究,并与传统ML进行了评估,结果表明在减少延迟和内存使用方面有所改善,但在准确性上略有折衷。此外,该研究包括在PlatformIO环境中使用ESP32微控制器的实际设置,以便对TinyML在EVCI网络安全中的应用进行实际评估。
更新时间: 2024-07-26 16:25:15
领域: cs.CR,cs.AI,cs.LG
Repairing Networks of $\mathcal{EL_\perp}$ Ontologies using Weakening and Completing -- Extended version
The quality of ontologies and their alignments is crucial for developing high-quality semantics-based applications. Traditional debugging techniques repair ontology networks by removing unwanted axioms and mappings, but may thereby remove consequences that are correct in the domain of the ontology network. In this paper we propose a framework for repairing ontology networks that deals with this issue. It defines basic operations such as debugging, weakening and completing. Further, it defines combination operators that reflect choices in how and when to use the basic operators, as well as choices regarding the autonomy level of the ontologies and alignments in the ontology network. We show the influence of the combination operators on the quality of the repaired network and present an implemented tool. By using our framework together with existing algorithms for debugging, weakening and completing, we essentially provide a blueprint for extending previous work and systems.
Updated: 2024-07-26 16:15:33
标题: 使用削弱和完善修复$\mathcal{EL_\perp}$本体网络 -- 扩展版本
摘要: 本文提出了一个修复本体网络的框架,该框架解决了传统调试技术修复本体网络时可能会移除正确领域内推论的问题。该框架定义了调试、弱化和完善等基本操作,并定义了反映基本操作如何以及何时使用的组合运算符,以及对本体网络中的本体和对齐的自治水平进行选择的选择。我们展示了组合运算符对修复网络质量的影响,并提出了一个已实现的工具。通过将我们的框架与现有的调试、弱化和完善算法一起使用,我们基本上为扩展先前的工作和系统提供了一个蓝图。
更新时间: 2024-07-26 16:15:33
领域: cs.AI,cs.LO
Enhancing material property prediction with ensemble deep graph convolutional networks
Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration and promoting rapid innovation cycles. Recent efforts have focused on employing advanced ML algorithms, including deep learning - based graph neural network, for property prediction. Additionally, ensemble models have proven to enhance the generalizability and robustness of ML and DL. However, the use of such ensemble strategies in deep graph networks for material property prediction remains underexplored. Our research provides an in-depth evaluation of ensemble strategies in deep learning - based graph neural network, specifically targeting material property prediction tasks. By testing the Crystal Graph Convolutional Neural Network (CGCNN) and its multitask version, MT-CGCNN, we demonstrated that ensemble techniques, especially prediction averaging, substantially improve precision beyond traditional metrics for key properties like formation energy per atom ($\Delta E^{f}$), band gap ($E_{g}$) and density ($\rho$) in 33,990 stable inorganic materials. These findings support the broader application of ensemble methods to enhance predictive accuracy in the field.
Updated: 2024-07-26 16:12:06
标题: 使用集成深度图卷积网络提升材料属性预测
摘要: 机器学习(ML)模型已经成为加速材料发现和设计的强大工具,通过从组成和结构数据中准确预测性能。这些能力对于开发能源、电子和生物医学等领域的先进技术至关重要,可能减少新材料探索所需的时间和资源,并促进快速创新周期。最近的努力集中在使用先进的ML算法,包括基于深度学习的图神经网络,用于性能预测。此外,集成模型已被证明可以增强ML和DL的泛化能力和稳健性。然而,在深度图网络中使用这种集成策略进行材料性能预测仍未得到充分探讨。我们的研究对深度学习 - 基于图神经网络中的集成策略进行了深入评估,特别针对材料性能预测任务。通过测试晶体图卷积神经网络(CGCNN)及其多任务版本MT-CGCNN,我们证明了集成技术,尤其是预测平均法,可以大幅提高33,990种稳定无机材料的关键性能(如每个原子的形成能量ΔEf、带隙Eg和密度ρ)的精确度,超越传统指标。这些发现支持在该领域中广泛应用集成方法以增强预测准确性。
更新时间: 2024-07-26 16:12:06
领域: cs.LG,cs.AI
On The Expressive Power of Knowledge Graph Embedding Methods
Knowledge Graph Embedding (KGE) is a popular approach, which aims to represent entities and relations of a knowledge graph in latent spaces. Their representations are known as embeddings. To measure the plausibility of triplets, score functions are defined over embedding spaces. Despite wide dissemination of KGE in various tasks, KGE methods have limitations in reasoning abilities. In this paper we propose a mathematical framework to compare reasoning abilities of KGE methods. We show that STransE has a higher capability than TransComplEx, and then present new STransCoRe method, which improves the STransE by combining it with the TransCoRe insights, which can reduce the STransE space complexity.
Updated: 2024-07-26 16:11:23
标题: 关于知识图谱嵌入方法的表达能力
摘要: 知识图谱嵌入(KGE)是一种流行的方法,旨在在潜在空间中表示知识图谱的实体和关系。它们的表示被称为嵌入。为了衡量三元组的合理性,在嵌入空间上定义了得分函数。尽管KGE在各种任务中广泛传播,但KGE方法在推理能力方面存在局限性。在本文中,我们提出了一个数学框架来比较KGE方法的推理能力。我们展示了STransE比TransComplEx具有更高的能力,然后提出了新的STransCoRe方法,通过将其与TransCoRe的见解结合,改进了STransE,可以降低STransE的空间复杂度。
更新时间: 2024-07-26 16:11:23
领域: cs.AI,cs.LG,MCS 68T30,I.2.4
CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks
Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. This paper presents a novel graph generation model, called CGGM, designed specifically to generate a larger number of nodes belonging to the minority class. The mechanism for generating an adjacency matrix, through adaptive sparsity, enhances flexibility in its structure. The feature generation module, called multidimensional features generator (MFG) to generate node features along with topological information. Labels are transformed into embedding vectors, serving as conditional constraints to control the generation of synthetic data across multiple categories. Using a multi-stage loss, the distribution of synthetic data is adjusted to closely resemble that of real data. In extensive experiments, we show that CGGM's synthetic data outperforms state-of-the-art methods across various metrics. Our results demonstrate efficient generation of diverse data categories, robustly enhancing multi-category classification model performance.
Updated: 2024-07-26 16:06:39
标题: CGGM:一种带有自适应稀疏度的条件图生成模型,用于物联网网络中节点异常检测
摘要: 动态图广泛用于检测物联网(IoT)中节点的异常行为。生成模型通常用于解决动态图中节点类别不平衡的问题。然而,它所面临的约束包括邻接关系的单调性,构建多维特征节点的困难,以及缺乏一种用于端到端生成多类节点的方法。本文提出了一种新颖的图生成模型,称为CGGM,专门设计用于生成属于少数类的更多节点。通过自适应稀疏性生成邻接矩阵的机制增强了其结构的灵活性。特征生成模块称为多维特征生成器(MFG)用于生成节点特征以及拓扑信息。标签被转换为嵌入向量,作为条件约束来控制跨多个类别生成合成数据。使用多阶段损失,合成数据的分布被调整以更接近真实数据。在广泛的实验中,我们展示了CGGM合成数据在各种指标上优于最先进的方法。我们的结果表明,有效生成多样化数据类别,稳健地提高多类别分类模型的性能。
更新时间: 2024-07-26 16:06:39
领域: cs.RO,cs.LG
QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning
Inspired by the success of the Transformer architecture in natural language processing and computer vision, we investigate the use of Transformers in Reinforcement Learning (RL), specifically in modeling the environment's dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities of TDMs for continuous control in real-time planning scenarios with Model Predictive Control (MPC). While Transformers excel in long-horizon prediction, their tokenization mechanism and autoregressive nature lead to costly planning over long horizons, especially as the environment's dimensionality increases. To alleviate this issue, we use a TDM for short-term planning, and learn an autoregressive discrete Q-function using a separate Q-Transformer (QT) model to estimate a long-term return beyond the short-horizon planning. Our proposed method, QT-TDM, integrates the robust predictive capabilities of Transformers as dynamics models with the efficacy of a model-free Q-Transformer to mitigate the computational burden associated with real-time planning. Experiments in diverse state-based continuous control tasks show that QT-TDM is superior in performance and sample efficiency compared to existing Transformer-based RL models while achieving fast and computationally efficient inference.
Updated: 2024-07-26 16:05:26
标题: QT-TDM:使用Transformer动态模型和自回归Q学习进行规划
摘要: 受到Transformer架构在自然语言处理和计算机视觉中取得成功的启发,我们研究了在强化学习(RL)中使用Transformer的可能性,特别是在使用Transformer动态模型(TDMs)对环境动态进行建模方面。我们评估了TDMs在实时规划场景中连续控制的能力,采用模型预测控制(MPC)。虽然Transformer在长期预测方面表现出色,但其标记化机制和自回归性质导致在长期规划中的成本昂贵,特别是当环境维度增加时。为了缓解这一问题,我们使用TDM进行短期规划,并使用单独的Q-Transformer(QT)模型学习自回归离散Q函数,以估计超出短期规划的长期回报。我们提出的方法,QT-TDM,将Transformer作为动态模型的强大预测能力与无模型Q-Transformer的有效性相结合,以减轻与实时规划相关的计算负担。在各种基于状态的连续控制任务中进行的实验表明,相比现有基于Transformer的RL模型,QT-TDM在性能和样本效率上表现更优秀,同时实现了快速和高效的推断。
更新时间: 2024-07-26 16:05:26
领域: cs.LG
The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning
This paper introduces a new empirical methodology, the Cross-environment Hyperparameter Setting Benchmark, that compares RL algorithms across environments using a single hyperparameter setting, encouraging algorithmic development which is insensitive to hyperparameters. We demonstrate that this benchmark is robust to statistical noise and obtains qualitatively similar results across repeated applications, even when using few samples. This robustness makes the benchmark computationally cheap to apply, allowing statistically sound insights at low cost. We demonstrate two example instantiations of the CHS, on a set of six small control environments (SC-CHS) and on the entire DM Control suite of 28 environments (DMC-CHS). Finally, to illustrate the applicability of the CHS to modern RL algorithms on challenging environments, we conduct a novel empirical study of an open question in the continuous control literature. We show, with high confidence, that there is no meaningful difference in performance between Ornstein-Uhlenbeck noise and uncorrelated Gaussian noise for exploration with the DDPG algorithm on the DMC-CHS.
Updated: 2024-07-26 16:04:40
标题: 跨环境强化学习超参数设置基准测试
摘要: 本文介绍了一种新的实证方法,即跨环境超参数设置基准(Cross-environment Hyperparameter Setting Benchmark),该方法通过使用单一超参数设置比较强化学习算法在不同环境中的表现,鼓励对超参数不敏感的算法开发。我们证明这一基准对统计噪声具有鲁棒性,并在重复应用时获得定性相似的结果,即使使用少量样本也是如此。这种鲁棒性使得基准在计算上具有较低成本,可以以低成本获得统计上可靠的见解。我们展示了两个CHS的示例实例,一个是在六个小型控制环境上(SC-CHS),另一个是在包含28个环境的整个DM Control套件上(DMC-CHS)。最后,为了说明CHS在现代强化学习算法在具有挑战性的环境中的适用性,我们进行了一项针对连续控制文献中一个开放问题的新型实证研究。我们展示了通过对DMC-CHS上使用DDPG算法进行探索时,奥恩斯坦-乌伦贝克噪声和不相关的高斯噪声之间的表现没有实质性差异。
更新时间: 2024-07-26 16:04:40
领域: cs.LG
The Role of Temporal Hierarchy in Spiking Neural Networks
Spiking Neural Networks (SNNs) have the potential for rich spatio-temporal signal processing thanks to exploiting both spatial and temporal parameters. The temporal dynamics such as time constants of the synapses and neurons and delays have been recently shown to have computational benefits that help reduce the overall number of parameters required in the network and increase the accuracy of the SNNs in solving temporal tasks. Optimizing such temporal parameters, for example, through gradient descent, gives rise to a temporal architecture for different problems. As has been shown in machine learning, to reduce the cost of optimization, architectural biases can be applied, in this case in the temporal domain. Such inductive biases in temporal parameters have been found in neuroscience studies, highlighting a hierarchy of temporal structure and input representation in different layers of the cortex. Motivated by this, we propose to impose a hierarchy of temporal representation in the hidden layers of SNNs, highlighting that such an inductive bias improves their performance. We demonstrate the positive effects of temporal hierarchy in the time constants of feed-forward SNNs applied to temporal tasks (Multi-Time-Scale XOR and Keyword Spotting, with a benefit of up to 4.1% in classification accuracy). Moreover, we show that such architectural biases, i.e. hierarchy of time constants, naturally emerge when optimizing the time constants through gradient descent, initialized as homogeneous values. We further pursue this proposal in temporal convolutional SNNs, by introducing the hierarchical bias in the size and dilation of temporal kernels, giving rise to competitive results in popular temporal spike-based datasets.
Updated: 2024-07-26 16:00:20
标题: 脉冲神经网络中时间层次结构的作用
摘要: 脉冲神经网络(SNN)由于利用空间和时间参数,具有丰富的时空信号处理潜力。最近展示了突触和神经元的时间常数以及延迟等时间动态具有计算优势,有助于减少网络所需的参数总数,并提高SNN在解决时间任务时的准确性。通过梯度下降等优化这些时间参数,例如产生不同问题的时间架构。正如在机器学习中所示,为了减少优化成本,可以应用架构偏差,即在时间领域中。神经科学研究发现了这种时间参数中的归纳偏差,突出了皮层不同层中的时间结构和输入表示的层次。受此启发,我们提议在SNN的隐藏层中施加时间表示的层次结构,强调这种归纳偏差改善了它们的性能。我们展示了在应用于时间任务(多时间尺度XOR和关键词识别)的前馈SNN的时间常数中时间层次结构的积极效果(在分类准确性上的受益高达4.1%)。此外,我们展示了这种架构偏差,即时间常数的层次结构,在通过梯度下降优化时间常数时会自然出现,并初始化为均匀值。我们进一步在时间卷积SNN中探索这一提议,通过引入时间内核的大小和扩张的层次偏差,在流行的基于时间脉冲的数据集中取得了竞争性结果。
更新时间: 2024-07-26 16:00:20
领域: cs.NE,cs.LG
Learning to Visually Connect Actions and their Effects
We introduce the novel concept of visually Connecting Actions and Their Effects (CATE) in video understanding. CATE can have applications in areas like task planning and learning from demonstration. We identify and explore two different aspects of the concept of CATE: Action Selection (AS) and Effect-Affinity Assessment (EAA), where video understanding models connect actions and effects at semantic and fine-grained levels, respectively. We design various baseline models for AS and EAA. Despite the intuitive nature of the task, we observe that models struggle, and humans outperform them by a large margin. Our experiments show that in solving AS and EAA, models learn intuitive properties like object tracking and pose encoding without explicit supervision. We demonstrate that CATE can be an effective self-supervised task for learning video representations from unlabeled videos. The study aims to showcase the fundamental nature and versatility of CATE, with the hope of inspiring advanced formulations and models.
Updated: 2024-07-26 16:00:07
标题: 学习将动作与其效果视觉连接起来
摘要: 我们引入了视频理解中连接动作和其效果的新概念——CATE。CATE可以在任务规划和从演示中学习等领域应用。我们确定并探索了CATE概念的两个不同方面:动作选择(AS)和效果亲和性评估(EAA),其中视频理解模型分别在语义和细粒度级别连接动作和效果。我们为AS和EAA设计了各种基线模型。尽管任务的直觉性质,我们观察到模型面临困难,而人类则以较大的优势表现。我们的实验表明,在解决AS和EAA时,模型学习了直观属性,如对象跟踪和姿势编码,而无需显式监督。我们展示了CATE可以成为从未标记视频中学习视频表示的有效自监督任务。该研究旨在展示CATE的基本性质和多功能性,希望激发先进的表述和模型。
更新时间: 2024-07-26 16:00:07
领域: cs.CV,cs.AI,cs.LG,cs.RO
Accurate and Scalable Detection and Investigation of Cyber Persistence Threats
In Advanced Persistent Threat (APT) attacks, achieving stealthy persistence within target systems is often crucial for an attacker's success. This persistence allows adversaries to maintain prolonged access, often evading detection mechanisms. Recognizing its pivotal role in the APT lifecycle, this paper introduces Cyber Persistence Detector (CPD), a novel system dedicated to detecting cyber persistence through provenance analytics. CPD is founded on the insight that persistent operations typically manifest in two phases: the "persistence setup" and the subsequent "persistence execution". By causally relating these phases, we enhance our ability to detect persistent threats. First, CPD discerns setups signaling an impending persistent threat and then traces processes linked to remote connections to identify persistence execution activities. A key feature of our system is the introduction of pseudo-dependency edges (pseudo-edges), which effectively connect these disjoint phases using data provenance analysis, and expert-guided edges, which enable faster tracing and reduced log size. These edges empower us to detect persistence threats accurately and efficiently. Moreover, we propose a novel alert triage algorithm that further reduces false positives associated with persistence threats. Evaluations conducted on well-known datasets demonstrate that our system reduces the average false positive rate by 93% compared to state-of-the-art methods.
Updated: 2024-07-26 15:51:49
标题: 准确且可扩展的检测和调查网络持久性威胁
摘要: 在高级持续性威胁(APT)攻击中,达到对目标系统的隐蔽持久性通常对攻击者的成功至关重要。这种持久性使对手能够保持长时间的访问,往往可以规避检测机制。本文认识到在APT生命周期中的关键作用,介绍了一种名为网络持久性检测器(CPD)的新型系统,专门用于通过溯源分析检测网络持久性。CPD建立在一种洞察力之上,即持续性操作通常表现为两个阶段:建立“持久性设置”和随后的“持久性执行”。通过因果关系这些阶段,我们增强了检测持久性威胁的能力。首先,CPD识别出预示即将出现的持久性威胁的设置,然后跟踪与远程连接相关的进程,以识别持久性执行活动。我们系统的一个关键特性是引入了伪依赖边(伪边),通过数据溯源分析有效地连接这些不连续的阶段,以及专家指导的边,使追踪更加快速,日志大小更小。这些边使我们能够准确高效地检测持久性威胁。此外,我们提出了一种新颖的警报分类算法,进一步减少与持久性威胁相关的误报。对知名数据集进行的评估表明,与最先进的方法相比,我们的系统将平均假阳性率降低了93%。
更新时间: 2024-07-26 15:51:49
领域: cs.CR
VACoDe: Visual Augmented Contrastive Decoding
Despite the astonishing performance of recent Large Vision-Language Models (LVLMs), these models often generate inaccurate responses. To address this issue, previous studies have focused on mitigating hallucinations by employing contrastive decoding (CD) with augmented images, which amplifies the contrast with the original image. However, these methods have limitations, including reliance on a single augmentation, which is restrictive for certain tasks, as well as the high cost of using external knowledge. In this study, we address these limitations by exploring how to utilize multiple image augmentations. Through extensive experiments, we observed that different augmentations produce varying levels of contrast depending on the task. Based on this observation, we introduce a novel method called VACoDe, Visual Augmented Contrastive Decoding. This method adaptively selects the augmentation with the highest contrast for each task using the proposed softmax distance metric. Our empirical tests show that \alg outperforms previous methods and improves output quality in various vision-language tasks. Additionally, VACoDe can be universally applied across different model types and sizes without additional training or the use of external models and data.
Updated: 2024-07-26 15:49:31
标题: VACoDe:视觉增强对比解码
摘要: 尽管最近的大型视觉语言模型(LVLMs)表现出色,但这些模型通常会生成不准确的响应。为了解决这个问题,先前的研究集中在通过使用对比解码(CD)和增强图像来减轻幻觉,从而增加与原始图像的对比。然而,这些方法存在局限性,包括依赖单一增强,对于某些任务来说受限制,并且使用外部知识的成本较高。在这项研究中,我们通过探索如何利用多个图像增强来解决这些局限性。通过大量实验,我们观察到不同的增强根据任务产生不同程度的对比。基于这一观察,我们引入了一种新方法称为VACoDe,即视觉增强对比解码。该方法使用提出的softmax距离度量,为每个任务自适应选择具有最高对比度的增强。我们的实证测试显示,该算法优于先前的方法,并在各种视觉语言任务中提高了输出质量。此外,VACoDe可以在不需要额外训练或使用外部模型和数据的情况下,在不同的模型类型和大小上普遍应用。
更新时间: 2024-07-26 15:49:31
领域: cs.CV,cs.AI,68T01,I.2.0
Weyl Calculus and Exactly Solvable Schrödinger Bridges with Quadratic State Cost
Schr\"{o}dinger bridge--a stochastic dynamical generalization of optimal mass transport--exhibits a learning-control duality. Viewed as a stochastic control problem, the Schr\"{o}dinger bridge finds an optimal control policy that steers a given joint state statistics to another while minimizing the total control effort subject to controlled diffusion and deadline constraints. Viewed as a stochastic learning problem, the Schr\"{o}dinger bridge finds the most-likely distribution-valued trajectory connecting endpoint distributional observations, i.e., solves the two point boundary-constrained maximum likelihood problem over the manifold of probability distributions. Recent works have shown that solving the Schr\"{o}dinger bridge problem with state cost requires finding the Markov kernel associated with a reaction-diffusion PDE where the state cost appears as a state-dependent reaction rate. We explain how ideas from Weyl calculus in quantum mechanics, specifically the Weyl operator and the Weyl symbol, can help determine such Markov kernels. We illustrate these ideas by explicitly finding the Markov kernel for the case of quadratic state cost via Weyl calculus, recovering our earlier results but avoiding tedious computation with Hermite polynomials.
Updated: 2024-07-26 15:48:23
标题: Weyl微积分与具有二次状态成本的可解谐振子桥连接
摘要: Schr\"{o}dinger bridge-一种随机动态的最佳质量传输的推广-展现了学习控制二重性。将Schr\"{o}dinger bridge视为随机控制问题,可以找到一个最优控制策略,将给定的联合状态统计引导到另一个状态,同时最小化总控制努力,受控扩散和截止约束。将Schr\"{o}dinger bridge视为随机学习问题,可以找到连接端点分布观测的最可能的分布值轨迹,即在概率分布流形上解决两点边界约束最大似然问题。最近的研究表明,解决具有状态成本的Schr\"{o}dinger bridge问题需要找到与反应扩散PDE相关联的马尔可夫核,其中状态成本出现为状态相关的反应速率。我们解释了如何使用量子力学中的Weyl微积分思想,特别是Weyl算子和Weyl符号,可以帮助确定这种马尔可夫核。我们通过明确找到二次状态成本情况下的Weyl微积分,展示了这些思想,恢复了我们早期的结果,避免使用Hermite多项式进行繁琐计算。
更新时间: 2024-07-26 15:48:23
领域: math.OC,cs.LG,cs.SY,eess.SY,math-ph,math.MP,stat.ML
Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models
Data-driven research in Additive Manufacturing (AM) has gained significant success in recent years. This has led to a plethora of scientific literature to emerge. The knowledge in these works consists of AM and Artificial Intelligence (AI) contexts that have not been mined and formalized in an integrated way. It requires substantial effort and time to extract scientific information from these works. AM domain experts have contributed over two dozen review papers to summarize these works. However, information specific to AM and AI contexts still requires manual effort to extract. The recent success of foundation models such as BERT (Bidirectional Encoder Representations for Transformers) or GPT (Generative Pre-trained Transformers) on textual data has opened the possibility of expediting scientific information extraction. We propose a framework that enables collaboration between AM and AI experts to continuously extract scientific information from data-driven AM literature. A demonstration tool is implemented based on the proposed framework and a case study is conducted to extract information relevant to the datasets, modeling, sensing, and AM system categories. We show the ability of LLMs (Large Language Models) to expedite the extraction of relevant information from data-driven AM literature. In the future, the framework can be used to extract information from the broader design and manufacturing literature in the engineering discipline.
Updated: 2024-07-26 15:43:52
标题: 人工智能与人类团队合作,利用大型语言模型从数据驱动的增材制造研究中提取科学信息
摘要: 近年来,基于数据驱动的增材制造(AM)研究取得了显著成功。这导致大量科学文献的出现。这些作品中的知识涵盖了尚未以整合方式开发的AM和人工智能(AI)背景。从这些作品中提取科学信息需要大量的努力和时间。AM领域专家已经撰写了两打以上的综述论文,以概括这些作品。然而,特定于AM和AI背景的信息仍需要手动提取。最近BERT(双向编码器表示转换器)或GPT(生成式预训练转换器)等基础模型在文本数据上取得了成功,为加速科学信息提取开辟了可能性。我们提出了一个框架,使AM和AI专家能够协作,持续从数据驱动的AM文献中提取科学信息。基于提出的框架实现了一个演示工具,并进行了一个案例研究,以提取与数据集、建模、传感和AM系统类别相关的信息。我们展示了LLMs(大型语言模型)加速从数据驱动的AM文献中提取相关信息的能力。未来,该框架可以用于从工程学科中更广泛的设计和制造文献中提取信息。
更新时间: 2024-07-26 15:43:52
领域: cs.IR,cs.AI
Diffusion MRI with Machine Learning
Diffusion-weighted magnetic resonance imaging (dMRI) offers unique capabilities including noninvasive probing of brain's tissue microstructure and structural connectivity. It is widely used for clinical assessment of brain pathologies and for neuroscience research. Analyzing the dMRI data to extract useful information for medical and scientific purposes can be challenging. The dMRI measurements often suffer from strong noise and artifacts, there is usually high inter-session and inter-scanner variability in the data, and considerable inter-subject heterogeneity in brain structure. Moreover, the relationship between measurements and the phenomena of interest can be highly complex. Recent years have witnessed increasing use of machine learning methods for dMRI analysis. This manuscript aims to assess these efforts, with a focus on methods that have addressed data preprocessing and harmonization, microstructure mapping, tractography, and white matter tract analysis. We study the main findings, strengths, and weaknesses of the existing methods and suggest topics for future research. We find that machine learning may be exceptionally suited to tackle some of the difficult tasks in dMRI analysis. However, for this to happen, several shortcomings of existing methods and critical unresolved issues need to be addressed. These include deficient evaluation practices, lack of rich training datasets and validation benchmarks, as well as model generalizability, reliability, and explainability concerns.
Updated: 2024-07-26 15:39:03
标题: 扩散磁共振成像与机器学习
摘要: 扩散加权磁共振成像(dMRI)提供了独特的能力,包括无创探测大脑组织微结构和结构连接性。它被广泛用于临床评估大脑病变和神经科学研究。分析dMRI数据以提取医学和科学目的的有用信息可能具有挑战性。dMRI测量通常受到强噪声和伪影的影响,数据中通常存在高的会话间和扫描仪间变异性,以及大量不同受试者之间的大脑结构异质性。此外,测量与感兴趣现象之间的关系可能非常复杂。近年来,机器学习方法在dMRI分析中的应用日益增多。本文旨在评估这些努力,重点关注已解决数据预处理和协调、微结构映射、径迹学和白质径迹分析的方法。我们研究现有方法的主要发现、优点和缺点,并提出未来研究的主题。我们发现机器学习可能特别适合解决dMRI分析中的一些困难任务。然而,要实现这一点,需要解决现有方法的若干不足之处和关键未解决的问题。这些问题包括评估实践的不足、缺乏丰富的训练数据集和验证基准,以及模型的泛化能力、可靠性和可解释性方面的担忧。
更新时间: 2024-07-26 15:39:03
领域: eess.IV,cs.CV,cs.LG
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Recent advancements have significantly enhanced the capabilities of Multimodal Large Language Models (MLLMs) in generating and understanding image-to-text content. Despite these successes, progress is predominantly limited to English due to the scarcity of high quality multimodal resources in other languages. This limitation impedes the development of competitive models in languages such as Arabic. To alleviate this situation, we introduce an efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions. Dallah demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning six Arabic dialects, Dallah showcases its capability to handle complex dialectal interactions incorporating both textual and visual elements. The model excels in two benchmark tests: one evaluating its performance on Modern Standard Arabic (MSA) and another specifically designed to assess dialectal responses. Beyond its robust performance in multimodal interaction tasks, Dallah has the potential to pave the way for further development of dialect-aware Arabic MLLMs.
Updated: 2024-07-26 15:34:12
标题: Dallah:一种针对阿拉伯语的方言感知的多模态大型语言模型
摘要: 最近的进展显著增强了多模态大型语言模型(MLLMs)在生成和理解图像到文本内容方面的能力。尽管取得了这些成功,但进展主要受限于英语,因为其他语言中高质量的多模态资源稀缺。这种限制阻碍了在阿拉伯语等语言中开发竞争性模型的进展。为了缓解这种情况,我们引入了一款名为Dallah的高效阿拉伯语多模态助手,利用基于LLaMA-2的先进语言模型来促进多模态交互。Dallah在阿拉伯语MLLMs方面展现了最新的性能。通过对六种阿拉伯方言进行微调,Dallah展示了其处理包含文本和视觉元素的复杂方言交互的能力。该模型在两个基准测试中表现出色:一个评估其在现代标准阿拉伯语(MSA)上的表现,另一个专门设计用于评估方言回应。除了在多模态交互任务中表现出强大的性能外,Dallah还有潜力为进一步发展方言感知的阿拉伯语MLLMs铺平道路。
更新时间: 2024-07-26 15:34:12
领域: cs.CL,cs.AI
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Disentangled representation learning (DRL) aims to identify and decompose underlying factors behind observations, thus facilitating data perception and generation. However, current DRL approaches often rely on the unrealistic assumption that semantic factors are statistically independent. In reality, these factors may exhibit correlations, which off-the-shelf solutions have yet to properly address. To tackle this challenge, we introduce a bidirectional weighted graph-based framework, to learn factorized attributes and their interrelations within complex data. Specifically, we propose a $\beta$-VAE based module to extract factors as the initial nodes of the graph, and leverage the multimodal large language model (MLLM) to discover and rank latent correlations, thereby updating the weighted edges. By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement. Experiments demonstrate our method's superior performance in disentanglement and reconstruction. Furthermore, the model inherits enhanced interpretability and generalizability from MLLMs.
Updated: 2024-07-26 15:32:21
标题: 基于图的无监督解缠表示学习:通过多模态大型语言模型
摘要: 解缠结表示学习(DRL)旨在识别和分解观察背后的潜在因素,从而促进数据感知和生成。然而,当前的DRL方法往往依赖于语义因素统计独立的不切实际假设。实际上,这些因素可能存在相关性,而现有解决方案尚未妥善解决这一问题。为了解决这一挑战,我们引入了一个双向加权图形框架,用于学习复杂数据中的因素化属性及其相互关系。具体而言,我们提出了一个基于$\beta$-VAE的模块来提取因素作为图中的初始节点,并利用多模态大型语言模型(MLLM)来发现和排名潜在相关性,从而更新加权边。通过集成这些互补模块,我们的模型成功实现了细粒度、实用和无监督的解缠。实验证明了我们的方法在解缠和重建方面的优越性能。此外,该模型从MLLM中继承了增强的解释性和泛化能力。
更新时间: 2024-07-26 15:32:21
领域: cs.CV,cs.LG
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
We propose Deep Companion Learning (DCL), a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions compared to its historical performance. To achieve this, we train a deep-companion model (DCM), by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision that encourages the primary model to address the scenarios it finds most challenging. We validate our approach through both theoretical analysis and extensive experimentation, including ablation studies, on a variety of benchmark datasets (CIFAR-100, Tiny-ImageNet, ImageNet-1K) using diverse architectural models (ShuffleNetV2, ResNet, Vision Transformer, etc.), demonstrating state-of-the-art performance.
Updated: 2024-07-26 15:31:13
标题: 深度伴侣学习:通过历史一致性增强泛化
摘要: 我们提出了深度伴侣学习(DCL),这是一种新颖的用于深度神经网络(DNNs)的训练方法,通过惩罚与其历史表现不一致的模型预测来增强泛化能力。为了实现这一目标,我们训练一个深度伴侣模型(DCM),利用先前版本的模型对新输入进行预测。这个伴侣模型解析了数据中的有意义的潜在语义结构,从而提供有针对性的监督,鼓励主要模型解决它发现最具挑战性的场景。我们通过理论分析和广泛的实验验证了我们的方法,包括对各种基准数据集(CIFAR-100、Tiny-ImageNet、ImageNet-1K)使用不同架构模型(ShuffleNetV2、ResNet、Vision Transformer等)进行消融研究,展示了最先进的性能。
更新时间: 2024-07-26 15:31:13
领域: cs.CV,cs.LG
Blockchain for Large Language Model Security and Safety: A Holistic Survey
With the advent of accessible interfaces for interacting with large language models, there has been an associated explosion in both their commercial and academic interest. Consequently, there has also been an sudden burst of novel attacks associated with large language models, jeopardizing user data on a massive scale. Situated at a comparable crossroads in its development, and equally prolific to LLMs in its rampant growth, blockchain has emerged in recent years as a disruptive technology with the potential to redefine how we approach data handling. In particular, and due to its strong guarantees about data immutability and irrefutability as well as inherent data provenance assurances, blockchain has attracted significant attention as a means to better defend against the array of attacks affecting LLMs and further improve the quality of their responses. In this survey, we holistically evaluate current research on how blockchains are being used to help protect against LLM vulnerabilities, as well as analyze how they may further be used in novel applications. To better serve these ends, we introduce a taxonomy of blockchain for large language models (BC4LLM) and also develop various definitions to precisely capture the nature of different bodies of research in these areas. Moreover, throughout the paper, we present frameworks to contextualize broader research efforts, and in order to motivate the field further, we identify future research goals as well as challenges present in the blockchain for large language model (BC4LLM) space.
Updated: 2024-07-26 15:24:01
标题: 区块链技术用于大型语言模型安全性与安全性:综合调查
摘要: 随着可访问的界面出现,用于与大型语言模型进行交互,它们的商业和学术兴趣也出现了爆炸性增长。因此,与大型语言模型相关的新型攻击也突然出现,威胁着大规模用户数据。在其发展的一个可比交叉点上,区块链与LLMs在其猛烈增长方面同样多产,近年来作为一项颠覆性技术出现,有潜力重新定义我们处理数据的方式。特别是由于其对数据不可变性和不可辩驳性以及固有数据溯源保证的强有力保证,区块链吸引了重要的关注,作为一种更好地防御影响LLMs的攻击数组和进一步提高其响应质量的手段。在这项调查中,我们全面评估 针对LLM漏洞使用区块链的当前研究,以及分析它们如何进一步应用于新颖应用。为了更好地实现这些目标,我们介绍了用于大型语言模型的区块链(BC4LLM)的分类法,并开发了各种定义,以精确捕捉这些领域不同研究的性质。此外,在整个论文中,我们提供了框架来将更广泛的研究努力置于上下文中,并为了进一步激励该领域,我们确定了未来研究目标以及区块链大型语言模型(BC4LLM)领域存在的挑战。
更新时间: 2024-07-26 15:24:01
领域: cs.CR,cs.AI,cs.DC,cs.LG
Online Planning in POMDPs with State-Requests
In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability. For this scenario, we propose AEMS-SR (Anytime Error Minimization Search with State Requests), a principled online planning algorithm tailored for POMDPs with state requests. By representing the search space as a graph instead of a tree, AEMS-SR avoids the exponential growth of the search space originating from state requests. Theoretical analysis demonstrates AEMS-SR's $\varepsilon$-optimality, ensuring solution quality, while empirical evaluations illustrate its effectiveness compared with AEMS and POMCP, two SOTA online planning algorithms. AEMS-SR enables efficient planning in domains characterized by partial observability and costly state requests offering practical benefits across various applications.
Updated: 2024-07-26 15:20:50
标题: POMDPs中带有状态请求的在线规划
摘要: 在关键的现实世界问题中,有时可以获得完整的状态信息,但成本很高,比如激活精确但能源密集型的传感器或请教人类,从而迫使代理在部分可观察性下运行。针对这种情况,我们提出了一种名为AEMS-SR(带状态请求的任意错误最小化搜索)的在线规划算法,专为具有状态请求的POMDPs量身定制。通过将搜索空间表示为图而不是树,AEMS-SR避免了由于状态请求而导致的搜索空间的指数增长。理论分析证明了AEMS-SR的ε-最优性,确保解决方案的质量,而实证评估则展示了与AEMS和POMCP两种SOTA在线规划算法相比的有效性。AEMS-SR使得在部分可观察性和昂贵的状态请求特征的领域中进行有效规划,从而在各种应用中提供实际的好处。
更新时间: 2024-07-26 15:20:50
领域: cs.LG,cs.AI
Interpreting artificial neural networks to detect genome-wide association signals for complex traits
Investigating the genetic architecture of complex diseases is challenging due to the highly polygenic and interactive landscape of genetic and environmental factors. Although genome-wide association studies (GWAS) have identified thousands of variants for multiple complex phenotypes, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis models. In this work, we trained artificial neural networks for predicting complex traits using both simulated and real genotype/phenotype datasets. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated loci (PAL) for the target phenotype. Simulations we performed with various parameters demonstrated that associated loci can be detected with good precision using strict selection criteria, but downstream analyses are required for fine-mapping the exact variants due to linkage disequilibrium, similarly to conventional GWAS. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we were able to detect multiple PAL related to this highly polygenic and heritable disorder. We also performed enrichment analyses with PAL in genic regions, which predominantly identified terms associated with brain morphology. With further improvements in model optimization and confidence measures, artificial neural networks can enhance the identification of genomic loci associated with complex diseases, providing a more comprehensive approach for GWAS and serving as initial screening tools for subsequent functional studies. Keywords: Deep learning, interpretability, genome-wide association studies, complex diseases
Updated: 2024-07-26 15:20:42
标题: 解释人工神经网络以检测复杂性状的全基因组关联信号
摘要: 研究复杂疾病的遗传结构是具有挑战性的,因为遗传和环境因素的高度多基因和互动性景观。虽然全基因组关联研究(GWAS)已经鉴定出数千个变体用于多个复杂表型,但传统统计方法可能受到简化假设(如线性和缺乏上位基因模型)的限制。在这项工作中,我们使用模拟和真实的基因型/表型数据集训练人工神经网络来预测复杂特征。我们通过不同的事后解释方法提取特征重要性评分,以识别目标表型的潜在相关位点(PAL)。我们用不同参数进行的模拟表明,使用严格的选择标准可以很好地精确检测相关位点,但由于连锁不平衡,需要进行下游分析来精确定位变体,类似于传统GWAS。通过将我们的方法应用于爱沙尼亚生物库中的精神分裂症队列,我们能够检测到与这种高度多基因和遗传性疾病相关的多个PAL。我们还在基因区域的PAL中进行富集分析,主要识别与大脑形态相关的术语。通过进一步改进模型优化和置信度度量,人工神经网络可以增强与复杂疾病相关的基因位点的识别,为GWAS提供更全面的方法,并作为后续功能研究的初始筛选工具。 关键词:深度学习,可解释性,全基因组关联研究,复杂疾病
更新时间: 2024-07-26 15:20:42
领域: q-bio.GN,cs.LG,q-bio.QM
Learning Chaotic Systems and Long-Term Predictions with Neural Jump ODEs
The Path-dependent Neural Jump ODE (PD-NJ-ODE) is a model for online prediction of generic (possibly non-Markovian) stochastic processes with irregular (in time) and potentially incomplete (with respect to coordinates) observations. It is a model for which convergence to the $L^2$-optimal predictor, which is given by the conditional expectation, is established theoretically. Thereby, the training of the model is solely based on a dataset of realizations of the underlying stochastic process, without the need of knowledge of the law of the process. In the case where the underlying process is deterministic, the conditional expectation coincides with the process itself. Therefore, this framework can equivalently be used to learn the dynamics of ODE or PDE systems solely from realizations of the dynamical system with different initial conditions. We showcase the potential of our method by applying it to the chaotic system of a double pendulum. When training the standard PD-NJ-ODE method, we see that the prediction starts to diverge from the true path after about half of the evaluation time. In this work we enhance the model with two novel ideas, which independently of each other improve the performance of our modelling setup. The resulting dynamics match the true dynamics of the chaotic system very closely. The same enhancements can be used to provably enable the PD-NJ-ODE to learn long-term predictions for general stochastic datasets, where the standard model fails. This is verified in several experiments.
Updated: 2024-07-26 15:18:29
标题: 使用神经跳跃ODE学习混沌系统和长期预测
摘要: Path-dependent Neural Jump ODE(PD-NJ-ODE)是一种在线预测一般(可能是非马尔可夫)随机过程的模型,该过程具有不规则(在时间上)和潜在不完整(相对于坐标)的观测。该模型在理论上已经建立了收敛到$L^2$-最优预测器的条件期望,这个预测器是由条件期望给出的。因此,该模型的训练仅基于基础随机过程的一组实现数据,而无需了解过程的规律。在基础过程是确定性的情况下,条件期望与过程本身一致。因此,这个框架可以等效地用于仅从不同初始条件的动力系统实现数据中学习ODE或PDE系统的动态。我们通过将其应用于双摆混沌系统展示了我们方法的潜力。在训练标准PD-NJ-ODE方法时,我们发现在评估时间的约一半后,预测开始偏离真实路径。在这项工作中,我们提出了两种新颖的想法来增强模型,这两种想法独立地改善了我们建模设置的性能。由此产生的动态与混沌系统的真实动态非常接近。相同的增强措施可以被用来明确地使PD-NJ-ODE能够学习一般随机数据集的长期预测,在这种情况下,标准模型会失败。这在几项实验中得到验证。
更新时间: 2024-07-26 15:18:29
领域: stat.ML,cs.AI,cs.LG,math.DS,math.PR
Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit
The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. [2018]), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. [2019]). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN), an architecture that resembles residual networks. We demonstrate that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-GNN in the narrow width limit is generally superior or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.
Updated: 2024-07-26 15:14:22
标题: 贝叶斯并行分支图神经网络中的鲁棒学习:窄宽度极限
摘要: 已知随机神经网络的无限宽度极限会导致神经网络成为高斯过程(NNGP)(Lee等人[2018]),其特征是与任务无关的核。普遍认为,更大的网络宽度有助于改善泛化能力(Park等人[2019])。然而,这项工作通过研究贝叶斯并行分支图神经网络(BPB-GNN)的窄宽度极限挑战了这一观念,该架构类似残差网络。我们证明,当BPB-GNN的宽度明显小于训练样本数量时,由于核重归一化中分支的对称性破坏,每个分支都表现出更强大的学习能力。令人惊讶的是,在偏置受限的情况下,窄宽度极限下BPB-GNN的性能通常优于或与宽宽度极限下达到的性能相媲美。此外,在窄宽度极限下,每个分支的输出范数大多独立于架构超参数,但通常反映了数据的特性。我们的结果为并行分支网络的新定义窄宽度区域提供了描述。
更新时间: 2024-07-26 15:14:22
领域: cs.LG,cs.AI
EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records
In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in learning compositionality. Additionally, our dataset integrates specially crafted tokens into SQL queries to improve execution efficiency. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain. EHR-SeqSQL is available \href{https://github.com/seonhee99/EHR-SeqSQL}{at this https URL}.
Updated: 2024-07-26 15:13:08
标题: EHR-SeqSQL:一个用于交互式探索电子健康记录的顺序文本到SQL数据集
摘要: 在本文中,我们介绍了EHR-SeqSQL,这是一种针对电子健康记录(EHR)数据库的新颖的顺序文本到SQL数据集。EHR-SeqSQL旨在解决文本到SQL解析中关键但尚未充分探索的方面:交互性、组合性和效率。据我们所知,EHR-SeqSQL不仅是最大的医学文本到SQL数据集基准,也是第一个包含顺序和上下文问题的数据集。我们提供了一个数据分割和新的测试集,旨在评估组成泛化能力。我们的实验表明,多轮方法在学习组合性方面优于单轮方法。此外,我们的数据集将特别设计的令牌集成到SQL查询中,以提高执行效率。通过EHR-SeqSQL,我们旨在弥合文本到SQL领域的实际需求和学术研究之间的差距。EHR-SeqSQL可以在\href{https://github.com/seonhee99/EHR-SeqSQL}{此 https URL}上找到。
更新时间: 2024-07-26 15:13:08
领域: cs.CL,cs.AI,cs.DB,cs.IR
Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements
Empathetic dialogue is an indispensable part of building harmonious social relationships and contributes to the development of a helpful AI. Previous approaches are mainly based on fine small-scale language models. With the advent of ChatGPT, the application effect of large language models (LLMs) in this field has attracted great attention. This work empirically investigates the performance of LLMs in generating empathetic responses and proposes three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations. Additionally, we explore the possibility of GPT-4 simulating human evaluators.
Updated: 2024-07-26 15:07:01
标题: 利用大型语言模型的能力进行共情回应生成:实证研究和改进
摘要: 共情性对话是建立和谐社会关系的必不可少部分,并有助于开发一个有帮助的人工智能。先前的方法主要基于精细的小规模语言模型。随着ChatGPT的出现,大型语言模型(LLMs)在这一领域的应用效果引起了极大关注。这项工作在实证上调查了LLMs在生成共情性回应方面的表现,并提出了三种改进方法:语义上类似的上下文学习、两阶段交互生成和与知识库结合。广泛的实验表明,LLMs能够从我们提出的方法中获益,并能够在自动和人类评估中实现最先进的表现。此外,我们探讨了GPT-4模拟人类评估者的可能性。
更新时间: 2024-07-26 15:07:01
领域: cs.CL,cs.AI
Log-Concave Coupling for Sampling Neural Net Posteriors
In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $\xi$. The resulting reverse conditional $w|\xi$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $\xi$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $\xi$ is again strictly log concave for sufficiently large $d$. The score of the marginal density of the auxiliary random variable $\xi$ is determined by an expectation over $w|\xi$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $\xi$ permits methods of sampling $\xi$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $\xi$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|\xi$, produces neuron weights $w$ whose marginal distribution is from the desired posterior.
Updated: 2024-07-26 15:05:41
标题: 对神经网络后验分布采样的对数凹耦合
摘要: 在这项工作中,我们提出了一种用于单隐藏层神经网络的采样算法。该算法建立在一系列递归的贝叶斯后验之上,使用了我们称之为贪婪贝叶斯的方法。由于神经元权重向量$w$的后验具有多峰性,对维度为$d$的神经元权重向量$w$进行采样是具有挑战性的。我们用一种称为辅助随机变量$\xi$的方法来解决这个问题。 结果显示,给定辅助随机变量的神经元权重的逆条件$w|\xi$是对数凹的。在构建后验分布时,我们在选择先验方面提供了一定的自由度。特别是,对于具有适当小方差的$w$的高斯先验,辅助变量$\xi$的边际密度被证明对于所有维度$d$都是严格对数凹的。对于单位$\ell_1$球上的均匀先验,证据表明当$d$足够大时,$\xi$的密度再次严格对数凹。 辅助随机变量$\xi$的边际密度的得分是通过对$w|\xi$的期望确定的,因此可以通过各种快速混合的马尔可夫链蒙特卡洛方法来计算。此外,计算$\xi$的得分允许通过从这个得分构建的漂移函数进行随机扩散(朗之万动力学)来对$\xi$进行采样。通过这种动力学,由Bakry和Emery开创的信息论方法表明,在其密度确实严格对数凹时,可以快速获得对$\xi$的准确采样。之后,再从$w|\xi$中抽取一个,产生的神经元权重$w$的边际分布来自所需的后验。
更新时间: 2024-07-26 15:05:41
领域: stat.ML,cs.IT,cs.LG,math.IT
Towards a Cyber Information Ontology
This paper introduces a set of terms that are intended to act as an interface between cyber ontologies (like a file system ontology or a data fusion ontology) and top- and mid-level ontologies, specifically Basic Formal Ontology and the Common Core Ontologies. These terms center on what makes cyberinformation management unique: numerous acts of copying items of information, the aggregates of copies that result from those acts, and the faithful members of those aggregates that represent all other members.
Updated: 2024-07-26 14:59:00
标题: 走向网络信息本体论
摘要: 这篇论文介绍了一组术语,旨在充当网络本体论(如文件系统本体论或数据融合本体论)与顶层和中层本体论(具体指基本形式本体论和共同核心本体论)之间的接口。这些术语聚焦于网络信息管理的独特之处:大量复制信息项的行为,由这些行为产生的复制品集合,以及代表所有其他成员的忠实成员。
更新时间: 2024-07-26 14:59:00
领域: cs.AI
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging
Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.
Updated: 2024-07-26 14:54:16
标题: Benchmarking依赖性测量方法以预防医学影像中的捷径学习
摘要: 医学影像队列经常受到获取设备、医院地点、患者背景等因素的影响。因此,深度学习模型往往学习到与因果相关的特征而非相关的特征,从而限制了它们对新数据的泛化能力。这个问题可以通过最小化任务相关和非任务相关变量之间的中间表示之间的依赖度量来解决。这些度量包括互信息、距离相关性和对抗分类器的性能。在这里,我们对防止捷径学习的任务进行了这些依赖度量的基准测试。我们使用Morpho-MNIST和CheXpert胸部X射线的医学成像任务进行了简化设置的研究。我们的结果为如何减轻医学影像中的混淆因素提供了见解。
更新时间: 2024-07-26 14:54:16
领域: cs.CV,cs.LG
Understanding XAI Through the Philosopher's Lens: A Historical Perspective
Despite explainable AI (XAI) has recently become a hot topic and several different approaches have been developed, there is still a widespread belief that it lacks a convincing unifying foundation. On the other hand, over the past centuries, the very concept of explanation has been the subject of extensive philosophical analysis in an attempt to address the fundamental question of "why" in the context of scientific law. However, this discussion has rarely been connected with XAI. This paper tries to fill in this gap and aims to explore the concept of explanation in AI through an epistemological lens. By comparing the historical development of both the philosophy of science and AI, an intriguing picture emerges. Specifically, we show that a gradual progression has independently occurred in both domains from logical-deductive to statistical models of explanation, thereby experiencing in both cases a paradigm shift from deterministic to nondeterministic and probabilistic causality. Interestingly, we also notice that similar concepts have independently emerged in both realms such as, for example, the relation between explanation and understanding and the importance of pragmatic factors. Our study aims to be the first step towards understanding the philosophical underpinnings of the notion of explanation in AI, and we hope that our findings will shed some fresh light on the elusive nature of XAI.
Updated: 2024-07-26 14:44:49
标题: 透过哲学家的视角理解可解释人工智能:历史透视
摘要: 尽管可解释人工智能(XAI)最近成为热门话题,并且已经发展出了几种不同的方法,但普遍认为它缺乏令人信服的统一基础。另一方面,在过去的几个世纪中,解释的概念一直是哲学分析的对象,试图解决科学定律背景下的“为什么”这一根本问题。然而,这种讨论很少与XAI联系起来。本文试图填补这一空白,旨在通过认识论的角度探索人工智能中的解释概念。通过比较科学哲学和人工智能的历史发展,一个有趣的图景浮现出来。具体来说,我们展示了从逻辑演绎到统计模型的解释在两个领域中独立发生的逐渐进步,从确定性到非确定性和概率因果关系的范式转变。有趣的是,我们还注意到在两个领域中类似的概念独立出现,比如解释与理解之间的关系以及实用因素的重要性。我们的研究旨在是理解人工智能中解释概念的哲学基础的第一步,希望我们的发现能为XAI难以捉摸的本质带来一些新的启示。
更新时间: 2024-07-26 14:44:49
领域: cs.AI
Learning production functions for supply chains with graph neural networks
The global economy relies on the flow of goods over supply chain networks, with nodes as firms and edges as transactions between firms. While we may observe these external transactions, they are governed by unseen production functions, which determine how firms internally transform the input products they receive into output products that they sell. In this setting, it can be extremely valuable to infer these production functions, to better understand and improve supply chains, and to forecast future transactions more accurately. However, existing graph neural networks (GNNs) cannot capture these hidden relationships between nodes' inputs and outputs. Here, we introduce a new class of models for this setting, by combining temporal GNNs with a novel inventory module, which learns production functions via attention weights and a special loss function. We evaluate our models extensively on real supply chains data, along with data generated from our new open-source simulator, SupplySim. Our models successfully infer production functions, with a 6-50% improvement over baselines, and forecast future transactions on real and synthetic data, outperforming baselines by 11-62%.
Updated: 2024-07-26 14:32:18
标题: 用图神经网络学习供应链的生产函数
摘要: 全球经济依赖于货物在供应链网络中的流动,其中节点是公司,边是公司之间的交易。虽然我们可以观察到这些外部交易,但它们受到看不见的生产函数的控制,这些生产函数决定了公司如何将它们接收到的输入产品内部转化为它们销售的输出产品。在这种情况下,推断这些生产函数可能非常有价值,以更好地理解和改进供应链,并更准确地预测未来的交易。然而,现有的图神经网络(GNNs)无法捕捉节点输入和输出之间的这些隐藏关系。在这里,我们引入了一种新的模型类别,通过将时间GNNs与一种新颖的库存模块结合起来,在这种设置中学习生产函数,通过注意力权重和特殊损失函数。我们在真实供应链数据以及从我们的新开源模拟器SupplySim生成的数据上对我们的模型进行了广泛评估。我们的模型成功地推断出生产函数,比基线提高了6-50%,并在真实和合成数据上预测未来交易,表现优于基线11-62%。
更新时间: 2024-07-26 14:32:18
领域: cs.LG,cs.CY,cs.SI
Any four real numbers are on all fours with analogy
This work presents a formalization of analogy on numbers that relies on generalized means. It is motivated by recent advances in artificial intelligence and applications of machine learning, where the notion of analogy is used to infer results, create data and even as an assessment tool of object representations, or embeddings, that are basically collections of numbers (vectors, matrices, tensors). This extended analogy use asks for mathematical foundations and clear understanding of the notion of analogy between numbers. We propose a unifying view of analogies that relies on generalized means defined in terms of a power parameter. In particular, we show that any four increasing positive real numbers is an analogy in a unique suitable power. In addition, we show that any such analogy can be reduced to an equivalent arithmetic analogy and that any analogical equation has a solution for increasing numbers, which generalizes without restriction to complex numbers. These foundational results provide a better understanding of analogies in areas where representations are numerical.
Updated: 2024-07-26 14:30:35
标题: 任意四个实数之间都存在类比关系
摘要: 这项工作提出了一个基于广义均值的数字类比的形式化方法。这是受到人工智能的最新进展和机器学习应用的启发,其中类比的概念被用来推断结果,创建数据,甚至作为对象表示或嵌入的评估工具,这些表示基本上是数字的集合(向量,矩阵,张量)。这种扩展的类比使用要求数学基础和对数字之间类比概念的明确理解。我们提出了一个依赖于幂参数定义的广义均值的类比的统一观点。特别地,我们展示了任意四个递增的正实数在一个唯一合适的幂下是一个类比。此外,我们展示了任何这样的类比都可以简化为一个等效的算术类比,并且任何类比方程对于递增的数字都有解,这一概念可以无限制地推广到复数。这些基础性的结果为在表示是数字的领域提供了对类比更好的理解。
更新时间: 2024-07-26 14:30:35
领域: cs.AI,68Txx
TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals
Efforts directed towards promoting Open Government Data (OGD) have gained significant traction across various governmental tiers since the mid-2000s. As more datasets are published on OGD portals, finding specific data becomes harder, leading to information overload. Complete and accurate documentation of datasets, including association of proper tags with datasets is key to improving dataset findability and accessibility. Analysis conducted on the Estonian Open Data Portal, revealed that 11% datasets have no associated tags, while 26% had only one tag assigned to them, which underscores challenges in data findability and accessibility within the portal, which, according to the recent Open Data Maturity Report, is considered trend-setter. The aim of this study is to propose an automated solution to tagging datasets to improve data findability on OGD portals. This paper presents Tagify - a prototype of tagging interface that employs large language models (LLM) such as GPT-3.5-turbo and GPT-4 to automate dataset tagging, generating tags for datasets in English and Estonian, thereby augmenting metadata preparation by data publishers and improving data findability on OGD portals by data users. The developed solution was evaluated by users and their feedback was collected to define an agenda for future prototype improvements.
Updated: 2024-07-26 14:22:30
标题: TAGIFY:基于LLM的标记界面,以提高OGD门户网站上的数据可查性
摘要: 自2000年代中期以来,为促进开放政府数据(OGD)而采取的努力在各级政府部门中得到了显著推动。随着越来越多的数据集发布在OGD门户上,寻找特定数据变得更加困难,导致信息过载。完整准确地记录数据集,包括为数据集关联适当标签,是提高数据集可找性和可访问性的关键。对爱沙尼亚开放数据门户进行的分析显示,11%的数据集没有关联标签,而26%的数据集只有一个标签,这突显了门户内数据可找性和可访问性面临的挑战,而根据最近的开放数据成熟度报告,该门户被认为是引领潮流的。本研究旨在提出一种自动化解决方案,通过为OGD门户上的数据集打标签以改善数据可找性。本文介绍了Tagify - 一个利用大型语言模型(LLM)如GPT-3.5-turbo和GPT-4来自动化数据集标记的原型界面,为英语和爱沙尼亚语数据集生成标签,从而通过数据发布者增强元数据准备并通过数据用户提高在OGD门户上的数据可找性。开发的解决方案通过用户评估,并收集了他们的反馈,以制定未来原型改进的议程。
更新时间: 2024-07-26 14:22:30
领域: cs.CY,cs.AI,cs.ET,cs.HC
Java-Class-Hijack: Software Supply Chain Attack for Java based on Maven Dependency Resolution and Java Classloading
We introduce Java-Class-Hijack, a novel software supply chain attack that enables an attacker to inject malicious code by crafting a class that shadows a legitimate class that is in the dependency tree. We describe the attack, provide a proof-of-concept demonstrating its feasibility, and replicate it in the German Corona-Warn-App server application. The proof-of-concept illustrates how a transitive dependency deep within the dependency tree can hijack a class from a direct dependency and entirely alter its behavior, posing a significant security risk to Java applications. The replication on the Corona-Warn-App demonstrates how compromising a small JSON validation library could result in a complete database takeover.
Updated: 2024-07-26 14:17:47
标题: Java-Class-Hijack:基于Maven依赖解析和Java类加载的Java软件供应链攻击
摘要: 我们介绍了Java-Class-Hijack,这是一种新颖的软件供应链攻击,使攻击者能够通过创建一个类来注入恶意代码,该类会覆盖依赖树中的一个合法类。我们描述了该攻击,提供了一个可行性证明,并在德国Corona-Warn-App服务器应用程序中复制了该攻击。该可行性证明说明了依赖树深处的一个传递性依赖如何可以劫持直接依赖中的一个类,并完全改变其行为,给Java应用程序带来重大安全风险。在Corona-Warn-App上的复制展示了如何通过 compromise 一个小的JSON验证库可能导致完全接管数据库。
更新时间: 2024-07-26 14:17:47
领域: cs.CR,cs.SE
Unsupervised Reservoir Computing for Multivariate Denoising of Severely Contaminated Signals
The interdependence and high dimensionality of multivariate signals present significant challenges for denoising, as conventional univariate methods often struggle to capture the complex interactions between variables. A successful approach must consider not only the multivariate dependencies of the desired signal but also the multivariate dependencies of the interfering noise. In our previous research, we introduced a method using machine learning to extract the maximum portion of ``predictable information" from univariate signal. We extend this approach to multivariate signals, with the key idea being to properly incorporate the interdependencies of the noise back into the interdependent reconstruction of the signal. The method works successfully for various multivariate signals, including chaotic signals and highly oscillating sinusoidal signals which are corrupted by spatially correlated intensive noise. It consistently outperforms other existing multivariate denoising methods across a wide range of scenarios.
Updated: 2024-07-26 14:14:57
标题: 无监督的水库计算用于多变量去噪严重受污染信号
摘要: 多变信号的相互依赖性和高维度给降噪带来了巨大挑战,因为传统的单变量方法往往难以捕捉变量之间复杂的相互作用。一种成功的方法必须考虑所需信号的多变量依赖性,同时也必须考虑干扰噪声的多变量依赖性。在我们之前的研究中,我们介绍了一种利用机器学习从单变量信号中提取“可预测信息”的方法。我们将这种方法扩展到多变量信号上,关键思想是正确地将噪声的相互依赖性纳入信号的相互依赖重建中。该方法成功地应用于各种多变信号,包括受空间相关强噪声污染的混沌信号和高度振荡的正弦信号。在各种情景下,它始终优于其他现有的多变降噪方法。
更新时间: 2024-07-26 14:14:57
领域: cs.LG,nlin.CD
Large Language Model for Table Processing: A Survey
Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet manipulations, web table question answering, and image table information extraction. Automating these table-centric tasks with Large Language Models (LLMs) or Visual Language Models (VLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides a comprehensive overview of table-related tasks, examining both user scenarios and technical aspects. It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis. We summarize the training techniques for LLMs and VLMs tailored for table processing. Additionally, we discuss prompt engineering, particularly the use of LLM-powered agents, for various table-related tasks. Finally, we highlight several challenges, including processing implicit user intentions and extracting information from various table sources.
Updated: 2024-07-26 14:12:33
标题: 大型语言模型用于表格处理:一项调查
摘要: Tables,通常是二维的结构化存储大量数据的,对于像数据库查询、电子表格操作、网络表格问题回答和图像表格信息提取等日常活动至关重要。利用大型语言模型(LLMs)或视觉语言模型(VLMs)自动化这些以表格为中心的任务,可以为公众带来重大利益,引起了学术界和工业界的兴趣。本调查提供了有关与表格相关任务的综合概述,考察了用户场景和技术方面。它涵盖了传统任务,如表格问题回答,以及电子表格操作和表格数据分析等新兴领域。我们总结了为表格处理量身定制的LLMs和VLMs的训练技术。此外,我们讨论了提示工程,特别是LLM驱动的代理人在各种与表格相关任务中的使用。最后,我们突出了几个挑战,包括处理隐含用户意图和从各种表格来源提取信息。
更新时间: 2024-07-26 14:12:33
领域: cs.AI,cs.CL
Evaluating Human Trajectory Prediction with Metamorphic Testing
The prediction of human trajectories is important for planning in autonomous systems that act in the real world, e.g. automated driving or mobile robots. Human trajectory prediction is a noisy process, and no prediction does precisely match any future trajectory. It is therefore approached as a stochastic problem, where the goal is to minimise the error between the true and the predicted trajectory. In this work, we explore the application of metamorphic testing for human trajectory prediction. Metamorphic testing is designed to handle unclear or missing test oracles. It is well-designed for human trajectory prediction, where there is no clear criterion of correct or incorrect human behaviour. Metamorphic relations rely on transformations over source test cases and exploit invariants. A setting well-designed for human trajectory prediction where there are many symmetries of expected human behaviour under variations of the input, e.g. mirroring and rescaling of the input data. We discuss how metamorphic testing can be applied to stochastic human trajectory prediction and introduce the Wasserstein Violation Criterion to statistically assess whether a follow-up test case violates a label-preserving metamorphic relation.
Updated: 2024-07-26 14:10:14
标题: 用变形测试评估人类轨迹预测
摘要: 人类轨迹预测对于在真实世界中行动的自主系统的规划是重要的,例如自动驾驶或移动机器人。人类轨迹预测是一个嘈杂的过程,没有任何预测能够精确匹配任何未来轨迹。因此,它被视为一个随机问题,其目标是最小化真实轨迹和预测轨迹之间的误差。在这项工作中,我们探讨了应用变形测试来进行人类轨迹预测的可能性。变形测试旨在处理不明确或缺失的测试预言。它非常适用于人类轨迹预测,因为在那里没有明确的正确或错误的人类行为标准。变形关系依赖于对源测试用例的转换和利用不变性。这种设计适用于人类轨迹预测,因为在输入变化时,例如镜像和重新缩放输入数据,预期人类行为有许多对称性。我们讨论了变形测试如何应用于随机人类轨迹预测,并引入了Wasserstein违反准则,用于统计评估后续测试用例是否违反了保留标签的变形关系。
更新时间: 2024-07-26 14:10:14
领域: cs.SE,cs.AI
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Distillation-based self-supervised learning typically leads to more compressed representations due to its radical clustering process and the implementation of a sharper target distribution. To overcome this limitation and preserve more information from input, we introduce UDI, conceptualized as Unsqueezed Distillation-based self-supervised learning (SSL). UDI enriches the learned representation by encouraging multimodal prediction distilled from a consolidated profile of local predictions that are derived via stratified sampling. Our evaluations show that UDI not only promotes semantically meaningful representations at instance level, delivering superior or competitive results to state-of-the-art SSL methods in image classification, but also effectively preserves the nuisance of input, which yields significant improvement in dense prediction tasks, including object detection and segmentation. Additionally, UDI performs competitively in low-shot image classification, improving the scalability of joint-embedding pipelines. Various visualizations and ablation studies are presented to further elucidate the mechanisms behind UDI. Our source code is available at https://github.com/ISL-CV/udi.
Updated: 2024-07-26 14:09:08
标题: 解压缩[CLS] Engage Bottleneck以学习丰富的表示
摘要: 基于蒸馏的自监督学习通常会导致更加压缩的表示,这是由于其激进的聚类过程和更为尖锐的目标分布的实现。为了克服这一限制并保留更多来自输入的信息,我们引入了UDI,概念化为未压缩的基于蒸馏的自监督学习(SSL)。UDI通过鼓励从通过分层抽样导出的本地预测的综合配置中蒸馏的多模态预测来丰富学习到的表示。我们的评估显示,UDI不仅在实例级别促进语义上有意义的表示,在图像分类中提供了优越或竞争力强的结果,并且有效地保留了输入的冗余信息,在包括目标检测和分割在内的密集预测任务中取得了显著的改进。此外,UDI在低样本图像分类中表现出有竞争力的结果,提高了联合嵌入流水线的可扩展性。我们还提供了各种可视化和消融研究,以进一步阐明UDI背后的机制。我们的源代码可在https://github.com/ISL-CV/udi 获取。
更新时间: 2024-07-26 14:09:08
领域: cs.CV,cs.LG
Score matching through the roof: linear, nonlinear, and latent variables causal discovery
Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function $\nabla \log p(X)$ of observed variables for causal discovery and propose the following contributions. First, we generalize the existing results of identifiability with the score to additive noise models with minimal requirements on the causal mechanisms. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential as an alternative to conditional independence tests to infer the equivalence class of causal graphs with hidden variables, and we provide the necessary conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm for causal discovery across linear, nonlinear, and latent variable models, which we empirically validate.
Updated: 2024-07-26 14:09:06
标题: 得分匹配飙升:线性、非线性和潜在变量因果发现
摘要: 从观测数据中发现因果关系具有巨大的潜力,但现有方法依赖于对潜在因果结构的强假设,通常需要对所有相关变量进行完全可观测。我们通过利用观察变量的得分函数$\nabla \log p(X)$来解决这些挑战,并提出以下贡献。首先,我们将得分与加性噪声模型的可识别性的现有结果进行推广,对因果机制的要求最小化。其次,我们建立了在存在隐藏变量的情况下从得分推断因果关系的条件;这个结果是双重的:我们展示了得分作为推断具有隐藏变量的因果图等价类的替代方法的潜力,并提供了在潜在变量模型中识别直接原因的必要条件。基于这些见解,我们提出了一种灵活的算法,可以跨线性、非线性和潜在变量模型进行因果发现,我们通过实证验证了这一算法。
更新时间: 2024-07-26 14:09:06
领域: stat.ML,cs.AI,stat.ME
Attacks on fairness in Federated Learning
Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.
Updated: 2024-07-26 14:08:00
标题: 对联邦学习中公平性的攻击
摘要: 联邦学习是一种重要的新兴分布式训练范式,可以在客户端保持数据私密性。现在人们已经很清楚,通过控制仅一小部分联邦学习客户端,可以在某些属性存在的情况下向联邦学习模型引入后门。在本文中,我们提出了一种破坏训练模型公平性的新型攻击。公平性被理解为训练模型的属性级性能分布。在某些领域中,公平性尤为突出,例如,在亚群体之间存在偏差的准确性歧视可能会产生灾难性后果。我们发现,通过采用类似后门攻击的威胁模型,攻击者能够影响聚合模型在任意给定属性集之间具有不公平的性能分布。此外,我们发现这种攻击仅通过控制单个客户端就是可能的。尽管先前已经深入讨论了在联邦学习中自然产生的不公平性,但其人为诱导的种类已被忽视。我们表明,在任何情况下,防御针对公平性的攻击应该是一个重要考虑因素,在训练模型中存在不公平性可能会使参与其训练的用户受益。
更新时间: 2024-07-26 14:08:00
领域: cs.LG,cs.CR
Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery
Causal discovery aims to estimate causal structures among variables based on observational data. Large Language Models (LLMs) offer a fresh perspective to tackle the causal discovery problem by reasoning on the metadata associated with variables rather than their actual data values, an approach referred to as knowledge-based causal discovery. In this paper, we investigate the capabilities of Small Language Models (SLMs, defined as LLMs with fewer than 1 billion parameters) with prompt-based learning for knowledge-based causal discovery. Specifically, we present KG Structure as Prompt, a novel approach for integrating structural information from a knowledge graph, such as common neighbor nodes and metapaths, into prompt-based learning to enhance the capabilities of SLMs. Experimental results on three types of biomedical and open-domain datasets under few-shot settings demonstrate the effectiveness of our approach, surpassing most baselines and even conventional fine-tuning approaches trained on full datasets. Our findings further highlight the strong capabilities of SLMs: in combination with knowledge graphs and prompt-based learning, SLMs demonstrate the potential to surpass LLMs with larger number of parameters. Our code and datasets are available on GitHub.
Updated: 2024-07-26 14:07:00
标题: 知识图谱结构作为提示:提高基于知识的因果发现的小语言模型能力
摘要: 因果发现旨在基于观测数据估计变量之间的因果结构。大型语言模型(LLMs)为解决因果发现问题提供了一种新的视角,通过对与变量相关的元数据进行推理,而不是它们的实际数据值,这种方法被称为基于知识的因果发现。在本文中,我们研究了具有基于提示学习的小型语言模型(SLMs,定义为具有少于10亿个参数的LLMs)用于基于知识的因果发现的能力。具体来说,我们提出了KG Structure as Prompt,这是一种将知识图中的结构信息(如共同邻居节点和元路径)集成到基于提示学习中的新方法,以增强SLMs的能力。在少样本设置下,对三种生物医学和开放领域数据集的实验结果表明了我们方法的有效性,超过了大多数基线甚至是在完整数据集上训练的传统微调方法。我们的研究结果进一步突显了SLMs的强大能力:结合知识图和基于提示学习,SLMs表现出超越具有更多参数的LLMs的潜力。我们的代码和数据集可在GitHub上找到。
更新时间: 2024-07-26 14:07:00
领域: cs.CL,cs.AI
FLUE: Federated Learning with Un-Encrypted model weights
Federated Learning enables diverse devices to collaboratively train a shared model while keeping training data locally stored, avoiding the need for centralized cloud storage. Despite existing privacy measures, concerns arise from potential reverse engineering of gradients, even with added noise, revealing private data. To address this, recent research emphasizes using encrypted model parameters during training. This paper introduces a novel federated learning algorithm, leveraging coded local gradients without encryption, exchanging coded proxies for model parameters, and injecting surplus noise for enhanced privacy. Two algorithm variants are presented, showcasing convergence and learning rates adaptable to coding schemes and raw data characteristics. Two encryption-free implementations with fixed and random coding matrices are provided, demonstrating promising simulation results from both federated optimization and machine learning perspectives.
Updated: 2024-07-26 14:04:57
标题: FLUE:未加密模型权重的联邦学习
摘要: 联合学习使不同的设备能够共同训练一个共享模型,同时保持训练数据在本地存储,避免了对集中式云存储的需求。尽管已经存在隐私保护措施,但仍然存在潜在的对梯度进行反向工程,即使添加了噪声也可能暴露私人数据的问题。为了解决这个问题,最近的研究强调在训练过程中使用加密的模型参数。本文提出了一种新颖的联合学习算法,利用编码的本地梯度而不使用加密,交换编码的代理模型参数,并注入额外的噪声以增强隐私保护。介绍了两种算法变体,展示了适应编码方案和原始数据特征的收敛和学习速率。提供了两种不使用加密的实现,分别使用固定和随机编码矩阵,展示了从联合优化和机器学习的角度获得的有前途的模拟结果。
更新时间: 2024-07-26 14:04:57
领域: cs.LG
Multi-Robot System Architecture design in SysML and BPMN
Multi-Robot System (MRS) is a complex system that contains many different software and hardware components. This main problem addressed in this article is the MRS design complexity. The proposed solution provides a modular modeling and simulation technique that is based on formal system engineering method, therefore the MRS design complexity is decomposed and reduced. Modeling the MRS has been achieved via two formal Architecture Description Languages (ADLs), which are Systems Modeling Language (SysML) and Business Process Model and Notation (BPMN), to design the system blueprints. By using those abstract design ADLs, the implementation of the project becomes technology agnostic. This allows to transfer the design concept from on programming language to another. During the simulation phase, a multi-agent environment is used to simulate the MRS blueprints. The simulation has been implemented in Java Agent Development (JADE) middleware. Therefore, its results can be used to analysis and verify the proposed MRS model in form of performance evaluation matrix.
Updated: 2024-07-26 14:04:40
标题: 在SysML和BPMN中的多机器人系统架构设计
摘要: 多机器人系统(MRS)是一个包含许多不同软件和硬件组件的复杂系统。本文所讨论的主要问题是MRS设计复杂性。提出的解决方案提供了一种基于形式系统工程方法的模块化建模和仿真技术,因此MRS设计复杂性得到了分解和减少。通过两种形式化架构描述语言(ADLs)对MRS进行建模,即系统建模语言(SysML)和业务流程建模和符号化(BPMN),以设计系统蓝图。通过使用这些抽象设计ADLs,项目的实施变得技术无关。这允许将设计概念从一种编程语言转移到另一种。在仿真阶段,使用多代理环境来模拟MRS蓝图。仿真已经在Java代理开发(JADE)中间件中实现。因此,其结果可以用于分析和验证提出的MRS模型,形式为性能评估矩阵。
更新时间: 2024-07-26 14:04:40
领域: cs.AI,cs.RO,cs.SE
FairAIED: Navigating Fairness, Bias, and Ethics in Educational AI Applications
The integration of Artificial Intelligence (AI) into education has transformative potential, providing tailored learning experiences and creative instructional approaches. However, the inherent biases in AI algorithms hinder this improvement by unintentionally perpetuating prejudice against specific demographics, especially in human-centered applications like education. This survey delves deeply into the developing topic of algorithmic fairness in educational contexts, providing a comprehensive evaluation of the diverse literature on fairness, bias, and ethics in AI-driven educational applications. It identifies the common forms of biases, such as data-related, algorithmic, and user-interaction, that fundamentally undermine the accomplishment of fairness in AI teaching aids. By outlining existing techniques for mitigating these biases, ranging from varied data gathering to algorithmic fairness interventions, the survey emphasizes the critical role of ethical considerations and legal frameworks in shaping a more equitable educational environment. Furthermore, it guides readers through the complexities of fairness measurements, methods, and datasets, shedding light on the way to bias reduction. Despite these gains, this survey highlights long-standing issues, such as achieving a balance between fairness and accuracy, as well as the need for diverse datasets. Overcoming these challenges and ensuring the ethical and fair use of AI's promise in education call for a collaborative, interdisciplinary approach.
Updated: 2024-07-26 13:59:20
标题: 公平的人工智能教育:在教育人工智能应用中导航公平性、偏见和伦理问题
摘要: 人工智能(AI)融入教育具有变革潜力,提供定制化学习体验和创新的教学方法。然而,AI算法中固有的偏见阻碍了这种改进,无意中延续了针对特定人群的偏见,尤其是在教育等以人为中心的应用中。本调查深入探讨了算法公平性在教育背景下的发展话题,全面评估了AI驱动教育应用中多样的公平、偏见和伦理文献。它识别了数据相关、算法和用户交互等常见偏见形式,这些偏见从根本上破坏了AI教学辅助工具实现公平的成就。通过概述现有的减少这些偏见的技术,从不同的数据收集到算法公平干预,该调查强调了伦理考虑和法律框架在塑造更公平的教育环境中的关键作用。此外,它指导读者了解公平度量、方法和数据集的复杂性,揭示了减少偏见的方法。尽管取得这些进展,该调查强调了长期存在的问题,如公平性和准确性之间的平衡以及对多样化数据集的需求。克服这些挑战,确保AI在教育中的承诺得到道德和公平的使用,需要采取协作的跨学科方法。
更新时间: 2024-07-26 13:59:20
领域: cs.LG
MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving
Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with spatial VOxel representations, to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world. We demonstrate multimodal future predictions and show that our spatial representation improves the prediction quality of both camera images and lidar point clouds.
Updated: 2024-07-26 13:52:14
标题: MUVO:具有空间表示的自动驾驶多模态世界模型
摘要: 学习无监督的自动驾驶世界模型有潜力显著提高当今系统的推理能力。然而,大多数工作忽视了世界的物理属性,而是集中在传感器数据上。我们提出了MUVO,一个具有空间体素表示的多模态世界模型,以解决这一挑战。我们利用原始摄像机和激光雷达数据来学习一个传感器不可知的世界的几何表示。我们展示了多模态未来预测,并展示了我们的空间表示如何改善摄像机图像和激光雷达点云的预测质量。
更新时间: 2024-07-26 13:52:14
领域: cs.LG,cs.RO
Towards Generalized Offensive Language Identification
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and mitigate its impact. These systems can follow two approaches; (1) Use publicly available models and application endpoints, including prompting large language models (LLMs) (2) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.
Updated: 2024-07-26 13:50:22
标题: 朝向泛化的攻击性语言识别
摘要: 互联网上存在的具有攻击性内容,包括仇恨言论和网络欺凌,是一个全球性的普遍问题。因此,这引起了机器学习(ML)和自然语言处理(NLP)社区的重视。因此,已经开发了许多系统来自动识别潜在有害内容并减轻其影响。这些系统可以采用两种方法:(1)使用公开可用的模型和应用端点,包括提示大型语言模型(LLMs)(2)对数据集进行标注并在其上训练ML模型。然而,这两种方法都缺乏对它们的普遍性的理解。此外,这些系统在跨领域和实际环境中的适用性经常受到质疑。本文通过实证评估了冒犯性语言检测模型和数据集在一个新颖的广义基准上的普适性。我们回答了三个关于普适性的研究问题。我们的发现将有助于创建强大的现实世界冒犯性语言检测系统。
更新时间: 2024-07-26 13:50:22
领域: cs.CL,cs.AI
Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents
Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.
Updated: 2024-07-26 13:47:17
标题: 学习代理人异质种群中道德行为的动态特征
摘要: 随着对人工智能系统安全性和对齐性的日益关注,强调了在人工代理中嵌入道德能力的重要性:一个有前途的解决方案是利用经验学习,即强化学习。在多代理(社会)环境中,复杂的群体级现象可能会从个体学习代理之间的互动中出现。许多现有研究依赖于模拟的社会困境环境来研究独立学习代理的相互作用;然而,它们往往忽视了实际上存在于代理社会中的道德异质性。例如,在不同时间点,单个学习代理可能会面对后果主义者(即,专注于随着时间最大化结果)、基于规范的(即,遵从特定规范)或基于美德的(即,考虑不同美德的结合)对手。代理人的共同发展可能如何受到群体中这种道德异质性的影响尚不清楚。在本文中,我们提出了一个研究,在社会困境环境中交互的道德异质人口的学习动态。利用一个带有伴侣选择机制的迭代囚徒困境环境,我们调查了群体中不同道德代理的盛行程度如何影响个体代理的学习行为和新兴的群体级结果。我们观察到了亲社会和反社会代理之间的几种非平凡互动,并发现某些类型的道德代理能够引导自私代理走向更合作的行为。
更新时间: 2024-07-26 13:47:17
领域: cs.MA,cs.AI,cs.CY,cs.LG
Outlier detection by ensembling uncertainty with negative objectness
Outlier detection is an essential capability in safety-critical applications of supervised visual recognition. Most of the existing methods deliver best results by encouraging standard closed-set models to produce low-confidence predictions in negative training data. However, that approach conflates prediction uncertainty with recognition of the negative class. We therefore reconsider direct prediction of K+1 logits that correspond to K groundtruth classes and one outlier class. This setup allows us to formulate a novel anomaly score as an ensemble of in-distribution uncertainty and the posterior of the outlier class which we term negative objectness. Now outliers can be independently detected due to i) high prediction uncertainty or ii) similarity with negative data. We embed our method into a dense prediction architecture with mask-level recognition over K+2 classes. The training procedure encourages the novel K+2-th class to learn negative objectness at pasted negative instances. Our models outperform the current state-of-the art on standard benchmarks for image-wide and pixel-level outlier detection with and without training on real negative data.
Updated: 2024-07-26 13:47:11
标题: 使用不确定性与负对象性合成的异常值检测
摘要: 异常值检测是在监督式视觉识别的安全关键应用中的一个关键能力。大多数现有的方法通过鼓励标准的封闭集模型在负训练数据中产生低置信度的预测来获得最佳结果。然而,这种方法将预测的不确定性与负类别的识别混淆在一起。因此,我们重新考虑直接预测对应于K个真实类别和一个异常值类别的K+1个logits。这种设置使我们能够制定一个新的异常分数,作为内部分布不确定性和异常类别的后验的集成,我们将其称为负对象性。现在,异常值可以独立检测,因为i)具有高预测不确定性或ii)与负数据相似。我们将我们的方法嵌入到一个密集预测架构中,该架构在K+2个类别上进行掩膜级别的识别。训练过程鼓励新的K+2-th类别在过去的负实例中学习负对象性。我们的模型在标准图像范围和像素级异常值检测的基准测试中表现优于当前的最新技术,无论是否在真实负数据上进行训练。
更新时间: 2024-07-26 13:47:11
领域: cs.CV,cs.LG
Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities
Harm reporting in Artificial Intelligence (AI) currently lacks a structured process for disclosing and addressing algorithmic flaws, relying largely on an ad-hoc approach. This contrasts sharply with the well-established Coordinated Vulnerability Disclosure (CVD) ecosystem in software security. While global efforts to establish frameworks for AI transparency and collaboration are underway, the unique challenges presented by machine learning (ML) models demand a specialized approach. To address this gap, we propose implementing a Coordinated Flaw Disclosure (CFD) framework tailored to the complexities of ML and AI issues. This paper reviews the evolution of ML disclosure practices, from ad hoc reporting to emerging participatory auditing methods, and compares them with cybersecurity norms. Our framework introduces innovations such as extended model cards, dynamic scope expansion, an independent adjudication panel, and an automated verification process. We also outline a forthcoming real-world pilot of CFD. We argue that CFD could significantly enhance public trust in AI systems. By balancing organizational and community interests, CFD aims to improve AI accountability in a rapidly evolving technological landscape.
Updated: 2024-07-26 13:45:36
标题: AI协调漏洞披露:超越安全漏洞
摘要: 在人工智能(AI)中的危害报告目前缺乏一个结构化的流程来披露和解决算法缺陷,主要依赖于临时方法。这与软件安全领域中已建立的协调漏洞披露(CVD)生态系统形成鲜明对比。虽然全球努力建立AI透明度和合作框架,但机器学习(ML)模型提出的独特挑战要求一种专门的方法。为了弥补这一差距,我们提出了一种专门针对ML和AI问题复杂性的协调缺陷披露(CFD)框架。本文回顾了ML披露实践的演变,从临时报告到新兴的参与审计方法,并将它们与网络安全规范进行比较。我们的框架引入了创新,如扩展模型卡片、动态范围扩展、独立裁决小组和自动验证流程。我们还概述了CFD即将进行的现实世界试点。我们认为,CFD可以显著提高公众对AI系统的信任。通过平衡组织和社区利益,CFD旨在改善在快速发展的技术领域中的AI问责制。
更新时间: 2024-07-26 13:45:36
领域: cs.AI,cs.CR,cs.CY
AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning
In this paper, we introduce AutoRDF2GML, a framework designed to convert RDF data into data representations tailored for graph machine learning tasks. AutoRDF2GML enables, for the first time, the creation of both content-based features -- i.e., features based on RDF datatype properties -- and topology-based features -- i.e., features based on RDF object properties. Characterized by automated feature extraction, AutoRDF2GML makes it possible even for users less familiar with RDF and SPARQL to generate data representations ready for graph machine learning tasks, such as link prediction, node classification, and graph classification. Furthermore, we present four new benchmark datasets for graph machine learning, created from large RDF knowledge graphs using our framework. These datasets serve as valuable resources for evaluating graph machine learning approaches, such as graph neural networks. Overall, our framework effectively bridges the gap between the Graph Machine Learning and Semantic Web communities, paving the way for RDF-based machine learning applications.
Updated: 2024-07-26 13:44:06
标题: AutoRDF2GML: 促进图机器学习中RDF集成
摘要: 本文介绍了AutoRDF2GML,这是一个旨在将RDF数据转换为适用于图机器学习任务的数据表示的框架。AutoRDF2GML首次实现了基于内容的特征和基于拓扑的特征的创建,即基于RDF数据类型属性和基于RDF对象属性的特征。AutoRDF2GML通过自动特征提取,使得即使对RDF和SPARQL不太熟悉的用户也能够生成适用于图机器学习任务的数据表示,如链接预测、节点分类和图分类。此外,我们提出了四个新的基准数据集,这些数据集是使用我们的框架从大型RDF知识图创建的。这些数据集可用作评估图机器学习方法(如图神经网络)的有价值资源。总的来说,我们的框架有效地搭建了图机器学习和语义网络社区之间的桥梁,为基于RDF的机器学习应用铺平了道路。
更新时间: 2024-07-26 13:44:06
领域: cs.LG,cs.AI,cs.IR
One Step to Efficient Synthetic Data
A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.
Updated: 2024-07-26 13:43:28
标题: 一步到位的高效合成数据
摘要: 一种常见的合成数据方法是从拟合模型中抽样。我们展示了在一般假设下,这种方法导致的样本具有低效的估计器,并且其联合分布与真实分布不一致。出于这个原因,我们提出了一种通用的合成数据生成方法,适用于参数模型,具有渐近有效的摘要统计量,并且易于实现且计算效率高。我们的方法允许构建既保留某些摘要统计量的部分合成数据集,也满足差分隐私(DP)强保证的完全合成数据,两者具有相同的渐近保证。我们还提供了理论和实证证据表明,我们的程序生成的分布收敛于真实分布。除了我们对合成数据的关注外,我们的程序还可用于在难以处理的似然函数存在的情况下执行近似假设检验。
更新时间: 2024-07-26 13:43:28
领域: math.ST,cs.CR,stat.CO,stat.TH
A Physics-Informed Neural Network-Based Approach for the Spatial Upsampling of Spherical Microphone Arrays
Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physics-informed neural network with Rowdy activation functions, leveraging physical constraints to provide high-order microphone array signals, starting from low-order devices. Results show that, within its domain of application, our approach outperforms a state of the art method based on signal processing for spherical microphone arrays upsampling.
Updated: 2024-07-26 13:35:06
标题: 一种基于物理信息神经网络的方法用于球形麦克风阵列的空间上采样
摘要: 球形麦克风阵列是捕捉声音场的空间特性的便捷工具。然而,实现出色的空间分辨率需要具有大量胶囊的阵列,因此导致设备昂贵。为了解决这个问题,我们提出了一种用有限数量胶囊进行球形麦克风阵列空间上采样的方法。我们的方法利用一种具有Rowdy激活函数的物理信息神经网络,利用物理约束提供高阶麦克风阵列信号,从低阶设备开始。结果表明,在其应用领域内,我们的方法优于基于信号处理的球形麦克风阵列上采样的最先进方法。
更新时间: 2024-07-26 13:35:06
领域: eess.AS,cs.LG,cs.SD,eess.SP
Using representation balancing to learn conditional-average dose responses from clustered data
Estimating a unit's responses to interventions with an associated dose, the "conditional average dose response" (CADR), is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such a response typically needs to be estimated from observational data, which introduces several challenges. That is why the machine learning (ML) community has proposed several tailored CADR estimators. Yet, the proposal of most of these methods requires strong assumptions on the distribution of data and the assignment of interventions, which go beyond the standard assumptions in causal inference. Whereas previous works have so far focused on smooth shifts in covariate distributions across doses, in this work, we will study estimating CADR from clustered data and where different doses are assigned to different segments of a population. On a novel benchmarking dataset, we show the impacts of clustered data on model performance and propose an estimator, CBRNet, that learns cluster-agnostic and hence dose-agnostic covariate representations through representation balancing for unbiased CADR inference. We run extensive experiments to illustrate the workings of our method and compare it with the state of the art in ML for CADR estimation.
Updated: 2024-07-26 13:33:14
标题: 使用表示平衡来从聚类数据中学习条件平均剂量响应
摘要: 估计单位对伴随剂量的干预的响应,即“条件平均剂量响应”(CADR),在各个领域都很重要,从医疗保健到商业、经济等等。这种响应通常需要从观察数据中估计出来,这引入了一些挑战。这就是为什么机器学习(ML)社区提出了几种专门的CADR估计器。然而,大多数这些方法的提议都需要对数据分布和干预分配做出强烈的假设,这超出了因果推断中的标准假设。过去的研究迄今为止一直集中在各个剂量之间协变量分布的平滑变化,而在这项工作中,我们将研究从聚类数据中估计CADR,不同剂量分配给人群的不同部分。在一个新颖的基准数据集上,我们展示了聚类数据对模型性能的影响,并提出了一种估计器CBRNet,通过表示平衡学习集群无关且因此无关剂量的协变量表示,从而进行无偏CADR推断。我们进行了大量实验来说明我们的方法的工作原理,并将其与ML中CADR估计的最新技术进行比较。
更新时间: 2024-07-26 13:33:14
领域: cs.LG,stat.ME,62D20
MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction
Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction methods heavily rely on the information learned from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, resulting in sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating polymer 1D sequential and 3D structural information to encourage downstream polymer property prediction tasks. Besides, considering the scarcity of polymer 3D data, we further introduce the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, in addition to predicting masked tokens and recovering clear 3D coordinates, MMPolymer achieves the cross-modal alignment of latent representations. Then we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experiments show that MMPolymer achieves state-of-the-art performance in downstream property prediction tasks. Moreover, given the pretrained MMPolymer, utilizing merely a single modality in the fine-tuning phase can also outperform existing methods, showcasing the exceptional capability of MMPolymer in polymer feature extraction and utilization.
Updated: 2024-07-26 13:24:41
标题: MMPolymer:用于聚合物性质预测的多模态多任务预训练框架
摘要: 聚合物是由许多相同或相似单体通过共价键构成的高分子化合物,因此它们的三维结构复杂且表现出不可忽视的规律性。通常,聚合物的性质,如可塑性、导电性、生物相容性等,与其三维结构密切相关。然而,现有的聚合物性质预测方法主要依赖于从聚合物SMILES序列(P-SMILES字符串)学习的信息,而忽视了关键的三维结构信息,导致性能亚优。在这项工作中,我们提出了MMPolymer,一种新颖的多模态多任务预训练框架,整合了聚合物1D序列和3D结构信息,以促进下游聚合物性质预测任务。此外,考虑到聚合物3D数据的稀缺性,我们进一步引入了“星形替代”策略,有效提取3D结构信息。在预训练过程中,除了预测掩模标记和恢复清晰的3D坐标外,MMPolymer实现了潜在表示的跨模态对齐。然后,我们进一步对预训练的MMPolymer进行微调,用于下游聚合物性质预测任务的监督学习范式。实验证明,MMPolymer在下游性质预测任务中实现了最先进的性能。此外,利用预训练的MMPolymer,在微调阶段仅利用单一模态也能胜过现有方法,展示了MMPolymer在聚合物特征提取和利用方面的卓越能力。
更新时间: 2024-07-26 13:24:41
领域: cs.LG,cond-mat.soft,cs.AI
A maturity framework for data driven maintenance
Maintenance decisions range from the simple detection of faults to ultimately predicting future failures and solving the problem. These traditionally human decisions are nowadays increasingly supported by data and the ultimate aim is to make them autonomous. This paper explores the challenges encountered in data driven maintenance, and proposes to consider four aspects in a maturity framework: data / decision maturity, the translation from the real world to data, the computability of decisions (using models) and the causality in the obtained relations. After a discussion of the theoretical concepts involved, the exploration continues by considering a practical fault detection and identification problem. Two approaches, i.e. experience based and model based, are compared and discussed in terms of the four aspects in the maturity framework. It is observed that both approaches yield the same decisions, but still differ in the assignment of causality. This confirms that a maturity assessment not only concerns the type of decision, but should also include the other proposed aspects.
Updated: 2024-07-26 13:20:58
标题: 一个数据驱动维护的成熟度框架
摘要: 维护决策范围从简单的故障检测到最终预测未来故障并解决问题。这些传统上由人类做出的决策如今越来越受到数据的支持,其最终目标是使它们自主化。本文探讨了数据驱动维护所面临的挑战,并提出在成熟度框架中考虑四个方面:数据/决策成熟度,从现实世界到数据的转换,决策的可计算性(使用模型)以及获得关系中的因果关系。在讨论涉及的理论概念之后,探索继续通过考虑一个实际的故障检测和识别问题。两种方法,即基于经验和基于模型的方法,在成熟度框架的四个方面上进行了比较和讨论。观察到两种方法都得出了相同的决策,但在因果关系的分配上仍有所不同。这证实了成熟度评估不仅涉及决策类型,还应包括其他提出的方面。
更新时间: 2024-07-26 13:20:58
领域: cs.AI,cs.LG,cs.SY,eess.SY,E.1; F.2; G.3; I.2.8; I.6.4; J.6
LLASP: Fine-tuning Large Language Models for Answer Set Programming
Recently, Large Language Models (LLMs) have showcased their potential in various natural language processing tasks, including code generation. However, while significant progress has been made in adapting LLMs to generate code for several imperative programming languages and tasks, there remains a notable gap in their application to declarative formalisms, such as Answer Set Programming (ASP). In this paper, we move a step towards exploring the capabilities of LLMs for ASP code generation. First, we perform a systematic evaluation of several state-of-the-art LLMs. Despite their power in terms of number of parameters, training data and computational resources, empirical results demonstrate inadequate performances in generating correct ASP programs. Therefore, we propose LLASP, a fine-tuned lightweight model specifically trained to encode fundamental ASP program patterns. To this aim, we create an ad-hoc dataset covering a wide variety of fundamental problem specifications that can be encoded in ASP. Our experiments demonstrate that the quality of ASP programs generated by LLASP is remarkable. This holds true not only when compared to the non-fine-tuned counterpart but also when compared to the majority of eager LLM candidates, particularly from a semantic perspective. All the code and data used to perform the experiments are publicly available at https://anonymous.4open.science/r/LLASP-D86C/.
Updated: 2024-07-26 13:18:42
标题: LLASP:用于答案集编程的大语言模型微调
摘要: 最近,大型语言模型(LLMs)在各种自然语言处理任务中展示了它们的潜力,包括代码生成。然而,虽然LLMs在将LLMs适应于为几种命令式编程语言和任务生成代码方面取得了重大进展,但它们在应用于声明性形式主义(如答案集编程ASP)方面仍存在显著差距。在本文中,我们迈出了一步,探索LLMs在ASP代码生成方面的能力。首先,我们对几种最先进的LLMs进行了系统评估。尽管它们在参数数量、训练数据和计算资源方面具有强大的能力,但实证结果表明,在生成正确的ASP程序方面表现不佳。因此,我们提出了LLASP,这是一个经过精细调整的轻量级模型,专门用于编码基本ASP程序模式。为此,我们创建了一个特定的数据集,涵盖了可以在ASP中编码的各种基本问题规范。我们的实验表明,LLASP生成的ASP程序质量是显著的。这不仅适用于与未经过精细调整的对应物相比,还适用于与大多数急切的LLM候选者相比,特别是从语义角度来看。用于执行实验的所有代码和数据都可以在https://anonymous.4open.science/r/LLASP-D86C/上公开获取。
更新时间: 2024-07-26 13:18:42
领域: cs.LG,cs.LO
Neurosymbolic AI for Enhancing Instructability in Generative AI
Generative AI, especially via Large Language Models (LLMs), has transformed content creation across text, images, and music, showcasing capabilities in following instructions through prompting, largely facilitated by instruction tuning. Instruction tuning is a supervised fine-tuning method where LLMs are trained on datasets formatted with specific tasks and corresponding instructions. This method systematically enhances the model's ability to comprehend and execute the provided directives. Despite these advancements, LLMs still face challenges in consistently interpreting complex, multi-step instructions and generalizing them to novel tasks, which are essential for broader applicability in real-world scenarios. This article explores why neurosymbolic AI offers a better path to enhance the instructability of LLMs. We explore the use a symbolic task planner to decompose high-level instructions into structured tasks, a neural semantic parser to ground these tasks into executable actions, and a neuro-symbolic executor to implement these actions while dynamically maintaining an explicit representation of state. We also seek to show that neurosymbolic approach enhances the reliability and context-awareness of task execution, enabling LLMs to dynamically interpret and respond to a wider range of instructional contexts with greater precision and flexibility.
Updated: 2024-07-26 13:15:50
标题: 神经符号人工智能用于增强生成式人工智能的可教性
摘要: 生成式人工智能,特别是通过大型语言模型(LLMs),已经改变了文本、图像和音乐等内容创作领域,展示了通过提示来遵循指令的能力,这在很大程度上得益于指令调整。指令调整是一种监督微调方法,其中LLMs在格式化了特定任务和相应指令的数据集上进行训练。这种方法系统地增强了模型理解和执行提供的指令的能力。尽管取得了这些进展,LLMs仍然面临着在一致解释复杂、多步骤指令并将其推广到新任务方面的挑战,这对于在实际场景中更广泛地应用至关重要。本文探讨了神经符号人工智能为何提供了一条更好的路径来增强LLMs的可指导性。我们探讨了使用符号任务计划器将高级指令分解为结构化任务,使用神经语义解析器将这些任务搭建到可执行动作中,以及使用神经符号执行器来实施这些动作,同时动态地维护状态的显式表示。我们还试图表明神经符号方法增强了任务执行的可靠性和上下文感知性,使LLMs能够动态地解释和响应更广泛的指令环境,并具有更高的精确性和灵活性。
更新时间: 2024-07-26 13:15:50
领域: cs.AI
Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.
Updated: 2024-07-26 13:02:24
标题: 基于功绩的公平组合式半强盗算法与无限制反馈延迟
摘要: 我们研究了在无限制反馈延迟下基于公平约束的随机组合半波段问题。这受到了众包和在线广告等应用的启发,其中即时反馈不可立即获得,不同选择(或臂)之间的公平至关重要。我们考虑两种类型的无限制反馈延迟:与奖励无关的延迟,其中反馈延迟与奖励无关;与奖励相关的延迟,其中反馈延迟与奖励相关。此外,我们引入了基于功绩的公平约束,以确保对臂的公平选择。我们定义了奖励遗憾和公平遗憾,并提出了新的臂选择算法,根据它们的优点在无限制反馈延迟下选择臂。我们证明我们的算法都实现了次线性的期望奖励遗憾和期望公平遗憾,依赖于延迟分布的分位数。我们还使用合成和真实数据进行了大量实验,并展示了我们的算法可以公平地选择具有不同反馈延迟的臂。
更新时间: 2024-07-26 13:02:24
领域: cs.LG,stat.ML
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the same result that the optimal vocabulary size depends on the available compute budget and that larger models deserve larger vocabularies. However, most LLMs use too small vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2-70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29.1 to 32.0 with the same 2.3e21 FLOPs. Our work emphasizes the necessity of jointly considering model parameters and vocabulary size for efficient scaling.
Updated: 2024-07-26 12:59:47
标题: 随着词汇量的增加,缩放定律:更大的模型应具有更大的词汇量
摘要: 对大型语言模型(LLMs)的扩展研究主要集中在模型参数和训练数据大小上,忽略了词汇量的作用。我们通过在多达500B个字符上训练参数从33M到3B的模型,探究了词汇量对LLM扩展定律的影响。我们提出了三种预测计算最优词汇量的互补方法:IsoFLOPs分析、导数估计和损失函数的参数拟合。我们的方法收敛于相同的结果,即最优词汇量取决于可用的计算预算,并且更大的模型需要更大的词汇量。然而,大多数LLMs使用的词汇量太小。例如,我们预测Llama2-70B的最优词汇量应至少为216K,比其32K的词汇量大7倍。我们通过在不同FLOPs预算下训练3B参数的模型来实证验证我们的预测。采用我们预测的最优词汇量一贯地提高了常用词汇量下游性能。通过将词汇量从传统的32K增加到43K,我们将ARC-Challenge的性能从29.1提高到32.0,FLOPs保持2.3e21不变。我们的工作强调了联合考虑模型参数和词汇量对有效扩展的必要性。
更新时间: 2024-07-26 12:59:47
领域: cs.CL,cs.AI
Cluster-norm for Unsupervised Probing of Knowledge
The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels, unsupervised probing techniques like Contrast-Consistent Search (CCS) have been developed (Burns et al., 2022). However, salient but unrelated features in a given dataset can mislead these probes (Farquhar et al., 2023). Addressing this, we propose a cluster normalization method to minimize the impact of such features by clustering and normalizing activations of contrast pairs before applying unsupervised probing techniques. While this approach does not address the issue of differentiating between knowledge in general and simulated knowledge - a major issue in the literature of latent knowledge elicitation (Christiano et al., 2021) - it significantly improves the ability of unsupervised probes to identify the intended knowledge amidst distractions.
Updated: 2024-07-26 12:57:54
标题: Cluster-norm 用于无监督知识探测
摘要: 语言模型的部署在生成可靠信息方面带来了挑战,特别是当这些模型使用人类偏好进行微调时。为了在提取编码知识时避免(潜在的)偏见人类标签,已经开发了无监督的探测技术,如对比一致搜索(CCS)(Burns等人,2022)。然而,在给定数据集中突出但无关的特征可能会误导这些探测器(Farquhar等人,2023)。为了解决这个问题,我们提出了一种聚类归一化方法,通过对比对的激活进行聚类和归一化,以最小化这些特征的影响,然后应用无监督的探测技术。尽管这种方法并未解决区分一般知识和模拟知识的问题 - 这是潜在知识引发文献中的一个主要问题(Christiano等人,2021)- 但它显著提高了无监督探测器在干扰中识别出预期知识的能力。
更新时间: 2024-07-26 12:57:54
领域: cs.AI,cs.CL,cs.LG
Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification
Large Language Models (LLMs) have demonstrated remarkable capabilities in executing tasks based on natural language queries. However, these models, trained on curated datasets, inherently embody biases ranging from racial to national and gender biases. It remains uncertain whether these biases impact the performance of LLMs for certain tasks. In this study, we investigate the political biases of LLMs within the stance classification task, specifically examining whether these models exhibit a tendency to more accurately classify politically-charged stances. Utilizing three datasets, seven LLMs, and four distinct prompting schemes, we analyze the performance of LLMs on politically oriented statements and targets. Our findings reveal a statistically significant difference in the performance of LLMs across various politically oriented stance classification tasks. Furthermore, we observe that this difference primarily manifests at the dataset level, with models and prompting schemes showing statistically similar performances across different stance classification datasets. Lastly, we observe that when there is greater ambiguity in the target the statement is directed towards, LLMs have poorer stance classification accuracy. Code & Dataset: http://doi.org/10.5281/zenodo.12938478
Updated: 2024-07-26 12:47:13
标题: 研究政治偏见对大型语言模型在立场分类中性能的影响
摘要: 大型语言模型(LLMs)已经展示出在执行基于自然语言查询的任务方面具有显著能力。然而,这些模型在经过精心筛选的数据集上训练,固有地具有从种族到国家和性别偏见的偏见。目前尚不清楚这些偏见是否会影响LLMs在某些任务中的表现。在本研究中,我们调查了LLMs在立场分类任务中的政治偏见,特别是检查这些模型是否表现出更准确地分类政治立场的倾向。利用三个数据集、七个LLMs和四种不同的提示方案,我们分析了LLMs在政治导向的陈述和目标上的表现。我们的研究结果显示,在各种政治导向的立场分类任务中,LLMs的表现存在显著差异。此外,我们观察到这种差异主要表现在数据集水平上,模型和提示方案在不同立场分类数据集上显示出统计上相似的表现。最后,我们观察到当陈述所指向的目标存在更多歧义时,LLMs的立场分类准确性较差。 代码和数据集:http://doi.org/10.5281/zenodo.12938478
更新时间: 2024-07-26 12:47:13
领域: cs.CL,cs.AI
Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection
Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $\epsilon >0$ our approach is able to return a mixture of Gaussian processes that is $\epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.
Updated: 2024-07-26 12:45:53
标题: 有限神经网络作为高斯过程混合体:从可证明的误差界到先验选择
摘要: 无穷宽或深的神经网络(NNs)具有独立同分布(i.i.d.)参数已被证明等价于高斯过程。由于高斯过程的有利特性,这种等价性通常被用来分析神经网络,并在多年来取得了各种突破。然而,神经网络和高斯过程只在极限情况下等价;在有限情况下,目前没有可用的方法来近似具有边界训练误差的高斯模型。在这项工作中,我们提出了一种算法框架,用混合高斯过程近似有限宽度和深度的神经网络,参数不一定独立同分布,并具有近似误差的误差边界。具体而言,我们考虑Wasserstein距离来量化概率模型之间的接近程度,并依靠最优输运和高斯过程的工具,迭代近似神经网络每一层的输出分布为高斯过程的混合物。至关重要的是,对于任何NN和$\epsilon >0$,我们的方法能够返回一个混合高斯过程,该过程在有限一组输入点处与NN $\epsilon$-close。此外,我们依赖所得误差边界的可微性,展示了我们的方法如何用于调整NN的参数以模拟给定高斯过程的功能行为,例如,在贝叶斯推断的背景下进行先验选择。我们从经验上调查了我们的结果对各种神经网络架构的回归和分类问题的有效性。我们的实验突显了我们的结果如何代表朝着理解神经网络预测并正式量化其不确定性的重要步骤。
更新时间: 2024-07-26 12:45:53
领域: cs.LG,stat.ML
LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation
Machine translation is indispensable in healthcare for enabling the global dissemination of medical knowledge across languages. However, complex medical terminology poses unique challenges to achieving adequate translation quality and accuracy. This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized specifically for medical texts. While large language models (LLMs) have demonstrated powerful capabilities, this research shows that small, specialized models trained on high-quality in-domain (mostly synthetic) data can outperform even vastly larger LLMs. Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts. Our LLMs-in-the-loop methodology employs synthetic data generation, rigorous evaluation, and agent orchestration to enhance performance. We developed small medical translation models using the MarianMT base model. We introduce a new medical translation test dataset to standardize evaluation in this domain. Assessed using BLEU, METEOR, ROUGE, and BERT scores on this test set, our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo. Results demonstrate that our LLMs-in-the-loop approach, combined with fine-tuning high-quality, domain-specific data, enables specialized models to outperform general-purpose and some larger systems. This research, part of a broader series on expert small models, paves the way for future healthcare-related AI developments, including deidentification and bio-medical entity extraction models. Our study underscores the potential of tailored neural translation models and the LLMs-in-the-loop methodology to advance the field through improved data generation, evaluation, agent, and modeling techniques.
Updated: 2024-07-26 12:37:58
标题: 在循环中的LLMs 第一部分:专家级小型人工智能模型用于生物医学文本翻译
摘要: 机器翻译在医疗保健领域中不可或缺,可以实现医学知识在不同语言间的全球传播。然而,复杂的医学术语给实现足够的翻译质量和准确性带来了独特的挑战。本研究介绍了一种新颖的“LLMs-in-the-loop”方法,用于开发专门针对医学文本进行优化的监督神经机器翻译模型。尽管大型语言模型(LLMs)表现出强大的能力,但本研究表明,使用高质量领域内(主要是合成的)数据训练的小型专门模型甚至可以胜过大得多的LLMs。 我们从科学文章、合成生成的临床文件和医学文本中编制了六种语言的自定义平行语料库。我们的LLMs-in-the-loop方法采用合成数据生成、严格评估和代理编排来提高性能。我们使用MarianMT基础模型开发了小型医学翻译模型。我们引入了一个新的医学翻译测试数据集,以在该领域中标准化评估。在这个测试集上使用BLEU、METEOR、ROUGE和BERT分数进行评估,我们基于MarianMT的模型胜过Google翻译、DeepL和GPT-4-Turbo。 结果表明,我们的LLMs-in-the-loop方法结合对高质量、领域特定数据的微调,使专门模型能够胜过通用目的和一些更大的系统。这项研究是专家小型模型系列的一部分,为未来包括去识别和生物医学实体提取模型在内的医疗保健相关AI发展铺平了道路。我们的研究强调了定制神经翻译模型和LLMs-in-the-loop方法通过改进数据生成、评估、代理和建模技术来推动该领域的潜力。
更新时间: 2024-07-26 12:37:58
领域: cs.CL,cs.AI,68T35
A Public Dataset For the ZKsync Rollup
Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer~2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. In this paper, we provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions. We also publish and share the code used in our analysis on GitHub to promote reproducibility and to support further research.
Updated: 2024-07-26 12:27:39
标题: 一个用于ZKsync Rollup的公共数据集 (Note: ZKsync Rollup 指的是一种扩展以太坊的技术,用于提高交易速度和降低费用的解决方案。)
摘要: 尽管区块链数据是公开可用的,但实际挑战和高昂的成本常常阻碍研究人员有效利用,从而限制了区块链领域的数据驱动研究和探索。当涉及到Layer~2 (L2)生态系统,特别是ZKsync时,这一点尤为明显。为了解决这些问题,我们从一个ZKsync Era存档节点中提取了1年活动数据,将其免费提供给外部机构。在本文中,我们详细介绍了这个数据集的内容及其创建方式,展示了一些可以使用它进行的示例分析,并讨论了一些未来的研究方向。我们还在GitHub上发布和分享了我们分析所使用的代码,以促进可重现性并支持进一步研究。
更新时间: 2024-07-26 12:27:39
领域: cs.CR,stat.AP
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation
Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, $k-$sampling, nucleus $p-$sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.
Updated: 2024-07-26 12:23:54
标题: 自适应对比搜索:基于不确定性引导的开放式文本生成解码
摘要: 从大型语言模型的输出分布中解码以生成高质量文本是语言建模中的一个复杂挑战。各种方法,如束搜索、温度采样、$k-$采样、核心$p-$采样、典型解码、对比解码和对比搜索,已被提出来解决这一问题,旨在提高连贯性、多样性以及与人类生成文本的相似性。在本研究中,我们引入了自适应对比搜索,这是一种新颖的解码策略,通过在每个生成步骤中引入一个自适应退化惩罚,以模型在每个生成步骤的估计不确定性为指导。该策略旨在提升语言建模过程的创造力和多样性,同时产生连贯和高质量的生成文本输出。我们的研究结果表明,在不同的模型架构和数据集上,我们的方法在文本生成任务中表现出性能提升的效果。我们的代码库、数据集和模型都可以公开获取。
更新时间: 2024-07-26 12:23:54
领域: cs.CL,cs.LG,stat.ME,stat.ML
Deep learning for predicting the occurrence of tipping points
Tipping points occur in many real-world systems, at which the system shifts suddenly from one state to another. The ability to predict the occurrence of tipping points from time series data remains an outstanding challenge and a major interest in a broad range of research fields. Particularly, the widely used methods based on bifurcation theory are neither reliable in prediction accuracy nor applicable for irregularly-sampled time series which are commonly observed from real-world systems. Here we address this challenge by developing a deep learning algorithm for predicting the occurrence of tipping points in untrained systems, by exploiting information about normal forms. Our algorithm not only outperforms traditional methods for regularly-sampled model time series but also achieves accurate predictions for irregularly-sampled model time series and empirical time series. Our ability to predict tipping points for complex systems paves the way for mitigation risks, prevention of catastrophic failures, and restoration of degraded systems, with broad applications in social science, engineering, and biology.
Updated: 2024-07-26 12:17:57
标题: 深度学习用于预测临界点的发生
摘要: 许多现实世界系统中存在临界点,该系统在此点突然从一种状态转变为另一种状态。从时间序列数据中预测临界点的发生仍然是一个突出的挑战,并且在广泛的研究领域中具有重要意义。特别是,基于分岔理论的广泛使用的方法在预测准确性方面不可靠,也不适用于常见的不规则采样时间序列。在这里,我们通过开发一个深度学习算法来解决这一挑战,用于预测未经训练系统中临界点的发生,利用有关正常形式的信息。我们的算法不仅在常规采样模型时间序列方面优于传统方法,而且在不规则采样模型时间序列和实证时间序列上实现了准确预测。我们对于复杂系统中临界点的预测能力为降低风险、预防灾难性故障和恢复受损系统铺平了道路,在社会科学、工程学和生物学等领域具有广泛应用。
更新时间: 2024-07-26 12:17:57
领域: cs.LG,math.DS
Graph Neural Networks for Virtual Sensing in Complex Systems: Addressing Heterogeneous Temporal Dynamics
Real-time condition monitoring is crucial for the reliable and efficient operation of complex systems. However, relying solely on physical sensors can be limited due to their cost, placement constraints, or inability to directly measure certain critical parameters. Virtual sensing addresses these limitations by leveraging readily available sensor data and system knowledge to estimate inaccessible parameters or infer system states. The increasing complexity of industrial systems necessitates deployments of sensors with diverse modalities to provide a comprehensive understanding of system states. These sensors capture data at varying frequencies to monitor both rapid and slowly varying system dynamics, as well as local and global state evolutions of the systems. This leads to heterogeneous temporal dynamics, which, particularly under varying operational end environmental conditions, pose a significant challenge for accurate virtual sensing. To address this, we propose a Heterogeneous Temporal Graph Neural Network (HTGNN) framework. HTGNN explicitly models signals from diverse sensors and integrates operating conditions into the model architecture. We evaluate HTGNN using two newly released datasets: a bearing dataset with diverse load conditions for bearing load prediction and a year-long simulated dataset for predicting bridge live loads. Our results demonstrate that HTGNN significantly outperforms established baseline methods in both tasks, particularly under highly varying operating conditions. These results highlight HTGNN's potential as a robust and accurate virtual sensing approach for complex systems, paving the way for improved monitoring, predictive maintenance, and enhanced system performance.
Updated: 2024-07-26 12:16:53
标题: 复杂系统中的虚拟感知图神经网络:解决异质时间动态
摘要: 实时条件监测对于复杂系统的可靠和高效运行至关重要。然而,仅依赖物理传感器可能受限于其成本、布置限制或无法直接测量某些关键参数。虚拟传感通过利用现有的传感器数据和系统知识来估计不可访问的参数或推断系统状态,从而解决了这些限制。工业系统日益复杂,需要部署具有多种模态的传感器,以全面了解系统状态。这些传感器以不同频率捕获数据,以监测系统动态的快速和缓慢变化,以及系统的本地和全局状态演变。这导致了异质的时间动态,尤其在不同的操作结束环境条件下,对准确的虚拟传感构成重大挑战。为解决这一问题,我们提出了一个异质时间图神经网络(HTGNN)框架。HTGNN明确地对来自多种传感器的信号进行建模,并将操作条件集成到模型架构中。我们使用两个新发布的数据集对HTGNN进行评估:一个带有多样负载条件的轴承数据集,用于轴承负载预测,以及一个年度模拟数据集,用于预测桥梁活载。我们的结果表明,在两项任务中,HTGNN在高度变化的操作条件下特别优于已建立的基准方法。这些结果突出了HTGNN作为复杂系统的稳健和准确的虚拟传感方法的潜力,为改进监测、预测性维护和提高系统性能铺平了道路。
更新时间: 2024-07-26 12:16:53
领域: cs.LG,cs.AI,cs.CE
Collaborative Evolving Strategy for Automatic Data-Centric Development
Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) task and outline its core challenges, which require domain-experts-like task scheduling and implementation capability, largely unexplored by previous work. By leveraging the strong complex problem-solving capabilities of large language models (LLMs), we propose an LLM-based autonomous agent, equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval (Co-STEER), to simultaneously address all the challenges. Specifically, our proposed Co-STEER agent enriches its domain knowledge through our proposed evolving strategy and develops both its scheduling and implementation skills by accumulating and retrieving domain-specific practical experience. With an improved schedule, the capability for implementation accelerates. Simultaneously, as implementation feedback becomes more thorough, the scheduling accuracy increases. These two capabilities evolve together through practical feedback, enabling a collaborative evolution process. Extensive experimental results demonstrate that our Co-STEER agent breaks new ground in AD^2 research, possesses strong evolvable schedule and implementation ability, and demonstrates the significant effectiveness of its components. Our Co-STEER paves the way for AD^2 advancements.
Updated: 2024-07-26 12:16:47
标题: 协作演进策略用于自动数据中心化开发
摘要: 人工智能(AI)显著影响许多领域,这在很大程度上归功于大量高质量数据用于机器学习模型。现在重点放在以数据为中心的AI策略上,优先考虑数据发展而非模型设计的进展。自动化这一过程至关重要。在本文中,我们作为第一个引入自动化以数据为中心开发(AD^2)任务并概述其核心挑战的工作,这需要领域专家般的任务调度和实施能力,这在以前的工作中很大程度上尚未被探索。 通过利用大型语言模型(LLMs)强大的复杂问题解决能力,我们提出了一种基于LLM的自主代理,配备了一种名为Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval(Co-STEER)的策略,以同时解决所有挑战。具体而言,我们提出的Co-STEER代理通过我们提出的进化策略丰富其领域知识,并通过积累和检索领域特定的实际经验发展其调度和实施技能。通过改进的时间表,实施能力加速。同时,随着实施反馈变得更加详尽,调度准确性也增加。这两种能力通过实际反馈共同进化,实现合作进化过程。 广泛的实验结果表明,我们的Co-STEER代理在AD^2研究中开创了新局面,具有强大的可进化调度和实施能力,并展示了其组件的显著有效性。我们的Co-STEER为AD^2的进展铺平了道路。
更新时间: 2024-07-26 12:16:47
领域: cs.AI
Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks
Discovering novel drug candidate molecules is one of the most fundamental and critical steps in drug development. Generative deep learning models, which create synthetic data given a probability distribution, offer a high potential for designing de novo molecules. However, for them to be useful in real-life drug development pipelines, these models should be able to design drug-like and target-centric molecules. In this study, we propose an end-to-end generative system, DrugGEN, for the de novo design of drug candidate molecules that interact with intended target proteins. The proposed method represents molecules as graphs and processes them via a generative adversarial network comprising graph transformer layers. The system is trained using a large dataset of drug-like compounds and target-specific bioactive molecules to design effective inhibitory molecules against the AKT1 protein, which is critically important in developing treatments for various types of cancer. We conducted molecular docking and dynamics to assess the target-centric generation performance of the model, as well as attention score visualisation to examine model interpretability. Results indicate that our de novo molecules have a high potential for interacting with the AKT1 protein at the level of its native ligands. Using the open-access DrugGEN codebase, it is possible to easily train models for other druggable proteins, given a dataset of experimentally known bioactive molecules.
Updated: 2024-07-26 11:59:06
标题: 使用基于图变换器的生成对抗网络对药物候选分子进行靶向特异性全新设计
摘要: 发现新型药物候选分子是药物开发中最基本和关键的步骤之一。生成式深度学习模型可以根据概率分布生成合成数据,为设计全新分子提供了很高的潜力。然而,为了在现实生活中的药物开发流程中发挥作用,这些模型应该能够设计具有药物样性和靶向中心性的分子。在本研究中,我们提出了一个端到端的生成系统DrugGEN,用于设计与预期靶蛋白相互作用的药物候选分子。所提出的方法将分子表示为图,并通过包含图变换层的生成对抗网络对其进行处理。该系统使用大量药物样化合物和靶向特异性生物活性分子的数据集进行训练,以设计对AKT1蛋白具有有效抑制作用的分子,这对于开发治疗各种癌症非常重要。我们进行了分子对接和动力学模拟来评估模型的靶向中心生成性能,以及进行了注意力分数可视化来检查模型的可解释性。结果表明,我们的全新分子在与AKT1蛋白相互作用的潜力方面具有很高的可能性,可以与其天然配体的水平相媲美。利用开放获取的DrugGEN代码库,可以轻松训练其他可药用蛋白的模型,只需提供已知的生物活性分子数据集即可。
更新时间: 2024-07-26 11:59:06
领域: cs.LG,q-bio.BM,q-bio.QM
Rapid Object Annotation
In this report we consider the problem of rapidly annotating a video with bounding boxes for a novel object. We describe a UI and associated workflow designed to make this process fast for an arbitrary novel target.
Updated: 2024-07-26 11:56:23
标题: 快速对象标注
摘要: 在这份报告中,我们考虑了快速为一种新对象标注视频边界框的问题。我们描述了一个用户界面和相关工作流程,旨在使这一过程对于任何新目标都能快速完成。
更新时间: 2024-07-26 11:56:23
领域: cs.CV,cs.LG
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.
Updated: 2024-07-26 11:54:04
标题: 关于LLMs在规划中的作用:将LLMs嵌入规划图中
摘要: 计划合成旨在生成一系列行动或政策,以将给定的初始状态过渡到目标状态,前提是领域模型可以由专家设计或从训练数据或与世界的互动中学习。受大型语言模型(LLMs)中新兴规划能力的声明启发,已经提出了研究LLMs规划有效性的工作,而不考虑LLMs中现成规划技术的利用。在本文中,我们旨在通过研究LLMs在现成规划框架中的作用来进一步研究LLMs的规划能力。为此,我们研究了将LLMs嵌入到众所周知的基于图的规划框架中的有效性,提出了一个基于LLMs的规划框架,其中LLMs嵌入到两个级别的规划图中,即相互约束生成级别和约束求解级别。我们在各种规划领域中实证展示了我们提出的框架的有效性。
更新时间: 2024-07-26 11:54:04
领域: cs.AI
Exploring Scaling Trends in LLM Robustness
Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large language models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as "jailbreaks" that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially better to adversarial training, but there is little to no benefit from model scale in the absence of explicit defenses.
Updated: 2024-07-26 11:51:58
标题: 探索LLM鲁棒性的缩放趋势
摘要: 语言模型的能力可以通过扩大模型规模和训练数据来提高,这一点是可以预测的。受此启发,越来越大的语言模型已经被训练出来,展示出一系列令人印象深刻的能力。然而,这些模型容易受到对抗性提示的影响,比如“越狱”,这会劫持模型执行不期望的行为,造成严重的误用风险。先前的研究表明,随着模型和数据规模的扩大,计算机视觉模型变得更加强大,这引发了一个问题:语言模型的鲁棒性是否也会随着规模的扩大而提高?我们通过实证研究探讨了这个问题,发现更大的模型在对抗性训练下表现更好,但在没有明确防御措施的情况下,模型规模几乎没有任何好处。
更新时间: 2024-07-26 11:51:58
领域: cs.LG,cs.AI,cs.CL,cs.CR,I.2.7
VeriCHERI: Exhaustive Formal Security Verification of CHERI at the RTL
Protecting data in memory from attackers continues to be a concern in computing systems. CHERI is a promising approach to achieve such protection, by providing and enforcing fine-grained memory protection directly in the hardware. Creating trust for the entire system stack, however, requires a gap-free verification of CHERI's hardware-based protection mechanisms. Existing verification methods for CHERI target the abstract ISA model rather than the underlying hardware implementation. Fully ensuring the CHERI security guarantees for a concrete RTL implementation is a challenge in previous flows and demands high manual efforts. This paper presents VeriCHERI, a novel approach to security verification. It is conceptionally different from previous works in that it does not require any ISA specification. Instead of checking compliance with a golden ISA model, we check against well-established global security objectives of confidentiality and integrity. Fully covering these objectives, VeriCHERI uses as few as four unbounded properties to exhaustively prove or disprove any vulnerability. We demonstrate the effectiveness and scalability of VeriCHERI on a RISC-V based processor implementing a CHERI variant.
Updated: 2024-07-26 11:48:55
标题: VeriCHERI:对RTL层面的CHERI进行全面形式化安全验证
摘要: 在计算系统中,保护内存中的数据免受攻击者的侵害仍然是一个关注点。CHERI是一种有前途的方法,可以通过在硬件中提供和强制执行细粒度内存保护来实现这种保护。然而,为整个系统堆栈创建信任需要对CHERI基于硬件的保护机制进行无缝验证。现有的针对CHERI的验证方法针对的是抽象的ISA模型,而不是底层硬件实现。完全确保CHERI安全性保证对于具体的RTL实现是前期流程中的一个挑战,并且需要高度的手工努力。本文提出了VeriCHERI,这是一种新颖的安全验证方法。它在概念上与先前的工作有所不同,因为它不需要任何ISA规范。我们不是检查是否符合黄金ISA模型,而是检查是否符合已建立的全局安全机密性和完整性目标。VeriCHERI使用仅四个无界属性来全面覆盖这些目标,以详尽地证明或反驳任何漏洞。我们展示了VeriCHERI在实现了CHERI变体的基于RISC-V的处理器上的有效性和可扩展性。
更新时间: 2024-07-26 11:48:55
领域: cs.CR
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Reinforcement learning from human feedback (RLHF) aligns Large Language Models (LLMs) with human preferences. However, these preferences can often change over time due to external factors (e.g. environment change and societal influence). Consequently, what was wrong then might be right now. Current preference optimization algorithms do not account for temporal preference drift in their modeling, which can lead to severe misalignment. To address this limitation, we use a Dynamic Bradley-Terry model that models preferences via time-dependent reward functions, and propose Non-Stationary Direct Preference Optimisation (NS-DPO). By introducing a discount parameter in the loss function, NS-DPO applies exponential weighting, which proportionally focuses learning on more time-relevant datapoints. We theoretically analyse the convergence of NS-DPO in the offline setting, providing upper bounds on the estimation error caused by non-stationary preferences. Finally, we demonstrate the effectiveness of NS-DPO1 for fine-tuning LLMs in scenarios with drifting preferences. By simulating preference drift using renowned reward models and modifying popular LLM datasets accordingly, we show that NS-DPO fine-tuned LLMs remain robust under non-stationarity, significantly outperforming baseline algorithms that ignore temporal preference changes, without sacrificing performance in stationary cases.
Updated: 2024-07-26 11:38:18
标题: 现在,错了:偏好漂移下的非平稳直接偏好优化
摘要: 人类反馈强化学习(RLHF)将大型语言模型(LLMs)与人类偏好进行对齐。然而,这些偏好往往会因外部因素(例如环境变化和社会影响)而随时间变化。因此,以前的错误可能现在是正确的。当前的偏好优化算法并未考虑其建模中的时间偏好漂移,这可能导致严重的不对齐。为了解决这一局限性,我们使用动态Bradley-Terry模型来通过时间相关的奖励函数建模偏好,并提出非稳态直接偏好优化(NS-DPO)。通过在损失函数中引入折扣参数,NS-DPO应用指数加权,从而比例地将学习重点放在更具时间相关性的数据点上。我们在离线设置中从理论上分析了NS-DPO的收敛性,提供了由于非稳态偏好引起的估计误差的上界。最后,我们展示了NS-DPO1在具有偏好漂移的情景中微调LLMs的有效性。通过使用知名奖励模型模拟偏好漂移,并相应修改流行的LLM数据集,我们展示了NS-DPO微调的LLMs在非稳态下依然保持稳健,在不牺牲在稳态情况下的性能的情况下明显优于忽略时间偏好变化的基线算法。
更新时间: 2024-07-26 11:38:18
领域: cs.LG
A dual ensemble classifier used to recognise contaminated multi-channel EMG and MMG signals in the control of upper limb bioprosthesis
Myopotential pattern recognition to decode the intent of the user is the most advanced approach to controlling a powered bioprosthesis. Unfortunately, many factors make this a difficult problem and achieving acceptable recognition quality in real-word conditions is a serious challenge. The aim of the paper is to develop a recognition system that will mitigate factors related to multimodality and multichannel recording of biosignals and their high susceptibility to contamination. The proposed method involves the use of two co-operating multiclassifier systems. The first system is composed of one-class classifiers related to individual electromyographic (EMG) and mechanomyographic (MMG) biosignal recording channels, and its task is to recognise contaminated channels. The role of the second system is to recognise the class of movement resulting from the patient's intention. The ensemble system consists of base classifiers using the representation (extracted features) of biosignals from different channels. The system uses a dynamic selection mechanism, eliminating those base classifiers that are associated with biosignal channels that are recognised by the one-class ensemble system as being contaminated. Experimental studies were conducted using signals from an able-bodied person with simulation of amputation. The results obtained allow us to reject the null hypothesis that the application of the dual ensemble foes not lead to improved classification quality.
Updated: 2024-07-26 11:36:05
标题: 一个用于识别受污染的多通道EMG和MMG信号的双重集成分类器在上肢生物假肢控制中的应用
摘要: 肌电位模式识别技术用于解码用户意图是控制动力生物假体的最先进方法。然而,许多因素使这成为一个困难的问题,在实际条件下实现可接受的识别质量是一个严峻的挑战。本文的目的是开发一个识别系统,以减轻与生物信号的多模式和多通道记录以及其高容易受污染性相关的因素。所提出的方法涉及使用两个协作的多分类器系统。第一个系统由与个体肌电图(EMG)和机械肌电图(MMG)生物信号记录通道相关的单类分类器组成,其任务是识别受污染的通道。第二个系统的作用是识别由患者意图导致的运动类别。整体系统由使用来自不同通道的生物信号的表示(提取特征)的基分类器组成。该系统使用动态选择机制,消除单类整体系统识别为受污染的生物信号通道的基分类器。实验研究使用了模拟截肢的健全人的信号进行。所获得的结果使我们能够拒绝零假设,即双重整体应用不会导致改善分类质量。
更新时间: 2024-07-26 11:36:05
领域: cs.LG
A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention
Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesise context and attention inductive biases in artificial neural networks (ANNs) could increase the generalization of algorithms for cell nuclei instance segmentation and classification. We conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general instance segmentation and classification method (Mask-RCNN) and a tailored cell nuclei instance segmentation and classification model (HoVer-Net) with context- and attention-based mechanisms, and do a comparative analysis on a multi-centre colon nuclei identification and counting dataset. Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms, the scientific understanding of these methods should be addressed.
Updated: 2024-07-26 11:30:22
标题: 对细胞核实例分割和分类的调查:利用上下文和注意力
摘要: 手动注释基于吉格像素的Hematoxylin和Eosin(H&E)染色全幅幻灯片图像(WSI)中的细胞核是一项繁重且昂贵的任务,意味着自动化算法用于细胞核实例分割和分类可以减轻病理学家和临床研究人员的工作负担,同时促进临床可解释特征的自动提取。但由于细胞核形态和色素特征的高内部和类间变异性,以及H&E染色对伪影的敏感性,最先进的算法无法正确检测和分类具有必要性能的实例。在这项工作中,我们假设人工神经网络(ANNs)中的上下文和注意力归纳偏差可以增加用于细胞核实例分割和分类的算法的泛化能力。我们对来自H&E染色显微成像的细胞核实例分割和分类的上下文和注意力方法进行了深入调查,同时对使用上下文和注意力解决的挑战进行了全面讨论。此外,我们阐述了当前方法的一些局限性并提出了未来研究的想法。作为一个案例研究,我们扩展了一般实例分割和分类方法(Mask-RCNN)和定制的细胞核实例分割和分类模型(HoVer-Net)以及基于上下文和注意力的机制,并在多中心结肠细胞核识别和计数数据集上进行了比较分析。尽管病理学家在分析和注释WSI时在多个层次上依赖上下文,并在关注特定感兴趣区域(RoIs)时,我们的发现表明将该领域知识转化为算法设计并不是一项简单的任务,但要充分利用这些机制,就应该解决对这些方法的科学理解。
更新时间: 2024-07-26 11:30:22
领域: cs.CV,cs.LG
Quality Assured: Rethinking Annotation Strategies in Imaging AI
This paper does not describe a novel method. Instead, it studies an essential foundation for reliable benchmarking and ultimately real-world application of AI-based image analysis: generating high-quality reference annotations. Previous research has focused on crowdsourcing as a means of outsourcing annotations. However, little attention has so far been given to annotation companies, specifically regarding their internal quality assurance (QA) processes. Therefore, our aim is to evaluate the influence of QA employed by annotation companies on annotation quality and devise methodologies for maximizing data annotation efficacy. Based on a total of 57,648 instance segmented images obtained from a total of 924 annotators and 34 QA workers from four annotation companies and Amazon Mechanical Turk (MTurk), we derived the following insights: (1) Annotation companies perform better both in terms of quantity and quality compared to the widely used platform MTurk. (2) Annotation companies' internal QA only provides marginal improvements, if any. However, improving labeling instructions instead of investing in QA can substantially boost annotation performance. (3) The benefit of internal QA depends on specific image characteristics. Our work could enable researchers to derive substantially more value from a fixed annotation budget and change the way annotation companies conduct internal QA.
Updated: 2024-07-26 11:26:43
标题: 质量保证:重新思考影像AI中的标注策略
摘要: 本文不描述一种新颖的方法。相反,它研究了可靠基准测试和最终基于人工智能图像分析的实际应用的基础:生成高质量的参考标注。先前的研究集中在众包作为外包标注的手段。然而,迄今为止对标注公司,特别是其内部质量保证(QA)流程,关注不足。因此,我们的目标是评估标注公司采用的QA对标注质量的影响,并设计出用于最大化数据标注效能的方法。基于从四家标注公司和亚马逊机械土耳其(MTurk)获取的924名标注者和34名QA工作者总共获得的57,648个实例分割图像,我们得出以下见解:(1)与广泛使用的平台MTurk相比,标注公司在数量和质量方面表现更好。 (2)标注公司的内部QA只提供了微弱的改进,如果有的话。然而,改善标注说明而不是投资于QA可以显著提升标注性能。 (3)内部QA的好处取决于特定的图像特征。我们的工作可以使研究人员从固定的标注预算中获得更多的价值,并改变标注公司进行内部QA的方式。
更新时间: 2024-07-26 11:26:43
领域: cs.CV,cs.AI,cs.LG
AMIR: Automated MisInformation Rebuttal -- A COVID-19 Vaccination Datasets based Recommendation System
Misinformation has emerged as a major societal threat in recent years in general; specifically in the context of the COVID-19 pandemic, it has wrecked havoc, for instance, by fuelling vaccine hesitancy. Cost-effective, scalable solutions for combating misinformation are the need of the hour. This work explored how existing information obtained from social media and augmented with more curated fact checked data repositories can be harnessed to facilitate automated rebuttal of misinformation at scale. While the ideas herein can be generalized and reapplied in the broader context of misinformation mitigation using a multitude of information sources and catering to the spectrum of social media platforms, this work serves as a proof of concept, and as such, it is confined in its scope to only rebuttal of tweets, and in the specific context of misinformation regarding COVID-19. It leverages two publicly available datasets, viz. FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination.
Updated: 2024-07-26 11:21:24
标题: AMIR:自动虚假信息辟谣——基于COVID-19疫苗数据集的推荐系统
摘要: 在近年来,虚假信息已经成为一个重要的社会威胁,特别是在COVID-19大流行的背景下,它已经造成了混乱,例如通过助长疫苗犹豫不决。对抗虚假信息的成本效益高、可扩展的解决方案已经成为当务之急。这项工作探讨了如何利用从社交媒体获取的现有信息,并结合更多经过筛选的事实核查数据仓库,以便实现规模化自动反驳虚假信息。尽管这里的思想可以概括并重新应用于使用多种信息来源、面向社交媒体平台的广泛虚假信息缓解的更广泛背景中,但这项工作作为一个概念验证,并因此仅限于反驳推文,并且在有关COVID-19的虚假信息具体背景下。它利用了两个公开可用的数据集,即FaCov(事实核查文章)和关于COVID-19疫苗接种的误导性(社交媒体Twitter)数据。
更新时间: 2024-07-26 11:21:24
领域: cs.AI,cs.IR,cs.SI
Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models
Text-to-image diffusion models generate impressive and realistic images, but do they learn to represent the 3D world from only 2D supervision? We demonstrate that yes, certain 3D scene representations are encoded in the text embedding space of models like Stable Diffusion. Our approach, Viewpoint Neural Textual Inversion (ViewNeTI), is to discover 3D view tokens; these tokens control the 3D viewpoint - the rendering pose in a scene - of generated images. Specifically, we train a small neural mapper to take continuous camera viewpoint parameters and predict a view token (a word embedding). This token conditions diffusion generation via cross-attention to produce images with the desired camera viewpoint. Using ViewNeTI as an evaluation tool, we report two findings: first, the text latent space has a continuous view-control manifold for particular 3D scenes; second, we find evidence for a generalized view-control manifold for all scenes. We conclude that since the view token controls the 3D `rendering' viewpoint, there is likely a scene representation embedded in frozen 2D diffusion models. Finally, we exploit the 3D scene representations for 3D vision tasks, namely, view-controlled text-to-image generation, and novel view synthesis from a single image, where our approach sets state-of-the-art for LPIPS. Code available at https://github.com/jmhb0/view_neti
Updated: 2024-07-26 11:14:21
标题: 观点文本反转:在2D扩散模型中发现场景表示和3D视图控制
摘要: 文本到图像扩散模型生成令人印象深刻和逼真的图像,但它们是否能够从仅有2D监督中学习表示三维世界?我们证明了是的,像稳定扩散这样的模型的文本嵌入空间中编码了某些三维场景表示。我们的方法,Viewpoint Neural Textual Inversion(ViewNeTI),是发现3D视图令牌;这些令牌控制生成图像的三维视角 - 场景中的渲染姿势。具体来说,我们训练一个小型神经映射器,以接受连续的摄像机视角参数,并预测一个视图令牌(一个词嵌入)。该令牌通过交叉注意力条件扩散生成,以产生具有所需摄像机视角的图像。使用ViewNeTI作为评估工具,我们报告了两个发现:首先,文本潜空间具有特定3D场景的连续视图控制流形;其次,我们发现所有场景都具有广义视图控制流形的证据。我们得出结论,由于视图令牌控制三维“渲染”视角,因此很可能在冻结的2D扩散模型中嵌入了场景表示。最后,我们利用3D场景表示进行3D视觉任务,即视图控制文本到图像生成和从单个图像进行新视图合成,其中我们的方法在LPIPS方面达到了最先进水平。代码可在https://github.com/jmhb0/view_neti 上找到。
更新时间: 2024-07-26 11:14:21
领域: cs.CV,cs.AI,cs.LG
The SkipSponge Attack: Sponge Weight Poisoning of Deep Neural Networks
Sponge attacks aim to increase the energy consumption and computation time of neural networks deployed on hardware accelerators. Existing sponge attacks can be performed during inference via sponge examples or during training via Sponge Poisoning. Sponge examples leverage perturbations added to the model's input to increase energy and latency, while Sponge Poisoning alters the objective function of a model to induce inference-time energy effects. In this work, we propose a novel sponge attack called SkipSponge. SkipSponge is the first sponge attack that is performed directly on the parameters of a pre-trained model using only a few data samples. Our experiments show that SkipSponge can successfully increase the energy consumption of image classification models, GANs, and autoencoders with fewer samples required than Sponge Poisoning. We show that poisoning defenses are ineffective if not adjusted specifically for the defense against SkipSponge (i.e., they decrease target layer bias values). Our work shows that SkipSponge is more effective on the GANs and the autoencoders than the state-of-the-art. Additionally, SkipSponge is stealthier than the previous Sponge Poisoning attack as it does not require significant changes in the victim model's weights. Our experiments indicate that the SkipSponge attack can be performed even when an attacker has access to only 1% of the entire dataset and reaches up to 13% energy increase.
Updated: 2024-07-26 11:08:07
标题: 跳绳海绵攻击:深度神经网络的海绵权重中毒
摘要: 海绵攻击旨在增加部署在硬件加速器上的神经网络的能耗和计算时间。现有的海绵攻击可以通过海绵示例在推理过程中执行,也可以通过海绵毒化在训练过程中执行。海绵示例利用添加到模型输入的扰动来增加能量和延迟,而海绵毒化改变模型的目标函数以诱导推理时的能量效应。在这项工作中,我们提出了一种新颖的海绵攻击称为SkipSponge。SkipSponge是第一种直接在预训练模型的参数上执行的海绵攻击,只需少量数据样本。我们的实验表明,SkipSponge可以成功增加图像分类模型、GAN和自编码器的能耗,且所需的样本比海绵毒化更少。我们发现,如果毒化防御没有专门调整以防御SkipSponge(即降低目标层的偏差值),则其是无效的。我们的研究表明,SkipSponge对GAN和自编码器比最先进的方法更有效。此外,SkipSponge比之前的海绵毒化攻击更隐蔽,因为它不需要显著改变受害模型的权重。我们的实验表明,即使攻击者只能访问整个数据集的1%,也可以执行SkipSponge攻击,并使能量增加达到13%。
更新时间: 2024-07-26 11:08:07
领域: cs.CR,cs.LG
Comparative Analysis of AES, Blowfish, Twofish, Salsa20, and ChaCha20 for Image Encryption
Nowadays, cybersecurity has grown into a more significant and difficult scientific issue. The recog-nition of threats and attacks meant for knowledge and safety on the internet is growing harder to detect. Since cybersecurity guarantees the privacy and security of data sent via the Internet, it is essential, while also providing protection against malicious attacks. Encrypt has grown into an an-swer that has become an essential element of information security systems. To ensure the security of shared data, including text, images, or videos, it is essential to employ various methods and strategies. This study delves into the prevalent cryptographic methods and algorithms utilized for prevention and stream encryption, examining their encoding techniques such as advanced encryp-tion standard (AES), Blowfish, Twofish, Salsa20, and ChaCha20. The primary objective of this re-search is to identify the optimal times and throughputs (speeds) for data encryption and decryption processes. The methodology of this study involved selecting five distinct types of images to com-pare the outcomes of the techniques evaluated in this research. The assessment focused on pro-cessing time and speed parameters, examining visual encoding and decoding using Java as the pri-mary platform. A comparative analysis of several symmetric key ciphers was performed, focusing on handling large datasets. Despite this limitation, comparing different images helped evaluate the techniques' novelty. The results showed that ChaCha20 had the best average time for both encryp-tion and decryption, being over 50% faster than some other algorithms. However, the Twofish algo-rithm had lower throughput during testing. The paper concludes with findings and suggestions for future improvements.
Updated: 2024-07-26 11:04:49
标题: 图像加密的AES、Blowfish、Twofish、Salsa20和ChaCha20的比较分析
摘要: 如今,网络安全已经发展成为一个更为重要且困难的科学问题。对于互联网上针对知识和安全的威胁和攻击的认识日益难以检测。由于网络安全保证了通过互联网发送数据的隐私和安全,因此它是必不可少的,同时也提供了对恶意攻击的保护。加密已经成为信息安全系统中不可或缺的一个答案。为了确保共享数据的安全,包括文本、图片或视频,必须采用各种方法和策略。本研究探讨了用于预防和流加密的普遍加密方法和算法,检查其编码技术,如高级加密标准(AES)、Blowfish、Twofish、Salsa20和ChaCha20。本研究的主要目标是确定数据加密和解密过程的最佳时间和吞吐量(速度)。本研究的方法包括选择五种不同类型的图像,比较本研究中评估的技术的结果。评估重点放在处理时间和速度参数上,使用Java作为主要平台进行视觉编码和解码。进行了几种对称密钥密码的比较分析,重点是处理大型数据集。尽管存在这种限制,但比较不同的图像有助于评估技术的创新性。结果显示,ChaCha20在加密和解密方面的平均时间最佳,比一些其他算法快50%以上。然而,在测试过程中,Twofish算法的吞吐量较低。论文以研究结果和未来改进的建议结论。
更新时间: 2024-07-26 11:04:49
领域: cs.CR,cs.AI
TEDi Policy: Temporally Entangled Diffusion for Robotic Control
Diffusion models have been shown to excel in robotic imitation learning by mastering the challenge of modeling complex distributions. However, sampling speed has traditionally not been a priority due to their popularity for image generation, limiting their application to dynamical tasks. While recent work has improved the sampling speed of diffusion-based robotic policies, they are restricted to techniques from the image generation domain. We adapt Temporally Entangled Diffusion (TEDi), a framework specific for trajectory generation, to speed up diffusion-based policies for imitation learning. We introduce TEDi Policy, with novel regimes for training and sampling, and show that it drastically improves the sampling speed while remaining performant when applied to state-of-the-art diffusion-based imitation learning policies.
Updated: 2024-07-26 11:02:27
标题: TEDi政策:用于机器人控制的时间纠缠扩散
摘要: 扩散模型在机器人模仿学习中表现出色,通过掌握建模复杂分布的挑战。然而,由于它们在图像生成方面广受欢迎,传统上并未将采样速度视为优先考虑的问题,从而限制了它们在动态任务中的应用。尽管最近的研究工作改善了基于扩散的机器人策略的采样速度,但它们仅限于来自图像生成领域的技术。我们将特定于轨迹生成的时间纠缠扩散(TEDi)框架调整为用于加速基于扩散的策略的模仿学习。我们引入了TEDi策略,具有新颖的训练和采样模式,并展示了它在应用于最先进的基于扩散的模仿学习策略时显著提高了采样速度,同时保持了良好的性能。
更新时间: 2024-07-26 11:02:27
领域: cs.RO,cs.AI
DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
Accurate real-time object detection is vital across numerous industrial applications, from safety monitoring to quality control. Traditional approaches, however, are hindered by arduous manual annotation and data collection, struggling to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an innovative automated end-to-end pipeline that revolutionizes object detection workflows from data collection to model evaluation. It eliminates the need for laborious human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios. DART encompasses four key stages: (1) Data Diversification using subject-driven image generation (DreamBooth with SDXL), (2) Annotation via open-vocabulary object detection (Grounding DINO) to generate bounding box and class labels (3) Review of generated images and pseudo-labels by large multimodal models (InternVL-1.5 and GPT-4o) to guarantee credibility, (4) Training of real-time object detectors (YOLOv8 and YOLOv10) using the verified data as ground truth. We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832. Its modular design ensures easy exchangeability and extensibility, allowing for future algorithm upgrades, seamless integration of new object categories, and adaptability to customized environments without manual labeling and additional data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.
Updated: 2024-07-26 11:01:21
标题: DART:一种带有数据多样性、开放词汇边界框标注、伪标签审查和模型训练的自动化端到端目标检测管道
摘要: 准确的实时对象检测对于许多工业应用至关重要,从安全监控到质量控制。然而,传统方法受到繁琐的手动标注和数据收集的阻碍,难以适应不断变化的环境和新颖的目标对象。为了解决这些限制,本文介绍了DART,一种创新的自动化端到端流程,从数据收集到模型评估彻底改变了对象检测工作流程。它消除了繁重的人工标注和广泛的数据收集需求,同时在各种场景下实现了出色的准确性。DART包括四个关键阶段:(1)使用面向主题的图像生成(DreamBooth with SDXL)进行数据多样化,(2)通过开放词汇对象检测(Grounding DINO)生成边界框和类标签进行注释,(3)通过大型多模型(InternVL-1.5和GPT-4o)审核生成的图像和伪标签以确保可信度,(4)使用经过验证的数据作为基本事实训练实时对象检测器(YOLOv8和YOLOv10)。我们将DART应用于一个名为Liebherr Product的自采集的建筑机械数据集,其中包含23个类别的超过15K张高质量图像。DART的当前实例将平均精度(AP)从0.064提高到0.832。其模块化设计确保易于交换和扩展,允许未来算法升级,无缝集成新的对象类别,并适应定制环境而无需手动标注和额外数据收集。该代码和数据集已发布在https://github.com/chen-xin-94/DART。
更新时间: 2024-07-26 11:01:21
领域: cs.CV,cs.AI
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model
The field of integrated circuit (IC) design is highly specialized, presenting significant barriers to entry and research and development challenges. Although large language models (LLMs) have achieved remarkable success in various domains, existing LLMs often fail to meet the specific needs of students, engineers, and researchers. Consequently, the potential of LLMs in the IC design domain remains largely unexplored. To address these issues, we introduce ChipExpert, the first open-source, instructional LLM specifically tailored for the IC design field. ChipExpert is trained on one of the current best open-source base model (Llama-3 8B). The entire training process encompasses several key stages, including data preparation, continue pre-training, instruction-guided supervised fine-tuning, preference alignment, and evaluation. In the data preparation stage, we construct multiple high-quality custom datasets through manual selection and data synthesis techniques. In the subsequent two stages, ChipExpert acquires a vast amount of IC design knowledge and learns how to respond to user queries professionally. ChipExpert also undergoes an alignment phase, using Direct Preference Optimization, to achieve a high standard of ethical performance. Finally, to mitigate the hallucinations of ChipExpert, we have developed a Retrieval-Augmented Generation (RAG) system, based on the IC design knowledge base. We also released the first IC design benchmark ChipICD-Bench, to evaluate the capabilities of LLMs across multiple IC design sub-domains. Through comprehensive experiments conducted on this benchmark, ChipExpert demonstrated a high level of expertise in IC design knowledge Question-and-Answer tasks.
Updated: 2024-07-26 11:00:08
标题: 芯片专家:开源集成电路设计特定大型语言模型
摘要: 集成电路(IC)设计领域具有高度专业化,存在进入壁垒和研发挑战。尽管大型语言模型(LLMs)在各个领域取得了显著成功,现有LLMs经常无法满足学生、工程师和研究人员的特定需求。因此,LLMs在IC设计领域的潜力仍然大部分未被开发。为了解决这些问题,我们介绍了ChipExpert,这是第一个针对IC设计领域专门定制的开源教学LLM。ChipExpert在目前最佳的开源基础模型(Llama-3 8B)上进行训练。整个训练过程包括多个关键阶段,包括数据准备、继续预训练、指导式监督微调、偏好对齐和评估。在数据准备阶段,我们通过手动选择和数据合成技术构建了多个高质量的自定义数据集。在随后的两个阶段,ChipExpert获得了大量IC设计知识,并学会了如何专业地回答用户的查询。ChipExpert还经历了一个对齐阶段,利用直接偏好优化,以实现高水准的道德表现。最后,为了减轻ChipExpert的幻觉,我们开发了一个基于IC设计知识库的检索增强生成(RAG)系统。我们还发布了第一个IC设计基准测试ChipICD-Bench,以评估LLMs在多个IC设计子领域的能力。通过在这个基准上进行的综合实验,ChipExpert在IC设计知识问答任务中表现出了高水平的专业知识。
更新时间: 2024-07-26 11:00:08
领域: cs.AR,cs.AI,cs.LG
Real Time Multi Organ Classification on Computed Tomography Images
Organ segmentation is a fundamental task in medical imaging since it is useful for many clinical automation pipelines. However, some tasks do not require full segmentation. Instead, a classifier can identify the selected organ without segmenting the entire volume. In this study, we demonstrate a classifier based method to obtain organ labels in real time by using a large context size with a sparse data sampling strategy. Although our method operates as an independent classifier at query locations, it can generate full segmentations by querying grid locations at any resolution, offering faster performance than segmentation algorithms. We compared our method with existing segmentation techniques, demonstrating its superior runtime potential for practical applications in medical imaging.
Updated: 2024-07-26 10:50:43
标题: 实时多器官在计算机断层扫描图像上的分类
摘要: 器官分割是医学影像中的基本任务,因为它对许多临床自动化流程非常有用。然而,有些任务并不需要完全分割。相反,分类器可以识别所选器官,而无需对整个体积进行分割。在本研究中,我们展示了一种基于分类器的方法,通过使用大的上下文尺寸和稀疏的数据采样策略,实时获取器官标签。尽管我们的方法在查询位置作为独立分类器运行,但它可以通过查询任何分辨率的网格位置生成完整的分割,比分割算法提供更快的性能。我们将我们的方法与现有的分割技术进行了比较,展示了其在医学影像实际应用中优越的运行时间潜力。
更新时间: 2024-07-26 10:50:43
领域: cs.CV,cs.AI,cs.LG
Adversarial Robustification via Text-to-Image Diffusion Models
Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as "adaptable" denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our experiments show that our data-free scheme applied to the pre-trained CLIP could improve the (provable) adversarial robustness of its diverse zero-shot classification derivatives (while maintaining their accuracy), significantly surpassing prior approaches that utilize the full training data. Not only for CLIP, we also demonstrate that our framework is easily applicable for robustifying other visual classifiers efficiently.
Updated: 2024-07-26 10:49:14
标题: 对抗性鲁棒性改进:基于文本到图像扩散模型
摘要: 敌对鲁棒性被传统认为是神经网络难以编码的一个具有挑战性的属性,需要大量的训练数据。然而,在最近采用现成模型的范式中,访问它们的训练数据通常是不可行或不切实际的,而大多数这样的模型最初并未针对敌对鲁棒性进行训练。在本文中,我们提出了一种可扩展且与模型无关的解决方案,实现敌对鲁棒性,而不使用任何数据。我们的直觉是将最近的文本到图像扩散模型视为可以优化以指定目标任务的“可适应”去噪器。基于此,我们提出:(a)启动一个去噪和分类流水线,提供针对敌对攻击的可证明保证,以及(b)利用从文本到图像模型生成的一些合成参考图像,实现新颖的适应方案。我们的实验表明,我们的无数据方案应用于预训练的CLIP能够提高其各种零样本分类派生物的(可证明的)敌对鲁棒性(同时保持其准确性),显著超越利用完整训练数据的先前方法。不仅适用于CLIP,我们还证明我们的框架可以轻松应用于有效地增强其他视觉分类器的鲁棒性。
更新时间: 2024-07-26 10:49:14
领域: cs.CV,cs.LG
Aspects of importance sampling in parameter selection for neural networks using ridgelet transform
The choice of parameters in neural networks is crucial in the performance, and an oracle distribution derived from the ridgelet transform enables us to obtain suitable initial parameters. In other words, the distribution of parameters is connected to the integral representation of target functions. The oracle distribution allows us to avoid the conventional backpropagation learning process; only a linear regression is enough to construct the neural network in simple cases. This study provides a new look at the oracle distributions and ridgelet transforms, i.e., an aspect of importance sampling. In addition, we propose extensions of the parameter sampling methods. We demonstrate the aspect of importance sampling and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.
Updated: 2024-07-26 10:45:27
标题: 使用Ridgelet变换在神经网络参数选择中重要性抽样的方面
摘要: 神经网络中参数的选择对性能至关重要,通过从ridgelet变换导出的oracle分布,我们能够获得合适的初始参数。换句话说,参数的分布与目标函数的积分表示相关。Oracle分布使我们能够避免传统的反向传播学习过程;在简单情况下,只需要进行线性回归就足以构建神经网络。这项研究提供了对oracle分布和ridgelet变换的新视角,即重要性采样的一个方面。此外,我们提出了参数采样方法的扩展。我们通过一维和高维示例展示了重要性采样的方面和所提出的采样算法;结果表明,权重参数的大小可能比截距参数更为关键。
更新时间: 2024-07-26 10:45:27
领域: cs.LG
Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects
Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.
Updated: 2024-07-26 10:37:29
标题: 通过将黑盒模型功能分解为可解释的预测效果实现可解释的机器学习
摘要: 机器学习(ML)在受欢迎度和重要性方面取得了显著增长。ML模型的高预测准确性通常是通过难以解释的复杂黑匣子架构实现的。这种可解释性问题一直妨碍了ML在医学、生态学和保险等领域的应用,其中对模型内部运作的理解对确保用户接受和公平性至关重要。对可解释性ML模型的需求推动了可解释机器学习(IML)领域的研究。在这里,我们提出了一种新的方法,用于对黑匣子预测进行功能分解,这被认为是IML的核心概念。我们的方法的思想是通过由简单子函数组成的替代模型来替换预测函数。类似于加法回归模型,这些函数提供了关于主要特征贡献及其相互作用方向和强度的见解。我们的方法基于一个被称为堆叠正交性的新概念,确保主要效应尽可能捕捉功能行为,并且不包含由高阶相互作用解释的信息。与早期的功能IML方法不同,它既不受外推影响,也不受隐藏特征相互作用的影响。为了计算子函数,我们提出了一种基于神经加法建模和高效的事后正交化程序的算法。
更新时间: 2024-07-26 10:37:29
领域: stat.ML,cs.LG
Fast and Reliable Probabilistic Reflectometry Inversion with Prior-Amortized Neural Posterior Estimation
Reconstructing the structure of thin films and multilayers from measurements of scattered X-rays or neutrons is key to progress in physics, chemistry, and biology. However, finding all structures compatible with reflectometry data is computationally prohibitive for standard algorithms, which typically results in unreliable analysis with only a single potential solution identified. We address this lack of reliability with a probabilistic deep learning method that identifies all realistic structures in seconds, setting new standards in reflectometry. Our method, Prior-Amortized Neural Posterior Estimation (PANPE), combines simulation-based inference with novel adaptive priors that inform the inference network about known structural properties and controllable experimental conditions. PANPE networks support key scenarios such as high-throughput sample characterization, real-time monitoring of evolving structures, or the co-refinement of several experimental data sets, and can be adapted to provide fast, reliable, and flexible inference across many other inverse problems.
Updated: 2024-07-26 10:29:16
标题: 快速可靠的具有先验摊销神经后验估计的概率反射测量反演
摘要: 从散射X射线或中子的测量中重建薄膜和多层结构对于物理学、化学和生物学的进展至关重要。然而,对于标准算法来说,找到与反射数据兼容的所有结构在计算上是不可行的,通常只能得出一个潜在解决方案,导致分析不可靠。我们通过一种概率深度学习方法来解决这种缺乏可靠性的问题,该方法能够在几秒钟内识别出所有真实的结构,为反射测量设立了新的标准。我们的方法,Prior-Amortized Neural Posterior Estimation (PANPE),将基于模拟的推断与新颖的自适应先验相结合,向推断网络提供关于已知结构属性和可控实验条件的信息。PANPE网络支持关键场景,例如高通量样品表征、实时监测不断发展的结构或多个实验数据集的协同精化,并可以适应提供快速、可靠和灵活的推断,解决许多其他反问题。
更新时间: 2024-07-26 10:29:16
领域: physics.app-ph,cond-mat.soft,cs.LG,stat.ML
Contrastive Learning of Asset Embeddings from Financial Time Series
Representation learning has emerged as a powerful paradigm for extracting valuable latent features from complex, high-dimensional data. In financial domains, learning informative representations for assets can be used for tasks like sector classification, and risk management. However, the complex and stochastic nature of financial markets poses unique challenges. We propose a novel contrastive learning framework to generate asset embeddings from financial time series data. Our approach leverages the similarity of asset returns over many subwindows to generate informative positive and negative samples, using a statistical sampling strategy based on hypothesis testing to address the noisy nature of financial data. We explore various contrastive loss functions that capture the relationships between assets in different ways to learn a discriminative representation space. Experiments on real-world datasets demonstrate the effectiveness of the learned asset embeddings on benchmark industry classification and portfolio optimization tasks. In each case our novel approaches significantly outperform existing baselines highlighting the potential for contrastive learning to capture meaningful and actionable relationships in financial data.
Updated: 2024-07-26 10:26:44
标题: 金融时间序列资产嵌入的对比学习
摘要: 表示学习已经成为一种从复杂、高维数据中提取有价值的潜在特征的强大范例。在金融领域,学习资产的信息性表示可以用于诸如行业分类和风险管理等任务。然而,金融市场的复杂和随机性提出了独特的挑战。我们提出了一种新颖的对比学习框架,用于从金融时间序列数据中生成资产嵌入。我们的方法利用资产收益在许多子窗口上的相似性来生成信息性的正负样本,使用基于假设检验的统计抽样策略来解决金融数据的噪声特性。我们探索了捕捉资产之间关系的各种对比损失函数,以不同方式学习一个具有区分性的表示空间。对真实世界数据集的实验表明,所学习的资产嵌入在基准行业分类和组合优化任务上的有效性。在每种情况下,我们的新方法明显优于现有基线,突显了对比学习在捕捉金融数据中有意义且可操作的关系的潜力。
更新时间: 2024-07-26 10:26:44
领域: cs.LG,q-fin.ST
Model Composition for Multimodal Large Language Models
Recent developments in Multimodal Large Language Models (MLLMs) have shown rapid progress, moving towards the goal of creating versatile MLLMs that understand inputs from various modalities. However, existing methods typically rely on joint training with paired multimodal instruction data, which is resource-intensive and challenging to extend to new modalities. In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters. Furthermore, we introduce DAMC to address parameter interference and mismatch issues during the merging process, thereby enhancing the model performance. To facilitate research in this area, we propose MCUB, a benchmark for assessing ability of MLLMs to understand inputs from diverse modalities. Experiments on this benchmark and four other multimodal understanding tasks show significant improvements over baselines, proving that model composition can create a versatile model capable of processing inputs from multiple modalities.
Updated: 2024-07-26 10:15:38
标题: 多模态大型语言模型的模型组合 (Model Composition for Multimodal Large Language Models)
摘要: 最近发展的多模态大型语言模型(MLLMs)显示出快速进展,朝着创建能够理解来自各种模态的输入的多功能MLLMs的目标迈进。然而,现有方法通常依赖于与成对的多模态指令数据的联合训练,这对资源要求高且难以扩展到新的模态。在本文中,我们通过对现有MLLMs的模型组合提出了一种新的范式,以创建一个保留每个原始模型的模态理解能力的新模型。我们的基本实现NaiveMC通过重复使用模态编码器和合并LLM参数来展示这种范式的有效性。此外,我们引入DAMC来解决在合并过程中的参数干扰和不匹配问题,从而提升模型性能。为了促进这一领域的研究,我们提出了MCUB,一个用于评估MLLMs理解来自多种模态输入能力的基准。对这个基准和其他四个多模态理解任务的实验显示,与基准相比有显著的改进,证明模型组合能够创建一个能够处理多模态输入的多功能模型。
更新时间: 2024-07-26 10:15:38
领域: cs.CV,cs.AI,cs.CL
Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.
Updated: 2024-07-26 10:11:46
标题: 分布式网络控制中的无线调度的深度强化学习
摘要: 我们考虑一个具有有限数量频道的全分布式无线网络控制系统(WNCS)的联合上行和下行调度问题。利用随机系统理论的元素,我们推导出了WNCS的充分稳定条件,该条件以控制和通信系统参数的形式陈述。一旦条件得到满足,就存在一个稳定所有WNCS植物的静止和确定性调度策略。通过分析和表示WNCS的逐步成本函数,将其表述为有限长度可数向量状态,我们将最佳传输调度问题制定为马尔科夫决策过程,并开发了基于深度强化学习(DRL)的框架来解决它。为了解决DRL中大动作空间的挑战,我们提出了用于DRL框架的新颖动作空间缩减和动作嵌入方法,可应用于各种算法,包括深度Q网络(DQN)、深度确定性策略梯度(DDPG)和双延迟深度确定性策略梯度(TD3)。数值结果表明,所提出的算法明显优于基准策略。
更新时间: 2024-07-26 10:11:46
领域: eess.SY,cs.AI,cs.IT,cs.SY,eess.SP,math.IT
Vulnerability Detection in Ethereum Smart Contracts via Machine Learning: A Qualitative Analysis
Smart contracts are central to a myriad of critical blockchain applications, from financial transactions to supply chain management. However, their adoption is hindered by security vulnerabilities that can result in significant financial losses. Most vulnerability detection tools and methods available nowadays leverage either static analysis methods or machine learning. Unfortunately, as valuable as they are, both approaches suffer from limitations that make them only partially effective. In this survey, we analyze the state of the art in machine-learning vulnerability detection for Ethereum smart contracts, by categorizing existing tools and methodologies, evaluating them, and highlighting their limitations. Our critical assessment unveils issues such as restricted vulnerability coverage and dataset construction flaws, providing us with new metrics to overcome the difficulties that restrain a sound comparison of existing solutions. Driven by our findings, we discuss best practices to enhance the accuracy, scope, and efficiency of vulnerability detection in smart contracts. Our guidelines address the known flaws while at the same time opening new avenues for research and development. By shedding light on current challenges and offering novel directions for improvement, we contribute to the advancement of secure smart contract development and blockchain technology as a whole.
Updated: 2024-07-26 10:09:44
标题: 通过机器学习在以太坊智能合约中检测脆弱性:定性分析
摘要: 智能合约是许多关键区块链应用的核心,涵盖了从金融交易到供应链管理等多个领域。然而,它们的采用受到安全漏洞的阻碍,这可能导致重大的财务损失。目前大多数漏洞检测工具和方法都利用静态分析方法或机器学习。然而,尽管它们非常有价值,但这两种方法都存在局限性,使它们只能部分有效。在这项调查中,我们通过对现有工具和方法进行分类、评估和突出其局限性,分析了以太坊智能合约中基于机器学习的漏洞检测的最新技术。我们的关键评估揭示了诸如受限的漏洞覆盖范围和数据集构建缺陷等问题,为我们提供了新的度量标准来克服阻碍对现有解决方案进行有效比较的困难。根据我们的发现,我们讨论了增强智能合约漏洞检测的准确性、范围和效率的最佳实践。我们的指南解决了已知的缺陷,同时为研究和开发开辟了新的方向。通过揭示当前挑战并提供改进的新方向,我们为安全智能合约开发和整个区块链技术的进步做出了贡献。
更新时间: 2024-07-26 10:09:44
领域: cs.CR,cs.LG
Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint
Large language models internalize enormous parametric knowledge during pre-training. Concurrently, realistic applications necessitate external contextual knowledge to aid models on the underlying tasks. This raises a crucial dilemma known as knowledge conflicts, where the contextual knowledge clashes with the However, existing decoding works are specialized in resolving knowledge conflicts and could inadvertently deteriorate performance in absence of conflicts. In this paper, we propose an adaptive decoding method, termed as contextual information-entropy constraint decoding (COIECD), to discern whether the knowledge conflicts occur and resolve them. It can improve the model's faithfulness to conflicting context, and simultaneously maintain high performance among non- Our experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets. Code is available.
Updated: 2024-07-26 10:00:52
标题: 通过上下文信息熵约束的自适应解码识别和解决知识冲突
摘要: 大型语言模型在预训练过程中内化了庞大的参数知识。与此同时,现实中的应用需要外部的上下文知识来帮助模型完成基础任务。这引发了一个被称为知识冲突的关键困境,即上下文知识与已有知识发生冲突。然而,现有的解码方法专注于解决知识冲突,但在没有冲突时可能会无意中降低性能。在本文中,我们提出了一种自适应解码方法,称为上下文信息熵约束解码(COIECD),用于识别知识冲突的发生并解决它们。它可以提高模型对冲突上下文的忠实度,同时在无冲突情况下保持高性能。我们的实验表明,COIECD在现实数据集中表现出强大的性能和稳健性。源代码可供使用。
更新时间: 2024-07-26 10:00:52
领域: cs.AI
Robust VAEs via Generating Process of Noise Augmented Data
Advancing defensive mechanisms against adversarial attacks in generative models is a critical research topic in machine learning. Our study focuses on a specific type of generative models - Variational Auto-Encoders (VAEs). Contrary to common beliefs and existing literature which suggest that noise injection towards training data can make models more robust, our preliminary experiments revealed that naive usage of noise augmentation technique did not substantially improve VAE robustness. In fact, it even degraded the quality of learned representations, making VAEs more susceptible to adversarial perturbations. This paper introduces a novel framework that enhances robustness by regularizing the latent space divergence between original and noise-augmented data. Through incorporating a paired probabilistic prior into the standard variational lower bound, our method significantly boosts defense against adversarial attacks. Our empirical evaluations demonstrate that this approach, termed Robust Augmented Variational Auto-ENcoder (RAVEN), yields superior performance in resisting adversarial inputs on widely-recognized benchmark datasets.
Updated: 2024-07-26 09:55:34
标题: 通过噪声增强数据的生成过程实现稳健的变分自编码器
摘要: 在生成模型中加强对抗性攻击的防御机制是机器学习中的一个关键研究课题。我们的研究重点放在一种特定类型的生成模型上 - 变分自动编码器(VAEs)。与普遍认为的以及现有文献中提出的通过向训练数据注入噪声可以使模型更加稳健的观点相反,我们的初步实验发现,对噪声增强技术的天真应用并没有实质性地提高VAE的稳健性。事实上,它甚至降低了所学到的表示的质量,使VAEs更容易受到对抗性扰动的影响。本文介绍了一个新颖的框架,通过正则化原始数据和噪声增强数据之间的潜变空间差异来增强稳健性。通过将一个配对的概率先验融入标准变分下界中,我们的方法显著提升了对抗性攻击的防御能力。我们的实证评估表明,这种方法,命名为Robust Augmented Variational Auto-Encoder(RAVEN),在广泛认可的基准数据集上抵抗对抗性输入方面表现出更好的性能。
更新时间: 2024-07-26 09:55:34
领域: cs.LG
Intersymbolic AI: Interlinking Symbolic AI and Subsymbolic AI
This perspective piece calls for the study of the new field of Intersymbolic AI, by which we mean the combination of symbolic AI, whose building blocks have inherent significance/meaning, with subsymbolic AI, whose entirety creates significance/effect despite the fact that individual building blocks escape meaning. Canonical kinds of symbolic AI are logic, games and planning. Canonical kinds of subsymbolic AI are (un)supervised machine and reinforcement learning. Intersymbolic AI interlinks the worlds of symbolic AI with its compositional symbolic significance and meaning and of subsymbolic AI with its summative significance or effect to enable culminations of insights from both worlds by going between and across symbolic AI insights with subsymbolic AI techniques that are being helped by symbolic AI principles. For example, Intersymbolic AI may start with symbolic AI to understand a dynamic system, continue with subsymbolic AI to learn its control, and end with symbolic AI to safely use the outcome of the learned subsymbolic AI controller in the dynamic system. The way Intersymbolic AI combines both symbolic and subsymbolic AI to increase the effectiveness of AI compared to either kind of AI alone is likened to the way that the combination of both conscious and subconscious thought increases the effectiveness of human thought compared to either kind of thought alone. Some successful contributions to the Intersymbolic AI paradigm are surveyed here but many more are considered possible by advancing Intersymbolic AI.
Updated: 2024-07-26 09:52:15
标题: Intersymbolic AI:连接符号AI和亚符号AI
摘要: 这篇观点文章呼吁研究新领域的符号-子符号人工智能,其中我们指的是具有固有意义/含义的符号人工智能与整体创造意义/效果的子符号人工智能的结合。典型的符号人工智能包括逻辑、游戏和规划。典型的子符号人工智能包括(非)监督机器学习和强化学习。符号-子符号人工智能将符号人工智能的符号性意义和含义与子符号人工智能的总体意义或效果相互关联,以便通过在符号人工智能见解之间以及通过符号人工智能原则的帮助下使用子符号人工智能技术的方式,从两个世界中的见解中获得总结。例如,符号-子符号人工智能可以从符号人工智能开始来理解一个动态系统,然后使用子符号人工智能来学习其控制,最后再使用符号人工智能来安全地使用学习到的子符号人工智能控制器的结果在动态系统中。符号-子符号人工智能将符号和子符号人工智能结合起来,以增加人工智能的效果,与任一种人工智能相比都更加有效,类似于意识和潜意识思维的结合增加了人类思维的效果,与任一种思维相比都更加有效。这里概述了一些成功的贡献符号-子符号人工智能范式的例子,但通过推进符号-子符号人工智能,认为还有更多可能性。
更新时间: 2024-07-26 09:52:15
领域: cs.AI,68T01, 68T05, 68T07, 68T27, 68T30, 03B70,I.2.0; I.2.3; I.2.4; I.2.6; I.2.8
SWIFT: Semantic Watermarking for Image Forgery Thwarting
This paper proposes a novel approach towards image authentication and tampering detection by using watermarking as a communication channel for semantic information. We modify the HiDDeN deep-learning watermarking architecture to embed and extract high-dimensional real vectors representing image captions. Our method improves significantly robustness on both malign and benign edits. We also introduce a local confidence metric correlated with Message Recovery Rate, enhancing the method's practical applicability. This approach bridges the gap between traditional watermarking and passive forensic methods, offering a robust solution for image integrity verification.
Updated: 2024-07-26 09:50:13
标题: SWIFT:语义水印技术用于防止图像篡改
摘要: 本文提出了一种新颖的图像认证和篡改检测方法,通过使用水印作为传递语义信息的通道。我们修改了HiDDeN深度学习水印架构,以嵌入和提取代表图像标题的高维实向量。我们的方法显著提高了对恶意和良性编辑的鲁棒性。我们还引入了与消息恢复率相关的局部置信度度量,增强了该方法的实际适用性。这种方法弥合了传统水印和被动法庭方法之间的差距,为图像完整性验证提供了强大的解决方案。
更新时间: 2024-07-26 09:50:13
领域: cs.CR,cs.AI,cs.CV,cs.MM
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Adversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We measure a set of characteristics of this distribution: the median, maximum, and minimum over the output probabilities, the entropy of the distribution, as well as the Kullback-Leibler and the Jensen-Shannon divergence with respect to the distributions of the subsequent time step. Then, by leveraging the characteristics observed for both benign and adversarial data, we apply binary classifiers, including simple threshold-based classification, ensembles of such classifiers, and neural networks. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the supreme performance of this approach, with a mean area under the receiver operating characteristic curve for distinguishing target adversarial examples against clean and noisy data of 99% and 97%, respectively. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.
Updated: 2024-07-26 09:49:42
标题: DistriBlock:通过利用输出分布特征识别对抗性音频样本
摘要: 对抗性攻击可以误导自动语音识别(ASR)系统预测任意目标文本,从而构成明显的安全威胁。为了防止这种攻击,我们提出了DistriBlock,一种高效的检测策略,适用于任何在每个时间步预测输出标记概率分布的ASR系统。我们测量了该分布的一组特征:输出概率的中位数、最大值和最小值,分布的熵,以及相对于后续时间步的分布的Kullback-Leibler和Jensen-Shannon散度。然后,通过利用对良性和对抗性数据观察到的特征,我们应用二元分类器,包括基于简单阈值的分类、这些分类器的集合以及神经网络。通过对不同最先进的ASR系统和语言数据集进行广泛分析,我们展示了该方法的卓越性能,用于区分目标对抗示例与干净和嘈杂数据的平均接收器操作特性曲线下面积分别为99%和97%。为了评估我们方法的鲁棒性,我们表明可以规避DistriBlock的自适应对抗性示例更加嘈杂,这使得它们更容易通过过滤检测,并为保持系统鲁棒性开辟了另一途径。
更新时间: 2024-07-26 09:49:42
领域: cs.SD,cs.CR,cs.LG,eess.AS
CardioLab: Laboratory Values Estimation from Electrocardiogram Features -- An Exploratory Study
Introduction: Laboratory value represents a cornerstone of medical diagnostics, but suffers from slow turnaround times, and high costs and only provides information about a single point in time. The continuous estimation of laboratory values from non-invasive data such as electrocardiogram (ECG) would therefore mark a significant frontier in healthcare monitoring. Despite its transformative potential, this domain remains relatively underexplored within the medical community. Methods: In this preliminary study, we used a publicly available dataset (MIMIC-IV-ECG) to investigate the feasibility of inferring laboratory values from ECG features and patient demographics using tree-based models (XGBoost). We define the prediction task as a binary prediction problem of predicting whether the lab value falls into low or high abnormalities. The model performance can then be assessed using AUROC. Results: Our findings demonstrate promising results in the estimation of laboratory values related to different organ systems based on a small yet comprehensive set of features. While further research and validation are warranted to fully assess the clinical utility and generalizability of ECG-based estimation in healthcare monitoring, our findings lay the groundwork for future investigations into approaches to laboratory value estimation using ECG data. Such advancements hold promise for revolutionizing predictive healthcare applications, offering faster, non-invasive, and more affordable means of patient monitoring.
Updated: 2024-07-26 09:40:30
标题: CardioLab:从心电图特征估计实验室数值 - 一项探索性研究
摘要: 介绍:实验室数值代表了医学诊断的基石,但受到回报时间慢、成本高以及仅提供单个时间点信息的限制。从非侵入性数据(如心电图)连续估计实验室数值将标志着医疗监测的重要前沿。尽管具有变革潜力,但这一领域在医学界仍相对未被充分探索。 方法:在这项初步研究中,我们使用公开可用的数据集(MIMIC-IV-ECG)利用基于树的模型(XGBoost)研究从心电图特征和患者人口统计数据推断实验室数值的可行性。我们将预测任务定义为预测实验室数值是否属于低或高异常的二元预测问题。然后可以使用AUROC评估模型性能。 结果:我们的研究结果展示了基于一小部分全面特征的实验室数值估计方面的有希望的结果。虽然进一步研究和验证需要全面评估心电图估计在医疗监测中的临床实用性和普适性,但我们的研究结果为未来探索利用心电图数据估计实验室数值的方法奠定了基础。这样的进展有望革新预测性医疗应用,提供更快速、非侵入性和更经济的患者监测手段。
更新时间: 2024-07-26 09:40:30
领域: eess.SP,cs.LG
When Meta-Learning Meets Online and Continual Learning: A Survey
Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a surge in research exploring the application of neural networks in other learning scenarios. One notable framework that has garnered significant attention is meta-learning. Often described as "learning to learn," meta-learning is a data-driven approach to optimize the learning algorithm. Other branches of interest are continual learning and online learning, both of which involve incrementally updating a model with streaming data. While these frameworks were initially developed independently, recent works have started investigating their combinations, proposing novel problem settings and learning algorithms. However, due to the elevated complexity and lack of unified terminology, discerning differences between the learning frameworks can be challenging even for experienced researchers. To facilitate a clear understanding, this paper provides a comprehensive survey that organizes various problem settings using consistent terminology and formal descriptions. By offering an overview of these learning paradigms, our work aims to foster further advancements in this promising area of research.
Updated: 2024-07-26 09:39:01
标题: 当元学习遇见在线和持续学习:一项调查
摘要: 在过去的十年中,深度神经网络已经展示出了显著的成功,使用的训练方案涉及在大量数据集上进行小批量随机梯度下降。在这一成就的基础上,研究人员开始探索神经网络在其他学习场景中的应用,其中一个备受关注的框架是元学习。元学习通常被描述为“学会学习”,是一种数据驱动的方法,用于优化学习算法。其他感兴趣的领域包括持续学习和在线学习,两者都涉及使用流数据逐渐更新模型。虽然这些框架最初是独立开发的,但最近的研究开始探讨它们的结合,提出新颖的问题设置和学习算法。然而,由于复杂性提高和统一术语的缺乏,即使是经验丰富的研究人员也可能难以区分学习框架之间的差异。为了促进清晰的理解,本文提供了一份全面的调查,使用一致的术语和形式化描述组织各种问题设置。通过概述这些学习范式,我们的工作旨在促进这一有前途的研究领域的进一步发展。
更新时间: 2024-07-26 09:39:01
领域: cs.LG,stat.ML
Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions
Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of full-plane service coverage. In this paper, we intend to solve active beamforming of the base station and passive beamforming of STAR-RISs, aiming for maximizing the energy efficiency constrained by hardware limitation of STAR-RISs. Furthermore, we investigate the impact of the on-off state of STAR-RIS elements on energy efficiency. To tackle the complex problem, a Multi-Agent Global and locAl deep Reinforcement learning (MAGAR) algorithm is designed. The global agent elevates the collaboration among local agents, which focus on individual learning. In numerical results, we observe the significant improvement of MAGAR compared to the other benchmarks, including Q-learning, multi-agent deep Q network (DQN) with golbal reward, and multi-agent DQN with local rewards. Moreover, the proposed architecture of multi-hop STAR-RISs achieves the highest energy efficiency compared to mode switching based STAR-RISs, conventional RISs and deployment without RISs or STAR-RISs.
Updated: 2024-07-26 09:35:50
标题: 多智能体深度强化学习用于能效多跳STAR-RIS辅助传输
摘要: 同时传输和反射可重构智能表面(STAR-RIS)为无线通信的覆盖范围提供了一种有前途的方式。然而,单个STAR-RIS的局限性激发了我们将多跳传输的概念整合到现有研究中关注的RIS上。因此,我们提出了多跳STAR-RIS的新颖架构,以实现更广泛的全平面服务覆盖范围。在本文中,我们打算解决基站的主动波束成形和STAR-RIS的被动波束成形问题,旨在通过STAR-RIS的硬件限制来最大化能效。此外,我们研究了STAR-RIS元素开关状态对能效的影响。为了解决这一复杂问题,设计了一种多智能体全局和局部深度强化学习(MAGAR)算法。全局智能体提升了局部智能体之间的协作,这些智能体专注于个体学习。在数值结果中,我们观察到MAGAR相对于其他基准方法(包括Q学习、具有全局奖励的多智能体深度Q网络(DQN)和具有局部奖励的多智能体DQN)有显著的改进。此外,所提出的多跳STAR-RIS架构相对于基于模式切换的STAR-RIS、传统RIS以及没有RIS或STAR-RIS的部署实现了最高的能效。
更新时间: 2024-07-26 09:35:50
领域: cs.LG,eess.SP
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
This paper tackles a key issue in the interpretation of scientific figures: the fine-grained alignment of text and figures. It advances beyond prior research that primarily dealt with straightforward, data-driven visualizations such as bar and pie charts and only offered a basic understanding of diagrams through captioning and classification. We introduce a novel task, Figure Integrity Verification, designed to evaluate the precision of technologies in aligning textual knowledge with visual elements in scientific figures. To support this, we develop a semi-automated method for constructing a large-scale dataset, Figure-seg, specifically designed for this task. Additionally, we propose an innovative framework, Every Part Matters (EPM), which leverages Multimodal Large Language Models (MLLMs) to not only incrementally improve the alignment and verification of text-figure integrity but also enhance integrity through analogical reasoning. Our comprehensive experiments show that these innovations substantially improve upon existing methods, allowing for more precise and thorough analysis of complex scientific figures. This progress not only enhances our understanding of multimodal technologies but also stimulates further research and practical applications across fields requiring the accurate interpretation of complex visual data.
Updated: 2024-07-26 09:35:36
标题: 每个部分都重要:基于多模态大语言模型的科学图表完整性验证
摘要: 这篇论文解决了科学图形解释中的一个关键问题:文本和图形的精细对齐。它超越了以往主要处理直观、数据驱动的可视化,如条形图和饼图,并仅通过标题和分类提供对图表的基本理解的研究。我们引入了一个新颖的任务,图形完整性验证,旨在评估技术在将文本知识与科学图形中的视觉元素对齐时的精度。为了支持这一点,我们开发了一种半自动化方法,用于构建一个专门为这一任务设计的大规模数据集Figure-seg。此外,我们提出了一个创新框架Every Part Matters (EPM),利用多模态大语言模型(MLLMs),不仅逐步改进文本-图形完整性的对齐和验证,还通过类比推理增强完整性。我们的全面实验表明,这些创新大大改进了现有方法,使得对复杂科学图形的分析更加精确和透彻。这一进展不仅提升了我们对多模态技术的理解,还激发了跨领域需要准确解释复杂视觉数据的进一步研究和实际应用。
更新时间: 2024-07-26 09:35:36
领域: cs.CL,cs.AI,cs.CV,cs.DL,cs.MM
Topology Optimization of Random Memristors for Input-Aware Dynamic SNN
There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run-time reconfigurability, and hardware architecture. To address these fundamental challenges, we introduce pruning optimization for input-aware dynamic memristive spiking neural network (PRIME). Signal representation-wise, PRIME employs leaky integrate-and-fire neurons to emulate the brain's inherent spiking mechanism. Drawing inspiration from the brain's structural plasticity, PRIME optimizes the topology of a random memristive spiking neural network without expensive memristor conductance fine-tuning. For runtime reconfigurability, inspired by the brain's dynamic adjustment of computational depth, PRIME employs an input-aware dynamic early stop policy to minimize latency during inference, thereby boosting energy efficiency without compromising performance. Architecture-wise, PRIME leverages memristive in-memory computing, mirroring the brain and mitigating the von Neumann bottleneck. We validated our system using a 40 nm 256 Kb memristor-based in-memory computing macro on neuromorphic image classification and image inpainting. Our results demonstrate the classification accuracy and Inception Score are comparable to the software baseline, while achieving maximal 62.50-fold improvements in energy efficiency, and maximal 77.0% computational load savings. The system also exhibits robustness against stochastic synaptic noise of analogue memristors. Our software-hardware co-designed model paves the way to future brain-inspired neuromorphic computing with brain-like energy efficiency and adaptivity.
Updated: 2024-07-26 09:35:02
标题: 随机膜电阻器的拓扑优化对输入感知动态SNN
摘要: 机器学习领域有着前所未有的发展,最近大型语言模型和世界模拟器等人工神经网络在数字计算机上运行的例子不胜枚举。然而,它们仍无法像人类大脑那样在能源效率和对不同难度输入的灵活适应性方面达到同等水平,这是由于信号表示、优化、运行时重构性和硬件架构的差异所致。为了应对这些基本挑战,我们引入了面向输入感知动态忆阻脉冲神经网络(PRIME)的修剪优化。在信号表示方面,PRIME使用漏电积分-发射神经元来模拟大脑固有的脉冲机制。受到大脑结构可塑性的启发,PRIME优化了随机的动态动态记忆脉冲神经网络的拓扑结构,而无需昂贵的记忆脉冲导电性微调。在运行时的可重构性方面,受到大脑动态调整计算深度的启发,PRIME采用了面向输入感知的动态提前停止策略,以在推理过程中最小化延迟,从而提高能源效率而不影响性能。在架构方面,PRIME利用记忆脉冲内存计算,模拟大脑并缓解冯·诺伊曼瓶颈。我们使用40纳米256 Kb基于忆阻器的内存计算宏对神经形态图像分类和图像修复进行了验证。我们的结果表明,分类准确性和Inception分数与软件基准相当,同时在能源效率方面实现了最大62.50倍的改进,并最大程度地节省了77.0%的计算负载。该系统还表现出对模拟型记忆脉冲的随机突触噪声的稳健性。我们的软硬件共同设计的模型为未来的大脑启发型神经形态计算铺平了道路,具备类似大脑的能源效率和适应性。
更新时间: 2024-07-26 09:35:02
领域: cs.ET,cs.AI,cs.NE
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance. To solve this problem, the mainstream method developed an effective thresholding strategy to generate accurate pseudo-labels. Unfortunately, the method neglected the quality of model predictions and its potential impact on pseudo-labeling performance. In this paper, we propose a dual-perspective method to generate high-quality pseudo-labels. To improve the quality of model predictions, we perform dual-decoupling to boost the learning of correlative and discriminative features, while refining the generation and utilization of pseudo-labels. To obtain proper class-wise thresholds, we propose the metric-adaptive thresholding strategy to estimate the thresholds, which maximize the pseudo-label performance for a given metric on labeled data. Experiments on multiple benchmark datasets show the proposed method can achieve the state-of-the-art performance and outperform the comparative methods with a significant margin.
Updated: 2024-07-26 09:33:53
标题: 双重解耦学习和度量自适应阈值用于半监督多标签学习
摘要: 半监督多标签学习(SSMLL)是一个强大的框架,利用未标记数据来降低收集精确多标签注释的昂贵成本。与半监督学习不同,由于实例中包含多个语义,无法在SSMLL中选择最可能的标签作为伪标签。为了解决这个问题,主流方法开发了一种有效的阈值策略来生成准确的伪标签。不幸的是,该方法忽略了模型预测的质量及其对伪标记性能的潜在影响。在本文中,我们提出了一种双重视角方法来生成高质量的伪标签。为了提高模型预测的质量,我们进行双重解耦以提升相关和辨别特征的学习,同时完善伪标签的生成和利用。为了获得适当的类别阈值,我们提出了度量自适应阈值策略来估计阈值,从而最大化标记数据上给定度量的伪标签性能。在多个基准数据集上的实验证明,所提出的方法可以实现最先进的性能,并以显著优势超过比较方法。
更新时间: 2024-07-26 09:33:53
领域: cs.LG
Denoising Lévy Probabilistic Models
Investigating noise distribution beyond Gaussian in diffusion generative models is an open problem. The Gaussian case has seen success experimentally and theoretically, fitting a unified SDE framework for score-based and denoising formulations. Recent studies suggest heavy-tailed noise distributions can address mode collapse and manage datasets with class imbalance, heavy tails, or outliers. Yoon et al. (NeurIPS 2023) introduced the L\'evy-Ito model (LIM), extending the SDE framework to heavy-tailed SDEs with $\alpha$-stable noise. Despite its theoretical elegance and performance gains, LIM's complex mathematics may limit its accessibility and broader adoption. This study takes a simpler approach by extending the denoising diffusion probabilistic model (DDPM) with $\alpha$-stable noise, creating the denoising L\'evy probabilistic model (DLPM). Using elementary proof techniques, we show DLPM reduces to running vanilla DDPM with minimal changes, allowing the use of existing implementations with minimal changes. DLPM and LIM have different training algorithms and, unlike the Gaussian case, they admit different backward processes and sampling algorithms. Our experiments demonstrate that DLPM achieves better coverage of data distribution tail, improved generation of unbalanced datasets, and faster computation times with fewer backward steps.
Updated: 2024-07-26 09:00:18
标题: 去噪Lévy 概率模型
摘要: 在扩散生成模型中研究超过高斯分布的噪声分布是一个未解决的问题。高斯情况在实验和理论上取得了成功,适合为基于评分和去噪公式提供统一的SDE框架。最近的研究表明,重尾噪声分布可以解决模态坍塌问题,并管理具有类别不平衡、重尾或异常值的数据集。Yoon等人(NeurIPS 2023)引入了L\'evy-Ito模型(LIM),将SDE框架扩展到具有$\alpha$-稳定噪声的重尾SDE。尽管LIM具有理论上的优雅和性能收益,但其复杂的数学可能限制其可访问性和更广泛的采用。本研究通过将去噪扩散概率模型(DDPM)与$\alpha$-稳定噪声扩展,创建了去噪L\'evy概率模型(DLPM)。使用基本的证明技术,我们展示DLPM减少到运行普通DDPM并进行最小更改,允许使用现有实现进行最小更改。DLPM和LIM具有不同的训练算法,并且与高斯情况不同,它们可以采用不同的反向过程和采样算法。我们的实验表明,DLPM实现了更好的数据分布尾部覆盖率,改进了不平衡数据集的生成,并且在更少的反向步骤下具有更快的计算时间。
更新时间: 2024-07-26 09:00:18
领域: cs.LG,stat.ML
Using GPT-4 to guide causal machine learning
Since its introduction to the public, ChatGPT has had an unprecedented impact. While some experts praised AI advancements and highlighted their potential risks, others have been critical about the accuracy and usefulness of Large Language Models (LLMs). In this paper, we are interested in the ability of LLMs to identify causal relationships. We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions, by isolating its ability to infer causal relationships based solely on the variable labels without being given any context, demonstrating the minimum level of effectiveness one can expect when it is provided with label-only information. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories, closely followed by knowledge graphs constructed by domain experts, with causal Machine Learning (ML) far behind. We use these results to highlight the important limitation of causal ML, which often produces causal graphs that violate common sense, affecting trust in them. However, we show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts, compared to structures learnt by causal ML alone. Overall, our findings suggest that despite GPT-4 not being explicitly designed to reason causally, it can still be a valuable tool for causal representation, as it improves the causal discovery process of causal ML algorithms that are designed to do just that.
Updated: 2024-07-26 08:59:26
标题: 使用GPT-4指导因果机器学习
摘要: 自ChatGPT向公众介绍以来,它产生了前所未有的影响。一些专家赞扬了人工智能的进步并强调了潜在风险,而其他人则对大型语言模型(LLMs)的准确性和实用性持批评态度。在本文中,我们对LLMs识别因果关系的能力感兴趣。我们专注于广为人知的GPT-4(Turbo)并评估其在最严格条件下的表现,通过仅基于变量标签而不提供任何上下文的情况下隔离其推断因果关系的能力,展示当仅提供标签信息时人们可以期望的最低有效性水平。我们发现问卷参与者认为GPT-4图在评估范畴中最准确,紧随其后的是领域专家构建的知识图,因果机器学习(ML)远远落后。我们利用这些结果来强调因果ML的重要局限性,因果图常常违反常识,影响人们对其的信任。然而,我们发现将GPT-4与因果ML配对可以克服这一限制,导致从真实数据中学到的图形结构更接近领域专家确定的图形结构,相比仅由因果ML学习的结构。总的来说,我们的发现表明,尽管GPT-4并非明确设计用于推理因果关系,但它仍然可以成为因果表示的有价值工具,因为它改进了专门设计用于进行因果发现的因果ML算法的过程。
更新时间: 2024-07-26 08:59:26
领域: cs.AI,cs.LG
A data balancing approach designing of an expert system for Heart Disease Prediction
Heart disease is a major global health concern that results in millions of deaths annually. Prevention and effective treatment of heart-related problems depend heavily on early detection and accurate prediction. It was previously predicted accurately with machine learning methods. This innovative development in healthcare has the power to transform preventative care and save a great deal of lives. The study starts with a thorough assessment of the literature that covers a wide range of topics, including pre-processing techniques, performance evaluation measures, datasets used in heart disease research, predictive modeling strategies, diagnostic methodologies, and current issues in the field. Building on these fundamental understandings, the background section describes the particular actions conducted in this investigation, such as the description of the dataset, data pre-treatment techniques, label encoding, feature selection methodology, algorithm selection tactics, and stringent performance evaluation techniques.The results indicate that ensemble methods, particularly random forests, outperformed individual classifiers in predicting heart disease. Key predictors identified included hypertension, cholesterol levels, smoking status, and physical inactivity. The Decision Tree and Random Forest model achieved an accuracy of 99.83%. This work demonstrates how machine learning models, particularly ensemble approaches, can increase the precision of heart disease prediction. In comparison to conventional techniques, the models offer a more reliable risk assessment since they integrate a wide range of variables and sophisticated algorithms. The results open the door to tailored healthcare treatments that facilitate early identification and treatment of cardiac disease.
Updated: 2024-07-26 08:56:13
标题: 一个用于心脏病预测的专家系统设计的数据平衡方法
摘要: 心脏病是一个导致数百万人死亡的重大全球健康问题。预防和有效治疗与心脏有关的问题在很大程度上取决于早期检测和准确预测。先前曾使用机器学习方法进行准确预测。这种创新发展在医疗保健领域具有改变预防护理和拯救大量生命的潜力。该研究从对文献的全面评估开始,涵盖了广泛的主题,包括数据预处理技术、性能评估指标、用于心脏病研究的数据集、预测建模策略、诊断方法学和该领域的当前问题。在这些基本理解的基础上,背景部分描述了本研究中进行的特定操作,如数据集描述、数据预处理技术、标签编码、特征选择方法论、算法选择策略和严格的性能评估技术。结果表明,集成方法,特别是随机森林,在预测心脏病方面优于个别分类器。确定的关键预测因子包括高血压、胆固醇水平、吸烟状况和体力活动不足。决策树和随机森林模型实现了99.83%的准确率。这项工作展示了机器学习模型,特别是集成方法,如何提高心脏病预测的精度。与传统技术相比,这些模型提供了更可靠的风险评估,因为它们整合了广泛的变量和复杂的算法。结果为提供定制的医疗治疗方案打开了大门,促进早期识别和治疗心脏疾病。
更新时间: 2024-07-26 08:56:13
领域: cs.LG
Climbing the Complexity Ladder with Expressive Attention
Attention involves comparing query and key vectors in terms of a scalar product, $\mathbf{Q}^T\mathbf{K}$, together with a subsequent softmax normalization. Classicaly, parallel/orthogonal/antiparallel queries and keys lead to large/intermediate/small attention weights. Here we study expressive attention (EA), which is based on $(\mathbf{Q}^T\mathbf{K})^2$, the squared dot product. In this case attention is enhanced when query and key are either parallel or antiparallel, and suppressed for orthogonal configurations. For a series of autoregressive prediction tasks, we find that EA performs at least as well as the standard mechanism, dot-product attention (DPA). Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100\% performance for a range of complexity levels not accessible to DPA.
Updated: 2024-07-26 08:41:58
标题: 用富有表现力的注意力攀登复杂度阶梯
摘要: 注意力涉及将查询向量和键向量进行标量积$\mathbf{Q}^T\mathbf{K}$比较,然后进行softmax归一化。经典的并行/正交/反向查询和键导致大/中等/小的注意力权重。在这里,我们研究了表达式注意力(EA),它基于$(\mathbf{Q}^T\mathbf{K})^2$,即平方点积。在这种情况下,当查询和键是并行或反向时,注意力得到增强,并且在正交配置下被抑制。对于一系列自回归预测任务,我们发现EA的性能至少与标准机制点积注意力(DPA)一样好。随着任务复杂度的增加,观察到EA比DPA表现更好,并且在多任务设置中也成立。对于给定的模型大小,EA能够在一系列DPA无法访问的复杂性水平上实现100\%的性能。
更新时间: 2024-07-26 08:41:58
领域: cs.LG,cs.AI
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet. However, this reliance poses privacy risks, as hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information. Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection. However, they are designed for unimodal classification, which remains largely unexplored in MCL. We first explore this context by evaluating the performance of existing methods on image-caption pairs, and they do not generalize effectively to multimodal data and exhibit limited impact to build shortcuts due to the lack of labels and the dispersion of pairs in MCL. In this paper, we propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples. It extends the Error-Minimization (EM) framework to optimize both image noise and an additional text trigger, thereby enlarging the optimized space and effectively misleading the model to learn the shortcut between the noise features and the text trigger. Specifically, we adopt projected gradient descent to solve the noise minimization problem and use HotFlip to approximate the gradient and replace words to find the optimal text trigger. Extensive experiments demonstrate the effectiveness of MEM, with post-protection retrieval results nearly half of random guessing, and its high transferability across different models. Our code is available on the https://github.com/thinwayliu/Multimodal-Unlearnable-Examples
Updated: 2024-07-26 08:39:19
标题: 多模态不可学习示例:保护数据免受多模态对比学习的影响
摘要: 多模态对比学习(MCL)已经在从互联网中获取的数百万图像-标题对中学习的零样本分类中取得了显著进展。然而,这种依赖性带来了隐私风险,因为黑客可能会未经授权地利用图像-文本数据进行模型训练,可能包括个人和隐私敏感信息。最近的研究提出通过向训练图像添加难以察觉的扰动来生成不可学习的示例,以建立保护的快捷方式。然而,它们是为单模态分类而设计的,在MCL中仍然很少被探索。我们首先通过评估现有方法在图像-标题对上的性能来探索这种背景,它们对多模态数据的泛化效果不佳,并且由于MCL中标签的缺乏和对偶的分散,对构建快捷方式的影响有限。在本文中,我们提出了一种新颖的优化过程Multi-step Error Minimization(MEM),用于生成多模态不可学习的示例。它将误差最小化(EM)框架扩展到同时优化图像噪声和额外的文本触发器,从而扩大了优化空间并有效地误导模型学习噪声特征和文本触发器之间的快捷方式。具体来说,我们采用投影梯度下降来解决噪声最小化问题,并使用HotFlip来近似梯度并替换单词以找到最佳文本触发器。大量实验证明了MEM的有效性,后保护检索结果几乎是随机猜测的一半,并且在不同模型之间具有很高的可转移性。我们的代码可以在 https://github.com/thinwayliu/Multimodal-Unlearnable-Examples 上找到。
更新时间: 2024-07-26 08:39:19
领域: cs.MM,cs.CR
Reinforcement Learning for Sustainable Energy: A Survey
The transition to sustainable energy is a key challenge of our time, requiring modifications in the entire pipeline of energy production, storage, transmission, and consumption. At every stage, new sequential decision-making challenges emerge, ranging from the operation of wind farms to the management of electrical grids or the scheduling of electric vehicle charging stations. All such problems are well suited for reinforcement learning, the branch of machine learning that learns behavior from data. Therefore, numerous studies have explored the use of reinforcement learning for sustainable energy. This paper surveys this literature with the intention of bridging both the underlying research communities: energy and machine learning. After a brief introduction of both fields, we systematically list relevant sustainability challenges, how they can be modeled as a reinforcement learning problem, and what solution approaches currently exist in the literature. Afterwards, we zoom out and identify overarching reinforcement learning themes that appear throughout sustainability, such as multi-agent, offline, and safe reinforcement learning. Lastly, we also cover standardization of environments, which will be crucial for connecting both research fields, and highlight potential directions for future work. In summary, this survey provides an extensive overview of reinforcement learning methods for sustainable energy, which may play a vital role in the energy transition.
Updated: 2024-07-26 08:37:14
标题: 可持续能源的强化学习:一项调查
摘要: 可持续能源转型是我们时代的一个关键挑战,需要对整个能源生产、储存、传输和消费的流程进行修改。在每个阶段,都会出现新的顺序决策挑战,范围从风力发电场的运行到电网管理或电动车充电站的调度。所有这些问题都非常适合强化学习,这是一种从数据中学习行为的机器学习分支。因此,许多研究已经探讨了利用强化学习来解决可持续能源问题。本文调查了这一文献,旨在搭建能源和机器学习两个基础研究领域之间的桥梁。在简要介绍两个领域后,我们系统地列出了相关的可持续性挑战,如何将其建模为强化学习问题以及文献中目前存在的解决方法。然后,我们放大视野,识别出贯穿可持续性领域的强化学习主题,例如多智能体、离线和安全强化学习。最后,我们还涵盖了环境标准化,这对连接两个研究领域将至关重要,并强调了未来工作的潜在方向。总之,本调查提供了关于可持续能源强化学习方法的广泛概述,这可能在能源转型中发挥关键作用。
更新时间: 2024-07-26 08:37:14
领域: cs.LG,cs.AI,cs.CY,cs.SY,eess.SY,stat.ML
Using Large Language Models for the Interpretation of Building Regulations
Compliance checking is an essential part of a construction project. The recent rapid uptake of building information models (BIM) in the construction industry has created more opportunities for automated compliance checking (ACC). BIM enables sharing of digital building design data that can be used for compliance checking with legal requirements, which are conventionally conveyed in natural language and not intended for machine processing. Creating a computable representation of legal requirements suitable for ACC is complex, costly, and time-consuming. Large language models (LLMs) such as the generative pre-trained transformers (GPT), GPT-3.5 and GPT-4, powering OpenAI's ChatGPT, can generate logically coherent text and source code responding to user prompts. This capability could be used to automate the conversion of building regulations into a semantic and computable representation. This paper evaluates the performance of LLMs in translating building regulations into LegalRuleML in a few-shot learning setup. By providing GPT-3.5 with only a few example translations, it can learn the basic structure of the format. Using a system prompt, we further specify the LegalRuleML representation and explore the existence of expert domain knowledge in the model. Such domain knowledge might be ingrained in GPT-3.5 through the broad pre-training but needs to be brought forth by careful contextualisation. Finally, we investigate whether strategies such as chain-of-thought reasoning and self-consistency could apply to this use case. As LLMs become more sophisticated, the increased common sense, logical coherence, and means to domain adaptation can significantly support ACC, leading to more efficient and effective checking processes.
Updated: 2024-07-26 08:30:47
标题: 使用大型语言模型解释建筑法规
摘要: 合规检查是建筑项目的重要部分。近年来,建筑信息模型(BIM)在建筑行业的快速普及为自动合规检查(ACC)创造了更多机会。BIM使数字建筑设计数据可以用于符合法律要求的检查,这些要求通常用自然语言传达,不适合机器处理。为ACC创建适合的法律要求的可计算表示是复杂、昂贵和耗时的。大型语言模型(LLMs)如生成式预训练变压器(GPT)、GPT-3.5和GPT-4,驱动OpenAI的ChatGPT,可以生成逻辑连贯的文本和源代码响应用户提示。这种能力可以用于自动将建筑法规转换为语义和可计算表示。本文评估了LLMs在将建筑法规翻译为LegalRuleML的几次学习设置中的性能。通过仅提供少量示例翻译,GPT-3.5可以学习格式的基本结构。使用系统提示,我们进一步指定LegalRuleML表示并探索模型中是否存在专业领域知识。这种领域知识可能通过广泛的预训练内化在GPT-3.5中,但需要通过仔细的情境化来呈现。最后,我们调查了像思维链推理和自洽性这样的策略是否适用于这种用例。随着LLMs变得更加复杂,增加的常识、逻辑连贯性和领域适应手段可以显著支持ACC,从而导致更高效和有效的检查过程。
更新时间: 2024-07-26 08:30:47
领域: cs.CL,cs.AI
A Systematic Review of Aspect-based Sentiment Analysis: Domains, Methods, and Trends
Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment analysis that identifies aspects and their associated opinions from a given text. With the surge of digital opinionated text data, ABSA gained increasing popularity for its ability to mine more detailed and targeted insights. Many review papers on ABSA subtasks and solution methodologies exist, however, few focus on trends over time or systemic issues relating to research application domains, datasets, and solution approaches. To fill the gap, this paper presents a Systematic Literature Review (SLR) of ABSA studies with a focus on trends and high-level relationships among these fundamental components. This review is one of the largest SLRs on ABSA. To our knowledge, it is also the first to systematically examine the interrelations among ABSA research and data distribution across domains, as well as trends in solution paradigms and approaches. Our sample includes 727 primary studies screened from 8550 search results without time constraints via an innovative automatic filtering process. Our quantitative analysis not only identifies trends in nearly two decades of ABSA research development but also unveils a systemic lack of dataset and domain diversity as well as domain mismatch that may hinder the development of future ABSA research. We discuss these findings and their implications and propose suggestions for future research.
Updated: 2024-07-26 08:22:07
标题: 一个关于基于方面的情感分析的系统性综述:领域、方法和趋势
摘要: 基于方面的情感分析(ABSA)是一种细粒度的情感分析类型,它从给定的文本中识别方面及其相关观点。随着数字化观点文本数据的激增,ABSA因其能够挖掘更详细和有针对性的见解而日益受到欢迎。存在许多关于ABSA子任务和解决方法学的综述论文,然而,很少有关注研究应用领域、数据集和解决方法之间的时间趋势或系统性问题。为了填补这一空白,本文提出了一项关于ABSA研究的系统文献综述,重点关注这些基本组成部分之间的趋势和高层次关系。这项综述是关于ABSA的最大规模之一。据我们所知,它也是第一个系统地检查ABSA研究与跨领域数据分布之间的相互关系,以及解决范式和方法的趋势的综述。我们的样本包括从8550个搜索结果中筛选出的727篇原始研究,通过一种创新的自动过滤过程进行,没有时间限制。我们的定量分析不仅揭示了近20年ABSA研究发展的趋势,还揭示了数据集和领域多样性的系统性缺乏,以及可能阻碍未来ABSA研究发展的领域不匹配。我们讨论这些发现及其影响,并提出了未来研究的建议。
更新时间: 2024-07-26 08:22:07
领域: cs.CL,cs.LG
MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
Background: Benchmarking medical decision support algorithms often struggles due to limited access to datasets, narrow prediction tasks, and restricted input modalities. These limitations affect their clinical relevance and performance in high-stakes areas like emergency care, complicating replication, validation, and improvement of benchmarks. Methods: We introduce a dataset based on MIMIC-IV, benchmarking protocol, and initial results for evaluating multimodal decision support in the emergency department (ED). We use diverse data modalities from the first 1.5 hours of patient arrival, including demographics, biometrics, vital signs, lab values, and electrocardiogram waveforms. We analyze 1443 clinical labels across two contexts: predicting diagnoses with ICD-10 codes and forecasting patient deterioration. Results: Our multimodal diagnostic model achieves an AUROC score over 0.8 in a statistically significant manner for 357 out of 1428 conditions, including cardiac issues like myocardial infarction and non-cardiac conditions such as renal disease and diabetes. The deterioration model scores above 0.8 in a statistically significant manner for 13 out of 15 targets, including critical events like cardiac arrest and mechanical ventilation, ICU admission as well as short- and long-term mortality. Incorporating raw waveform data significantly improves model performance, which represents one of the first robust demonstrations of this effect. Conclusions: This study highlights the uniqueness of our dataset, which encompasses a wide range of clinical tasks and utilizes a comprehensive set of features collected early during the emergency after arriving at the ED. The strong performance, as evidenced by high AUROC scores across diagnostic and deterioration targets, underscores the potential of our approach to revolutionize decision-making in acute and emergency medicine.
Updated: 2024-07-26 08:18:27
标题: MDS-ED:急诊科的多模式决策支持——急诊医学诊断和恶化预测的基准数据集
摘要: 背景:由于数据集的有限获取、狭窄的预测任务和受限的输入形式,医疗决策支持算法的基准测试往往难以开展。这些限制影响了它们在高风险领域如急救护理中的临床相关性和性能,使得基准测试的复制、验证和改进变得复杂。 方法:我们介绍了基于MIMIC-IV的数据集、基准测试协议和初步结果,用于评估急诊科(ED)中的多模态决策支持。我们使用患者到达后最初1.5小时内的多种数据形式,包括人口统计学、生物测定、生命体征、实验室值和心电图波形。我们在两个背景下分析了1443个临床标签:预测具有ICD-10编码的诊断和预测患者恶化。 结果:我们的多模态诊断模型在357种1428种状况中以统计学显著的方式获得了超过0.8的AUROC分数,包括心脏问题如心肌梗死以及非心脏状况如肾病和糖尿病。恶化模型在15个目标中以统计学显著的方式获得了13个超过0.8的分数,包括心脏骤停和机械通气等关键事件,ICU入院以及短期和长期死亡。将原始波形数据纳入显著提高了模型性能,这代表了这种效果的首次强有力的证明。 结论:本研究突显了我们数据集的独特性,包括广泛的临床任务和在到达急诊科后早期收集的全面特征。高AUROC分数在诊断和恶化目标上的表现强劲,强调了我们方法在改变急诊医学和急救医学中的决策制定潜力。
更新时间: 2024-07-26 08:18:27
领域: cs.LG,eess.SP
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
The paper presents the performance results of Reactor Mk.1, ARCs flagship large language model, through a benchmarking process analysis. The model utilizes the Lychee AI engine and possesses less than 100 billion parameters, resulting in a combination of efficiency and potency. The Reactor Mk.1 outperformed models such as GPT-4o, Claude Opus, and Llama 3, with achieved scores of 92% on the MMLU dataset, 91% on HumanEval dataset, and 88% on BBH dataset. It excels in both managing difficult jobs and reasoning, establishing as a prominent AI solution in the present cutting-edge AI technology.
Updated: 2024-07-26 08:03:32
标题: 反应堆Mk.1性能:MMLU、HumanEval和BBH测试结果
摘要: 本文介绍了Reactor Mk.1的性能结果,这是ARCs旗舰大型语言模型,通过基准测试过程分析。该模型利用Lychee AI引擎,拥有不到1000亿参数,既高效又强大。Reactor Mk.1在MMLU数据集上取得了92%的得分,在HumanEval数据集上取得了91%的得分,在BBH数据集上取得了88%的得分,表现优于GPT-4o、Claude Opus和Llama 3等模型。它在处理困难工作和推理方面表现出色,被视为当前尖端AI技术中卓越的解决方案。
更新时间: 2024-07-26 08:03:32
领域: cs.AI,cs.CL
Online Test Synthesis From Requirements: Enhancing Reinforcement Learning with Game Theory
We consider the automatic online synthesis of black-box test cases from functional requirements specified as automata for reactive implementations. The goal of the tester is to reach some given state, so as to satisfy a coverage criterion, while monitoring the violation of the requirements. We develop an approach based on Monte Carlo Tree Search, which is a classical technique in reinforcement learning for efficiently selecting promising inputs. Seeing the automata requirements as a game between the implementation and the tester, we develop a heuristic by biasing the search towards inputs that are promising in this game. We experimentally show that our heuristic accelerates the convergence of the Monte Carlo Tree Search algorithm, thus improving the performance of testing.
Updated: 2024-07-26 07:59:59
标题: 在线测试合成从需求:通过博弈论增强强化学习
摘要: 我们考虑从被指定为反应性实现的自动机的功能要求中合成黑盒测试用例的在线自动合成。测试人员的目标是达到一定的状态,以满足覆盖准则,同时监视需求的违规情况。我们开发了一种基于蒙特卡罗树搜索的方法,这是一种在强化学习中用于有效选择有前途的输入的经典技术。将自动机要求视为实现和测试人员之间的游戏,我们通过使搜索偏向于在这个游戏中有前途的输入来开发一种启发式方法。我们通过实验证明,我们的启发式方法加速了蒙特卡罗树搜索算法的收敛,从而提高了测试的性能。
更新时间: 2024-07-26 07:59:59
领域: cs.AI,cs.GT,cs.LG
Characterizing Continual Learning Scenarios and Strategies for Audio Analysis
Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume the data distribution at training and deployment time will be the same. However, due to various real-life challenges, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. Continual learning (CL) approaches are devised to handle such changes in data distribution. There have been a few attempts to use CL approaches for audio analysis. Yet, there is a lack of a systematic evaluation framework. In this paper, we create a comprehensive CL dataset and characterize CL approaches for audio-based monitoring tasks. We have investigated the following CL and non-CL approaches: EWC, LwF, SI, GEM, A-GEM, GDumb, Replay, Naive, Cumulative, and Joint training. The study is very beneficial for researchers and practitioners working in the area of audio analysis for developing adaptive models. We observed that Replay achieved better results than other methods in the DCASE challenge data. It achieved an accuracy of 70.12% for the domain incremental scenario and an accuracy of 96.98% for the class incremental scenario.
Updated: 2024-07-26 07:57:18
标题: 对音频分析的持续学习场景和策略的特征化
摘要: 音频分析在许多应用场景中都很有用。最先进的音频分析方法假定训练和部署时的数据分布将是相同的。然而,由于各种现实生活挑战,数据可能会在其分布中发生漂移,或者可能会在未来出现新的类别。因此,一次性训练的模型可能无法表现出足够的性能。连续学习(CL)方法被设计用来处理数据分布的变化。已经有一些尝试使用CL方法进行音频分析。然而,缺乏系统评估框架。在本文中,我们创建了一个全面的CL数据集,并对基于音频的监测任务的CL方法进行了特征化。我们调查了以下CL和非CL方法:EWC、LwF、SI、GEM、A-GEM、GDumb、Replay、Naive、Cumulative和Joint training。这项研究对从事音频分析领域的研究人员和从业者开发自适应模型非常有益。我们观察到Replay在DCASE挑战数据中取得了比其他方法更好的结果。在域增量场景中,其准确率达到了70.12%,在类别增量场景中,准确率达到了96.98%。
更新时间: 2024-07-26 07:57:18
领域: cs.SD,cs.CV,cs.LG,eess.AS
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.
Updated: 2024-07-26 07:54:47
标题: 通过高保真度生成对抗网络实现语音带宽扩展
摘要: 语音带宽扩展对于扩展低带宽语音信号的频率范围至关重要,从而提高数字应用中的音频质量、清晰度和可感知性。其应用涵盖电话、压缩、文本转语音合成和语音识别。本文提出了一种新颖的方法,使用高保真生成对抗网络,与级联系统不同,我们的系统在配对的窄带和宽带语音信号上进行端到端训练。我们的方法将各种带宽上采样比率整合到一个单一的统一模型中,专门设计用于语音带宽扩展应用。我们的方法显示出在各种带宽扩展因素下的稳健性能,包括在训练过程中未遇到的因素,展示了零射击能力。据我们所知,这是第一部展示这种能力的作品。实验结果表明,我们的方法在实践中的语音增强应用中优于以前的端到端方法,以及插值和传统技术,展示了其有效性。
更新时间: 2024-07-26 07:54:47
领域: cs.SD,cs.AI,eess.AS
PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning
Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge arises from the expensive and limited nature of user data, coupled with the scene state space tending towards infinity. These factors contribute to overfitting and poor generalization problems during model training. Henceforth, we propose an instance-based transfer imitation learning approach. This method facilitates knowledge transfer from extensive expert domain data to the user domain, presenting a fundamental resolution to these issues. We initially train a pre-trained model using large-scale expert data. Subsequently, during the fine-tuning phase, we feed the batch data, which comprises expert and user data. Employing the inverse reinforcement learning technique, we extract the style feature distribution from user demonstrations, constructing the regularization term for the approximation of user style. In our experiments, we conducted extensive evaluations of the proposed method. Compared to the baseline methods, our approach mitigates the overfitting issue caused by sparse user data. Furthermore, we discovered that integrating the driving model with a differentiable nonlinear optimizer as a safety protection layer for end-to-end personalized fine-tuning results in superior planning performance.
Updated: 2024-07-26 07:51:11
标题: PP-TIL:基于实例传递模仿学习的自动驾驶个性化规划
摘要: 个性化运动规划在城市自动驾驶中具有重要意义,满足个体用户的独特需求。然而,先前的努力经常遇到难题,即在复杂的城市环境中同时解决两个关键方面:个性化规划和通过数据利用提高规划性能。挑战在于用户数据的昂贵和有限性,以及场景状态空间趋向无限。这些因素导致模型训练过程中出现过拟合和泛化能力差的问题。因此,我们提出了一种基于实例的转移模仿学习方法。该方法促进了从丰富的专家领域数据到用户领域的知识转移,为这些问题提供了基本解决方案。我们首先使用大规模专家数据对一个预训练模型进行训练。随后在微调阶段,我们输入包含专家和用户数据的批处理数据。通过逆强化学习技术,我们从用户示范中提取风格特征分布,构建正则化项以近似用户风格。在我们的实验中,我们对所提出的方法进行了广泛评估。与基准方法相比,我们的方法减轻了由稀疏用户数据引起的过拟合问题。此外,我们发现,将驾驶模型与可微分非线性优化器集成作为端到端个性化微调的安全保护层,可以实现更优越的规划性能。
更新时间: 2024-07-26 07:51:11
领域: cs.RO,cs.AI,cs.LG
Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data
The public sharing of user information opens the door for adversaries to infer private data, leading to privacy breaches and facilitating malicious activities. While numerous studies have concentrated on privacy leakage via public user attributes, the threats associated with the exposure of user relationships, particularly through network structure, are often neglected. This study aims to fill this critical gap by advancing the understanding and protection against privacy risks emanating from network structure, moving beyond direct connections with neighbors to include the broader implications of indirect network structural patterns. To achieve this, we first investigate the problem of Graph Privacy Leakage via Structure (GPS), and introduce a novel measure, the Generalized Homophily Ratio, to quantify the various mechanisms contributing to privacy breach risks in GPS. Based on this insight, we develop a novel graph private attribute inference attack, which acts as a pivotal tool for evaluating the potential for privacy leakage through network structures under worst-case scenarios. To protect users' private data from such vulnerabilities, we propose a graph data publishing method incorporating a learnable graph sampling technique, effectively transforming the original graph into a privacy-preserving version. Extensive experiments demonstrate that our attack model poses a significant threat to user privacy, and our graph data publishing method successfully achieves the optimal privacy-utility trade-off compared to baselines.
Updated: 2024-07-26 07:40:54
标题: 揭示隐私漏洞:探讨图数据结构的作用
摘要: 用户信息的公开共享为对手推断私人数据敞开了大门,导致隐私泄露并促进恶意活动。虽然许多研究集中在通过公开用户属性泄露隐私,但与用户关系暴露相关的威胁,特别是通过网络结构,往往被忽视。本研究旨在填补这一关键空白,通过深入了解和防范源自网络结构的隐私风险,超越了与邻居的直接连接,包括间接网络结构模式的更广泛影响。为了实现这一目标,我们首先研究了通过结构泄露图隐私(GPS)的问题,并引入了一种新颖的度量标准,广义同性比,来量化在GPS中导致隐私泄露风险的各种机制。基于这一见解,我们开发了一种新颖的图隐私属性推断攻击,作为评估在最坏情况下通过网络结构可能发生隐私泄露的潜力的关键工具。为了保护用户的私人数据免受这些漏洞的影响,我们提出了一种图数据发布方法,结合了可学习的图采样技术,有效地将原始图转化为保护隐私版本。广泛的实验表明,我们的攻击模型对用户隐私构成了重大威胁,而我们的图数据发布方法成功地在隐私与效用之间实现了最佳权衡,相比基线方法。
更新时间: 2024-07-26 07:40:54
领域: cs.LG,cs.SI
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics
Bioacoustic research, vital for understanding animal behavior, conservation, and ecology, faces a monumental challenge: analyzing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult. We address this with animal2vec, an interpretable large transformer model, and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabeled audio and then refines its understanding with labeled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labeled dataset on non-human terrestrial mammals currently available. Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labeled data (few-shot learning). animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyze large amounts of data even with scarce ground truth information.
Updated: 2024-07-26 07:39:30
标题: animal2vec和MeerKAT:一种用于稀有事件原始音频输入的自监督变压器和生物声学大规模参考数据集
摘要: 生物声学研究对于理解动物行为、保护和生态学至关重要,面临着一个巨大的挑战:分析大量数据集,其中动物的鸣叫很少见。虽然深度学习技术正变得标准化,但将它们应用于生物声学仍然困难重重。我们通过animal2vec来解决这个问题,这是一个可解释的大型转换模型,以及一个专为稀疏和不平衡的生物声学数据定制的自监督训练方案。它从未标记的音频中学习,然后通过标记数据进一步完善其理解。此外,我们推出并公开发布MeerKAT:卡拉哈里狐獴音频转录,这是目前可用的非人类陆生哺乳动物最大的标记数据集,具有毫秒级分辨率的注释。我们的模型在MeerKAT和公开可用的NIPS4Bplus鸣鸟数据集上表现优异。此外,animal2vec即使在有限的标记数据(少样本学习)下也表现良好。animal2vec和MeerKAT为生物声学研究提供了一个新的参考点,使科学家能够分析大量数据,即使地面真实信息稀缺也能做到。
更新时间: 2024-07-26 07:39:30
领域: cs.SD,cs.AI,eess.AS,q-bio.QM,stat.AP
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation
Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models utilize both noisy text and its corresponding gold text for training, which is infeasible in many real-world applications in which gold text is not available. In this paper, we consider a more realistic setting in which only noisy text and its NER labels are available. We propose to retrieve relevant text of the noisy text from a knowledge corpus and use it to enhance the representation of the original noisy input. We design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. After retrieving relevant text, we concatenate the retrieved text with the original noisy text and encode them with a transformer network, utilizing self-attention to enhance the contextual token representations of the noisy text using the retrieved text. We further employ a multi-view training framework that improves robust NER without retrieving text during inference. Experiments show that our retrieval-augmented model achieves significant improvements in various noisy NER settings.
Updated: 2024-07-26 07:30:41
标题: 从嘈杂数据中学习稳健的命名实体识别器并进行检索增强
摘要: 命名实体识别(NER)模型通常在处理嘈杂输入时遇到困难,例如拼写错误或由光学字符识别过程生成的错误,并且学习一个稳健的NER模型是具有挑战性的。现有的稳健NER模型利用噪声文本及其对应的黄金文本进行训练,然而在许多现实世界的应用中,黄金文本是不可用的,这在训练中是不可行的。在本文中,我们考虑了一个更加现实的情境,即只有噪声文本和其NER标签是可用的。我们提出从知识语料库中检索噪声文本的相关文本,并使用它来增强原始嘈杂输入的表示。我们设计了三种检索方法:基于词汇相似性的稀疏检索,基于语义相似性的密集检索,以及基于任务特定文本的自检索。在检索相关文本后,我们将检索到的文本与原始噪声文本连接起来,并使用一个变压器网络对它们进行编码,利用自注意力来使用检索文本增强噪声文本的上下文标记表示。我们进一步采用一个多视图训练框架,可以在推断过程中改善稳健的NER而不需要检索文本。实验表明,我们的检索增强模型在各种嘈杂NER设置中取得了显著的改进。
更新时间: 2024-07-26 07:30:41
领域: cs.CL,cs.AI
Look Globally and Reason: Two-stage Path Reasoning over Sparse Knowledge Graphs
Sparse Knowledge Graphs (KGs), frequently encountered in real-world applications, contain fewer facts in the form of (head entity, relation, tail entity) compared to more populated KGs. The sparse KG completion task, which reasons answers for given queries in the form of (head entity, relation, ?) for sparse KGs, is particularly challenging due to the necessity of reasoning missing facts based on limited facts. Path-based models, known for excellent explainability, are often employed for this task. However, existing path-based models typically rely on external models to fill in missing facts and subsequently perform path reasoning. This approach introduces unexplainable factors or necessitates meticulous rule design. In light of this, this paper proposes an alternative approach by looking inward instead of seeking external assistance. We introduce a two-stage path reasoning model called LoGRe (Look Globally and Reason) over sparse KGs. LoGRe constructs a relation-path reasoning schema by globally analyzing the training data to alleviate the sparseness problem. Based on this schema, LoGRe then aggregates paths to reason out answers. Experimental results on five benchmark sparse KG datasets demonstrate the effectiveness of the proposed LoGRe model.
Updated: 2024-07-26 07:10:27
标题: 全球视野和推理:稀疏知识图上的两阶段路径推理
摘要: 稀疏知识图谱(KGs)在现实世界应用中经常遇到,与更为密集的KGs相比,它包含较少的事实(头实体,关系,尾实体)。稀疏KG完成任务是一项特别具有挑战性的任务,它在稀疏KG中根据有限事实推理出给定查询的答案(头实体,关系,?)。以路径为基础的模型以其出色的可解释性而闻名,通常被用于这项任务。然而,现有的基于路径的模型通常依赖外部模型来填补缺失的事实,随后进行路径推理。这种方法引入了不可解释的因素,或者需要细致的规则设计。鉴于此,本文提出了一种寻求内部而非外部帮助的替代方法。我们引入了一种名为LoGRe(全局观察和推理)的两阶段路径推理模型,用于稀疏KG。LoGRe通过全局分析训练数据构建关系路径推理模式,以缓解稀疏问题。基于这个模式,LoGRe然后聚合路径来推理出答案。在五个基准稀疏KG数据集上的实验结果表明了所提出的LoGRe模型的有效性。
更新时间: 2024-07-26 07:10:27
领域: cs.LG,cs.AI
Machine learning for structure-guided materials and process design
In recent years, there has been a growing interest in accelerated materials innovation in the context of the process-structure-property chain. In this regard, it is essential to take into account manufacturing processes and tailor materials design approaches to support downstream process design approaches. As a major step into this direction, we present a holistic optimization approach that covers the entire process-structure-property chain in materials engineering. Our approach specifically employs machine learning to address two critical identification problems: a materials design problem, which involves identifying near-optimal material structures that exhibit desired properties, and a process design problem that is to find an optimal processing path to manufacture these structures. Both identification problems are typically ill-posed, which presents a significant challenge for solution approaches. However, the non-unique nature of these problems offers an important advantage for processing: By having several target structures that perform similarly well, processes can be efficiently guided towards manufacturing the best reachable structure. The functionality of the approach will be demonstrated manufacturing crystallographic textures with desired properties in a metal forming process.
Updated: 2024-07-26 07:08:24
标题: 机器学习在结构导向材料和工艺设计中的应用
摘要: 近年来,在过程-结构-性能链的背景下,人们对加速材料创新表现出了日益增长的兴趣。在这方面,考虑到制造过程并调整材料设计方法以支持下游过程设计方法至关重要。作为朝着这个方向迈出的重要一步,我们提出了一个涵盖材料工程中整个过程-结构-性能链的全面优化方法。我们的方法具体采用机器学习来解决两个关键的识别问题:一个是材料设计问题,涉及识别出具有所需性能的近乎最佳材料结构;另一个是找到一个最佳的加工路径来制造这些结构的过程设计问题。这两个识别问题通常是不适定的,这对于解决方法是一个重大挑战。然而,这些问题的非唯一性为加工提供了一个重要的优势:通过拥有几种表现出类似优异性能的目标结构,可以有效地引导过程朝着制造最佳可达结构的方向。该方法的功能将通过在金属成形过程中制造具有所需性能的晶体结构纹理来加以展示。
更新时间: 2024-07-26 07:08:24
领域: cond-mat.mtrl-sci,cs.LG
Enhancing Solutions for Complex PDEs: Introducing Complementary Convolution and Equivariant Attention in Fourier Neural Operators
Neural operators improve conventional neural networks by expanding their capabilities of functional mappings between different function spaces to solve partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which draws inspiration from Green's function method and directly approximates operator kernels in the frequency domain. However, after empirical observation followed by theoretical validation, we demonstrate that the FNO approximates kernels primarily in a relatively low-frequency domain. This suggests a limited capability in solving complex PDEs, particularly those characterized by rapid coefficient changes and oscillations in the solution space. Such cases are crucial in specific scenarios, like atmospheric convection and ocean circulation. To address this challenge, inspired by the translation equivariant of the convolution kernel, we propose a novel hierarchical Fourier neural operator along with convolution-residual layers and attention mechanisms to make them complementary in the frequency domain to solve complex PDEs. We perform experiments on forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations, and other physical scenarios, and find that the proposed method achieves superior performance in these PDE benchmarks, especially for equations characterized by rapid coefficient variations.
Updated: 2024-07-26 07:08:19
标题: 加强复杂偏微分方程的解决方案:在傅里叶神经算子中引入互补卷积和等变注意力
摘要: 神经算子通过扩展传统神经网络的功能映射能力,从而解决偏微分方程(PDEs)之间不同函数空间的功能映射。其中最显著的方法之一是傅里叶神经算子(FNO),它借鉴了格林函数方法,并直接在频域中逼近算子核。然而,经验观察和理论验证后,我们发现FNO主要在一个相对低频域中逼近核。这表明在解决复杂PDEs的能力受到限制,特别是那些在解空间中快速系数变化和振荡的特征PDEs。这些情况在特定场景中至关重要,如大气对流和海洋环流。为了解决这一挑战,受卷积核平移等变性的启发,我们提出了一种新颖的分层傅里叶神经算子,并结合卷积残差层和注意机制,使它们在频域中相互补充,以解决复杂PDEs。我们对多尺度椭圆方程、Navier-Stokes方程和其他物理场景的正向和反向问题进行实验,发现所提出的方法在这些PDE基准测试中取得了卓越的性能,特别是对于快速系数变化的方程。
更新时间: 2024-07-26 07:08:19
领域: cs.LG,cs.NA,math.DS,math.NA
How To Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-Angle Maximum Intensity Projections and Diffusion Models
Prostate specific membrane antigen (PSMA) positron emission tomography/computed tomography (PET/CT) imaging provides a tremendously exciting frontier in visualization of prostate cancer (PCa) metastatic lesions. However, accurate segmentation of metastatic lesions is challenging due to low signal-to-noise ratios and variable sizes, shapes, and locations of the lesions. This study proposes a novel approach for automated segmentation of metastatic lesions in PSMA PET/CT 3D volumetric images using 2D denoising diffusion probabilistic models (DDPMs). Instead of 2D trans-axial slices or 3D volumes, the proposed approach segments the lesions on generated multi-angle maximum intensity projections (MA-MIPs) of the PSMA PET images, then obtains the final 3D segmentation masks from 3D ordered subset expectation maximization (OSEM) reconstruction of 2D MA-MIPs segmentations. Our proposed method achieved superior performance compared to state-of-the-art 3D segmentation approaches in terms of accuracy and robustness in detecting and segmenting small metastatic PCa lesions. The proposed method has significant potential as a tool for quantitative analysis of metastatic burden in PCa patients.
Updated: 2024-07-26 07:08:05
标题: 如何使用2D模型在3D中分割:使用多角度最大强度投影和扩散模型自动分割PET体积上的前列腺癌转移病灶
摘要: 前列腺特异性膜抗原(PSMA)正电子发射计算机断层扫描/计算机断层扫描(PET/CT)成像在可视化前列腺癌(PCa)转移性病灶方面提供了一个极具潜力的前沿。然而,由于转移性病灶的信噪比低、大小、形状和位置不同,准确分割转移性病灶具有挑战性。本研究提出了一种新颖的方法,利用2D去噪扩散概率模型(DDPMs)在PSMA PET/CT 3D体积图像中自动分割转移性病灶。该方法不是在2D横断面或3D体积上进行分割,而是在PSMA PET图像的多角度最大强度投影(MA-MIPs)上分割病灶,然后从2D MA-MIPs分割的3D有序子集期望最大化(OSEM)重建中获得最终的3D分割掩模。我们提出的方法在检测和分割小型转移性PCa病灶方面的准确性和稳健性方面优于最先进的3D分割方法。该方法作为一种工具,有显著潜力用于PCa患者的转移负担的定量分析。
更新时间: 2024-07-26 07:08:05
领域: physics.med-ph,cs.AI,cs.CV,I.4.6
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Understanding emotions is a fundamental aspect of human communication. Integrating audio and video signals offers a more comprehensive understanding of emotional states compared to traditional methods that rely on a single data source, such as speech or facial expressions. Despite its potential, multimodal emotion recognition faces significant challenges, particularly in synchronization, feature extraction, and fusion of diverse data sources. To address these issues, this paper introduces a novel transformer-based model named Audio-Video Transformer Fusion with Cross Attention (AVT-CA). The AVT-CA model employs a transformer fusion approach to effectively capture and synchronize interlinked features from both audio and video inputs, thereby resolving synchronization problems. Additionally, the Cross Attention mechanism within AVT-CA selectively extracts and emphasizes critical features while discarding irrelevant ones from both modalities, addressing feature extraction and fusion challenges. Extensive experimental analysis conducted on the CMU-MOSEI, RAVDESS and CREMA-D datasets demonstrates the efficacy of the proposed model. The results underscore the importance of AVT-CA in developing precise and reliable multimodal emotion recognition systems for practical applications.
Updated: 2024-07-26 07:05:04
标题: 多模态情感识别:使用音视频Transformer融合和交叉注意力
摘要: 理解情绪是人类交流的基本方面。集成音频和视频信号相比于传统方法,比如仅依赖于单一数据源,如语音或面部表情,能够更全面地理解情绪状态。尽管多模态情绪识别具有潜力,但面临着重要挑战,特别是在同步、特征提取和融合不同数据源方面。为解决这些问题,本文介绍了一种名为Audio-Video Transformer Fusion with Cross Attention(AVT-CA)的新型基于Transformer的模型。AVT-CA模型采用了Transformer融合方法,有效地捕捉和同步来自音频和视频输入的相互关联特征,从而解决了同步问题。此外,AVT-CA内的Cross Attention机制从两种模态中选择性地提取和强调关键特征,同时丢弃不相关的特征,解决了特征提取和融合的挑战。在CMU-MOSEI、RAVDESS和CREMA-D数据集上进行的广泛实验分析表明了所提出模型的有效性。结果强调了AVT-CA在为实际应用开发精确可靠的多模态情绪识别系统方面的重要性。
更新时间: 2024-07-26 07:05:04
领域: cs.MM,cs.CL,cs.CV,cs.LG,cs.SD,eess.AS,F.2.2; I.2.7
Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement Network
Trajectory prediction is crucial for autonomous driving as it aims to forecast the future movements of traffic participants. Traditional methods usually perform holistic inference on the trajectories of agents, neglecting the differences in prediction difficulty among agents. This paper proposes a novel Difficulty-Guided Feature Enhancement Network (DGFNet), which leverages the prediction difficulty differences among agents for multi-agent trajectory prediction. Firstly, we employ spatio-temporal feature encoding and interaction to capture rich spatio-temporal features. Secondly, a difficulty-guided decoder is used to control the flow of future trajectories into subsequent modules, obtaining reliable future trajectories. Then, feature interaction and fusion are performed through the future feature interaction module. Finally, the fused agent features are fed into the final predictor to generate the predicted trajectory distributions for multiple participants. Experimental results demonstrate that our DGFNet achieves state-of-the-art performance on the Argoverse 1\&2 motion forecasting benchmarks. Ablation studies further validate the effectiveness of each module. Moreover, compared with SOTA methods, our method balances trajectory prediction accuracy and real-time inference speed.
Updated: 2024-07-26 07:04:30
标题: 使用困难引导特征增强网络进行多智能体轨迹预测
摘要: 轨迹预测对于自动驾驶至关重要,因为它旨在预测交通参与者未来的移动路径。传统方法通常对代理的轨迹进行整体推断,忽略了不同代理之间预测难度的差异。本文提出了一种新颖的难度引导特征增强网络(DGFNet),利用代理之间的预测难度差异进行多代理轨迹预测。首先,我们采用时空特征编码和交互来捕捉丰富的时空特征。其次,使用难度引导解码器来控制未来轨迹进入后续模块的流动,获取可靠的未来轨迹。然后,通过未来特征交互模块执行特征交互和融合。最后,将融合的代理特征输入到最终预测器中,为多个参与者生成预测的轨迹分布。实验结果表明,我们的DGFNet在Argoverse 1&2运动预测基准上取得了最先进的性能。消融研究进一步验证了每个模块的有效性。此外,与SOTA方法相比,我们的方法平衡了轨迹预测精度和实时推断速度。
更新时间: 2024-07-26 07:04:30
领域: cs.RO,cs.AI
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Simulated virtual environments have been widely used to learn robotic agents that perform daily household tasks. These environments encourage research progress by far, but often provide limited object interactability, visual appearance different from real-world environments, or relatively smaller environment sizes. This prevents the learned models in the virtual scenes from being readily deployable. To bridge the gap between these learning environments and deploying (i.e., real) environments, we propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks by understanding free-form language instructions and interacting with objects in large, multi-room and 3D-captured scenes. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics, encouraging the community to develop methods in more realistic environments. Our code and data are publicly available.
Updated: 2024-07-26 07:00:27
标题: ReALFRED: 一个在照片逼真环境中的具身指导跟随基准。
摘要: 模拟虚拟环境被广泛用于学习执行日常家务任务的机器人代理。这些环境大大促进了研究进展,但通常提供有限的物体交互性,视觉外观与真实环境不同,或者环境尺寸相对较小。这阻碍了虚拟场景中学习模型的即时部署。为了弥合这些学习环境与部署(即真实)环境之间的差距,我们提出了ReALFRED基准,利用真实世界场景、物体和房间布局来学习代理完成家务任务,通过理解自由形式的语言指令并与大型、多房间和三维捕捉场景中的物体交互。具体来说,我们扩展了ALFRED基准,更新了较大的环境空间,减小了视觉领域的差距。通过ReALFRED,我们分析了先前为ALFRED基准设计的方法,并观察到它们在所有指标上表现一致较低,鼓励社区在更现实的环境中开发方法。我们的代码和数据是公开可用的。
更新时间: 2024-07-26 07:00:27
领域: cs.RO,cs.AI
Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case
This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. The dataset used in this study was obtained from a New Zealand manufacturer of woollen carpets and rugs. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated: a Decision Tree and two ensemble methods, Random Forest and XGBoost. Additionally, three feature selection methods were also evaluated: the SelectKBest method, using chi-squared as the scoring function, the Pearson Correlation Coefficient, and the Boruta algorithm. Not surprisingly, the ensemble methods typically produced better results than the Decision Tree model. The Random Forest model yielded the best results overall when combined with the Boruta feature selection technique. Finally, a tree ensemble explaining technique was used to extract rule lists to capture necessary and sufficient conditions for classification by a trained model that could be easily interpreted by a human. Notably, several features that were in the extracted rule lists were statistical features and calculated features that were added to the original dataset. This demonstrates the influence that bringing in additional information during the data preprocessing stages can have on the ultimate model performance.
Updated: 2024-07-26 06:50:17
标题: 利用可解释技术进行复杂纺织品制造用例中的质量预测
摘要: 这篇论文通过可解释的技术,开发了一种分类复杂纺织制造数据集中产品故障实例的方法。本研究使用的数据集来自新西兰羊毛地毯和地毯制造商。在研究准确性和可解释性之间的权衡时,评估了三种不同的基于树的分类算法:决策树和两种集成方法,随机森林和XGBoost。此外,还评估了三种特征选择方法:使用卡方作为评分函数的SelectKBest方法,皮尔逊相关系数和Boruta算法。毫不奇怪,集成方法通常比决策树模型产生更好的结果。随机森林模型在与Boruta特征选择技术结合时获得了最佳结果。最后,使用树集成解释技术提取规则列表,以捕获经过训练的模型对分类的必要和充分条件,这些条件可以轻松地被人类解释。值得注意的是,提取的规则列表中的几个特征是统计特征和计算特征,这些特征被添加到原始数据集中。这显示了在数据预处理阶段引入额外信息可能对最终模型性能产生的影响。
更新时间: 2024-07-26 06:50:17
领域: cs.LG
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
We propose a novel approach to significantly improve the intelligibility in the Non-Audible Murmur (NAM)-to-speech conversion task, leveraging self-supervision and sequence-to-sequence (Seq2Seq) learning techniques. Unlike conventional methods that explicitly record ground-truth speech, our methodology relies on self-supervision and speech-to-speech synthesis to simulate ground-truth speech. Despite utilizing simulated speech, our method surpasses the current state-of-the-art (SOTA) by 29.08% improvement in the Mel-Cepstral Distortion (MCD) metric. Additionally, we present error rates and demonstrate our model's proficiency to synthesize speech in novel voices of interest. Moreover, we present a methodology for augmenting the existing CSTR NAM TIMIT Plus corpus, setting a benchmark with a Word Error Rate (WER) of 42.57% to gauge the intelligibility of the synthesized speech. Speech samples can be found at https://nam2speech.github.io/NAM2Speech/
Updated: 2024-07-26 06:44:01
标题: 朝着利用自监督语音模型提高NAM到语音合成的可懂度
摘要: 我们提出了一种新颖的方法,显著改善了不可听闻的低语音(NAM)转换成语音的可理解性,利用自监督和序列到序列(Seq2Seq)学习技术。与明确记录地面真实语音的传统方法不同,我们的方法依赖于自监督和语音合成来模拟地面真实语音。尽管利用了模拟语音,我们的方法在Mel-Cepstral失真(MCD)度量上超越了当前的最新技术(SOTA),提高了29.08%。此外,我们提出了错误率,并展示了我们的模型在感兴趣的新声音中合成语音的熟练程度。此外,我们提出了一种用于增强现有CSTR NAM TIMIT Plus语料库的方法,建立了一个具有42.57%的字错误率(WER)的基准,以评估合成语音的可理解性。语音样本可以在https://nam2speech.github.io/NAM2Speech/ 找到。
更新时间: 2024-07-26 06:44:01
领域: cs.SD,cs.AI,eess.AS
Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement Learning
Mobile robots are being used on a large scale in various crowded situations and become part of our society. The socially acceptable navigation behavior of a mobile robot with individual human consideration is an essential requirement for scalable applications and human acceptance. Deep Reinforcement Learning (DRL) approaches are recently used to learn a robot's navigation policy and to model the complex interactions between robots and humans. We propose to divide existing DRL-based navigation approaches based on the robot's exhibited social behavior and distinguish between social collision avoidance with a lack of social behavior and socially aware approaches with explicit predefined social behavior. In addition, we propose a novel socially integrated navigation approach where the robot's social behavior is adaptive and emerges from the interaction with humans. The formulation of our approach is derived from a sociological definition, which states that social acting is oriented toward the acting of others. The DRL policy is trained in an environment where other agents interact socially integrated and reward the robot's behavior individually. The simulation results indicate that the proposed socially integrated navigation approach outperforms a socially aware approach in terms of ego navigation performance while significantly reducing the negative impact on all agents within the environment.
Updated: 2024-07-26 06:41:45
标题: 社会整合导航:一个具有深度强化学习的社交机器人
摘要: 移动机器人正在各种拥挤场合大规模使用,并成为社会的一部分。具有个体人类考虑的社会可接受的导航行为是可扩展应用和人类接受的基本要求。最近使用深度强化学习(DRL)方法来学习机器人的导航策略,并建模机器人与人类之间的复杂互动。我们提出根据机器人表现出的社会行为将现有的基于DRL的导航方法分为社会避撞缺乏社会行为和明确预定义社会行为的社会意识方法。此外,我们提出了一种新颖的社会整合导航方法,其中机器人的社会行为是自适应的,并且是通过与人类的互动而产生的。我们的方法的制定源自社会学定义,即社会行为是朝向他人行为的。DRL策略在一个环境中进行训练,其中其他代理人进行社会整合互动,并单独奖励机器人的行为。模拟结果表明,所提出的社会整合导航方法在自我导航性能方面优于社会意识方法,同时显著降低了对环境中所有代理人的负面影响。
更新时间: 2024-07-26 06:41:45
领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY
A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models
Over the past decade, extensive research efforts have been dedicated to the extraction of information from textual process descriptions. Despite the remarkable progress witnessed in natural language processing (NLP), information extraction within the Business Process Management domain remains predominantly reliant on rule-based systems and machine learning methodologies. Data scarcity has so far prevented the successful application of deep learning techniques. However, the rapid progress in generative large language models (LLMs) makes it possible to solve many NLP tasks with very high quality without the need for extensive data. Therefore, we systematically investigate the potential of LLMs for extracting information from textual process descriptions, targeting the detection of process elements such as activities and actors, and relations between them. Using a heuristic algorithm, we demonstrate the suitability of the extracted information for process model generation. Based on a novel prompting strategy, we show that LLMs are able to outperform state-of-the-art machine learning approaches with absolute performance improvements of up to 8\% $F_1$ score across three different datasets. We evaluate our prompting strategy on eight different LLMs, showing it is universally applicable, while also analyzing the impact of certain prompt parts on extraction quality. The number of example texts, the specificity of definitions, and the rigour of format instructions are identified as key for improving the accuracy of extracted information. Our code, prompts, and data are publicly available.
Updated: 2024-07-26 06:39:35
标题: 一种使用大型语言模型从自然语言文本中提取过程模型信息的通用提示策略
摘要: 在过去的十年里,人们已经投入了大量的研究工作来从文本过程描述中提取信息。尽管自然语言处理(NLP)领域取得了显著进展,但在业务流程管理领域内,信息提取仍然主要依赖于基于规则的系统和机器学习方法。迄今为止,数据稀缺性阻碍了深度学习技术的成功应用。然而,生成式大型语言模型(LLMs)的快速进展使得在不需要大量数据的情况下解决许多NLP任务成为可能。因此,我们系统地调查了LLMs从文本过程描述中提取信息的潜力,旨在检测过程元素(如活动和参与者)及其之间的关系。通过使用启发式算法,我们展示了提取信息用于过程模型生成的适用性。基于一种新颖的提示策略,我们展示了LLMs能够在三个不同数据集上的绝对性能提升达到8\%的$F_1$分数。我们在八种不同的LLMs上评估了我们的提示策略,表明它是普遍适用的,同时分析了某些提示部分对提取质量的影响。示例文本的数量、定义的特定性以及格式指令的严谨性被确定为提高提取信息准确性的关键因素。我们的代码、提示和数据都是公开可用的。
更新时间: 2024-07-26 06:39:35
领域: cs.CL,cs.AI
MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI
In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontology that incorporates detailed knowledge about various diseases, a diagnostic classification model that uses patient symptoms to detect specific diseases accurately, and the integration of Semantic Web Rule Language (SWRL) with ontology and ChatGPT to generate clear, personalized health advice. This approach significantly improves prediction accuracy and ensures results that are easy to understand, addressing the complexity of diseases and diverse symptoms. The MLtoGAI system demonstrates substantial advancements in accuracy and user satisfaction, contributing to developing more intelligent and accessible healthcare solutions. This innovative approach combines the strengths of ML algorithms with the ability to provide transparent, human-understandable explanations through ChatGPT, achieving significant improvements in prediction accuracy and user comprehension. By leveraging semantic technology and explainable AI, the system enhances the accuracy of disease prediction and ensures that the recommendations are relevant and easily understood by individual patients. Our research highlights the potential of integrating advanced technologies to overcome existing challenges in medical diagnostics, paving the way for future developments in intelligent healthcare systems. Additionally, the system is validated using 200 synthetic patient data records, ensuring robust performance and reliability.
Updated: 2024-07-26 06:32:06
标题: MLtoGAI:基于语义网络与机器学习的增强疾病预测和个性化推荐,利用生成式人工智能
摘要: 在现代医疗保健中,解决准确疾病预测和个性化建议的复杂性既至关重要又具有挑战性。本研究介绍了MLtoGAI,它将语义Web技术与机器学习(ML)相结合,通过ChatGPT提供用户友好的解释,以增强疾病预测。该系统包括三个关键组件:一个可重复使用的疾病本体论,它包含有关各种疾病的详细知识;一个诊断分类模型,它使用患者症状准确检测特定疾病;以及将语义Web规则语言(SWRL)与本体和ChatGPT集成,生成清晰、个性化的健康建议。这种方法显著提高了预测准确性,并确保结果易于理解,解决了疾病和不同症状的复杂性。MLtoGAI系统在准确性和用户满意度方面取得了重大进展,有助于开发更智能和可访问的医疗保健解决方案。这种创新方法将ML算法的优势与通过ChatGPT提供透明、人类可理解解释的能力相结合,实现了预测准确性和用户理解度的显著提升。通过利用语义技术和可解释AI,该系统提高了疾病预测的准确性,并确保建议与个体患者相关且易于理解。我们的研究突出了整合先进技术以克服医疗诊断中现有挑战的潜力,为智能医疗保健系统的未来发展铺平了道路。此外,该系统使用了200个合成患者数据记录进行验证,确保了其稳健性和可靠性。
更新时间: 2024-07-26 06:32:06
领域: cs.AI,cs.LG
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the open-ended generative capabilities centered around audio. Thus, it is challenging to track the progression in the Large Audio-Language Models (LALMs) domain and to provide guidance for future improvement. In this paper, we introduce AIR-Bench (\textbf{A}udio \textbf{I}nst\textbf{R}uction \textbf{Bench}mark), the first benchmark designed to evaluate the ability of LALMs to understand various types of audio signals (including human speech, natural sounds, and music), and furthermore, to interact with humans in the textual format. AIR-Bench encompasses two dimensions: \textit{foundation} and \textit{chat} benchmarks. The former consists of 19 tasks with approximately 19k single-choice questions, intending to inspect the basic single-task ability of LALMs. The latter one contains 2k instances of open-ended question-and-answer data, directly assessing the comprehension of the model on complex audio and its capacity to follow instructions. Both benchmarks require the model to generate hypotheses directly. We design a unified framework that leverages advanced language models, such as GPT-4, to evaluate the scores of generated hypotheses given the meta-information of the audio. Experimental results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation. By revealing the limitations of existing LALMs through evaluation results, AIR-Bench can provide insights into the direction of future research.
Updated: 2024-07-26 06:30:47
标题: 空气台: 通过生成理解来对大型音频语言模型进行基准测试
摘要: 最近,指令遵循的音频语言模型引起了广泛关注,用于人类与音频的交互。然而,缺乏能够评估以音频为中心的交互能力的基准已经阻碍了该领域的进展。先前的模型主要关注评估不同的基本任务,如自动语音识别(ASR),缺乏评估围绕音频的开放式生成能力。因此,跟踪大型音频语言模型(LALMs)领域的进展并为未来改进提供指导是具有挑战性的。在本文中,我们介绍了AIR-Bench(\textbf{A}udio \textbf{I}nst\textbf{R}uction \textbf{Bench}mark),这是第一个旨在评估LALMs理解各种类型音频信号(包括人类语音、自然声音和音乐),并且与人类以文本格式交互的基准。AIR-Bench包括两个维度:基础基准和聊天基准。前者包含19个任务,大约19k个单项选择问题,旨在检查LALMs的基本单任务能力。后者包含2k个开放式问答数据实例,直接评估模型对复杂音频的理解及其遵循指令的能力。两个基准都要求模型直接生成假设。我们设计了一个统一框架,利用先进的语言模型,如GPT-4,根据音频的元信息评估生成假设的得分。实验结果显示,基于GPT-4的评估与人类评估之间存在高度一致性。通过评估结果揭示现有LALMs的局限性,AIR-Bench可以为未来研究方向提供见解。
更新时间: 2024-07-26 06:30:47
领域: eess.AS,cs.CL,cs.LG,cs.SD
Outer Approximation and Super-modular Cuts for Constrained Assortment Optimization under Mixed-Logit Model
In this paper, we study the assortment optimization problem under the mixed-logit customer choice model. While assortment optimization has been a major topic in revenue management for decades, the mixed-logit model is considered one of the most general and flexible approaches for modeling and predicting customer purchasing behavior. Existing exact methods have primarily relied on mixed-integer linear programming (MILP) or second-order cone (CONIC) reformulations, which allow for exact problem solving using off-the-shelf solvers. However, these approaches often suffer from weak continuous relaxations and are slow when solving large instances. Our work addresses the problem by focusing on components of the objective function that can be proven to be monotonically super-modular and convex. This allows us to derive valid cuts to outer-approximate the nonlinear objective functions. We then demonstrate that these valid cuts can be incorporated into Cutting Plane or Branch-and-Cut methods to solve the problem exactly. Extensive experiments show that our approaches consistently outperform previous methods in terms of both solution quality and computation time.
Updated: 2024-07-26 06:27:11
标题: 外部逼近和超模块切割在混合Logit模型下受限组合优化中的应用
摘要: 在这篇论文中,我们研究了在混合logit客户选择模型下的组合优化问题。尽管组合优化在收入管理中已经是一个主要的话题几十年了,但混合logit模型被认为是建模和预测客户购买行为最一般和灵活的方法之一。现有的精确方法主要依赖于混合整数线性规划(MILP)或二阶锥(CONIC)重构,这些方法允许使用现成的求解器进行精确问题求解。然而,这些方法经常受到连续松弛的不足和解决大规模实例时的速度缓慢的问题。我们的工作通过专注于可以被证明为单调超模和凸的目标函数组件来解决这个问题。这使我们能够推导出有效的切割来外逼近非线性目标函数。然后我们证明这些有效的切割可以被整合到切割平面或分支和切割方法中来精确求解问题。大量实验证明我们的方法在解决方案质量和计算时间方面始终优于以前的方法。
更新时间: 2024-07-26 06:27:11
领域: math.OC,cs.AI
ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model
Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD
Updated: 2024-07-26 06:25:48
标题: ChangeMamba:基于时空状态空间模型的遥感变化检测
摘要: 卷积神经网络(CNN)和变压器在遥感变化检测(CD)领域取得了令人瞩目的进展。然而,这两种架构都存在固有的缺点:CNN受限于有限的感受野,可能阻碍其捕捉更广泛的空间上下文,而变压器计算量大,使其在大型数据集上训练和部署成本高昂。最近,基于状态空间模型的Mamba架构在一系列自然语言处理任务中表现出了卓越的性能,可以有效弥补上述两种架构的缺点。本文首次探索了Mamba架构在遥感CD任务中的潜力。我们为二值变化检测(BCD)、语义变化检测(SCD)和建筑损坏评估(BDA)定制了相应的框架,分别称为MambaBCD、MambaSCD和MambaBDA。这三个框架都采用了最先进的Visual Mamba架构作为编码器,可以从输入图像中完全学习全局空间上下文信息。对于变化解码器,在所有三种架构中都可用,我们提出了三种时空关系建模机制,可以与Mamba架构自然结合,并充分利用其属性实现多时序特征的时空交互,从而获取准确的变化信息。在五个基准数据集上,我们提出的框架优于当前基于CNN和变压器的方法,而无需使用任何复杂的训练策略或技巧,充分展示了Mamba架构在CD任务中的潜力。进一步实验证明我们的架构对降质数据具有相当的鲁棒性。源代码将在https://github.com/ChenHongruixuan/MambaCD 上提供。
更新时间: 2024-07-26 06:25:48
领域: eess.IV,cs.AI,cs.CV
Adaptive Self-training Framework for Fine-grained Scene Graph Generation
Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.
Updated: 2024-07-26 06:17:59
标题: 细粒度场景图生成的自适应自训练框架
摘要: 场景图生成(SGG)模型在基准数据集方面存在固有问题,如长尾谓词分布和缺失注释问题。在这项工作中,我们旨在通过利用未标记的三元组来缓解SGG的长尾问题。为此,我们引入了一种自训练框架用于SGG(ST-SGG),该框架为未标记的三元组分配伪标签,基于这些伪标签对SGG模型进行训练。虽然在图像识别方面自训练取得了显著进展,但设计一个用于SGG任务的自训练框架更具挑战性,因为其固有特性,如语义模糊性和长尾谓词类的分布。因此,我们提出了一种新颖的SGG伪标记技术,称为具有动量的类别自适应阈值(CATM),这是一个与模型无关的框架,可应用于任何现有的SGG模型。此外,我们设计了一个图结构学习器(GSL),在采用我们提出的自训练框架到最先进的基于消息传递神经网络(MPNN)的SGG模型时是有益的。我们的广泛实验证实了ST-SGG在各种SGG模型上的有效性,特别是在提高精细谓词类性能方面。
更新时间: 2024-07-26 06:17:59
领域: cs.CV,cs.AI
Constructing Enhanced Mutual Information for Online Class-Incremental Learning
Online Class-Incremental continual Learning (OCIL) addresses the challenge of continuously learning from a single-channel data stream, adapting to new tasks while mitigating catastrophic forgetting. Recently, Mutual Information (MI)-based methods have shown promising performance in OCIL. However, existing MI-based methods treat various knowledge components in isolation, ignoring the knowledge confusion across tasks. This narrow focus on simple MI knowledge alignment may lead to old tasks being easily forgotten with the introduction of new tasks, risking the loss of common parts between past and present knowledge.To address this, we analyze the MI relationships from the perspectives of diversity, representativeness, and separability, and propose an Enhanced Mutual Information (EMI) method based on knwoledge decoupling. EMI consists of Diversity Mutual Information (DMI), Representativeness Mutual Information (RMI) and Separability Mutual Information (SMI). DMI diversifies intra-class sample features by considering the similarity relationships among inter-class sample features to enable the network to learn more general knowledge. RMI summarizes representative features for each category and aligns sample features with these representative features, making the intra-class sample distribution more compact. SMI establishes MI relationships for inter-class representative features, enhancing the stability of representative features while increasing the distinction between inter-class representative features, thus creating clear boundaries between class. Extensive experimental results on widely used benchmark datasets demonstrate the superior performance of EMI over state-of-the-art baseline methods.
Updated: 2024-07-26 06:16:11
标题: 构建增强的互信息用于在线类增量学习
摘要: 在线课程增量持续学习(OCIL)解决了从单通道数据流中持续学习的挑战,适应新任务同时减轻灾难性遗忘。最近,基于互信息(MI)的方法在OCIL中表现出很好的性能。然而,现有的基于MI的方法将各种知识组件孤立对待,忽略了任务之间的知识混淆。对简单MI知识对齐的狭窄关注可能导致旧任务在引入新任务时容易被遗忘,从而冒着失去过去和现在知识之间共同部分的风险。为了解决这个问题,我们从多样性、代表性和可分离性的视角分析了MI关系,并提出了基于知识解耦的增强互信息(EMI)方法。EMI包括多样性互信息(DMI)、代表性互信息(RMI)和可分离性互信息(SMI)。DMI通过考虑类间样本特征的相似关系来使网络学习更一般的知识,使得类内样本特征多样化。RMI总结了每个类别的代表性特征,并将样本特征与这些代表性特征对齐,使得类内样本分布更加紧凑。SMI建立了类间代表性特征的互信息关系,增强了代表性特征的稳定性,同时增加了类间代表性特征之间的区分度,从而在类别之间创建清晰的边界。广泛使用的基准数据集上的大量实验结果表明,EMI相对于最先进的基准方法具有优越的性能。
更新时间: 2024-07-26 06:16:11
领域: cs.LG
Is larger always better? Evaluating and prompting large language models for non-generative medical tasks
The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14 language models (9 GPT-based and 5 BERT-based) and 7 traditional predictive models using the MIMIC dataset (ICU patient records) and the TJH dataset (early COVID-19 EHR data), focusing on tasks such as mortality and readmission prediction, disease hierarchy reconstruction, and biomedical sentence matching, comparing both zero-shot and finetuned performance. Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies, frequently surpassing traditional models. However, for unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks. Consequently, while LLMs are effective for zero-shot learning on structured data, finetuned BERT models are more suitable for unstructured texts, underscoring the importance of selecting models based on specific task requirements and data characteristics to optimize the application of NLP technology in healthcare.
Updated: 2024-07-26 06:09:10
标题: 是否总是越大越好?评估和推动大型语言模型在非生成性医学任务中的应用
摘要: 在医学领域中,大型语言模型(LLMs)的应用正在增长,但它们处理结构化电子健康记录(EHR)数据和非结构化临床笔记的能力尚未得到充分研究。本研究对各种模型进行了基准测试,包括基于GPT的LLMs、基于BERT的模型和传统临床预测模型,用于非生成性医学任务,利用著名数据集。我们评估了14个语言模型(9个基于GPT和5个基于BERT的)和7个传统预测模型,使用MIMIC数据集(ICU患者记录)和TJH数据集(早期COVID-19 EHR数据),重点关注死亡率和再入院预测、疾病层次重建和生物医学句子匹配等任务,比较了零样本和微调性能。结果表明,在使用良好设计的提示策略时,LLMs在结构化EHR数据上表现出了强大的零样本预测能力,经常超过传统模型。然而,在非结构化医学文本中,LLMs并未超过微调的BERT模型,在监督和无监督任务中表现出色。因此,虽然LLMs在结构化数据上的零样本学习效果显著,但微调的BERT模型更适合非结构化文本,强调根据具体任务要求和数据特性选择模型以优化NLP技术在医疗保健中的应用的重要性。
更新时间: 2024-07-26 06:09:10
领域: cs.CL,cs.AI,cs.LG
She Works, He Works: A Curious Exploration of Gender Bias in AI-Generated Imagery
This paper examines gender bias in AI-generated imagery of construction workers, highlighting discrepancies in the portrayal of male and female figures. Grounded in Griselda Pollock's theories on visual culture and gender, the analysis reveals that AI models tend to sexualize female figures while portraying male figures as more authoritative and competent. These findings underscore AI's potential to mirror and perpetuate societal biases, emphasizing the need for critical engagement with AI-generated content. The project contributes to discussions on the ethical implications of AI in creative practices and its broader impact on cultural perceptions of gender.
Updated: 2024-07-26 05:56:18
标题: 她工作,他工作:AI生成图像中性别偏见的好奇探索
摘要: 这篇论文研究了人工智能生成的建筑工人形象中存在的性别偏见,突出了男性和女性形象描绘中的差异。基于格里泽尔达·波洛克(Griselda Pollock)关于视觉文化和性别的理论,分析显示人工智能模型倾向于将女性形象性感化,同时将男性形象描绘为更具权威性和能力。这些发现强调了人工智能可能反映和延续社会偏见的潜力,强调了对人工智能生成内容进行批判性参与的必要性。该项目为关于人工智能在创意实践中的道德影响以及对性别文化认知的更广泛影响的讨论做出了贡献。
更新时间: 2024-07-26 05:56:18
领域: cs.CY,cs.AI,cs.CV,I.2.0; J.5
DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning
Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG representation learning predominantly relies on GNN+RNN architectures, which manifest the inherent limitations of both Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs). GNNs suffer from the over-smoothing issue as the models architecture goes deeper, while RNNs struggle to capture long-term dependencies effectively. GNN+RNN architectures also grapple with scaling to large graph sizes and long sequences. Additionally, these methods often compute node representations separately and focus solely on individual node characteristics, thereby overlooking the behavior intersections between the two nodes whose link is being predicted, such as instances where the two nodes appear together in the same context or share common neighbors. This paper introduces a novel representation learning method DTFormer for DTDGs, pivoting from the traditional GNN+RNN framework to a Transformer-based architecture. Our approach exploits the attention mechanism to concurrently process topological information within the graph at each timestamp and temporal dynamics of graphs along the timestamps, circumventing the aforementioned fundamental weakness of both GNNs and RNNs. Moreover, we enhance the model's expressive capability by incorporating the intersection relationships among nodes and integrating a multi-patching module. Extensive experiments conducted on six public dynamic graph benchmark datasets confirm our model's efficacy, achieving the SOTA performance.
Updated: 2024-07-26 05:46:23
标题: DTFormer:一种基于Transformer的离散时间动态图表示学习方法
摘要: 离散时间动态图(DTDGs)在现实世界中广泛存在,并以其易于数据获取而闻名,引起了学术研究人员和行业从业者的广泛关注。DTDGs的表示学习已被广泛应用于对临时变化实体及其演变连接的动态建模。目前,DTDG表示学习主要依赖于GNN+RNN架构,这表现出图神经网络(GNNs)和循环神经网络(RNNs)的固有限制。GNNs在模型架构变得更深时会出现过度平滑问题,而RNNs则难以有效捕捉长期依赖关系。GNN+RNN架构还面临着扩展到大型图大小和长序列的困难。此外,这些方法通常分别计算节点表示,并专注于单个节点特征,从而忽视了预测连接的两个节点之间的行为交集,比如这两个节点在相同上下文中出现或共享共同邻居的情况。 本文介绍了一种新颖的表示学习方法DTFormer,针对DTDGs,从传统的GNN+RNN框架转变为基于Transformer的架构。我们的方法利用注意力机制同时处理每个时间戳的图内部拓扑信息和图的时间动态,避开了GNNs和RNNs的上述基本弱点。此外,我们通过整合节点之间的交集关系和集成多补丁模块,增强了模型的表现能力。在六个公共动态图基准数据集上进行的大量实验证实了我们模型的有效性,实现了SOTA性能。
更新时间: 2024-07-26 05:46:23
领域: cs.LG
Patched MOA: optimizing inference for diverse software development tasks
This paper introduces Patched MOA (Mixture of Agents), an inference optimization technique that significantly enhances the performance of large language models (LLMs) across diverse software development tasks. We evaluate three inference optimization algorithms - Best of N, Mixture of Agents, and Monte Carlo Tree Search and demonstrate that Patched MOA can boost the performance of smaller models to surpass that of larger, more expensive models. Notably, our approach improves the gpt-4o-mini model's performance on the Arena-Hard-Auto benchmark by 15.52%, outperforming gpt-4-turbo at a fraction of the cost. We also apply Patched MOA to various software development workflows, showing consistent improvements in task completion rates. Our method is model-agnostic, transparent to end-users, and can be easily integrated into existing LLM pipelines. This work contributes to the growing field of LLM optimization, offering a cost-effective solution for enhancing model performance without the need for fine-tuning or larger models.
Updated: 2024-07-26 05:34:34
标题: Patched MOA: 优化推理以应用于多样化的软件开发任务
摘要: 本文介绍了Patched MOA(Mixture of Agents),这是一种推理优化技术,显著提升了大型语言模型(LLMs)在各种软件开发任务中的性能。我们评估了三种推理优化算法 - 最佳N、代理混合和蒙特卡洛树搜索,并展示Patched MOA可以提升较小模型的性能,使其超越更大更昂贵的模型。值得注意的是,我们的方法提高了gpt-4o-mini模型在Arena-Hard-Auto基准测试中的表现15.52%,胜过了成本的一小部分的gpt-4-turbo。我们还将Patched MOA应用于各种软件开发工作流程,显示了任务完成率的持续改善。我们的方法是模型无关的,对终端用户透明,并且可以轻松集成到现有的LLM流程中。这项工作为LLM优化领域的发展做出了贡献,提供了一种经济有效的解决方案,可以提升模型性能,而无需进行微调或使用更大的模型。
更新时间: 2024-07-26 05:34:34
领域: cs.SE,cs.AI
TCGPN: Temporal-Correlation Graph Pre-trained Network for Stock Forecasting
Recently, the incorporation of both temporal features and the correlation across time series has become an effective approach in time series prediction. Spatio-Temporal Graph Neural Networks (STGNNs) demonstrate good performance on many Temporal-correlation Forecasting Problem. However, when applied to tasks lacking periodicity, such as stock data prediction, the effectiveness and robustness of STGNNs are found to be unsatisfactory. And STGNNs are limited by memory savings so that cannot handle problems with a large number of nodes. In this paper, we propose a novel approach called the Temporal-Correlation Graph Pre-trained Network (TCGPN) to address these limitations. TCGPN utilize Temporal-correlation fusion encoder to get a mixed representation and pre-training method with carefully designed temporal and correlation pre-training tasks. Entire structure is independent of the number and order of nodes, so better results can be obtained through various data enhancements. And memory consumption during training can be significantly reduced through multiple sampling. Experiments are conducted on real stock market data sets CSI300 and CSI500 that exhibit minimal periodicity. We fine-tune a simple MLP in downstream tasks and achieve state-of-the-art results, validating the capability to capture more robust temporal correlation patterns.
Updated: 2024-07-26 05:27:26
标题: TCGPN:用于股票预测的时序相关图预训练网络
摘要: 最近,将时间特征和时间序列之间的相关性结合起来已成为时间序列预测中的一种有效方法。时空图神经网络(STGNNs)在许多时间相关性预测问题上表现出良好的性能。然而,当应用于缺乏周期性的任务,如股票数据预测时,发现STGNNs的有效性和鲁棒性并不理想。而且STGNNs受到内存节省的限制,无法处理具有大量节点的问题。本文提出了一种名为时间相关性图预训练网络(TCGPN)的新方法,以解决这些限制。TCGPN利用时间相关性融合编码器获取混合表示,并通过精心设计的时间和相关性预训练任务进行预训练。整个结构独立于节点的数量和顺序,通过各种数据增强可以获得更好的结果。并通过多次采样可以显著减少训练过程中的内存消耗。在展现最小周期性的真实股市数据集CSI300和CSI500上进行了实验。我们在下游任务中微调了一个简单的多层感知器(MLP)并取得了最新的结果,验证了捕捉更加稳健的时间相关性模式的能力。
更新时间: 2024-07-26 05:27:26
领域: cs.LG,cs.AI,stat.ML
Credit Card Fraud Detection Using Advanced Transformer Model
With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of data sparsity significantly. We also selected highly correlated vectors to strengthen the training process.To guarantee the reliability and practicality of the new Transformer model, we conducted performance comparisons with several widely adopted models, including Support Vector Machine (SVM), Random Forest, Neural Network, and Logistic Regression. We rigorously compared these models using metrics such as Precision, Recall, and F1 Score. Through these detailed analyses and comparisons, we present to the readers a highly efficient and powerful anti-fraud mechanism with promising prospects. The results demonstrate that the Transformer model not only excels in traditional applications but also shows great potential in niche areas like fraud detection, offering a substantial advancement in the field.
Updated: 2024-07-26 05:26:40
标题: 使用高级Transformer模型进行信用卡欺诈检测
摘要: 随着各种在线和移动支付系统的蓬勃发展,信用卡欺诈已成为金融安全的重要威胁。本研究专注于最新Transformer模型的创新应用,以实现更强大和精确的欺诈检测。为确保数据的可靠性,我们精心处理了数据来源,平衡数据集以解决数据稀疏性问题。我们还选择了高度相关的向量来加强训练过程。为了确保新Transformer模型的可靠性和实用性,我们与几种广泛采用的模型进行了性能比较,包括支持向量机(SVM)、随机森林、神经网络和逻辑回归。我们严格比较了这些模型,使用Precision、Recall和F1 Score等指标。通过这些详细的分析和比较,我们向读者展示了一种高效和强大的反欺诈机制,展现了良好的前景。结果表明,Transformer模型不仅在传统应用中表现出色,而且在欺诈检测等小众领域显示出巨大潜力,为该领域的发展提供了实质性进展。
更新时间: 2024-07-26 05:26:40
领域: cs.LG,cs.AI
WorkR: Occupation Inference for Intelligent Task Assistance
Occupation information can be utilized by digital assistants to provide occupation-specific personalized task support, including interruption management, task planning, and recommendations. Prior research in the digital workplace assistant domain requires users to input their occupation information for effective support. However, as many individuals switch between multiple occupations daily, current solutions falter without continuous user input. To address this, this study introduces WorkR, a framework that leverages passive sensing to capture pervasive signals from various task activities, addressing three challenges: the lack of a passive sensing architecture, personalization of occupation characteristics, and discovering latent relationships among occupation variables. We argue that signals from application usage, movements, social interactions, and the environment can inform a user's occupation. WorkR uses a Variational Autoencoder (VAE) to derive latent features for training models to infer occupations. Our experiments with an anonymized, context-rich activity and task log dataset demonstrate that our models can accurately infer occupations with more than 91% accuracy across six ISO occupation categories.
Updated: 2024-07-26 05:23:55
标题: WorkR: 智能任务辅助的职业推断
摘要: 职业信息可以被数字助理利用来提供针对特定职业的个性化任务支持,包括打断管理、任务规划和建议。先前在数字助理领域的研究需要用户输入他们的职业信息以获得有效支持。然而,由于许多人每天在多个职业之间切换,当前的解决方案在没有持续用户输入的情况下会失败。为了解决这个问题,本研究介绍了WorkR,一个利用被动感知来捕捉来自各种任务活动的普遍信号的框架,解决了三个挑战:缺乏被动感知架构、职业特征的个性化以及发现职业变量之间的潜在关系。我们认为应用使用、动作、社交互动和环境中的信号可以告知用户的职业。WorkR利用变分自动编码器(VAE)推导潜在特征来训练模型推断职业。我们对一个匿名的、上下文丰富的活动和任务日志数据集进行的实验表明,我们的模型可以在六个ISO职业类别中以超过91%的准确率准确推断出职业。
更新时间: 2024-07-26 05:23:55
领域: cs.LG
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized from generative AI models. Existing ADD models suffer from generalization issues, with a large performance discrepancy between in-domain and out-of-domain data. Moreover, the black-box nature of existing models limits their use in real-world scenarios, where explanations are required for model decisions. To alleviate these issues, we introduce a new ADD model that explicitly uses the StyleLInguistics Mismatch (SLIM) in fake speech to separate them from real speech. SLIM first employs self-supervised pretraining on only real samples to learn the style-linguistics dependency in the real class. The learned features are then used in complement with standard pretrained acoustic features (e.g., Wav2vec) to learn a classifier on the real and fake classes. When the feature encoders are frozen, SLIM outperforms benchmark methods on out-of-domain datasets while achieving competitive results on in-domain data. The features learned by SLIM allow us to quantify the (mis)match between style and linguistic content in a sample, hence facilitating an explanation of the model decision.
Updated: 2024-07-26 05:23:41
标题: SLIM:用于广义音频深度伪造检测的风格-语言不匹配模型
摘要: 音频深度伪造检测(ADD)对于打击使用生成式AI模型合成的语音的滥用至关重要。现有的ADD模型存在泛化问题,在域内和域外数据之间存在较大的性能差距。此外,现有模型的黑匣子特性限制了它们在需要对模型决策进行解释的实际场景中的使用。为了缓解这些问题,我们引入了一种新的ADD模型,该模型明确使用StyleLInguistics Mismatch(SLIM)在伪造语音中将其与真实语音分开。SLIM首先仅对真实样本进行自监督预训练,以学习真实类别中的风格-语言学依赖关系。然后,学习到的特征与标准预训练的声学特征(例如,Wav2vec)结合使用,以学习真实和伪造类别的分类器。当特征编码器被冻结时,SLIM在域外数据集上优于基准方法,同时在域内数据上取得竞争性结果。SLIM学习到的特征使我们能够量化样本中风格和语言内容之间的(不)匹配,从而便于解释模型决策。
更新时间: 2024-07-26 05:23:41
领域: cs.SD,cs.AI,eess.AS
Advanced Payment Security System:XGBoost, LightGBM and SMOTE Integrated
With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically based on XGBoost and LightGBM, for developing a more accurate and robust Payment Security Protection Model. To enhance data reliability, we meticulously processed the data sources and applied SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance and improve data representation. By selecting highly correlated features, we aimed to strengthen the training process and boost model performance. We conducted thorough performance evaluations of our proposed models, comparing them against traditional methods including Random Forest, Neural Network, and Logistic Regression. Using metrics such as Precision, Recall, and F1 Score, we rigorously assessed their effectiveness. Our detailed analyses and comparisons reveal that the combination of SMOTE with XGBoost and LightGBM offers a highly efficient and powerful mechanism for payment security protection. Moreover, the integration of XGBoost and LightGBM in a Local Ensemble model further demonstrated outstanding performance. After incorporating SMOTE, the new combined model achieved a significant improvement of nearly 6\% over traditional models and around 5\% over its sub-models, showcasing remarkable results.
Updated: 2024-07-26 05:07:22
标题: 先进的支付安全系统:XGBoost、LightGBM和SMOTE集成
摘要: 随着各种在线和移动支付系统的兴起,交易欺诈已成为金融安全的重要威胁。本研究探讨了基于XGBoost和LightGBM等先进机器学习模型的应用,以开发更准确和稳健的支付安全保护模型。为了增强数据可靠性,我们精心处理了数据来源,并应用了SMOTE(合成少数样本过采样技术)来解决类别不平衡问题并提高数据表征。通过选择高度相关的特征,我们旨在加强训练过程并提高模型性能。我们对我们提出的模型进行了彻底的性能评估,将它们与传统方法(包括随机森林、神经网络和逻辑回归)进行了比较。通过Precision、Recall和F1 Score等指标,我们严格评估了它们的有效性。我们的详细分析和比较显示,SMOTE与XGBoost和LightGBM的结合为支付安全保护提供了高效和强大的机制。此外,将XGBoost和LightGBM集成到本地集成模型中进一步展示了出色的性能。在整合SMOTE后,新的组合模型在传统模型上取得了近6\%的显著改进,比其子模型提高了约5\%,展示了卓越的结果。
更新时间: 2024-07-26 05:07:22
领域: cs.CR,cs.AI,cs.LG
YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention
Accurate prediction of drug molecule solubility is crucial for therapeutic effectiveness and safety. Traditional methods often miss complex molecular structures, leading to inaccuracies. We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks to enhance prediction precision. GCNs excel at capturing intricate molecular topologies by modeling the relationships between atoms and bonds. Transformers, with their self-attention mechanisms, effectively identify long-range dependencies within molecules, capturing global interactions. LSTMs process sequential data, preserving long-term dependencies and integrating temporal information within molecular sequences. This multifaceted approach leverages the strengths of each component, resulting in a model that comprehensively understands and predicts molecular properties. Trained on 9,943 compounds and tested on an anticancer dataset, the YZS-Model achieved an $R^2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models ($R^2$ of 0.52 and RMSE of 0.61). In an independent test, it demonstrated an RMSE of 1.05, improving accuracy by 45.9%. The integration of these deep learning techniques allows the YZS-Model to learn valuable features from complex data without predefined parameters, handle large datasets efficiently, and adapt to various molecular types. This comprehensive capability significantly improves predictive accuracy and model generalizability. Its precision in solubility predictions can expedite drug development by optimizing candidate selection, reducing costs, and enhancing efficiency. Our research underscores deep learning's transformative potential in pharmaceutical science, particularly for solubility prediction and drug design.
Updated: 2024-07-26 04:47:15
标题: YZS模型:基于图卷积网络和Transformer-Attention的有机药物溶解度预测模型
摘要: 准确预测药物分子的溶解度对于治疗的有效性和安全性至关重要。传统方法往往会忽略复杂的分子结构,导致不准确性。我们引入了YZS-Model,这是一个深度学习框架,集成了图卷积网络(GCN)、Transformer架构和长短期记忆(LSTM)网络,以提高预测精度。GCNs擅长捕捉复杂的分子拓扑结构,通过建模原子和键之间的关系。Transformers通过其自注意力机制有效地识别分子内的远程依赖关系,捕获全局交互作用。LSTMs处理顺序数据,保留长期依赖关系,并在分子序列内集成时间信息。这种多方面的方法利用了每个组件的优势,导致一个全面理解和预测分子性质的模型。在对9,943种化合物进行训练并在一个抗癌数据集上进行测试后,YZS-Model实现了0.59的$R^2和0.57的RMSE,优于基准模型(0.52的$R^2和0.61的RMSE)。在独立测试中,它表现出1.05的RMSE,精度提高了45.9%。这些深度学习技术的整合使YZS-Model能够从复杂数据中学习有价值的特征,无需预定义参数,高效处理大型数据集,并适应各种分子类型。这种全面的能力显著提高了预测精度和模型的泛化能力。其在溶解度预测方面的精度可以通过优化候选选择、降低成本和提高效率来加快药物开发。我们的研究强调了深度学习在制药科学中的变革潜力,特别是在溶解度预测和药物设计方面。
更新时间: 2024-07-26 04:47:15
领域: cs.LG,cs.AI
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks
The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model's tendency to hallucinate and catastrophic forgetting (CF) in the pre-training stage, rendering the veracity of divulged PIIs negligible. In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs. We formalize the privacy leakage problem in LLMs and explain why forgotten PIIs can be recovered through empirical analysis on open-source language models. Based upon these insights, we evaluate the performance of Janus on both open-source language models and two latest LLMs, i.e., GPT-3.5-Turbo and LLaMA-2-7b. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline and significantly outperforms the state-of-the-art privacy extraction attacks including prefix attacks and in-context learning (ICL). Furthermore, our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack, allowing an adversary to conduct such an attack at a low cost.
Updated: 2024-07-26 04:43:19
标题: “雅努斯界面:大型语言模型中的微调如何放大隐私风险”
摘要: 大型语言模型(LLMs)的快速发展引起了公众对其广泛训练数据集中个人可识别信息(PII)隐私泄露的担忧。最近的研究表明,通过精心设计的提示,对手方可以从LLMs的训练数据中提取高度敏感的隐私数据。然而,这些攻击在预训练阶段受到模型幻觉和灾难性遗忘(CF)的影响,使得披露的PIIs的真实性可以忽略不计。在我们的研究中,我们提出了一种新型攻击,Janus,利用微调接口从LLMs的预训练数据中恢复遗忘的PIIs。我们在LLMs中形式化了隐私泄露问题,并通过对开源语言模型的实证分析解释了为什么遗忘的PIIs可以被恢复。基于这些见解,我们评估了Janus在开源语言模型和两个最新的LLMs,即GPT-3.5-Turbo和LLaMA-2-7b上的性能。我们的实验结果显示,与基线相比,Janus将隐私风险提高了10倍以上,并且明显优于包括前缀攻击和上下文学习(ICL)在内的最先进的隐私提取攻击。此外,我们的分析验证了OpenAI和Azure AI Studio提供的现有微调API易受我们的Janus攻击的影响,允许对手以较低成本进行此类攻击。
更新时间: 2024-07-26 04:43:19
领域: cs.CR,cs.CL
Homomorphic Encryption-Enabled Federated Learning for Privacy-Preserving Intrusion Detection in Resource-Constrained IoV Networks
This paper aims to propose a novel framework to address the data privacy issue for Federated Learning (FL)-based Intrusion Detection Systems (IDSs) in Internet-of-Vehicles(IoVs) with limited computational resources. In particular, in conventional FL systems, it is usually assumed that the computing nodes have sufficient computational resources to process the training tasks. However, in practical IoV systems, vehicles usually have limited computational resources to process intensive training tasks, compromising the effectiveness of deploying FL in IDSs. While offloading data from vehicles to the cloud can mitigate this issue, it introduces significant privacy concerns for vehicle users (VUs). To resolve this issue, we first propose a highly-effective framework using homomorphic encryption to secure data that requires offloading to a centralized server for processing. Furthermore, we develop an effective training algorithm tailored to handle the challenges of FL-based systems with encrypted data. This algorithm allows the centralized server to directly compute on quantum-secure encrypted ciphertexts without needing decryption. This approach not only safeguards data privacy during the offloading process from VUs to the centralized server but also enhances the efficiency of utilizing FL for IDSs in IoV systems. Our simulation results show that our proposed approach can achieve a performance that is as close to that of the solution without encryption, with a gap of less than 0.8%.
Updated: 2024-07-26 04:19:37
标题: 同态加密技术支持的资源受限IoV网络隐私保护入侵检测的联邦学习
摘要: 本文旨在提出一个新颖的框架来解决基于联邦学习(FL)的入侵检测系统(IDSs)在具有有限计算资源的车联网(IoVs)中的数据隐私问题。特别是,在传统的FL系统中,通常假定计算节点有足够的计算资源来处理训练任务。然而,在实际的IoV系统中,车辆通常具有有限的计算资源来处理密集的训练任务,从而影响了在IDSs中部署FL的有效性。尽管将数据从车辆转移到云端可以缓解这个问题,但它引入了对车辆用户(VUs)的重大隐私顾虑。为了解决这个问题,我们首先提出了一个高效的框架,使用同态加密来保护需要转移到集中服务器进行处理的数据。此外,我们开发了一种有效的训练算法,专门用于处理具有加密数据的基于FL系统的挑战。该算法允许集中服务器直接在量子安全加密密文上进行计算,无需解密。这种方法不仅在从VUs到集中服务器的转移过程中保护数据隐私,还增强了在IoV系统中利用FL进行IDSs的效率。我们的模拟结果表明,我们提出的方法可以实现接近无加密解决方案的性能,差距不到0.8%。
更新时间: 2024-07-26 04:19:37
领域: cs.CR
The formation of perceptual space in early phonetic acquisition: a cross-linguistic modeling approach
This study investigates how learners organize perceptual space in early phonetic acquisition by advancing previous studies in two key aspects. Firstly, it examines the shape of the learned hidden representation as well as its ability to categorize phonetic categories. Secondly, it explores the impact of training models on context-free acoustic information, without involving contextual cues, on phonetic acquisition, closely mimicking the early language learning stage. Using a cross-linguistic modeling approach, autoencoder models are trained on English and Mandarin and evaluated in both native and non-native conditions, following experimental conditions used in infant language perception studies. The results demonstrate that unsupervised bottom-up training on context-free acoustic information leads to comparable learned representations of perceptual space between native and non-native conditions for both English and Mandarin, resembling the early stage of universal listening in infants. These findings provide insights into the organization of perceptual space during early phonetic acquisition and contribute to our understanding of the formation and representation of phonetic categories.
Updated: 2024-07-26 04:18:36
标题: 早期语音习得中感知空间的形成:一种跨语言建模方法
摘要: 这项研究通过在两个关键方面推进先前研究,探讨了学习者在早期语音习得中如何组织知觉空间。首先,它考察了学习到的隐藏表示的形状以及其对语音类别进行分类的能力。其次,它探讨了在没有涉及上下文线索的情况下,训练模型对无上下文声学信息对语音习得的影响,紧密模仿了早期语言习得阶段。采用跨语言建模方法,自编码器模型在英语和普通话上进行训练,并在本族和非本族条件下进行评估,遵循婴儿语言知觉研究中使用的实验条件。结果表明,对无上下文声学信息进行无监督的自下而上训练导致了英语和普通话的本族和非本族条件下对知觉空间的学习表示具有可比性,类似于婴儿普遍听力的早期阶段。这些发现深入探讨了早期语音习得中知觉空间的组织,并有助于我们理解语音类别的形成和表征。
更新时间: 2024-07-26 04:18:36
领域: cs.CL,cs.LG,cs.SD,eess.AS,I.2.7
Non-Overlapping Placement of Macro Cells based on Reinforcement Learning in Chip Design
Due to the increasing complexity of chip design, existing placement methods still have many shortcomings in dealing with macro cells coverage and optimization efficiency. Aiming at the problems of layout overlap, inferior performance, and low optimization efficiency in existing chip design methods, this paper proposes an end-to-end placement method, SRLPlacer, based on reinforcement learning. First, the placement problem is transformed into a Markov decision process by establishing the coupling relationship graph model between macro cells to learn the strategy for optimizing layouts. Secondly, the whole placement process is optimized after integrating the standard cell layout. By assessing on the public benchmark ISPD2005, the proposed SRLPlacer can effectively solve the overlap problem between macro cells while considering routing congestion and shortening the total wire length to ensure routability.
Updated: 2024-07-26 04:15:54
标题: 芯片设计中基于强化学习的宏单元非重叠放置
摘要: 由于芯片设计的复杂性不断增加,现有的布局方法在处理宏单元覆盖和优化效率方面仍然存在许多缺点。针对现有芯片设计方法中布局重叠、性能较差和优化效率低的问题,本文提出了一种基于强化学习的端到端布局方法SRLPlacer。首先,通过建立宏单元之间的耦合关系图模型将布局问题转化为马尔可夫决策过程,以学习优化布局的策略。其次,在整合标准单元布局后优化整个布局过程。通过对公共基准ISPD2005的评估,提出的SRLPlacer可以有效解决宏单元之间的重叠问题,同时考虑路由拥塞并缩短总线长度以确保可路由性。
更新时间: 2024-07-26 04:15:54
领域: cs.AR,cs.AI
A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP
The development of large language models (LLMs), such as GPT, has enabled the construction of several socialbots, like ChatGPT, that are receiving a lot of attention for their ability to simulate a human conversation. However, the conversation is not guided by a goal and is hard to control. In addition, because LLMs rely more on pattern recognition than deductive reasoning, they can give confusing answers and have difficulty integrating multiple topics into a cohesive response. These limitations often lead the LLM to deviate from the main topic to keep the conversation interesting. We propose AutoCompanion, a socialbot that uses an LLM model to translate natural language into predicates (and vice versa) and employs commonsense reasoning based on Answer Set Programming (ASP) to hold a social conversation with a human. In particular, we rely on s(CASP), a goal-directed implementation of ASP as the backend. This paper presents the framework design and how an LLM is used to parse user messages and generate a response from the s(CASP) engine output. To validate our proposal, we describe (real) conversations in which the chatbot's goal is to keep the user entertained by talking about movies and books, and s(CASP) ensures (i) correctness of answers, (ii) coherence (and precision) during the conversation, which it dynamically regulates to achieve its specific purpose, and (iii) no deviation from the main topic.
Updated: 2024-07-26 04:13:43
标题: 使用LLM和目标导向ASP构建的可靠常识推理社交机器人
摘要: 大型语言模型(LLM)的发展,如GPT,已经实现了几个社交机器人的构建,比如ChatGPT,因其模拟人类对话的能力而受到广泛关注。然而,这种对话并没有目标,并且很难控制。此外,由于LLMs更多地依赖于模式识别而非演绎推理,它们可能给出令人困惑的答案,并难以将多个主题整合成连贯的回复。这些限制通常会导致LLM偏离主题以保持对话的趣味性。我们提出了一个名为AutoCompanion的社交机器人,它使用LLM模型将自然语言转换为谓词(反之亦然),并利用基于答案集编程(ASP)的常识推理与人类进行社交对话。具体而言,我们依赖于s(CASP),作为ASP的目标导向实现作为后端。本文介绍了框架设计以及如何使用LLM解析用户消息并从s(CASP)引擎输出生成响应。为了验证我们的提议,我们描述了(真实的)对话,其中聊天机器人的目标是通过谈论电影和书籍来使用户感到娱乐,而s(CASP)确保了在对话过程中的(i)答案的正确性,(ii)连贯性(和精确性),它动态调节以实现特定目的,以及(iii)不偏离主题。
更新时间: 2024-07-26 04:13:43
领域: cs.CL,cs.AI,cs.LO
AutoRE: Document-Level Relation Extraction with Large Language Models
Large Language Models (LLMs) have demonstrated exceptional abilities in comprehending and generating text, motivating numerous researchers to utilize them for Information Extraction (IE) purposes, including Relation Extraction (RE). Nonetheless, most existing methods are predominantly designed for Sentence-level Relation Extraction (SentRE) tasks, which typically encompass a restricted set of relations and triplet facts within a single sentence. Furthermore, certain approaches resort to treating relations as candidate choices integrated into prompt templates, leading to inefficient processing and suboptimal performance when tackling Document-Level Relation Extraction (DocRE) tasks, which entail handling multiple relations and triplet facts distributed across a given document, posing distinct challenges. To overcome these limitations, we introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts). Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios. Additionally, we have developed an easily extensible RE framework using a Parameters Efficient Fine Tuning (PEFT) algorithm (QLoRA). Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results, surpassing TAG by 10.03\% and 9.03\% respectively on the dev and test set. The code is available at https://github.com/THUDM/AutoRE and the demonstration video is provided at https://www.youtube.com/watch?v=IhKRsZUAxKk.
Updated: 2024-07-26 04:12:16
标题: AutoRE:使用大型语言模型进行文档级关系抽取
摘要: 大型语言模型(LLMs)展示了在理解和生成文本方面的异常能力,激励了许多研究人员利用它们进行信息提取(IE)目的,包括关系提取(RE)。然而,大多数现有方法主要设计用于句子级关系提取(SentRE)任务,这些任务通常涵盖了单个句子中的一组有限的关系和三元组事实。此外,某些方法采用将关系视为集成到提示模板中的候选选择,导致在处理文档级关系提取(DocRE)任务时处理效率低下,性能亚优,这些任务涉及在给定文档中分布的多个关系和三元组事实,带来了明显的挑战。为了克服这些限制,我们引入了AutoRE,一个采用一种名为RHF(Relation-Head-Facts)的新型RE提取范式的端到端DocRE模型。与现有方法不同,AutoRE不依赖于已知关系选项的假设,使其更具现实反映性。此外,我们使用参数高效微调(PEFT)算法(QLoRA)开发了一个易于扩展的RE框架。我们在RE-DocRED数据集上的实验展示了AutoRE的最佳性能,取得了最新的结果,分别在dev集和test集上超过了TAG 10.03%和9.03%。代码可在https://github.com/THUDM/AutoRE获得,并提供了演示视频https://www.youtube.com/watch?v=IhKRsZUAxKk。
更新时间: 2024-07-26 04:12:16
领域: cs.CL,cs.AI
SoftMAC: Differentiable Soft Body Simulation with Forecast-based Contact Model and Two-way Coupling with Articulated Rigid Bodies and Clothes
Differentiable physics simulation provides an avenue to tackle previously intractable challenges through gradient-based optimization, thereby greatly improving the efficiency of solving robotics-related problems. To apply differentiable simulation in diverse robotic manipulation scenarios, a key challenge is to integrate various materials in a unified framework. We present SoftMAC, a differentiable simulation framework that couples soft bodies with articulated rigid bodies and clothes. SoftMAC simulates soft bodies with the continuum-mechanics-based Material Point Method (MPM). We provide a novel forecast-based contact model for MPM, which effectively reduces penetration without introducing other artifacts like unnatural rebound. To couple MPM particles with deformable and non-volumetric clothes meshes, we also propose a penetration tracing algorithm that reconstructs the signed distance field in local area. Diverging from previous works, SoftMAC simulates the complete dynamics of each modality and incorporates them into a cohesive system with an explicit and differentiable coupling mechanism. The feature empowers SoftMAC to handle a broader spectrum of interactions, such as soft bodies serving as manipulators and engaging with underactuated systems. We conducted comprehensive experiments to validate the effectiveness and accuracy of the proposed differentiable pipeline in downstream robotic manipulation applications. Supplementary materials and videos are available on our project website at https://damianliumin.github.io/SoftMAC.
Updated: 2024-07-26 04:02:13
标题: SoftMAC:基于预测接触模型的可微软体模拟及与关节刚体和服装的双向耦合
摘要: 可微物理模拟为通过基于梯度的优化解决以前难以解决的挑战提供了一条途径,从而极大地提高了解决与机器人相关的问题的效率。为了将可微模拟应用于不同的机器人操作场景,一个关键挑战是将各种材料集成到统一框架中。我们提出了SoftMAC,一个将软体与关节刚体和服装耦合的可微模拟框架。SoftMAC使用基于连续力学的材料点方法(MPM)模拟软体。我们提供了一种基于预测的MPM接触模型,有效地减少了穿透而不引入其他人为反弹等不自然现象。为了将MPM粒子与可变形和非体积网格的服装耦合,我们还提出了一种穿透追踪算法,重建局部区域的有符号距离场。与以往作品不同,SoftMAC模拟每种模态的完整动态,并将它们纳入具有显式和可微耦合机制的连贯系统中。这个特性使SoftMAC能够处理更广泛的相互作用,例如软体作为操作器并与欠驱动系统互动。我们进行了全面的实验,验证了所提出的可微管道在下游机器人操作应用中的有效性和准确性。我们的项目网站https://damianliumin.github.io/SoftMAC上提供了补充材料和视频。
更新时间: 2024-07-26 04:02:13
领域: cs.RO,cs.AI,cs.GR
Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies
Based on the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification, we predict the level of empathic concern and personal distress displayed in essays. For the first stage of this project we implemented a Feed-Forward Neural Network using sentence-level embeddings as features. We experimented with four different embedding models for generating the inputs to the neural network. The subsequent stage builds upon the previous work and we have implemented three types of revisions. The first revision focuses on the enhancements to the model architecture and the training approach. The second revision focuses on handling class imbalance using stratified data sampling. The third revision focuses on leveraging lexical resources, where we apply four different resources to enrich the features associated with the dataset. During the final stage of this project, we have created the final end-to-end system for the primary task using an ensemble of models to revise primary task performance. Additionally, as part of the final stage, these approaches have been adapted to the WASSA 2023 Shared Task on Empathy Emotion and Personality Detection in Interactions, in which the empathic concern, emotion polarity, and emotion intensity in dyadic text conversations are predicted.
Updated: 2024-07-26 04:01:27
标题: 朝着更准确预测人类同理心和情感在文本和多轮对话中的方法:结合先进的自然语言处理、基于Transformer的网络和语言学方法论
摘要: 基于WASSA 2022关于共情检测和情感分类的共享任务,我们预测了文章中展示的共情关注和个人困扰的水平。在项目的第一阶段中,我们使用句子级嵌入作为特征实现了一个前馈神经网络。我们尝试了四种不同的嵌入模型来生成神经网络的输入。随后的阶段建立在之前的工作基础上,我们实施了三种类型的修订。第一次修订集中在模型架构和训练方法的改进上。第二次修订着重处理类别不平衡,使用分层数据抽样。第三次修订着重利用词汇资源,我们应用了四种不同的资源来丰富与数据集相关的特征。在项目的最后阶段,我们创建了最终的端到端系统,使用模型集成来修改主要任务的性能。此外,在最后阶段,这些方法已经被调整用于WASSA 2023关于交互中共情情感和个性检测的共享任务,预测了双人文本对话中的共情关注、情感极性和情感强度。
更新时间: 2024-07-26 04:01:27
领域: cs.CL,cs.LG
A Correlation-induced Finite Difference Estimator
Estimating stochastic gradients is pivotal in fields like service systems within operations research. The classical method for this estimation is the finite difference approximation, which entails generating samples at perturbed inputs. Nonetheless, practical challenges persist in determining the perturbation and obtaining an optimal finite difference estimator in the sense of possessing the smallest mean squared error (MSE). To tackle this problem, we propose a double sample-recycling approach in this paper. Firstly, pilot samples are recycled to estimate the optimal perturbation. Secondly, recycling these pilot samples again and generating new samples at the estimated perturbation, lead to an efficient finite difference estimator. We analyze its bias, variance and MSE. Our analyses demonstrate a reduction in asymptotic variance, and in some cases, a decrease in asymptotic bias, compared to the optimal finite difference estimator. Therefore, our proposed estimator consistently coincides with, or even outperforms the optimal finite difference estimator. In numerical experiments, we apply the estimator in several examples, and numerical results demonstrate its robustness, as well as coincidence with the theory presented, especially in the case of small sample sizes.
Updated: 2024-07-26 03:59:24
标题: 一个相关性引起的有限差分估计器
摘要: 在运营研究领域,估计随机梯度是至关重要的,尤其是在服务系统等领域。传统的估计方法是有限差分逼近,这涉及在扰动输入处生成样本。然而,在确定扰动并获得具有最小均方误差(MSE)的最优有限差分估计器方面仍然存在实际挑战。为了解决这个问题,本文提出了一种双样本循环方法。首先,利用样本循环来估计最佳扰动。其次,再次循环这些试验样本并在估计的扰动处生成新样本,可以得到一个高效的有限差分估计器。我们分析了其偏差、方差和均方误差。我们的分析表明,与最佳有限差分估计器相比,渐近方差减少,在某些情况下,渐近偏差减少。因此,我们提出的估计器始终与最佳有限差分估计器一致,甚至优于最佳有限差分估计器。在数值实验中,我们将该估计器应用于几个示例中,数值结果表明其稳健性,并与所提出的理论一致,特别是在样本量较小的情况下。
更新时间: 2024-07-26 03:59:24
领域: stat.ME,cs.LG,cs.NA,math.NA,math.OC
Conversational Dueling Bandits in Generalized Linear Models
Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce relative feedback-based conversations into conversational recommendation systems through the integration of dueling bandits in generalized linear models (GLM) and propose a novel conversational dueling bandit algorithm called ConDuel. Theoretical analyses of regret upper bounds and empirical validations on synthetic and real-world data underscore ConDuel's efficacy. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees, which further proves the applicability of the proposed framework.
Updated: 2024-07-26 03:43:10
标题: 在广义线性模型中的对话式对抗臂带
摘要: 对话式推荐系统通过与用户互动来获取他们对推荐商品的反馈,从而引出用户偏好。这种系统利用多臂老虎机框架以在线方式学习用户偏好,并在近年取得了巨大成功。然而,现有的对话式老虎机方法存在一些局限性。首先,它们只允许用户对推荐的物品或类别提供明确的二进制反馈,导致解释上的模糊。在实践中,用户通常面临不止一个选择。相对反馈以其信息量而闻名,在推荐系统设计中越来越受欢迎。此外,当前的上下文老虎机方法主要基于线性奖励假设,忽略了在广义线性模型中的实际非线性奖励结构。因此,在本文中,我们通过将双侧老虎机集成到广义线性模型(GLM)中,引入基于相对反馈的对话式推荐系统,并提出了一种名为ConDuel的新型对话式双侧老虎机算法。对合成和真实数据的理论分析和实证验证强调了ConDuel的效力。我们还展示了将我们的算法扩展到多项Logit老虎机,并提供理论和实验保证的潜力,进一步证明了所提出框架的适用性。
更新时间: 2024-07-26 03:43:10
领域: cs.LG,cs.IT,math.IT,stat.ML
Enhancing Peak Assignment in 13C NMR Spectroscopy: A Novel Approach Using Multimodal Alignment
Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces a novel solution, Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K-M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores K-M3AID's effectiveness in multiple zero-shot tasks.
Updated: 2024-07-26 03:39:50
标题: 提升13C NMR谱峰归属:使用多模态对准的新方法
摘要: 核磁共振(NMR)光谱技术在解析分子结构和动态行为中起着至关重要的作用。虽然AI增强的NMR预测模型有望取得成功,但在分子检索、同分异构体识别和峰值分配等任务中仍然存在挑战。为此,本文引入了一种新颖解决方案,即基于知识引导的实例级差异的多级多模态对齐(K-M3AID),该解决方案在两种异质模态之间建立对应关系:分子图和NMR光谱。K-M3AID采用了双协调对比学习架构,包括三个关键模块:图级对齐模块、节点级对齐模块和通信通道。值得注意的是,K-M3AID在节点级对齐模块内引入了基于知识引导的实例级差异对比学习。此外,K-M3AID表明,在节点级对齐中获得的技能对图级对齐有积极影响,承认元学习作为一种固有属性。实证验证强调了K-M3AID在多个零样本任务中的有效性。
更新时间: 2024-07-26 03:39:50
领域: cs.LG,physics.chem-ph,q-bio.QM
Synthetic Data, Similarity-based Privacy Metrics, and Regulatory (Non-)Compliance
In this paper, we argue that similarity-based privacy metrics cannot ensure regulatory compliance of synthetic data. Our analysis and counter-examples show that they do not protect against singling out and linkability and, among other fundamental issues, completely ignore the motivated intruder test.
Updated: 2024-07-26 03:30:05
标题: 合成数据,基于相似性的隐私度量以及监管(不)合规性
摘要: 在本文中,我们认为基于相似性的隐私度量无法确保合成数据的法规合规性。我们的分析和反例表明,它们不能防止单独识别和可链接性,而且在其他基本问题中完全忽略了有动机的入侵者测试。
更新时间: 2024-07-26 03:30:05
领域: cs.CR,cs.AI,cs.CY
Vision language models are blind
While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, are powering various image-text applications and scoring high on many vision-understanding benchmarks, we find that they are surprisingly still struggling with low-level vision tasks that are easy to humans. Specifically, on BlindTest, our suite of 7 very simple tasks such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting circles in an Olympic-like logo, four state-of-the-art VLMs are only 58.57% accurate on average. Claude 3.5 Sonnet performs the best at 74.94% accuracy, but this is still far from the human expected accuracy of 100%. Across different image resolutions and line widths, VLMs consistently struggle with tasks that require precise spatial information and recognizing geometric primitives that overlap or are close together. Code and data are available at: https://vlmsareblind.github.io
Updated: 2024-07-26 03:27:58
标题: 视觉语言模型是盲目的
摘要: 尽管具有视觉能力的大型语言模型(VLMs),例如GPT-4o和Gemini 1.5 Pro,正在推动各种图像文本应用并在许多视觉理解基准测试中获得高分,但我们发现它们仍然令人惊讶地在人类容易完成的低级视觉任务上遇到困难。具体而言,在BlindTest上,我们的一套7个非常简单的任务,例如识别(a)两个圆是否重叠;(b)两条线是否相交;(c)一个单词中哪个字母被圈出;以及(d)在类似奥林匹克标志的徽标中计数圆圈,四种最先进的VLMs平均准确率仅为58.57%。Claude 3.5 Sonnet的表现最佳,准确率为74.94%,但仍远低于人类期望的100%准确率。在不同的图像分辨率和线宽下,VLMs在需要精确空间信息和识别重叠或接近的几何基元的任务中表现一贯艰难。代码和数据可在以下链接获取:https://vlmsareblind.github.io
更新时间: 2024-07-26 03:27:58
领域: cs.AI,cs.CV
A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation
Ophthalmology consultations are crucial for diagnosing, treating, and preventing eye diseases. However, the growing demand for consultations exceeds the availability of ophthalmologists. By leveraging large pre-trained language models, we can design effective dialogues for specific scenarios, aiding in consultations. Traditional fine-tuning strategies for question-answering tasks are impractical due to increasing model size and often ignoring patient-doctor role function during consultations. In this paper, we propose EyeDoctor, an ophthalmic medical questioning large language model that enhances accuracy through doctor-patient role perception guided and an augmented knowledge base with external disease information. Experimental results show EyeDoctor achieves higher question-answering precision in ophthalmology consultations. Notably, EyeDoctor demonstrated a 7.25% improvement in Rouge-1 scores and a 10.16% improvement in F1 scores on multi-round datasets compared to second best model ChatGPT, highlighting the importance of doctor-patient role differentiation and dynamic knowledge base expansion for intelligent medical consultations. EyeDoc also serves as a free available web based service and souce code is available at https://github.com/sperfu/EyeDoc.
Updated: 2024-07-26 03:23:31
标题: 基于风格差异的眼科会诊专用引导大型语言模型
摘要: 眼科咨询对于诊断、治疗和预防眼部疾病至关重要。然而,对眼科医生的需求不断增长,超过了眼科医生的供给量。通过利用大型预训练语言模型,我们可以为特定情景设计有效的对话,帮助咨询。传统的针对问答任务的微调策略由于模型规模不断增大,通常忽视咨询过程中的患者-医生角色功能而变得不切实际。本文提出了EyeDoctor,一个眼科医疗问答大型语言模型,通过医生-患者角色感知和增强的知识库与外部疾病信息提高了准确性。实验结果表明,EyeDoctor在眼科咨询中达到了更高的问答准确性。值得注意的是,与次佳模型ChatGPT相比,EyeDoctor在多轮数据集上的Rouge-1分数提高了7.25%,F1分数提高了10.16%,突出了医生-患者角色区分和动态知识库扩展对于智能医疗咨询的重要性。EyeDoc还作为一个免费的基于网络的服务,并且源代码可在https://github.com/sperfu/EyeDoc上找到。
更新时间: 2024-07-26 03:23:31
领域: cs.CL,cs.AI
Practical Attribution Guidance for Rashomon Sets
Different prediction models might perform equally well (Rashomon set) in the same task, but offer conflicting interpretations and conclusions about the data. The Rashomon effect in the context of Explainable AI (XAI) has been recognized as a critical factor. Although the Rashomon set has been introduced and studied in various contexts, its practical application is at its infancy stage and lacks adequate guidance and evaluation. We study the problem of the Rashomon set sampling from a practical viewpoint and identify two fundamental axioms - generalizability and implementation sparsity that exploring methods ought to satisfy in practical usage. These two axioms are not satisfied by most known attribution methods, which we consider to be a fundamental weakness. We use the norms to guide the design of an $\epsilon$-subgradient-based sampling method. We apply this method to a fundamental mathematical problem as a proof of concept and to a set of practical datasets to demonstrate its ability compared with existing sampling methods.
Updated: 2024-07-26 03:17:41
标题: 《拉贡献集合的实用归因指南》
摘要: 不同的预测模型在同一任务中可能表现同样出色(拉肖蒙集),但对数据提供冲突的解释和结论。在可解释人工智能(XAI)的背景下,拉肖蒙效应被认为是一个关键因素。尽管拉肖蒙集已经在各种背景下被引入和研究,但其实际应用还处于初级阶段,缺乏充分的指导和评估。我们从实际角度研究了拉肖蒙集采样的问题,并确定了两个基本公理 - 泛化性和实现稀疏性,探索方法应在实际使用中满足。这两个公理大多数已知的归因方法并不满足,我们认为这是一个基本的弱点。我们使用规范来指导设计基于$\epsilon$-次梯度的采样方法。我们将这种方法应用于一个基本的数学问题作为概念验证,并应用于一组实际数据集,以展示其与现有采样方法相比的能力。
更新时间: 2024-07-26 03:17:41
领域: cs.LG
Scalable Graph Compressed Convolutions
Designing effective graph neural networks (GNNs) with message passing has two fundamental challenges, i.e., determining optimal message-passing pathways and designing local aggregators. Previous methods of designing optimal pathways are limited with information loss on the input features. On the other hand, existing local aggregators generally fail to extract multi-scale features and approximate diverse operators under limited parameter scales. In contrast to these methods, Euclidean convolution has been proven as an expressive aggregator, making it a perfect candidate for GNN construction. However, the challenges of generalizing Euclidean convolution to graphs arise from the irregular structure of graphs. To bridge the gap between Euclidean space and graph topology, we propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution. The permutations constrain all nodes in a row regardless of their input order and therefore enable the flexible generalization of Euclidean convolution to graphs. Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning. CoCN follows local feature-learning and global parameter-sharing mechanisms of convolution neural networks. The whole model can be trained end-to-end, with compressed convolution applied to learn individual node features and their corresponding structure features. CoCN can further borrow successful practices from Euclidean convolution, including residual connection and inception mechanism. We validate CoCN on both node-level and graph-level benchmarks. CoCN achieves superior performance over competitive GNN baselines. Codes are available at https://github.com/sunjss/CoCN.
Updated: 2024-07-26 03:14:13
标题: 可扩展的图压缩卷积
摘要: 设计具有消息传递功能的有效图神经网络(GNNs)存在两个基本挑战,即确定最佳的消息传递路径和设计本地聚合器。先前设计最佳路径的方法受限于输入特征的信息丢失。另一方面,现有的本地聚合器通常无法提取多尺度特征并在有限的参数尺度下近似各种运算符。与这些方法相反,欧几里得卷积已被证明是一种表现力强的聚合器,使其成为GNN构建的理想候选。然而,将欧几里得卷积推广到图形的挑战来自图形的不规则结构。为了弥合欧几里得空间和图形拓扑之间的差距,我们提出了一种可微分方法,通过对输入图形进行排列以校准输入图形进行欧几里得卷积。排列约束一行中的所有节点,无论其输入顺序如何,并因此使欧几里得卷积对图形的灵活泛化变得可能。基于图形校准,我们提出了压缩卷积网络(CoCN)用于分层图形表示学习。CoCN遵循卷积神经网络的本地特征学习和全局参数共享机制。整个模型可以端到端地进行训练,应用压缩卷积来学习各个节点的特征及其相应的结构特征。CoCN还可以借鉴欧几里得卷积的成功实践,包括残差连接和启发机制。我们在节点级和图级基准上验证了CoCN。CoCN在竞争性的GNN基线上取得了优越的性能。代码可在https://github.com/sunjss/CoCN上找到。
更新时间: 2024-07-26 03:14:13
领域: cs.LG
Online Differentially Private Synthetic Data Generation
We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(\log(t)t^{-1/d})$ for $d\geq 2$ and $O(\log^{4.5}(t)t^{-1})$ for $d=1$ in the 1-Wasserstein distance. This result extends the previous work on the continual release model for counting queries to Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.
Updated: 2024-07-26 03:12:42
标题: 在线差分隐私合成数据生成
摘要: 我们提出了一个在线差分隐私合成数据生成的多项式时间算法。对于一个在超立方体$[0,1]^d$内的数据流和一个无限时间范围,我们开发了一个在线算法,在每个时间$t$生成一个差分隐私合成数据集。该算法在1-Wasserstein距离上实现了近乎最优的精确度界限,对于$d\geq 2$为$O(\log(t)t^{-1/d})$,对于$d=1$为$O(\log^{4.5}(t)t^{-1)$。这个结果将以前关于计数查询的持续发布模型扩展到Lipschitz查询。与离线情况相比,在那种情况下整个数据集一次性可用,我们的方法仅需要在精确度界限中额外的多项式对数因子。
更新时间: 2024-07-26 03:12:42
领域: math.ST,cs.DS,cs.LG,math.PR,stat.TH
Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning
Masked autoencoder has demonstrated its effectiveness in self-supervised point cloud learning. Considering that masking is a kind of corruption, in this work we explore a more general denoising autoencoder for point cloud learning (Point-DAE) by investigating more types of corruptions beyond masking. Specifically, we degrade the point cloud with certain corruptions as input, and learn an encoder-decoder model to reconstruct the original point cloud from its corrupted version. Three corruption families (\ie, density/masking, noise, and affine transformation) and a total of fourteen corruption types are investigated with traditional non-Transformer encoders. Besides the popular masking corruption, we identify another effective corruption family, \ie, affine transformation. The affine transformation disturbs all points globally, which is complementary to the masking corruption where some local regions are dropped. We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction. Extensive experiments on tasks of object classification, few-shot learning, robustness testing, part segmentation, and 3D object detection validate the effectiveness of the proposed method. The codes are available at \url{https://github.com/YBZh/Point-DAE}.
Updated: 2024-07-26 02:51:08
标题: Point-DAE: 用于自监督点云学习的去噪自编码器
摘要: 遮罩自编码器已经在自监督点云学习中证明了其有效性。考虑到遮罩是一种损坏,本文探讨了一种更一般的用于点云学习的去噪自编码器(Point-DAE),通过研究更多类型的损坏而不仅仅是遮罩。具体来说,我们使用特定的损坏来破坏点云作为输入,并学习一个编码器-解码器模型来从其损坏版本中重建原始点云。我们研究了三种损坏家族(密度/遮罩、噪音和仿射变换)以及总共十四种损坏类型,使用传统的非Transformer编码器。除了常见的遮罩损坏,我们还确定了另一种有效的损坏家族,即仿射变换。仿射变换全局干扰所有点,这与遮罩损坏互补,后者会删除一些局部区域。我们还验证了仿射变换损坏的有效性,通过Transformer骨干网,我们将完整点云的重建分解为详细局部补丁和粗糙全局形状的重建,缓解了重建中的位置泄漏问题。在对象分类、少样本学习、鲁棒性测试、部分分割和3D对象检测等任务上进行了大量实验,验证了所提方法的有效性。代码可在 \url{https://github.com/YBZh/Point-DAE} 上获得。
更新时间: 2024-07-26 02:51:08
领域: cs.CV,cs.AI
FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction
Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performance. However, due to privacy concerns, data from different platforms cannot be uploaded to a server for centralized model training. Vertical federated learning (VFL) provides a possible solution which is able to keep the raw data on respective participating parties and learn a collaborative model in a privacy-preserving way. However, traditional VFL methods only utilize aligned data with common keys across parties, which strongly restricts their application scope. In this paper, we propose FedUD, which is able to exploit unaligned data, in addition to aligned data, for more accurate federated CTR prediction. FedUD contains two steps. In the first step, FedUD utilizes aligned data across parties like traditional VFL, but it additionally includes a knowledge distillation module. This module distills useful knowledge from the guest party's high-level representations and guides the learning of a representation transfer network. In the second step, FedUD applies the learned knowledge to enrich the representations of the host party's unaligned data such that both aligned and unaligned data can contribute to federated model training. Experiments on two real-world datasets demonstrate the superior performance of FedUD for federated CTR prediction.
Updated: 2024-07-26 02:48:32
标题: FedUD:利用不对齐的数据进行跨平台联合点击率预测
摘要: 点击率(CTR)预测在在线广告平台中起着重要作用。大多数现有方法使用广告平台本身的数据进行CTR预测。由于用户行为也存在于许多其他平台上,例如媒体平台,因此进一步利用这些补充信息来更好地建模用户兴趣并改善CTR预测性能是有益的。然而,由于隐私问题,来自不同平台的数据无法上传到服务器进行集中式模型训练。垂直联邦学习(VFL)提供了一个可能的解决方案,可以在各自参与方保留原始数据的同时以保护隐私的方式学习协作模型。然而,传统的VFL方法仅利用各方之间具有共同键的对齐数据,这严重限制了它们的应用范围。在本文中,我们提出了FedUD,能够利用未对齐数据和对齐数据进行更准确的联邦CTR预测。FedUD包含两个步骤。在第一步中,FedUD利用各方之间的对齐数据,类似于传统的VFL,但另外还包括一个知识蒸馏模块。该模块从访客方的高级表示中提取有用的知识,并指导表示转移网络的学习。在第二步中,FedUD将学到的知识应用于丰富主机方的未对齐数据的表示,使得对齐数据和未对齐数据都能为联邦模型训练做出贡献。对两个真实世界数据集的实验证明了FedUD在联邦CTR预测方面的优越性能。
更新时间: 2024-07-26 02:48:32
领域: cs.IR,cs.LG
Constructing the CORD-19 Vaccine Dataset
We introduce new dataset 'CORD-19-Vaccination' to cater to scientists specifically looking into COVID-19 vaccine-related research. This dataset is extracted from CORD-19 dataset [Wang et al., 2020] and augmented with new columns for language detail, author demography, keywords, and topic per paper. Facebook's fastText model is used to identify languages [Joulin et al., 2016]. To establish author demography (author affiliation, lab/institution location, and lab/institution country columns) we processed the JSON file for each paper and then further enhanced using Google's search API to determine country values. 'Yake' was used to extract keywords from the title, abstract, and body of each paper and the LDA (Latent Dirichlet Allocation) algorithm was used to add topic information [Campos et al., 2020, 2018a,b]. To evaluate the dataset, we demonstrate a question-answering task like the one used in the CORD-19 Kaggle challenge [Goldbloom et al., 2022]. For further evaluation, sequential sentence classification was performed on each paper's abstract using the model from Dernoncourt et al. [2016]. We partially hand annotated the training dataset and used a pre-trained BERT-PubMed layer. 'CORD- 19-Vaccination' contains 30k research papers and can be immensely valuable for NLP research such as text mining, information extraction, and question answering, specific to the domain of COVID-19 vaccine research.
Updated: 2024-07-26 02:44:55
标题: 构建CORD-19疫苗数据集
摘要: 我们引入了新的数据集“CORD-19-Vaccination”,以满足专门研究COVID-19疫苗相关研究的科学家的需求。该数据集是从CORD-19数据集[Wang et al., 2020]中提取的,并增加了新的列,包括语言细节、作者人口统计学、关键词和每篇论文的主题。我们使用Facebook的fastText模型来识别语言[Joulin et al., 2016]。为了确定作者的人口统计学(作者所属机构、实验室/机构位置和实验室/机构所在国家的列),我们处理了每篇论文的JSON文件,然后进一步利用Google的搜索API来确定国家的值。我们使用“Yake”从每篇论文的标题、摘要和正文中提取关键词,并使用LDA(潜在狄利克雷分配)算法添加主题信息[Campos et al., 2020, 2018a,b]。为了评估数据集,我们展示了类似于CORD-19 Kaggle挑战中使用的问答任务[Goldbloom et al., 2022]。为了进一步评估,我们对每篇论文的摘要进行了顺序句子分类,使用了Dernoncourt等人[2016]的模型。我们部分手动标注了训练数据集,并使用了预训练的BERT-PubMed层。'CORD-19-Vaccination'包含30k篇研究论文,对于NLP研究(如文本挖掘、信息提取和问题回答),特别是针对COVID-19疫苗研究领域,具有巨大的价值。
更新时间: 2024-07-26 02:44:55
领域: cs.CL,cs.IR,cs.LG
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization
Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image features to detect anomalies and localize anomalous patches. However, the generic descriptions of "abnormal" often fail to precisely match diverse types of anomalies across different object categories. Additionally, computing feature similarities for single patches struggles to pinpoint specific locations of anomalies with various sizes and scales. To address these issues, we propose a novel ZSAD method called FiLo, comprising two components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High-Quality Localization (HQ-Loc). FG-Des introduces fine-grained anomaly descriptions for each category using Large Language Models (LLMs) and employs adaptively learned textual templates to enhance the accuracy and interpretability of anomaly detection. HQ-Loc, utilizing Grounding DINO for preliminary localization, position-enhanced text prompts, and Multi-scale Multi-shape Cross-modal Interaction (MMCI) module, facilitates more accurate localization of anomalies of different sizes and shapes. Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization, achieving state-of-the-art performance with an image-level AUC of 83.9% and a pixel-level AUC of 95.9% on the VisA dataset. Code is available at https://github.com/CASIA-IVA-Lab/FiLo.
Updated: 2024-07-26 02:42:21
标题: FiLo:通过细粒度描述和高质量定位进行零-shot异常检测
摘要: 零样本异常检测(ZSAD)方法包括在目标项目类别中直接检测异常,而无需访问任何已知的正常或异常样本。现有方法通常依赖于多模态预训练模型的强大泛化能力,计算代表“正常”或“异常”语义的手工制作的文本特征与图像特征之间的相似性,以检测异常并定位异常区域。然而,“异常”的通用描述通常无法精确匹配不同对象类别中各种类型的异常。此外,为单个区域计算特征相似性往往难以准确指出各种大小和比例的异常位置。为了解决这些问题,我们提出了一种名为FiLo的新型ZSAD方法,包括两个组件:自适应学习的细粒度描述(FG-Des)和位置增强的高质量定位(HQ-Loc)。FG-Des使用大型语言模型(LLMs)为每个类别引入细粒度异常描述,并采用自适应学习的文本模板来增强异常检测的准确性和可解释性。HQ-Loc利用Grounding DINO进行初步定位,位置增强文本提示和多尺度多形状跨模态交互(MMCI)模块,有助于更准确地定位不同大小和形状的异常。在MVTec和VisA等数据集上的实验结果表明,FiLo显著改善了ZSAD在检测和定位方面的性能,实现了在VisA数据集上图像级AUC为83.9%,像素级AUC为95.9%的最新性能。代码可在https://github.com/CASIA-IVA-Lab/FiLo上获取。
更新时间: 2024-07-26 02:42:21
领域: cs.CV,cs.LG
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints
Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC).
Updated: 2024-07-26 02:34:25
标题: 受带宽限制的生成模型的扩散驱动语义通信
摘要: 最近几年,扩散模型在人工智能生成内容(AIGC)中得到了广泛应用,这归功于其卓越的生成能力。结合语义通信,扩散模型被用于去噪、数据重建和内容生成等任务。然而,现有基于扩散的生成模型并未考虑到严格的带宽限制,这限制了其在无线通信中的应用。本文介绍了一个带有高级VAE压缩的基于扩散的语义通信框架,用于带宽受限的生成模型。我们设计的架构利用了扩散模型,其中通过无线信道的信号传输过程充当扩散中的前向过程。为了减少带宽需求,我们在接收端基于变分自动编码器进行重新参数化,将下采样模块和配对上采样模块结合,以确保恢复的特征符合高斯分布。此外,我们推导了我们提出系统的损失函数,并通过全面实验评估了其性能。我们的实验结果显示,在像素级度量(如峰值信噪比(PSNR))和语义度量(如学习到的感知图像块相似性(LPIPS))方面取得了显著的改进。与深度联合源-信道编码(DJSCC)相比,这些增强在压缩率和信噪比方面更为明显。
更新时间: 2024-07-26 02:34:25
领域: cs.LG,cs.AI
PersLLM: A Personified Training Approach for Large Language Models
Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in practical applications. Addressing this, the development of personality traits in LLMs emerges as a crucial area of research to unlock their latent potential. Existing methods to personify LLMs generally involve strategies like employing stylized training data for instruction tuning or using prompt engineering to simulate different personalities. These methods only capture superficial linguistic styles instead of the core of personalities and are therefore not stable. In this study, we propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development, into a comprehensive training methodology. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality. Single-agent evaluation validates our method's superiority, as it produces responses more aligned with reference personalities compared to other approaches. Case studies for multi-agent communication highlight its benefits in enhancing opinion consistency within individual agents and fostering collaborative creativity among multiple agents in dialogue contexts, potentially benefiting human simulation and multi-agent cooperation. Additionally, human-agent interaction evaluations indicate that our personified models significantly enhance interactive experiences, underscoring the practical implications of our research.
Updated: 2024-07-26 02:34:14
标题: PersLLM:一种针对大型语言模型的个性化培训方法
摘要: 大型语言模型展现出人类水平智能的一些方面,促使它们被应用为类人代理在社交模拟、人机交互和协作多代理系统等领域。然而,缺乏明显的个性特征,比如显示讨好行为、意见不一致和统一的响应模式,降低了LLMs在实际应用中的效用。解决这一问题,LLMs中个性特征的发展成为一个关键的研究领域,以释放它们的潜在潜力。现有的个性化LLMs方法通常涉及策略,如使用风格化的训练数据进行指导调整,或使用提示工程来模拟不同的个性。这些方法只捕捉表面的语言风格,而不是个性的核心,因此不稳定。在这项研究中,我们提出了PersLLM,将基于心理学的个性原则:社会实践、一致性和动态发展,融入一个全面的培训方法。我们直接将个性特征融入模型参数中,增强了模型对归纳的抵抗力,促进了一致性,并支持了个性的动态发展。单个代理评估验证了我们方法的优越性,因为它产生的响应与参考个性更加一致,与其他方法相比。多代理通信的案例研究突显了其在增强个体代理内的意见一致性和在对话环境中促进多代理之间协作创造力方面的益处,潜在地有益于人类模拟和多代理合作。此外,人-代理互动评估表明,我们的个性化模型显著提升了交互体验,强调了我们研究的实际影响。
更新时间: 2024-07-26 02:34:14
领域: cs.CL,cs.AI,cs.CY
Machine Unlearning using a Multi-GAN based Model
This article presents a new machine unlearning approach that utilizes multiple Generative Adversarial Network (GAN) based models. The proposed method comprises two phases: i) data reorganization in which synthetic data using the GAN model is introduced with inverted class labels of the forget datasets, and ii) fine-tuning the pre-trained model. The GAN models consist of two pairs of generators and discriminators. The generator discriminator pairs generate synthetic data for the retain and forget datasets. Then, a pre-trained model is utilized to get the class labels of the synthetic datasets. The class labels of synthetic and original forget datasets are inverted. Finally, all combined datasets are used to fine-tune the pre-trained model to get the unlearned model. We have performed the experiments on the CIFAR-10 dataset and tested the unlearned models using Membership Inference Attacks (MIA). The inverted class labels procedure and synthetically generated data help to acquire valuable information that enables the model to outperform state-of-the-art models and other standard unlearning classifiers.
Updated: 2024-07-26 02:28:32
标题: 使用基于多GAN的模型进行机器遗忘
摘要: 本文提出了一种利用多个生成对抗网络(GAN)模型的新机器遗忘方法。所提出的方法包括两个阶段:i)数据重组,在这个阶段使用GAN模型生成合成数据,并使用忘记数据的反转类标签;ii)微调预训练模型。GAN模型由两对生成器和鉴别器组成。生成器鉴别器对为保留和忘记数据集生成合成数据。然后,使用预训练模型获取合成数据集的类标签。合成和原始忘记数据集的类标签被反转。最后,所有组合数据集用于微调预训练模型以获取未学习的模型。我们在CIFAR-10数据集上进行了实验,并使用成员推断攻击(MIA)测试了未学习的模型。反转类标签程序和合成生成的数据有助于获得有价值的信息,使模型能够胜过最先进的模型和其他标准的遗忘分类器。
更新时间: 2024-07-26 02:28:32
领域: cs.LG
Quantum Key Distribution Routing Protocol in Quantum Networks: Overview and Challenges
The use of quantum cryptography in everyday applications has gained attention in both industrial and academic fields. Due to advancements in quantum electronics, practical quantum devices are already available in the market, and ready for wider use. Quantum Key Distribution (QKD) is a crucial aspect of quantum cryptography, which involves generating and distributing symmetric cryptographic keys between geographically separated users using principles of quantum physics. Many successful QKD networks have been established to test different solutions. The objective of this paper is to delve into the potential of utilizing established routing design techniques in the context of quantum key distribution, a field distinguished by its unique properties rooted in the principles of quantum mechanics. However, the implementation of these techniques poses substantial challenges, including quantum memory decoherence, key rate generation, latency delays, inherent noise in quantum systems, limited communication ranges, and the necessity for highly specialized hardware. This paper conducts an in-depth examination of essential research pertaining to the design methodologies for quantum key distribution. It also explores the fundamental aspects of quantum routing and the associated properties inherent to quantum QKD. This paper elucidates the necessary steps for constructing efficient and resilient QKD networks. In summarizing the techniques relevant to QKD networking and routing, including their underlying principles, protocols, and challenges, this paper sheds light on potential applications and delineates future research directions in this burgeoning field.
Updated: 2024-07-26 02:18:08
标题: 量子网络中的量子密钥分发路由协议:概述与挑战
摘要: 量子密码学在日常应用中的应用引起了工业界和学术界的关注。由于量子电子学的进步,实用的量子设备已经在市场上可用,并准备好进行更广泛的使用。量子密钥分发(QKD)是量子密码学的一个关键方面,它涉及利用量子物理原理在地理上分隔的用户之间生成和分发对称密码密钥。许多成功的QKD网络已经建立起来以测试不同的解决方案。本文的目的是探讨在量子密钥分发领域利用已建立的路由设计技术的潜力,该领域以其根植于量子力学原理的独特属性而闻名。然而,这些技术的实施面临着重大挑战,包括量子存储器失真、密钥生成速率、延迟延迟、量子系统中固有的噪声、通信范围有限以及对高度专门化硬件的必要性。本文对与量子密钥分发设计方法有关的基础研究进行了深入的研究。它还探讨了量子路由的基本方面以及与量子QKD相关的属性。本文阐明了构建高效和弹性QKD网络所需的必要步骤。总结了与QKD网络和路由相关的技术,包括它们的基本原理、协议和挑战,本文为潜在的应用提供了启示,并勾勒了这一新兴领域未来研究的方向。
更新时间: 2024-07-26 02:18:08
领域: cs.CR
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average.
Updated: 2024-07-26 02:10:04
标题: 破坏扩散:针对基于扩散的定制的令牌级注意力抹除攻击
摘要: 随着像DreamBooth这样基于扩散的定制方法的发展,个人现在可以访问训练能够生成其个性化图像的模型。尽管方便,恶意用户已经滥用这些技术来创建虚假图像,从而引发隐私安全危机。基于此,提出了主动对抗性攻击来保护用户免受定制的影响。对抗性示例被训练来扭曲定制模型的输出,从而阻止滥用。在本文中,我们提出了DisDiff(破坏扩散),一种新颖的对抗性攻击方法,用于干扰扩散模型的输出。我们首先深入探讨了内在的图像-文本关系,即著名的交叉注意力,并从经验上发现主题标识符令牌在引导图像生成中起着重要作用。因此,我们提出了交叉注意力擦除模块,明确地“擦除”指示的注意力图并干扰文本指导。此外,我们分析了扩散模型的采样过程对Projected Gradient Descent(PGD)攻击的影响,并引入了一种新颖的Merit Sampling Scheduler,以适应性地调节扰动更新幅度以一种步骤感知的方式。我们的DisDiff在两个面部基准和两个常用提示上平均优于最先进的方法,FDFR分数提高了12.75%,ISM分数提高了7.25%。
更新时间: 2024-07-26 02:10:04
领域: cs.CV,cs.AI,I.2.10
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection
Vehicular networks are exposed to various threats resulting from malicious attacks. These threats compromise the security and reliability of communications among road users, thereby jeopardizing road and traffic safety. One of the main vectors of these attacks within vehicular networks is misbehaving vehicles. To address this challenge, we propose deploying a pretrained Large Language Model (LLM)-empowered Misbehavior Detection System (MDS) within an edge-cloud detection framework. Specifically, we fine-tune Mistral-7B, a state-of-the-art LLM, as the edge component to enable real-time detection, whereas a larger LLM deployed in the cloud can conduct a more comprehensive analysis. Our experiments conducted on the extended VeReMi dataset demonstrate Mistral-7B's superior performance, achieving 98\% accuracy compared to other LLMs such as LLAMA2-7B and RoBERTa. Additionally, we investigate the impact of window size on computational costs to optimize deployment efficiency. Leveraging LLMs in MDS shows interesting results in improving the detection of vehicle misbehavior, consequently strengthening vehicular network security to ensure the safety of road users.
Updated: 2024-07-26 02:09:32
标题: MistralBSM:利用Mistral-7B进行车载网络的异常行为检测
摘要: 车载网络面临来自恶意攻击的各种威胁。这些威胁损害了道路用户之间通信的安全性和可靠性,从而危及道路和交通安全。在车载网络内,这些攻击的主要向量之一是行为不端的车辆。为了解决这一挑战,我们提出在边缘-云检测框架内部署一个预训练的大型语言模型(LLM)增强的行为不端检测系统(MDS)。具体而言,我们通过对最先进的LLM Mistral-7B进行微调,作为边缘组件以实现实时检测,而在云中部署更大的LLM可以进行更全面的分析。我们在扩展的VeReMi数据集上进行的实验表明,与其他LLM(如LLAMA2-7B和RoBERTa)相比,Mistral-7B表现出优越的性能,达到了98%的准确率。此外,我们研究了窗口大小对计算成本的影响,以优化部署效率。利用LLM在MDS中显示出有趣的结果,有助于改善对车辆不当行为的检测,从而增强车载网络安全性,确保道路用户的安全。
更新时间: 2024-07-26 02:09:32
领域: cs.LG,cs.CR
Longhorn: State Space Models are Amortized Online Learners
The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.
Updated: 2024-07-26 02:03:00
标题: 长角牛:状态空间模型是摊销的在线学习器
摘要: 现代人工智能方法(如大型语言模型)的最基本能力是能够预测长序列中的下一个标记,这被称为“序列建模”。尽管Transformers模型是当前主流的序列建模方法,但其与序列长度相关的二次计算成本是一个显著的缺点。状态空间模型(SSMs)由于其线性解码效率和高并行性在训练过程中具有很大的潜力。然而,现有的SSMs通常依赖看似临时的线性递归设计。在这项工作中,我们通过在线学习的视角探索SSM设计,将SSMs概念化为特定在线学习问题的元模块。这种方法将SSM设计与制定精确的在线学习目标联系起来,状态转移规则是从优化这些目标中导出的。基于这一见解,我们引入了一种基于隐式更新的新型深度SSM架构,用于优化在线回归目标。我们的实验结果表明,我们的模型在标准序列建模基准和语言建模任务上优于最先进的SSMs,包括Mamba模型。
更新时间: 2024-07-26 02:03:00
领域: cs.LG
Leveraging AI Planning For Detecting Cloud Security Vulnerabilities
Cloud computing services provide scalable and cost-effective solutions for data storage, processing, and collaboration. Alongside their growing popularity, concerns related to their security vulnerabilities leading to data breaches and sophisticated attacks such as ransomware are growing. To address these, first, we propose a generic framework to express relations between different cloud objects such as users, datastores, security roles, to model access control policies in cloud systems. Access control misconfigurations are often the primary driver for cloud attacks. Second, we develop a PDDL model for detecting security vulnerabilities which can for example lead to widespread attacks such as ransomware, sensitive data exfiltration among others. A planner can then generate attacks to identify such vulnerabilities in the cloud. Finally, we test our approach on 14 real Amazon AWS cloud configurations of different commercial organizations. Our system can identify a broad range of security vulnerabilities, which state-of-the-art industry tools cannot detect.
Updated: 2024-07-26 01:37:38
标题: 利用人工智能规划技术检测云安全漏洞
摘要: 云计算服务为数据存储、处理和协作提供了可扩展和成本效益的解决方案。随着它们日益流行,与其安全漏洞相关的担忧也在增加,这些漏洞可能导致数据泄露和勒索软件等复杂攻击。为了解决这些问题,首先,我们提出了一个通用框架,用于表达云系统中不同云对象之间的关系,如用户、数据存储、安全角色,以建模云系统中的访问控制策略。访问控制的错误配置通常是云攻击的主要驱动因素。其次,我们开发了一个PDDL模型,用于检测安全漏洞,这些漏洞可能导致广泛的攻击,如勒索软件、敏感数据外泄等。规划器可以生成攻击,以识别云中的此类漏洞。最后,我们在14个不同商业组织的真实亚马逊AWS云配置上测试了我们的方法。我们的系统可以识别一系列安全漏洞,这是目前行业工具无法检测到的。
更新时间: 2024-07-26 01:37:38
领域: cs.CR,cs.AI
Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks
This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and device architectures. We focus on accelerating the technology level of STCO using AI techniques, by employing graph neural network (GNN)-based approaches for both TCAD simulation and cell library characterization, which are interconnected through a unified compact model, collectively achieving over a 100X speedup over traditional methods. These advancements enable comprehensive STCO iterations with runtime speedups ranging from 1.9X to 14.1X and supports both emerging and traditional technologies.
Updated: 2024-07-26 01:34:49
标题: 基于图神经网络的新兴技术快速系统技术协同优化框架
摘要: 本文提出了一种快速的系统技术协同优化(STCO)框架,针对下一代集成电路设计,优化功耗、性能和面积(PPA),解决了新型材料和器件架构带来的挑战和机遇。我们专注于利用人工智能技术加速STCO技术水平,通过采用基于图神经网络(GNN)的方法,对TCAD仿真和单元库特性进行建模,通过统一的紧凑模型相互关联,总体实现传统方法的100倍以上加速。这些进步使得STCO迭代能够在运行时间加速1.9倍至14.1倍,并支持新兴和传统技术。
更新时间: 2024-07-26 01:34:49
领域: cs.ET,cs.AI
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.
Updated: 2024-07-26 01:28:16
标题: Tag-LLM:将通用性LLM重新用于专业领域
摘要: 大型语言模型(LLMs)已经展示出在理解和生成自然语言方面的出色能力。然而,在预训练语料库中未充分代表的高度专业化领域,如物理和生物医学科学,它们的能力会减弱。本文探讨了如何将通用LLMs重新定位为专业领域的有效任务解决者。我们引入了一种新颖的、与模型无关的框架,用于学习定制输入标签,这些标签被参数化为连续向量并附加到LLM的嵌入层,以对LLM进行条件化。我们设计了两种类型的输入标签:领域标签用于界定专业化表示(例如化学式)并提供领域相关的上下文;功能标签用于表示特定功能(例如预测分子性质)并压缩功能解决指令。我们开发了一个三阶段协议,利用辅助数据和领域知识来学习这些标签。通过明确将任务领域与任务功能分离,我们的方法通过各种输入标签的不同组合使零样本泛化到未见问题成为可能。它还提高了LLM在各种专业领域的性能,例如预测蛋白质或化学性质和建模药物靶标相互作用,胜过针对这些任务定制的专家模型。
更新时间: 2024-07-26 01:28:16
领域: cs.LG,cs.AI,cs.CL
Towards Automated Solution Recipe Generation for Industrial Asset Management with LLM
This study introduces a novel approach to Industrial Asset Management (IAM) by incorporating Conditional-Based Management (CBM) principles with the latest advancements in Large Language Models (LLMs). Our research introduces an automated model-building process, traditionally reliant on intensive collaboration between data scientists and domain experts. We present two primary innovations: a taxonomy-guided prompting generation that facilitates the automatic creation of AI solution recipes and a set of LLM pipelines designed to produce a solution recipe containing a set of artifacts composed of documents, sample data, and models for IAM. These pipelines, guided by standardized principles, enable the generation of initial solution templates for heterogeneous asset classes without direct human input, reducing reliance on extensive domain knowledge and enhancing automation. We evaluate our methodology by assessing asset health and sustainability across a spectrum of ten asset classes. Our findings illustrate the potential of LLMs and taxonomy-based LLM prompting pipelines in transforming asset management, offering a blueprint for subsequent research and development initiatives to be integrated into a rapid client solution.
Updated: 2024-07-26 01:24:52
标题: 朝向具有LLM的工业资产管理自动化解决方案生成
摘要: 这项研究引入了一种新颖的工业资产管理(IAM)方法,将基于条件的管理(CBM)原则与最新的大型语言模型(LLMs)相结合。我们的研究介绍了一种自动化的建模过程,传统上依赖于数据科学家和领域专家之间的密切合作。我们提出了两项主要创新:一种基于分类学引导的提示生成,有助于自动生成人工智能解决方案配方,以及一组旨在生成包含文档、示例数据和IAM模型的一组工件的LLM管道。这些受标准原则指导的管道使得能够生成异质资产类别的初始解决方案模板,无需直接人类输入,减少对广泛领域知识的依赖,增强自动化。我们通过评估十种资产类别的资产健康和可持续性来评估我们的方法论。我们的研究结果展示了LLMs和基于分类学的LLM提示管道在转变资产管理方面的潜力,为后续的研究和开发计划提供了一个蓝图,以整合为客户快速提供解决方案的积极措施。
更新时间: 2024-07-26 01:24:52
领域: cs.AI
Fairness Definitions in Language Models Explained
Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts (\textit{e.g.,} medium-sized LMs versus large-sized LMs) and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up-to-date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their foundational principles and operational distinctions. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The implementation and additional resources are publicly available at https://github.com/LavinWong/Fairness-in-Large-Language-Models/tree/main/definitions.
Updated: 2024-07-26 01:21:25
标题: 在语言模型中的公平定义解释
摘要: 语言模型(LMs)在各种自然语言处理(NLP)任务中表现出色。尽管取得了这些进展,LMs可能会继承和放大与敏感属性(如性别和种族)相关的社会偏见,从而限制它们在实际应用中的采用。因此,公平性在LMs中得到了广泛探讨,导致提出了各种公平性概念。然而,在特定环境中应用哪种公平性定义的缺乏明确协议(例如,中型LMs与大型LMs)以及理解这些定义之间区别的复杂性可能会造成混淆,并阻碍进一步进展。为此,本文提出了一项系统调查,澄清了公平性定义在LMs中的应用。具体而言,我们从LMs和LMs中的公平性简要介绍开始,然后提供了现有公平性概念的全面、最新概述,并介绍了一种基于其基本原则和操作区别对这些概念进行分类的新颖分类法。我们通过实验进一步阐明了每个定义,展示了它们的实际影响和结果。最后,我们讨论当前的研究挑战和未解问题,旨在促进创新思想并推动该领域的发展。实现和额外资源可在https://github.com/LavinWong/Fairness-in-Large-Language-Models/tree/main/definitions 上公开获取。
更新时间: 2024-07-26 01:21:25
领域: cs.CL,cs.AI,cs.LG
Improving Representation of High-frequency Components for Medical Foundation Models
Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific trained models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in DSC for retina vessel segmentation and a +7% increase in IoU for lung nodule detection. Further experiments quantitatively reveal that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.
Updated: 2024-07-26 01:19:27
标题: Improving Representation of High-frequency Components for Medical Foundation Models (改进医学基础模型中高频组件的表示)
摘要: 基础模型近来吸引了广泛关注,因为它们在各种下游任务中展现出了令人印象深刻的泛化能力。然而,这些模型在表示高频成分和细粒度细节方面表现出了很大的局限性。在许多医学影像任务中,由于涉及复杂的解剖结构、亚视觉特征和复杂的边界,精确表示这些信息是至关重要的。因此,流行的基础模型的有限表达能力可能导致这些任务中的显著性能下降甚至失败。为了解决这些挑战,我们提出了一种新的预训练策略,名为高频进阶表示自编码器(Frepa)。通过高频遮罩和低频扰动结合对抗学习,Frepa鼓励编码器有效表示和保留图像嵌入中的高频成分。此外,我们引入了一种创新的直方图均衡化图像遮罩策略,将遮罩自编码器方法扩展到其他架构,如Swin Transformer和卷积网络。我们在九种医学模态上开发了Frepa,并在32个2D图像和3D体积数据的下游任务中对其进行验证。在没有微调的情况下,Frepa可以超过其他自监督预训练方法,并在某些情况下甚至超过特定任务训练的模型。这种改进对涉及细粒度细节的任务尤为显著,例如视网膜血管分割的DSC增加了高达+15%,肺结节检测的IoU增加了+7%。进一步的实验证明,Frepa能够在嵌入中实现更优越的高频表示和保留,突显了其在开发更广义和通用的医学图像基础模型方面的潜力。
更新时间: 2024-07-26 01:19:27
领域: eess.IV,cs.AI,cs.CV
Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation
Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation.
Updated: 2024-07-26 01:14:52
标题: 探索Kolmogorov-Arnold网络在分类中的局限性:对软件训练和硬件实现的见解
摘要: 科尔莫哥洛夫-阿诺德网络(Kolmogorov-Arnold Networks,简称KANs)是一种新型的神经网络,最近因能够以更高的准确性和互操作性替代人工智能(AI)中的多层感知器(MLPs)而备受关注。然而,KAN的评估仍然有限,无法对特定领域进行深入分析。此外,还没有研究对KAN在硬件设计中的实施进行过研究,这将直接证明KAN是否真正优于MLP在实际应用中。因此,在本文中,我们专注于验证KAN在分类问题上的表现,这是AI中一个常见但重要的主题,使用四种不同类型的数据集。此外,还考虑了使用Vitis高级综合(HLS)工具进行相应的硬件实现。据我们所知,这是第一篇为KAN实施硬件的文章。结果表明,在高复杂数据集中,KAN无法比MLP实现更高的准确性,同时利用了更多的硬件资源。因此,在软件和硬件实现中,MLP仍然是实现准确性和效率的有效方法。
更新时间: 2024-07-26 01:14:52
领域: cs.LG,cs.AR
Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet
In this study, state-of-the-art unsupervised detection models were evaluated for the purpose of automated anomaly inspection of wool carpets. A custom dataset of four unique types of carpet textures was created to thoroughly test the models and their robustness in detecting subtle anomalies in complex textures. Due to the requirements of an inline inspection system in a manufacturing use case, the metrics of importance in this study were accuracy in detecting anomalous areas, the number of false detections, and the inference times of each model for real-time performance. Of the evaluated models, the student-teacher network based methods were found on average to yield the highest detection accuracy and lowest false detection rates. When trained on a multi-class dataset the models were found to yield comparable if not better results than single-class training. Finally, in terms of detection speed, with exception to the generative model, all other evaluated models were found to have comparable inference times on a GPU, with an average of 0.16s per image. On a CPU, most of these models typically produced results between 1.5 to 2 times the respective GPU inference times.
Updated: 2024-07-26 01:13:59
标题: 纺织品异常检测:对地毯自动质量检验技术的现状评估
摘要: 在这项研究中,评估了最先进的无监督检测模型,目的是自动检测羊毛地毯的异常情况。创建了一个包含四种独特地毯纹理的定制数据集,以彻底测试模型及其在复杂纹理中检测微小异常的鲁棒性。由于在制造使用案例中需要在线检测系统,因此本研究中重要的指标是检测异常区域的准确性、误报的数量以及每个模型的推理时间,以实现实时性能。在评估的模型中,基于学生-教师网络的方法平均发现具有最高的检测准确性和最低的误报率。当在多类数据集上训练模型时,发现与单类训练相比,这些模型的结果是可比较甚至更好的。最后,就检测速度而言,除了生成模型外,所有其他评估的模型在GPU上具有可比较的推理时间,每张图像平均为0.16秒。在CPU上,大多数这些模型通常产生的结果是GPU推理时间的1.5到2倍。
更新时间: 2024-07-26 01:13:59
领域: cs.CV,cs.LG
Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation
Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 39 specific tasks. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, a Generalizable Pathology Foundation Model (GPFM) is pretrained on a large-scale dataset consisting of 190 million images from around 86,000 public H\&E whole slides across 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.36, with 29 tasks ranked 1st, while the the second-best model, UNI, attains an average rank of 2.96, with only 4 tasks ranked 1st. The superior generalization of GPFM demonstrates its exceptional modeling capabilities across a wide range of clinical tasks, positioning it as a new cornerstone for feature representation in CPath.
Updated: 2024-07-26 01:12:54
标题: 朝向通过统一知识蒸馏实现可推广的病理基础模型
摘要: 在大规模数据集上预训练的基础模型正在彻底改变计算病理学(CPath)领域。基础模型的泛化能力对于在各种下游临床任务中取得成功至关重要。然而,目前的基础模型仅在有限类型和数量的任务上进行了评估,其泛化能力和整体性能尚不明确。为了填补这一空白,我们建立了一个最全面的基准来评估现成的基础模型在六种不同的临床任务类型上的表现,涵盖了总共39个具体任务。我们的研究发现,现有的基础模型在某些任务类型上表现出色,但难以有效处理全面的临床任务。为了提高病理学基础模型的泛化能力,我们提出了一个统一的知识蒸馏框架,包括专家知识蒸馏和自我知识蒸馏,前者使模型能够从多个专家模型的知识中学习,而后者利用自我蒸馏来通过本地-全局对齐实现图像表示学习。基于这一框架,我们预先在一个包含来自大约86,000个公共H\&E全切片的190亿张图像的大规模数据集上进行了一个通用病理学基础模型(GPFM)的预训练。在建立的基准上评估,GPFM取得了令人印象深刻的平均排名1.36,其中有29个任务排名第一,而第二优秀的模型UNI的平均排名为2.96,只有4个任务排名第一。GPFM的优越泛化能力展示了其在广泛临床任务中的出色建模能力,将其定位为CPath特征表示的新基石。
更新时间: 2024-07-26 01:12:54
领域: eess.IV,cs.CV,cs.LG
HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments
Major depressive disorder (MDD) presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of learning strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task (PRT) within the EMBARC study, we propose a novel RL-HMM framework for analyzing reward-based decision-making. Our model accommodates learning strategy switching between two distinct approaches under a hidden Markov model (HMM): subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient EM algorithm for parameter estimation and employ a nonparametric bootstrap for inference. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.
Updated: 2024-07-26 01:12:39
标题: 使用强化学习实验发现决策动态的HMM模型
摘要: 抑郁症(MDD)在诊断和治疗上存在挑战,因其复杂和异质性的特性。新兴证据表明,奖励加工异常可能作为MDD的行为标记。为了衡量奖励加工,患者执行基于计算机的行为任务,涉及做出选择或对与不同结果相关的刺激作出反应。强化学习(RL)模型被拟合以提取衡量奖励加工各个方面的参数,以描述患者在行为任务中如何做出决策。最近的研究结果表明,仅基于单一RL模型来表征奖励学习可能是不足的;相反,可能存在多种策略之间的决策过程切换。一个重要的科学问题是,决策中学习策略的动态如何影响MDD个体的奖励学习能力。受EMBARC研究中概率奖励任务(PRT)的启发,我们提出了一个新颖的RL-HMM框架来分析基于奖励的决策。我们的模型适应了在隐藏马尔可夫模型(HMM)下两种不同方法之间的学习策略切换:根据RL模型做出决策的被试者或选择随机选择。我们考虑连续的RL状态空间,并在HMM中允许时变的转移概率。我们引入了一个计算高效的EM算法用于参数估计,并使用非参数引导进行推断。我们将我们的方法应用于EMBARC研究,展示MDD患者与健康对照组相比,参与RL的程度较低,并且参与与情绪冲突任务中负面影响电路的脑活动相关。
更新时间: 2024-07-26 01:12:39
领域: cs.LG,stat.AP,stat.ME,stat.ML
Capturing the security expert knowledge in feature selection for web application attack detection
This article puts forward the use of mutual information values to replicate the expertise of security professionals in selecting features for detecting web attacks. The goal is to enhance the effectiveness of web application firewalls (WAFs). Web applications are frequently vulnerable to various security threats, making WAFs essential for their protection. WAFs analyze HTTP traffic using rule-based approaches to identify known attack patterns and to detect and block potential malicious requests. However, a major challenge is the occurrence of false positives, which can lead to blocking legitimate traffic and impact the normal functioning of the application. The problem is addressed as an approach that combines supervised learning for feature selection with a semi-supervised learning scenario for training a One-Class SVM model. The experimental findings show that the model trained with features selected by the proposed algorithm outperformed the expert-based selection approach in terms of performance. Additionally, the results obtained by the traditional rule-based WAF ModSecurity, configured with a vanilla set of OWASP CRS rules, were also improved.
Updated: 2024-07-26 00:56:11
标题: 捕捉网络应用攻击检测中安全专家知识的特征选择
摘要: 本文提出了使用互信息值来复制安全专家在选择用于检测web攻击的特征方面的专业知识。其目标是增强web应用防火墙(WAFs)的有效性。web应用程序经常容易受到各种安全威胁的影响,因此WAFs对其保护至关重要。WAFs使用基于规则的方法分析HTTP流量,以识别已知的攻击模式,并检测和阻止潜在的恶意请求。然而,一个主要挑战是出现误报警,这可能导致阻止合法流量,并影响应用程序的正常运行。该问题被视为一种将监督学习用于特征选择与半监督学习方案相结合,用于训练One-Class SVM模型的方法。实验结果表明,使用所提出算法选择的特征训练的模型在性能方面优于基于专家选择的方法。此外,使用传统基于规则的WAF ModSecurity配置的具有原始OWASP CRS规则集的结果也得到了改善。
更新时间: 2024-07-26 00:56:11
领域: cs.CR,cs.AI
Impact of Recurrent Neural Networks and Deep Learning Frameworks on Real-time Lightweight Time Series Anomaly Detection
Real-time lightweight time series anomaly detection has become increasingly crucial in cybersecurity and many other domains. Its ability to adapt to unforeseen pattern changes and swiftly identify anomalies enables prompt responses and critical decision-making. While several such anomaly detection approaches have been introduced in recent years, they primarily utilize a single type of recurrent neural networks (RNNs) and have been implemented in only one deep learning framework. It is unclear how the use of different types of RNNs available in various deep learning frameworks affects the performance of these anomaly detection approaches due to the absence of comprehensive evaluations. Arbitrarily choosing a RNN variant and a deep learning framework to implement an anomaly detection approach may not reflect its true performance and could potentially mislead users into favoring one approach over another. In this paper, we aim to study the influence of various types of RNNs available in popular deep learning frameworks on real-time lightweight time series anomaly detection. We reviewed several state-of-the-art approaches and implemented a representative anomaly detection approach using well-known RNN variants supported by three widely recognized deep learning frameworks. A comprehensive evaluation is then conducted to analyze the performance of each implementation across real-world, open-source time series datasets. The evaluation results provide valuable guidance for selecting the appropriate RNN variant and deep learning framework for real-time, lightweight time series anomaly detection.
Updated: 2024-07-26 00:38:51
标题: 循环神经网络和深度学习框架对实时轻量级时间序列异常检测的影响
摘要: 实时轻量级时间序列异常检测在网络安全和许多其他领域变得越来越关键。它适应未预料到的模式变化并迅速识别异常的能力使得能够做出迅速反应和关键决策成为可能。尽管近年来引入了几种这样的异常检测方法,但它们主要利用单一类型的递归神经网络(RNNs),并且仅在一个深度学习框架中实现。由于缺乏全面的评估,目前尚不清楚在各种深度学习框架中可用的不同类型的RNNs对这些异常检测方法的性能有何影响。随意选择一个RNN变体和一个深度学习框架来实现异常检测方法可能无法反映其真实性能,并且可能会误导用户倾向于某种方法而不是另一种。本文旨在研究流行的深度学习框架中可用的各种类型的RNNs对实时轻量级时间序列异常检测的影响。我们回顾了几种最新的方法,并使用三种广泛认可的深度学习框架支持的知名RNN变体实现了一个代表性的异常检测方法。然后进行了全面的评估,以分析每种实现在真实世界的开源时间序列数据集上的性能。评估结果为选择适合的RNN变体和深度学习框架进行实时、轻量级时间序列异常检测提供了有价值的指导。
更新时间: 2024-07-26 00:38:51
领域: cs.LG
LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments
In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which includes 17,681 headline A/B tests, we first investigate three pure-LLM approaches: prompt-based methods, embedding-based classification models, and fine-tuned open-source LLMs. We find that prompt-based approaches perform poorly, achieving no more than 65\% accuracy in identifying the catchier headline. In contrast, both OpenAI-embedding-based classification models and fine-tuned Llama-3 with 8 billion parameters achieve an accuracy of around 82-84\%. We then introduce LOLA, which combines the best pure-LLM approach with the Upper Confidence Bound algorithm to allocate traffic and maximize clicks adaptively. Our numerical experiments on Upworthy data show that LOLA outperforms the standard A/B test method (the current status quo at Upworthy), pure bandit algorithms, and pure-LLM approaches, particularly in scenarios with limited experimental traffic. Our approach is scalable and applicable to content experiments across various settings where firms seek to optimize user engagement, including digital advertising and social media recommendations.
Updated: 2024-07-26 00:26:10
标题: LOLA: 基于LLM辅助的内容实验在线学习算法
摘要: 在快速发展的数字内容领域,媒体公司和新闻发行商需要自动化和高效的方法来增强用户参与度。本文介绍了LLM辅助在线学习算法(LOLA),这是一个将大型语言模型(LLMs)与自适应实验相结合的新框架,用于优化内容传递。利用来自Upworthy的大规模数据集,其中包括17,681个标题A/B测试,我们首先研究了三种纯LLM方法:基于提示的方法,基于嵌入的分类模型和经过微调的开源LLMs。我们发现基于提示的方法表现不佳,在识别更引人注目的标题方面的准确率不超过65%。相比之下,OpenAI-基于嵌入的分类模型和具有80亿参数的经过微调的Llama-3的准确率约为82-84%。然后,我们介绍了LOLA,它将最佳纯LLM方法与上置信界(UCB)算法相结合,自适应地分配流量并最大化点击量。我们在Upworthy数据上的数值实验表明,LOLA在限制实验流量的情况下胜过标准A/B测试方法(Upworthy目前的现状),纯强盗算法和纯LLM方法,特别是在场景中。我们的方法可扩展且适用于各种设置的内容实验,其中公司寻求优化用户参与度,包括数字广告和社交媒体推荐。
更新时间: 2024-07-26 00:26:10
领域: cs.LG,stat.ML
Mixed Non-linear Quantization for Vision Transformers
The majority of quantization methods have been proposed to reduce the model size of Vision Transformers, yet most of them have overlooked the quantization of non-linear operations. Only a few works have addressed quantization for non-linear operations, but they applied a single quantization method across all non-linear operations. We believe that this can be further improved by employing a different quantization method for each non-linear operation. Therefore, to assign the most error-minimizing quantization method from the known methods to each non-linear layer, we propose a mixed non-linear quantization that considers layer-wise quantization sensitivity measured by SQNR difference metric. The results show that our method outperforms I-BERT, FQ-ViT, and I-ViT in both 8-bit and 6-bit settings for ViT, DeiT, and Swin models by an average of 0.6%p and 19.6%p, respectively. Our method outperforms I-BERT and I-ViT by 0.6%p and 20.8%p, respectively, when training time is limited. We plan to release our code at https://gitlab.com/ones-ai/mixed-non-linear-quantization.
Updated: 2024-07-26 00:19:01
标题: 视觉变换器的混合非线性量化
摘要: 大多数量化方法旨在减小Vision Transformers模型的大小,但大多数方法忽略了非线性操作的量化。只有少数研究致力于非线性操作的量化,但它们在所有非线性操作中应用了单一的量化方法。我们认为,通过为每个非线性操作采用不同的量化方法,可以进一步改进。因此,为了将已知方法中最大限度地减少误差的量化方法分配给每个非线性层,我们提出了一种考虑以SQNR差值度量的层间量化敏感性的混合非线性量化方法。结果表明,我们的方法在ViT、DeiT和Swin模型的8位和6位设置中表现优于I-BERT、FQ-ViT和I-ViT,分别平均提高了0.6%和19.6%。当训练时间受限时,我们的方法分别比I-BERT和I-ViT提高了0.6%和20.8%。我们计划在https://gitlab.com/ones-ai/mixed-non-linear-quantization发布我们的代码。
更新时间: 2024-07-26 00:19:01
领域: cs.CV,cs.AI
On Convergence Analysis of Policy Iteration Algorithms for Entropy-Regularized Stochastic Control Problems
In this paper we investigate the issues regarding the convergence of the Policy Iteration Algorithm(PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the PIA (see, e.g., Huang-Wang-Zhou(2023)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the infinite horizon model with large discount factor and in the finite horizon model, the similar arguments lead to the exponential rate of convergence of PIA without tear. Finally, with some extra efforts we show that our approach can also be extended to the case when diffusion contains control, in the one dimensional setting but without much extra constraints on the coefficients. We believe that these results are new in the literature.
Updated: 2024-07-26 00:15:47
标题: 关于熵正则化随机控制问题的策略迭代算法收敛分析
摘要: 在本文中,我们研究了一类一般连续时间熵正则化随机控制问题的政策迭代算法(PIA)收敛的问题。特别是,我们不像在PIA中涉及的迭代PDE中使用复杂的PDE估计(例如,参见Huang-Wang-Zhou(2023)),而是为PIA的收敛提供了一个简单的从头开始的证明。我们的方法建立在PDE解及其导数的概率表示公式基础上。此外,在具有大折扣因子的无限期模型和有限期模型中,类似的论证导致PIA的指数收敛率而不会出现分歧。最后,通过一些额外的努力,我们展示了我们的方法也可以扩展到扩散包含控制的情况,在一维设置中但对系数没有太多额外的约束。我们相信这些结果在文献中是新的。
更新时间: 2024-07-26 00:15:47
领域: math.OC,cs.LG,93E35, 60H30, 35Q93
A Model for Combinatorial Dictionary Learning and Inference
We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a combinatorial model in which to study this question, motivated by the way objects occlude each other in a scene to form an image. First, we identify a property we call "well-structuredness" of a set of low-dimensional components which ensures that no two components in the set are too similar. We show how well-structuredness is sufficient for learning the set of latent components comprising a set of sample instances. We then consider the problem: given a set of components and an instance generated from some unknown subset of them, identify which parts of the instance arise from which components. We consider two variants: (1) determine the minimal number of components required to explain the instance; (2) determine the correct explanation for as many locations as possible. For the latter goal, we also devise a version that is robust to adversarial corruptions, with just a slightly stronger assumption on the components. Finally, we show that the learning problem is computationally infeasible in the absence of any assumptions.
Updated: 2024-07-26 00:13:30
标题: 一个用于组合字典学习和推理的模型
摘要: 我们经常对将复杂的结构化数据分解为简单的成分以解释数据感兴趣。这个问题的线性版本被广泛研究,称为字典学习和因子分析。在这项工作中,我们提出了一个组合模型来研究这个问题,灵感来自于场景中物体相互遮挡形成图像的方式。首先,我们确定了一个我们称之为“良好结构”的低维成分集的属性,该属性确保集合中没有两个成分过于相似。我们展示了良好结构对学习包含一组样本实例的潜在成分集是足够的。然后,我们考虑以下问题:给定一组成分和从其中一些未知子集生成的实例,确定实例中哪些部分来自哪些成分。我们考虑了两种变体:(1)确定解释实例所需的最小成分数量;(2)确定尽可能多的位置的正确解释。对于后一目标,我们还设计了一个对抗性破坏具有鲁棒性的版本,只需对成分做出稍微更强的假设。最后,我们表明在没有任何假设的情况下,学习问题在计算上是不可行的。
更新时间: 2024-07-26 00:13:30
领域: cs.LG,cs.DS
Investigating the Privacy Risk of Using Robot Vacuum Cleaners in Smart Environments
Robot vacuum cleaners have become increasingly popular and are widely used in various smart environments. To improve user convenience, manufacturers also introduced smartphone applications that enable users to customize cleaning settings or access information about their robot vacuum cleaners. While this integration enhances the interaction between users and their robot vacuum cleaners, it results in potential privacy concerns because users' personal information may be exposed. To address these concerns, end-to-end encryption is implemented between the application, cloud service, and robot vacuum cleaners to secure the exchanged information. Nevertheless, network header metadata remains unencrypted and it is still vulnerable to network eavesdropping. In this paper, we investigate the potential risk of private information exposure through such metadata. A popular robot vacuum cleaner was deployed in a real smart environment where passive network eavesdropping was conducted during several selected cleaning events. Our extensive analysis, based on Association Rule Learning, demonstrates that it is feasible to identify certain events using only the captured Internet traffic metadata, thereby potentially exposing private user information and raising privacy concerns.
Updated: 2024-07-26 00:00:53
标题: 研究在智能环境中使用机器人吸尘器的隐私风险
摘要: 机器人吸尘器越来越受欢迎,在各种智能环境中被广泛使用。为了提高用户便利性,制造商还推出了智能手机应用程序,使用户可以自定义清洁设置或访问有关他们的机器人吸尘器的信息。虽然这种集成增强了用户与他们的机器人吸尘器之间的互动,但也导致潜在的隐私问题,因为用户的个人信息可能会被泄露。为了解决这些问题,在应用程序、云服务和机器人吸尘器之间实施了端到端加密,以保护交换的信息。然而,网络头部元数据仍然未加密,仍然容易受到网络窃听的攻击。在本文中,我们调查了通过这些元数据可能暴露私人信息的潜在风险。在一个真实的智能环境中部署了一款热门的机器人吸尘器,在几次选择的清洁事件期间进行了被动网络窃听。我们基于关联规则学习进行了广泛的分析,结果表明仅使用捕获的互联网流量元数据就有可能识别出某些事件,从而可能暴露私人用户信息并引起隐私问题。
更新时间: 2024-07-26 00:00:53
领域: cs.LG,cs.AI