Arxiv Day: Article

UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data

Most complex diseases, including cancer and non-malignant diseases like asthma, have distinct molecular subtypes that require distinct clinical approaches. However, existing computational patient stratification methods have been benchmarked almost exclusively on cancer omics data and only perform well when mutually exclusive subtypes can be characterized by many biomarkers. Here, we contribute with a massive evaluation attempt, quantitatively exploring the power of 22 unsupervised patient stratification methods using both, simulated and real transcriptome data. From this experience, we developed UnPaSt (https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification, working even with only a limited number of subtype-predictive biomarkers. We evaluated all 23 methods on real-world breast cancer and asthma transcriptomics data. Although many methods reliably detected major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets. Essentially, we showed that UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.

Updated: 2024-07-31 23:50:27

标题: UnPaSt：基于组学数据中差异表达的双向簇分类的无监督患者分层

摘要: 大多数复杂疾病，包括癌症和非恶性疾病如哮喘，都具有需要不同临床方法的明显的分子亚型。然而，现有的计算机患者分层方法几乎完全基于癌症组学数据进行基准测试，只有在可以通过许多生物标志物表征互斥亚型时才表现良好。在这里，我们做出了一次大规模的评估尝试，定量探讨了22种无监督患者分层方法的能力，同时使用模拟和真实的转录组数据。通过这次经验，我们开发了UnPaSt（https://apps.cosy.bio/unpast/），优化了无监督患者分层，在仅有有限数量的亚型预测生物标志物的情况下也能运行。我们在真实的乳腺癌和哮喘转录组数据上评估了所有23种方法。尽管许多方法可可靠地检测到主要的乳腺癌亚型，但只有少数方法发现了高Th2型哮喘，而UnPaSt在两个测试数据集中显著优于其竞争对手。本质上，我们展示了UnPaSt能够在组学数据中检测出许多具有生物学洞察力和可重复性的模式。

更新时间: 2024-07-31 23:50:27

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2408.00200v1

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models

Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. We evaluate the effectiveness of open source GPT models for the task of automatic identification of the presence of vulnerable code syntax (specifically targeting C and C++ source code). This task is evaluated on a selection of 36 source code examples from the NIST SARD dataset, which are specifically curated to not contain natural English that indicates the presence, or lack thereof, of a particular vulnerability. The NIST SARD source code dataset contains identified vulnerable lines of source code that are examples of one out of the 839 distinct Common Weakness Enumerations (CWE), allowing for exact quantification of the GPT output classification error rate. A total of 5 GPT models are evaluated, using 10 different inference temperatures and 100 repetitions at each setting, resulting in 5,000 GPT queries per vulnerable source code analyzed. Ultimately, we find that the GPT models that we evaluated are not suitable for fully automated vulnerability scanning because the false positive and false negative rates are too high to likely be useful in practice. However, we do find that the GPT models perform surprisingly well at automated vulnerability detection for some of the test cases, in particular surpassing random sampling, and being able to identify the exact lines of code that are vulnerable albeit at a low success rate. The best performing GPT model result found was Llama-2-70b-chat-hf with inference temperature of 0.1 applied to NIST SARD test case 149165 (which is an example of a buffer overflow vulnerability), which had a binary classification recall score of 1.0 and a precision of 1.0 for correctly and uniquely identifying the vulnerable line of code and the correct CWE number.

Updated: 2024-07-31 23:33:26

标题: 使用生成式预训练转换模型进行自动化软件漏洞静态代码分析

摘要: 生成式预训练转换器模型已被证明在各种自然语言处理任务中非常有效，包括生成计算机代码。我们评估了开源GPT模型在自动识别存在易受攻击代码语法（特别针对C和C++源代码）任务中的有效性。该任务在NIST SARD数据集的36个源代码示例中进行评估，这些示例经过专门策划，不包含指示特定漏洞存在与否的自然英语。NIST SARD源代码数据集包含已识别的源代码中易受攻击的行，这些行属于839个不同的通用弱点枚举（CWE）中的一个示例，允许对GPT输出分类错误率进行精确量化。共评估了5个GPT模型，使用10种不同的推断温度和每个设置下100次重复，从而每个易受攻击的源代码分析中进行了5,000次GPT查询。最终，我们发现我们评估的GPT模型不适用于完全自动化的漏洞扫描，因为假阳性和假阴性率太高，可能在实践中无法使用。然而，我们发现GPT模型在某些测试案例中表现出人意外的良好自动漏洞检测能力，特别是超过随机抽样，并能够识别出易受攻击的代码行，尽管成功率较低。发现的最佳表现的GPT模型结果是应用于NIST SARD测试案例149165（这是一个缓冲区溢出漏洞示例）的Llama-2-70b-chat-hf，其二元分类召回率为1.0，精度为1.0，可以正确且唯一地识别易受攻击的代码行和正确的CWE编号。

更新时间: 2024-07-31 23:33:26

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.00197v1

Combining audio control and style transfer using latent diffusion

Deep generative models are now able to synthesize high-quality audio signals, shifting the critical aspect in their development from audio quality to control capabilities. Although text-to-music generation is getting largely adopted by the general public, explicit control and example-based style transfer are more adequate modalities to capture the intents of artists and musicians. In this paper, we aim to unify explicit control and style transfer within a single model by separating local and global information to capture musical structure and timbre respectively. To do so, we leverage the capabilities of diffusion autoencoders to extract semantic features, in order to build two representation spaces. We enforce disentanglement between those spaces using an adversarial criterion and a two-stage training strategy. Our resulting model can generate audio matching a timbre target, while specifying structure either with explicit controls or through another audio example. We evaluate our model on one-shot timbre transfer and MIDI-to-audio tasks on instrumental recordings and show that we outperform existing baselines in terms of audio quality and target fidelity. Furthermore, we show that our method can generate cover versions of complete musical pieces by transferring rhythmic and melodic content to the style of a target audio in a different genre.

Updated: 2024-07-31 23:27:27

标题: 将潜在扩散用于结合音频控制和风格转移

摘要: 深度生成模型现在能够合成高质量的音频信号，将它们的发展重点从音频质量转移到控制能力。虽然文本转音乐生成受到大众的广泛接受，但显式控制和基于示例的风格转移更适合捕捉艺术家和音乐家的意图。在本文中，我们旨在将显式控制和风格转移统一到一个模型中，通过分离本地和全局信息分别捕捉音乐结构和音色。为此，我们利用扩散自动编码器的能力提取语义特征，以构建两个表示空间。我们利用对抗标准和两阶段训练策略在这些空间之间强制解耦。我们的模型可以生成与音色目标匹配的音频，同时可以通过显式控制或通过另一个音频示例指定结构。我们在一次性音色转移和MIDI到音频任务上对我们的模型进行评估，并展示我们在音频质量和目标保真度方面胜过现有基线。此外，我们展示了我们的方法可以通过将节奏和旋律内容转移到目标音频的风格中，生成完整音乐作品的翻唱版本。

更新时间: 2024-07-31 23:27:27

领域: cs.SD,cs.LG,eess.AS,stat.ML

下载: http://arxiv.org/abs/2408.00196v1

Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges

In order to deploy deep neural networks (DNNs) in high-stakes scenarios, it is imperative that DNNs provide inference robust to external perturbations - both intentional and unintentional.Although the resilience of DNNs to intentional and unintentional perturbations has been widely investigated, a unified vision of these inherently intertwined problem domains is still missing.In this work, we fill this gap by providing a survey of the state of the art and highlighting the similarities of the proposed approaches.We also analyze the research challenges that need to be addressed to deploy resilient and secure DNNs.As there has not been any such survey connecting the resilience of DNNs to intentional and unintentional perturbations, we believe this work can help advance the frontier in both domains by enabling the exchange of ideas between the two communities.

Updated: 2024-07-31 23:20:46

标题: 深度神经网络对有意和无意扰动的鲁棒性和安全性：调查和研究挑战

摘要: 为了在高风险场景中部署深度神经网络（DNNs），DNNs必须提供对外部扰动（无论是故意的还是无意的）具有鲁棒的推理能力。尽管DNNs对故意和无意扰动的鲁棒性已经得到广泛研究，但对这两个本质上交织在一起的问题领域的统一视野仍然缺失。在这项工作中，我们通过提供最新技术的调查和突出提出方法的相似之处来填补这一空白。我们还分析了需要解决的部署具有鲁棒性和安全性的DNNs的研究挑战。由于还没有任何将DNNs的鲁棒性与故意和无意扰动连接起来的调查，我们相信这项工作可以通过促进两个领域之间的思想交流来推进这一前沿。

更新时间: 2024-07-31 23:20:46

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2408.00193v1

S-SYNTH: Knowledge-Based, Synthetic Generation of Skin Images

Development of artificial intelligence (AI) techniques in medical imaging requires access to large-scale and diverse datasets for training and evaluation. In dermatology, obtaining such datasets remains challenging due to significant variations in patient populations, illumination conditions, and acquisition system characteristics. In this work, we propose S-SYNTH, the first knowledge-based, adaptable open-source skin simulation framework to rapidly generate synthetic skin, 3D models and digitally rendered images, using an anatomically inspired multi-layer, multi-component skin and growing lesion model. The skin model allows for controlled variation in skin appearance, such as skin color, presence of hair, lesion shape, and blood fraction among other parameters. We use this framework to study the effect of possible variations on the development and evaluation of AI models for skin lesion segmentation, and show that results obtained using synthetic data follow similar comparative trends as real dermatologic images, while mitigating biases and limitations from existing datasets including small dataset size, lack of diversity, and underrepresentation.

Updated: 2024-07-31 23:16:29

标题: S-SYNTH：基于知识的合成皮肤图像生成

摘要: 在医学成像中开发人工智能（AI）技术需要访问大规模和多样化的数据集进行训练和评估。在皮肤病学领域，由于患者人群、光照条件和采集系统特征的显著变化，获取这样的数据集仍然具有挑战性。在这项工作中，我们提出了S-SYNTH，这是第一个基于知识的、可适应的开源皮肤模拟框架，可以快速生成合成皮肤、3D模型和数字渲染图像，利用解剖学启发的多层、多组分皮肤和生长病变模型。该皮肤模型允许对皮肤外观进行控制性变化，如皮肤颜色、毛发存在、病变形状和血液含量等参数。我们使用这个框架研究可能变化对皮肤病变分割的AI模型开发和评估的影响，并展示使用合成数据获得的结果与真实皮肤病学图像呈现类似的比较趋势，同时减轻了现有数据集的偏见和限制，包括数据集规模小、缺乏多样性和代表性不足。

更新时间: 2024-07-31 23:16:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.00191v1

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, leading us to new frontiers in chemical understanding and discovery. Hence, we introduce Asymmetric Contrastive Multimodal Learning (ACML) as a novel approach tailored for molecules, showcasing its potential to advance the field of chemistry. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. We demonstrate the effectiveness of this framework through large-scale cross-modality retrieval and isomer discrimination tasks. Additionally, ACML enhances interpretability by revealing chemical semantics in graph presentations and bolsters the expressive power of graph neural networks, as evidenced by improved performance in molecular property prediction tasks from MoleculeNet. ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of the chemical semantics of different modalities.

Updated: 2024-07-31 22:32:34

标题: 不对称对比多模态学习以推进化学理解

摘要: 多模态深度学习的多功能性为推进科学研究和实际应用带来了巨大的潜力。随着这一领域的不断发展，跨模态分析的集体力量有望推动革命性创新，引领我们走向化学理解和发现的新领域。因此，我们引入了一种针对分子量身定制的新方法——不对称对比多模态学习（ACML），展示了其推进化学领域的潜力。ACML利用有效的不对称对比学习的力量，将信息从不同的化学模态无缝地转移到分子图表示中。通过结合预训练的化学单模态编码器和一个浅层设计的图编码器，ACML促进了来自不同模态的协调化学语义的同化，实现了具有高效训练的综合表示学习。我们通过大规模的跨模态检索和同分异构体鉴别任务展示了这一框架的有效性。此外，ACML通过在图表现中揭示化学语义，增强了可解释性，并在MoleculeNet的分子属性预测任务中取得了性能提升，从而增强了图神经网络的表现能力。ACML展示了其颠覆化学研究和应用的能力，提供了对不同模态的化学语义更深入理解的可能性。

更新时间: 2024-07-31 22:32:34

领域: cs.LG

下载: http://arxiv.org/abs/2311.06456v3

Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

We introduce the elEmBERT model for chemical classification tasks. It is based on deep learning techniques, such as a multilayer encoder architecture. We demonstrate the opportunities offered by our approach on sets of organic, inorganic and crystalline compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct an analysis of vector representations of chemical compounds, shedding light on the underlying patterns in structural data. Our model exhibits exceptional predictive capabilities and proves universally applicable to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average precision of 96%, surpassing the previously best result by 10%.

Updated: 2024-07-31 22:27:28

标题: 结构与性质：化学元素嵌入和深度学习方法在准确预测化学性质方面的应用

摘要: 我们介绍了elEmBERT模型，用于化学分类任务。它基于深度学习技术，如多层编码器架构。我们展示了我们的方法在有机、无机和晶体化合物集合上提供的机会。特别是，我们使用包括晶体属性和药物设计相关基准的Matbench和Moleculenet基准集合开发和测试了模型。我们还对化合物的向量表示进行了分析，揭示了结构数据中的基本模式。我们的模型表现出异常的预测能力，并证明在分子和材料数据集中普遍适用。例如，在Tox21数据集上，我们实现了96%的平均精度，超过先前最佳结果10%。

更新时间: 2024-07-31 22:27:28

领域: physics.chem-ph,cond-mat.mtrl-sci,cs.LG,physics.atm-clus,q-bio.QM

下载: http://arxiv.org/abs/2309.09355v2

Adapting Skills to Novel Grasps: A Self-Supervised Approach

In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills

Updated: 2024-07-31 22:18:09

标题: 将技能调整为新抓取：一种自监督方法

摘要: 在这篇论文中，我们研究了将涉及握持物体（例如工具）的操作轨迹从单个握持姿势适应到新的握持姿势的问题。一种常见的解决方法是为每种可能的握持姿势明确定义一个新的轨迹，但这是非常低效的。相反，我们提出了一种方法，可以在只需要一个自监督数据收集期间直接调整这些轨迹，在此期间，摄像头观察到机器人的末端执行器与物体紧密握持移动。重要的是，我们的方法不需要关于握持物体的先验知识（如3D CAD模型），它可以使用RGB图像、深度图像或两者，并且不需要摄像头校准。通过一系列涉及1360次评估的真实世界实验，我们发现自监督RGB数据始终优于依赖深度图像的备选方案，包括几种最先进的姿势估计方法。与表现最佳的基线相比，我们的方法在将操作轨迹调整到几种日常任务的新握持时，成功率平均提高了28.5%。实验视频可在我们的网站https://www.robot-learning.uk/adapting-skills 上观看。

更新时间: 2024-07-31 22:18:09

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.00178v1

Longhorn: State Space Models are Amortized Online Learners

The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.

Updated: 2024-07-31 22:09:50

标题: 长角牛：状态空间模型是摊销在线学习者

摘要: 现代人工智能方法（如大型语言模型（LLMs））最基本的能力是能够预测长序列中的下一个标记，称为“序列建模”。尽管变压器模型是当前主导的序列建模方法，但其相对于序列长度的二次计算成本是一个重大缺点。状态空间模型（SSMs）由于其线性解码效率和高训练并行性而提供了一个有希望的替代方案。然而，现有的SSMs通常依赖看似临时设计的线性递归。在这项工作中，我们通过在线学习的视角探讨SSM设计，将SSMs概念化为特定在线学习问题的元模块。这种方法将SSM设计与制定精确的在线学习目标联系起来，状态转移规则是从优化这些目标中导出的。基于这一见解，我们引入了一种基于隐式更新的新型深度SSM架构，用于优化在线回归目标。我们的实验结果显示，我们的模型在标准序列建模基准和语言建模任务上优于最先进的SSMs，包括Mamba模型。

更新时间: 2024-07-31 22:09:50

领域: cs.LG

下载: http://arxiv.org/abs/2407.14207v4

Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge

A common practice in deep learning involves training large neural networks on massive datasets to achieve high accuracy across various domains and tasks. While this approach works well in many application areas, it often fails drastically when processing data from a new modality with a significant distribution shift from the data used to pre-train the model. This paper focuses on adapting a large object detection model trained on RGB images to new data extracted from IR images with a substantial modality shift. We propose Modality Translator (ModTr) as an alternative to the common approach of fine-tuning a large model to the new modality. ModTr adapts the IR input image with a small transformation network trained to directly minimize the detection loss. The original RGB model can then work on the translated inputs without any further changes or fine-tuning to its parameters. Experimental results on translating from IR to RGB images on two well-known datasets show that our simple approach provides detectors that perform comparably or better than standard fine-tuning, without forgetting the knowledge of the original model. This opens the door to a more flexible and efficient service-based detection pipeline, where a unique and unaltered server, such as an RGB detector, runs constantly while being queried by different modalities, such as IR with the corresponding translations model. Our code is available at: https://github.com/heitorrapela/ModTr.

Updated: 2024-07-31 21:50:57

标题: 目标检测适应的模态转换，无需忘记先前的知识

摘要: 深度学习中的常见做法包括在大型数据集上训练大型神经网络，以在各个领域和任务中实现高准确性。虽然这种方法在许多应用领域表现良好，但在处理具有与用于预训练模型的数据显著分布偏移的新模态数据时往往失败严重。本文重点研究了将在RGB图像上训练的大型目标检测模型调整到从IR图像提取的具有重大模态偏移的新数据。我们提出了模态翻译器（ModTr）作为替代将大型模型微调到新模态的常见方法。ModTr使用一个小型变换网络对IR输入图像进行调整，该网络经过训练以直接最小化检测损失。然后，原始的RGB模型可以在经过翻译的输入上工作，而无需对其参数进行任何进一步的更改或微调。在两个知名数据集上从IR到RGB图像的转换实验结果表明，我们的简单方法提供了表现相当或更好的检测器，而不会忘记原始模型的知识。这打开了一个更灵活和高效的基于服务的检测管线的大门，其中一个唯一且未更改的服务器，例如RGB检测器，可以持续运行，同时通过不同的模态查询，如带有相应翻译模型的IR。我们的代码可在以下网址获得：https://github.com/heitorrapela/ModTr。

更新时间: 2024-07-31 21:50:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.01492v3

CREW: Facilitating Human-AI Teaming Research

With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.

Updated: 2024-07-31 21:43:55

标题: CREW: 促进人工智能与人类团队合作研究

摘要: 随着人工智能（AI）技术的日益部署，人类与AI代理合作的潜力正在迅速增长。人类与AI代理团队合作是研究人类和AI代理共同工作时的重要范式。人类与AI团队合作研究的独特之处在于需要联合研究人类和AI代理，需要从机器学习到人机交互、机器人学、认知科学、神经科学、心理学、社会科学和复杂系统等多学科研究努力。然而，现有的人类与AI团队合作研究平台有限，通常只支持过于简化的场景和单一任务，或专门关注人类团队研究或多智能体AI算法。我们介绍了CREW，这是一个旨在促进人类与AI代理团队合作研究并吸引来自多个科学学科的合作的平台，强调人类参与。它包括为认知研究和人类与AI团队合作设计的预构建任务，具有可扩展潜力来自我们的模块化设计。与传统的认知神经科学研究一样，CREW还支持多模式人类生理信号记录用于行为分析。此外，CREW通过使用最先进的算法和调优的基准，对实时由人类引导的强化学习代理进行基准测试。通过CREW，我们能够在一周内进行50个人类受试者研究，以验证我们基准的有效性。

更新时间: 2024-07-31 21:43:55

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00170v1

Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation

In this paper, we introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches, termed Lazy Video Object Segmentation (ziVOS). In contrast, to both tasks, which handle video object segmentation in an off-line manner (i.e., pre-recorded sequences), we propose through ziVOS to target online recorded sequences. Here, we strive to strike a balance between performance and robustness for long-term scenarios by soliciting user feedback's on-the-fly during the segmentation process. Hence, we aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period. We propose a competitive baseline, i.e., Lazy-XMem, as a reference for future works in ziVOS. Our proposed approach uses an uncertainty estimation of the tracking state to determine whether a user interaction is necessary to refine the model's prediction. To quantitatively assess the performance of our method and the user's workload, we introduce complementary metrics alongside those already established in the field. We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos. Our code is publicly available at https://github.com/Vujas-Eteph/LazyXMem.

Updated: 2024-07-31 21:42:42

标题: 平衡取舍：基于即时不确定性的用户交互，用于长期视频目标分割

摘要: 在这篇论文中，我们介绍了一种视频对象分割（VOS）的变体，将交互式和半自动方法结合起来，称为懒惰视频对象分割（ziVOS）。与处理视频对象分割的离线方式（即预先录制的序列）相比，我们通过ziVOS提出针对在线录制序列的方法。在这里，我们努力在分割过程中实时征求用户反馈，以在长期场景下在性能和稳健性之间取得平衡。因此，我们的目标是最大化对感兴趣对象的跟踪持续时间，同时需要最少的用户更正来维持跟踪时间。我们提出了一个竞争基准，即懒惰-XMem，作为ziVOS未来工作的参考。我们的方法利用对跟踪状态的不确定性估计来确定是否需要用户交互来优化模型的预测。为了定量评估我们的方法的性能和用户的工作量，我们引入了与领域已有指标相辅相成的指标。我们使用最近推出的LVOS数据集来评估我们的方法，该数据集提供了许多长期视频。我们的代码可以在https://github.com/Vujas-Eteph/LazyXMem 上公开获取。

更新时间: 2024-07-31 21:42:42

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2408.00169v1

Finch: Prompt-guided Key-Value Cache Compression

Recent large language model applications, such as Retrieval-Augmented Generation and chatbots, have led to an increased need to process longer input contexts. However, this requirement is hampered by inherent limitations. Architecturally, models are constrained by a context window defined during training. Additionally, processing extensive texts requires substantial GPU memory. We propose a novel approach, Finch, to compress the input context by leveraging the pre-trained model weights of the self-attention. Given a prompt and a long text, Finch iteratively identifies the most relevant Key (K) and Value (V) pairs over chunks of the text conditioned on the prompt. Only such pairs are stored in the KV cache, which, within the space constrained by the context window, ultimately contains a compressed version of the long text. Our proposal enables models to consume large inputs even with high compression (up to 93x) while preserving semantic integrity without the need for fine-tuning.

Updated: 2024-07-31 21:33:56

标题: 雀：提示导向的键-值缓存压缩

摘要: 最近大型语言模型应用，如检索增强生成和聊天机器人，导致了对处理更长输入上下文的增加需求。然而，这一要求受到固有限制的影响。在架构上，模型受到训练期间定义的上下文窗口的限制。此外，处理大量文本需要大量的GPU内存。我们提出了一种新颖的方法，Finch，通过利用自注意力的预训练模型权重来压缩输入上下文。给定一个提示和一个长文本，Finch在基于提示的文本块上迭代地识别最相关的Key（K）和Value（V）对。只有这样的对被存储在KV缓存中，最终在上下文窗口的空间限制下包含了长文本的压缩版本。我们的提议使模型能够消耗大型输入，即使进行高度压缩（最多达到93倍），同时保持语义完整性，无需进行微调。

更新时间: 2024-07-31 21:33:56

领域: cs.AI

下载: http://arxiv.org/abs/2408.00167v1

Review of Explainable Graph-Based Recommender Systems

Explainability of recommender systems has become essential to ensure users' trust and satisfaction. Various types of explainable recommender systems have been proposed including explainable graph-based recommender systems. This review paper discusses state-of-the-art approaches of these systems and categorizes them based on three aspects: learning methods, explaining methods, and explanation types. It also explores the commonly used datasets, explainability evaluation methods, and future directions of this research area. Compared with the existing review papers, this paper focuses on explainability based on graphs and covers the topics required for developing novel explainable graph-based recommender systems.

Updated: 2024-07-31 21:30:36

标题: 可解释的基于图的推荐系统综述

摘要: 推荐系统的可解释性已成为确保用户信任和满意的重要因素。已经提出了各种类型的可解释性推荐系统，包括可解释性基于图的推荐系统。本综述论文讨论了这些系统的最新方法，并根据三个方面对它们进行分类：学习方法、解释方法和解释类型。它还探讨了常用的数据集、可解释性评估方法以及这一研究领域的未来方向。与现有的综述论文相比，本文侧重于基于图形的可解释性，并涵盖了开发新型可解释性基于图的推荐系统所需的主题。

更新时间: 2024-07-31 21:30:36

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00166v1

Non-convolutional Graph Neural Networks

Rethink convolution-based graph neural networks (GNN) -- they characteristically suffer from limited expressiveness, over-smoothing, and over-squashing, and require specialized sparse kernels for efficient computation. Here, we design a simple graph learning module entirely free of convolution operators, coined \textit{random walk with unifying memory} (RUM) neural network, where an RNN merges the topological and semantic graph features along the random walks terminating at each node. Relating the rich literature on RNN behavior and graph topology, we theoretically show and experimentally verify that RUM attenuates the aforementioned symptoms and is more expressive than the Weisfeiler-Lehman (WL) isomorphism test. On a variety of node- and graph-level classification and regression tasks, RUM not only achieves competitive performance, but is also robust, memory-efficient, scalable, and faster than the simplest convolutional GNNs.

Updated: 2024-07-31 21:29:26

标题: 非卷积图神经网络

摘要: 重新思考基于卷积的图神经网络（GNN） - 它们通常受限于表达能力有限、平滑过度和压缩过度，并且需要专门的稀疏核以进行高效计算。在这里，我们设计了一个完全不涉及卷积操作的简单图学习模块，称为\textit{随机游走统一记忆}（RUM）神经网络，其中一个RNN将随机游走终止于每个节点的拓扑和语义图特征进行合并。通过关联RNN行为和图拓扑的丰富文献，我们在理论上表明并在实验中验证RUM减轻了前述症状并且比Weisfeiler-Lehman（WL）同构测试更具表达能力。在各种节点和图级别的分类和回归任务上，RUM不仅取得了竞争性的表现，而且具有鲁棒性、记忆效率、可扩展性，并且比最简单的卷积GNN更快。

更新时间: 2024-07-31 21:29:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.00165v1

A Culturally-Aware Tool for Crowdworkers: Leveraging Chronemics to Support Diverse Work Styles

Crowdsourcing markets are expanding worldwide, but often feature standardized interfaces that ignore the cultural diversity of their workers, negatively impacting their well-being and productivity. To transform these workplace dynamics, this paper proposes creating culturally-aware workplace tools, specifically designed to adapt to the cultural dimensions of monochronic and polychronic work styles. We illustrate this approach with "CultureFit," a tool that we engineered based on extensive research in Chronemics and culture theories. To study and evaluate our tool in the real world, we conducted a field experiment with 55 workers from 24 different countries. Our field experiment revealed that CultureFit significantly improved the earnings of workers from cultural backgrounds often overlooked in design. Our study is among the pioneering efforts to examine culturally aware digital labor interventions. It also provides access to a dataset with over two million data points on culture and digital work, which can be leveraged for future research in this emerging field. The paper concludes by discussing the importance and future possibilities of incorporating cultural insights into the design of tools for digital labor.

Updated: 2024-07-31 21:22:41

标题: 一种面向文化意识的众包工作者工具：利用时间语言学支持多样化的工作风格

摘要: 众包市场正在全球范围内扩张，但通常具有标准化界面，忽略了员工的文化多样性，对其福祉和生产力产生了负面影响。为了改变这种工作场所动态，本文提出创建具有文化意识的工作场所工具，专门设计为适应单向时间和多向时间工作风格的文化维度。我们以我们在时间观念学和文化理论方面的广泛研究为基础，通过"CultureFit"工具来说明这种方法。为了在现实世界中研究和评估我们的工具，我们进行了一项实地实验，共有来自24个不同国家的55名工人参与。我们的实地实验表明，CultureFit显著提高了来自设计中经常被忽视的文化背景的工人的收入。我们的研究是在审查具有文化意识的数字劳动干预方面的开创性努力之一。它还提供了一个拥有超过两百万数据点关于文化和数字工作的数据集，可以为未来在这一新兴领域的研究提供支持。本文最后讨论了将文化洞察力纳入数字劳动工具设计的重要性和未来可能性。

更新时间: 2024-07-31 21:22:41

领域: cs.HC,cs.AI,68U35,H.5.2

下载: http://arxiv.org/abs/2408.07838v1

Beyond Size and Class Balance: Alpha as a New Dataset Quality Metric for Deep Learning

In deep learning, achieving high performance on image classification tasks requires diverse training sets. However, the current best practice$\unicode{x2013}$maximizing dataset size and class balance$\unicode{x2013}$does not guarantee dataset diversity. We hypothesized that, for a given model architecture, model performance can be improved by maximizing diversity more directly. To test this hypothesis, we introduce a comprehensive framework of diversity measures from ecology that generalizes familiar quantities like Shannon entropy by accounting for similarities among images. (Size and class balance emerge as special cases.) Analyzing thousands of subsets from seven medical datasets showed that the best correlates of performance were not size or class balance but $A$$\unicode{x2013}$"big alpha"$\unicode{x2013}$a set of generalized entropy measures interpreted as the effective number of image-class pairs in the dataset, after accounting for image similarities. One of these, $A_0$, explained 67% of the variance in balanced accuracy, vs. 54% for class balance and just 39% for size. The best pair of measures was size-plus-$A_1$ (79%), which outperformed size-plus-class-balance (74%). Subsets with the largest $A_0$ performed up to 16% better than those with the largest size (median improvement, 8%). We propose maximizing $A$ as a way to improve deep learning performance in medical imaging.

Updated: 2024-07-31 21:20:53

标题: 超越大小和类平衡：Alpha作为深度学习的新数据集质量度量标准

摘要: 在深度学习中，实现在图像分类任务上的高性能需要多样化的训练集。然而，当前最佳实践——最大化数据集大小和类别平衡——并不能保证数据集的多样性。我们假设，对于给定的模型架构，通过更直接地最大化多样性可以提高模型性能。为了测试这一假设，我们引入了一个从生态学中推广的多样性量度的综合框架，这个框架考虑了图像之间的相似性，从而推广了熟悉的量度如香农熵。（大小和类别平衡出现为特殊情况。）分析了来自七个医学数据集的数千个子集，结果显示性能的最佳相关因素不是大小或类别平衡，而是“A—大阿尔法”—一组被解释为数据集中图像-类别对的有效数量的泛化熵量度，考虑了图像之间的相似性。其中之一，A0，解释了平衡准确度的67%的变化，而类别平衡仅为54%，大小仅为39%。最佳的量度对是大小加上A1（79%），优于大小加上类别平衡（74%）。具有最大A0的子集比具有最大大小的子集表现得更好，最高可提高16%（中位数改善为8%）。我们提出最大化A作为一种提高医学成像中深度学习性能的方法。

更新时间: 2024-07-31 21:20:53

领域: cs.CV,cs.LG,J.3; I.2.6

下载: http://arxiv.org/abs/2407.15724v2

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.

Updated: 2024-07-31 21:17:43

标题: 层级状态空间模型用于连续序列到序列建模

摘要: 从原始感官数据序列进行推理是一个跨领域的普遍问题，涵盖了从医疗设备到机器人等各个领域。这些问题通常涉及使用长序列的原始传感器数据（例如磁力计、压阻传感器）来预测理想物理量的序列（例如力量、惯性测量）。虽然经典方法对于局部线性预测问题很强大，但当使用现实世界的传感器时往往表现不佳。这些传感器通常是非线性的，受到外部变量（例如振动）的影响，并表现出数据相关的漂移。对于许多问题，由于获得地面真实标签需要昂贵的设备，预测任务变得更加困难。在这项工作中，我们提出了分层状态空间模型（HiSS），这是一种概念简单的连续序列预测新技术。HiSS将结构化状态空间模型堆叠在一起，形成一个时间层次结构。在六个真实世界的传感器数据集上，从基于触觉的状态预测到基于加速度计的惯性测量，HiSS在均方误差上至少比现有的序列模型（如因果Transformer、LSTM、S4和Mamba）表现出色23%。我们的实验进一步表明，HiSS在较小的数据集上展现出高效的扩展性，并与现有的数据过滤技术兼容。代码、数据集和视频可以在https://hiss-csp.github.io 上找到。

更新时间: 2024-07-31 21:17:43

领域: cs.LG,cs.RO,eess.SP

下载: http://arxiv.org/abs/2402.10211v3

A Taxonomy of Stereotype Content in Large Language Models

This study introduces a taxonomy of stereotype content in contemporary large language models (LLMs). We prompt ChatGPT 3.5, Llama 3, and Mixtral 8x7B, three powerful and widely used LLMs, for the characteristics associated with 87 social categories (e.g., gender, race, occupations). We identify 14 stereotype dimensions (e.g., Morality, Ability, Health, Beliefs, Emotions), accounting for ~90% of LLM stereotype associations. Warmth and Competence facets were the most frequent content, but all other dimensions were significantly prevalent. Stereotypes were more positive in LLMs (vs. humans), but there was significant variability across categories and dimensions. Finally, the taxonomy predicted the LLMs' internal evaluations of social categories (e.g., how positively/negatively the categories were represented), supporting the relevance of a multidimensional taxonomy for characterizing LLM stereotypes. Our findings suggest that high-dimensional human stereotypes are reflected in LLMs and must be considered in AI auditing and debiasing to minimize unidentified harms from reliance in low-dimensional views of bias in LLMs.

Updated: 2024-07-31 21:14:41

标题: 大型语言模型中的刻板印象内容分类学

摘要: 本研究介绍了当代大型语言模型（LLMs）中刻板印象内容的分类法。我们启动了ChatGPT 3.5、Llama 3和Mixtral 8x7B这三种强大且广泛使用的LLMs，以了解与87个社会类别（例如性别、种族、职业）相关的特征。我们确定了14个刻板印象维度（例如道德、能力、健康、信仰、情绪），占LLM刻板印象关联的约90%。温暖和竞争力方面是最常见的内容，但所有其他维度也显著普遍。LLMs中的刻板印象更为积极（相对于人类），但在类别和维度之间存在显著的变异。最后，这种分类法预测了LLMs对社会类别的内部评价（例如这些类别的代表性是积极还是消极），支持多维分类法在表征LLM刻板印象方面的相关性。我们的研究结果表明，高维人类刻板印象在LLMs中得以反映，并且在AI审计和去偏见过程中必须考虑，以减少对LLMs中偏见低维观点的过度依赖可能带来的未知伤害。

更新时间: 2024-07-31 21:14:41

领域: cs.CY,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.00162v1

Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting

Recent work in behavioral testing for natural language processing (NLP) models, such as Checklist, is inspired by related paradigms in software engineering testing. They allow evaluation of general linguistic capabilities and domain understanding, hence can help evaluate conceptual soundness and identify model weaknesses. However, a major challenge is the creation of test cases. The current packages rely on semi-automated approach using manual development which requires domain expertise and can be time consuming. This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques. It clusters the text representations to carefully construct meaningful groups and then apply prompting techniques to automatically generate Minimal Functionality Tests (MFT). The well-known Amazon Reviews corpus is used to demonstrate our approach. We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.

Updated: 2024-07-31 21:12:21

标题: 使用聚类和提示生成自然语言处理的行为测试用例

摘要: 最近在自然语言处理（NLP）模型的行为测试方面的工作，如Checklist，受到软件工程测试相关范式的启发。它们允许评估一般语言能力和领域理解，因此可以帮助评估概念的合理性并识别模型的弱点。然而，一个主要挑战是创建测试用例。当前的软件包依赖于半自动化方法，使用需要领域专业知识且可能耗时的手动开发。本文介绍了一种利用大型语言模型和统计技术开发测试用例的自动化方法。它将文本表示聚类以精心构建有意义的组，并应用提示技术自动生成最小功能测试（MFT）。著名的亚马逊评论语料库用于演示我们的方法。我们分析了四种不同分类算法的行为测试概况，并讨论这些模型的限制和优势。

更新时间: 2024-07-31 21:12:21

领域: cs.CL,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2408.00161v1

Generalization in Neural Networks: A Broad Survey

This paper reviews concepts, modeling approaches, and recent findings along a spectrum of different levels of abstraction of neural network models including generalization across (1) Samples, (2) Distributions, (3) Domains, (4) Tasks, (5) Modalities, and (6) Scopes. Strategies for (1) sample generalization from training to test data are discussed, with suggestive evidence presented that, at least for the ImageNet dataset, popular classification models show substantial overfitting. An empirical example and perspectives from statistics highlight how models' (2) distribution generalization can benefit from consideration of causal relationships and counterfactual scenarios. Transfer learning approaches and results for (3) domain generalization are summarized, as is the wealth of domain generalization benchmark datasets available. Recent breakthroughs surveyed in (4) task generalization include few-shot meta-learning approaches and the emergence of transformer-based foundation models such as those used for language processing. Studies performing (5) modality generalization are reviewed, including those that integrate image and text data and that apply a biologically-inspired network across olfactory, visual, and auditory modalities. Higher-level (6) scope generalization results are surveyed, including graph-based approaches to represent symbolic knowledge in networks and attribution strategies for improving networks' explainability. Additionally, concepts from neuroscience are discussed on the modular architecture of brains and the steps by which dopamine-driven conditioning leads to abstract thinking.

Updated: 2024-07-31 21:06:23

标题: 神经网络中的泛化：广泛调查

摘要: 本文回顾了神经网络模型不同抽象级别的概念、建模方法和最新研究成果，包括跨（1）样本、（2）分布、（3）领域、（4）任务、（5）模态和（6）范围的泛化。讨论了从训练到测试数据的（1）样本泛化策略，并提供了暗示性证据，表明至少对于ImageNet数据集，流行的分类模型存在严重过拟合问题。通过实证例和统计学的观点强调了模型的（2）分布泛化如何从考虑因果关系和反事实场景中受益。总结了转移学习方法和结果，以及可用的领域泛化基准数据集。在（4）任务泛化方面进行的研究涵盖了少样本元学习方法和基于transformer的基础模型的涌现，例如用于语言处理的模型。回顾了执行（5）模态泛化的研究，包括整合图像和文本数据的研究，并应用跨嗅觉、视觉和听觉模态的生物启发式网络。调查了更高级别的（6）范围泛化结果，包括用图形方法表示网络中符号知识和改进网络可解释性的归因策略。此外，还讨论了神经科学中关于大脑模块化结构和多巴胺驱动的条件反射如何导致抽象思维的概念。

更新时间: 2024-07-31 21:06:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.01610v3

Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution

A central problem in biology is to understand how organisms evolve and adapt to their environment by acquiring variations in the observable characteristics or traits of species across the tree of life. With the growing availability of large-scale image repositories in biology and recent advances in generative modeling, there is an opportunity to accelerate the discovery of evolutionary traits automatically from images. Toward this goal, we introduce Phylo-Diffusion, a novel framework for conditioning diffusion models with phylogenetic knowledge represented in the form of HIERarchical Embeddings (HIER-Embeds). We also propose two new experiments for perturbing the embedding space of Phylo-Diffusion: trait masking and trait swapping, inspired by counterpart experiments of gene knockout and gene editing/swapping. Our work represents a novel methodological advance in generative modeling to structure the embedding space of diffusion models using tree-based knowledge. Our work also opens a new chapter of research in evolutionary biology by using generative models to visualize evolutionary changes directly from images. We empirically demonstrate the usefulness of Phylo-Diffusion in capturing meaningful trait variations for fishes and birds, revealing novel insights about the biological mechanisms of their evolution.

Updated: 2024-07-31 21:06:14

标题: 使用生命之树对扩散模型进行分级条件化以研究物种进化

摘要: 生物学中的一个核心问题是理解生物体如何通过在整个生命树上获得的物种可观察特征或特征的变异来进化和适应其环境。随着生物学中大规模图像存储库的不断增加和生成建模的最新进展，有机会加速从图像中自动发现进化特征。为实现这一目标，我们引入了Phylo-Diffusion，这是一个新颖的框架，用于将表示为HIERarchical Embeddings（HIER-Embeds）的系统发育知识与扩散模型相结合。我们还提出了两个新的实验，用于干扰Phylo-Diffusion的嵌入空间：特征掩蔽和特征交换，受到基因敲除和基因编辑/交换的对应实验的启发。我们的工作代表了生成建模中的方法论进步，通过使用基于树的知识来构建扩散模型的嵌入空间。我们的工作还开启了进化生物学研究的新篇章，通过使用生成模型直接从图像中可视化进化变化。我们在鱼类和鸟类中经验性地证明了Phylo-Diffusion在捕捉有意义的特征变异方面的实用性，揭示了关于它们进化生物机制的新见解。

更新时间: 2024-07-31 21:06:14

领域: q-bio.PE,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.00160v1

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose representations of images and captions. We focus on the situation when one image is associated with several captions, each caption containing both information shared among all captions and unique information per caption about the scene depicted in the image. In such cases, it is unclear whether contrastive losses are sufficient for learning task-optimal representations that contain all the information provided by the captions or whether the contrastive learning setup encourages the learning of a simple shortcut that minimizes contrastive loss. We introduce synthetic shortcuts for vision-language: a training and evaluation framework where we inject synthetic shortcuts into image-text data. We show that contrastive VLMs trained from scratch or fine-tuned with data containing these synthetic shortcuts mainly learn features that represent the shortcut. Hence, contrastive losses are not sufficient to learn task-optimal representations, i.e., representations that contain all task-relevant information shared between the image and associated captions. We examine two methods to reduce shortcut learning in our training and evaluation framework: (i) latent target decoding and (ii) implicit feature modification. We show empirically that both methods improve performance on the evaluation task, but only partly reduce shortcut learning when training and evaluating with our shortcut learning framework. Hence, we show the difficulty and challenge of our shortcut learning framework for contrastive vision-language representation learning.

Updated: 2024-07-31 21:02:12

标题: 展示和减少视觉-语言表示学习中的捷径

摘要: 视觉语言模型（VLMs）主要依赖对比训练来学习图像和标题的通用表示。我们关注的情况是一个图像与多个标题相关联，每个标题包含所有标题共享的信息以及关于图像所描绘场景的每个标题独特的信息。在这种情况下，不清楚对比损失是否足以学习包含所有标题提供的信息的任务最优表示，或者对比学习设置是否鼓励学习最小化对比损失的简单快捷方式。我们引入了用于视觉语言的合成快捷方式：一个训练和评估框架，我们在图像文本数据中注入合成快捷方式。我们展示了从头开始训练或使用包含这些合成快捷方式的数据微调的对比VLMs主要学习代表快捷方式的特征。因此，对比损失不足以学习任务最优表示，即包含图像和相关标题之间所有任务相关信息的表示。我们检查了在我们的训练和评估框架中减少快捷方式学习的两种方法：（i）潜在目标解码和（ii）隐式特征修改。我们在实证上表明，这两种方法都可以提高评估任务的性能，但只在使用我们的快捷方式学习框架进行训练和评估时在一定程度上减少了快捷方式学习。因此，我们展示了对比视觉语言表示学习的快捷方式学习框架的困难和挑战。

更新时间: 2024-07-31 21:02:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.17510v2

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Recent studies customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks have yielded promising results, especially in the field of scientific chart comprehension. These studies generally utilize visual instruction tuning with specialized datasets to enhance question and answer (QA) accuracy within the chart domain. However, they often neglect the fundamental discrepancy between natural image-caption pre-training data and digital chart image-QA data, particularly in the models' capacity to extract underlying numeric values from charts. This paper tackles this oversight by exploring the training processes necessary to improve MLLMs' comprehension of charts. We present three key findings: (1) Incorporating raw data values in alignment pre-training markedly improves comprehension of chart data. (2) Replacing images with their textual representation randomly during end-to-end fine-tuning transfer the language reasoning capability to chart interpretation skills. (3) Requiring the model to first extract the underlying chart data and then answer the question in the fine-tuning can further improve the accuracy. Consequently, we introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension. CHOPINLLM effectively interprets various types of charts, including unannotated ones, while maintaining robust reasoning abilities. Furthermore, we establish a new benchmark to evaluate MLLMs' understanding of different chart types across various comprehension levels. Experimental results show that CHOPINLLM exhibits strong performance in understanding both annotated and unannotated charts across a wide range of types.

Updated: 2024-07-31 21:01:16

标题: 关于为图表理解定制的多模态语言模型的预训练

摘要: 最近的研究定制多模态大语言模型（MLLMs）用于特定领域任务取得了令人期待的结果，尤其在科学图表理解领域。这些研究通常利用专门数据集进行视觉指导调整，以提高图表领域内问题和答案（QA）的准确性。然而，它们经常忽视自然图像标题预训练数据与数字图表图像-QA数据之间的基本差异，特别是模型从图表中提取基础数值的能力。本文通过探讨改善MLLMs理解图表的训练过程来解决这一疏忽。我们提出了三个关键发现：（1）在对齐预训练中加入原始数据值显著改善了对图表数据的理解。（2）在端到端微调中随机用文本表示替换图像可将语言推理能力转移到图表解释能力上。（3）要求模型首先提取基础图表数据，然后在微调中回答问题可以进一步提高准确性。因此，我们引入了CHOPINLLM，一种专为深入理解图表而定制的MLLM。CHOPINLLM能够有效解释各种类型的图表，包括未注释的图表，同时保持强大的推理能力。此外，我们建立了一个新的基准，以评估MLLMs对不同类型图表在各种理解水平上的理解能力。实验结果显示，CHOPINLLM在理解广泛类型的已注释和未注释图表方面表现出色。

更新时间: 2024-07-31 21:01:16

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.14506v2

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

Updated: 2024-07-31 20:59:45

标题: 一个有限样本复杂度边界的分布鲁棒Q-learning

摘要: 我们考虑一种强化学习设置，其中部署环境与训练环境不同。应用鲁棒马尔可夫决策过程的公式，我们扩展了在Liu等人[2022]中研究的分布鲁棒$Q$-learning框架。此外，我们改进了他们的多级蒙特卡洛估计器的设计和分析。假设可以访问模拟器，我们证明了学习最优鲁棒$Q$-函数的算法在$\epsilon$误差的上确界下界为$\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$的最坏情况下的期望样本复杂度，其中$\gamma$是折扣率，$p_{\wedge}$是转移核的非零最小支持概率，$\delta$是不确定性大小。这是模型无关的鲁棒RL问题的第一个样本复杂度结果。模拟研究进一步验证了我们的理论结果。

更新时间: 2024-07-31 20:59:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2302.13203v3

CogNarr Ecosystem: Facilitating Group Cognition at Scale

Human groups of all sizes and kinds engage in deliberation, problem solving, strategizing, decision making, and more generally, cognition. Some groups are large, and that setting presents unique challenges. The small-group setting often involves face-to-face dialogue, but group cognition in the large-group setting typically requires some form of online interaction. New approaches are needed to facilitate the kind of rich communication and information processing that are required for effective, functional cognition in the online setting, especially for groups characterized by thousands to millions of participants who wish to share potentially complex, nuanced, and dynamic perspectives. This concept paper proposes the CogNarr (Cognitive Narrative) ecosystem, which is designed to facilitate functional cognition in the large-group setting. The paper's contribution is a novel vision as to how recent developments in cognitive science, artificial intelligence, natural language processing, and related fields might be scaled and applied to large-group cognition, using an approach that itself promotes further scientific advancement. A key perspective is to view a group as an organism that uses some form of cognitive architecture to sense the world, process information, remember, learn, predict, make decisions, and adapt to changing conditions. The CogNarr ecosystem is designed to serve as a component within that architecture.

Updated: 2024-07-31 20:58:18

标题: CogNarr生态系统：促进大规模团体认知

摘要: 各种规模和类型的人类群体都参与决策、问题解决、制定策略、做出决定，以及更一般地说，认知活动。一些群体很大，这种环境带来独特的挑战。小组环境通常涉及面对面对话，但大型群体环境中的群体认知通常需要某种形式的在线交互。需要新的方法来促进在线环境中所需的有效、功能性认知所需的丰富交流和信息处理，尤其是对于由数千到数百万参与者组成、希望分享潜在复杂、微妙和动态观点的群体。本概念论文提出了旨在促进大型群体环境中功能性认知的CogNarr（认知叙事）生态系统。该论文的贡献在于提出了一个新颖的愿景，即如何将认知科学、人工智能、自然语言处理和相关领域的最新发展进行扩展和应用于大型群体认知，采用一种本身促进进一步科学进步的方法。一个关键的视角是将群体视为一个使用某种形式的认知架构来感知世界、处理信息、记忆、学习、预测、做出决定和适应变化条件的生物体。CogNarr生态系统被设计为该架构中的一个组成部分。

更新时间: 2024-07-31 20:58:18

领域: cs.HC,cs.AI,F.4.1, I.2.7

下载: http://arxiv.org/abs/2407.18945v2

Generative Learning of the Solution of Parametric Partial Differential Equations Using Guided Diffusion Models and Virtual Observations

We introduce a generative learning framework to model high-dimensional parametric systems using gradient guidance and virtual observations. We consider systems described by Partial Differential Equations (PDEs) discretized with structured or unstructured grids. The framework integrates multi-level information to generate high fidelity time sequences of the system dynamics. We demonstrate the effectiveness and versatility of our framework with two case studies in incompressible, two dimensional, low Reynolds cylinder flow on an unstructured mesh and incompressible turbulent channel flow on a structured mesh, both parameterized by the Reynolds number. Our results illustrate the framework's robustness and ability to generate accurate flow sequences across various parameter settings, significantly reducing computational costs allowing for efficient forecasting and reconstruction of flow dynamics.

Updated: 2024-07-31 20:52:33

标题: 使用指导扩散模型和虚拟观测生成学习参数化偏微分方程的解答

摘要: 我们引入了一个生成学习框架，用于使用梯度引导和虚拟观测模拟高维参数系统。我们考虑由偏微分方程（PDEs）描述，并用结构化或非结构化网格离散化的系统。该框架整合多级信息以生成系统动态的高保真度时间序列。我们通过两个案例研究展示了我们框架的有效性和多功能性，其中一个是在非结构化网格上的不可压缩、二维、低雷诺气缸流中，另一个是在结构化网格上的不可压缩湍流通道流，两者均由雷诺数参数化。我们的结果展示了框架的稳健性和生成准确流动序列的能力，跨各种参数设置显著减少计算成本，从而实现流动动态的高效预测和重构。

更新时间: 2024-07-31 20:52:33

领域: cs.LG,physics.comp-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2408.00157v1

Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks

We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.

Updated: 2024-07-31 20:42:03

标题: 隐藏的毒药：机器遗忘使伪装的中毒攻击成为可能

摘要: 我们介绍了伪装数据毒化攻击，这是一种新的攻击向量，当模型重新训练可能被引发时，会出现在机器遗忘等环境中。攻击者首先向训练数据集中添加一些精心制作的数据点，使模型预测的影响最小。攻击者随后触发一个请求，删除引入的数据点的一个子集，此时攻击被释放，模型的预测受到负面影响。特别地，我们考虑了对包括CIFAR-10、Imagenette和Imagewoof在内的数据集进行干净标签的有针对性攻击（其目标是使模型错误分类特定的测试点）。这种攻击是通过构建伪装数据点来掩盖被毒化数据集的影响来实现的。

更新时间: 2024-07-31 20:42:03

领域: cs.LG,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2212.10717v2

Debiased Distribution Compression

Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.

Updated: 2024-07-31 20:32:46

标题: 去偏压分布压缩

摘要: 现代压缩方法可以比独立同分布抽样更简洁地总结目标分布$\mathbb{P}$，但需要访问一个类似于快速收敛到$\mathbb{P}$的马尔可夫链的低偏置输入序列。我们引入了一套适用于具有偏置输入序列的压缩方法。给定$n$个指向错误分布的点和二次时间，Stein核稀疏化（SKT）返回$\sqrt{n}$个等权重点，具有$\widetilde{O}(n^{-1/2})$的最大均值差异（MMD）到$\mathbb{P}$。对于更大规模的压缩任务，低秩SKT利用自适应低秩去偏过程，在次二次时间内实现了相同的成就，这可能具有独立的兴趣。对于支持单纯形或保持权重的下游任务，Stein重组和Stein乔利斯基实现了更大的简洁性，与SKT的保证相匹配，只需$\text{poly-log}(n)$个加权点。这些进展的基础是对单纯形加权核心集的质量保证，核矩阵的谱衰减，以及Stein核希尔伯特空间的覆盖数。在我们的实验中，我们的技术提供了简洁而准确的后验总结，同时克服了由于燃烧，近似马尔可夫链蒙特卡洛和调温引起的偏见。

更新时间: 2024-07-31 20:32:46

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2404.12290v3

Moderating Group Conversation Dynamics with Social Robots

This research investigates the impact of social robot participation in group conversations and assesses the effectiveness of various addressing policies. The study involved 300 participants, divided into groups of four, interacting with a humanoid robot serving as the moderator. The robot utilized conversation data to determine the most appropriate speaker to address. The findings indicate that the robot's addressing policy significantly influenced conversation dynamics, resulting in more balanced attention to each participant and a reduction in subgroup formation.

Updated: 2024-07-31 20:29:20

标题: 用社交机器人调节群体对话动态

摘要: 这项研究调查了社交机器人参与小组对话的影响，并评估了各种对话策略的有效性。研究涉及300名参与者，分成四人一组，与一台人形机器人作为主持人互动。机器人利用对话数据确定最合适的发言者。研究结果表明，机器人的对话策略显著影响了对话动态，导致对每位参与者的关注更加平衡，减少了子群体的形成。

更新时间: 2024-07-31 20:29:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2408.00151v1

StyleRF-VolVis: Style Transfer of Neural Radiance Fields for Expressive Volume Visualization

In volume visualization, visualization synthesis has attracted much attention due to its ability to generate novel visualizations without following the conventional rendering pipeline. However, existing solutions based on generative adversarial networks often require many training images and take significant training time. Still, issues such as low quality, consistency, and flexibility persist. This paper introduces StyleRF-VolVis, an innovative style transfer framework for expressive volume visualization (VolVis) via neural radiance field (NeRF). The expressiveness of StyleRF-VolVis is upheld by its ability to accurately separate the underlying scene geometry (i.e., content) and color appearance (i.e., style), conveniently modify color, opacity, and lighting of the original rendering while maintaining visual content consistency across the views, and effectively transfer arbitrary styles from reference images to the reconstructed 3D scene. To achieve these, we design a base NeRF model for scene geometry extraction, a palette color network to classify regions of the radiance field for photorealistic editing, and an unrestricted color network to lift the color palette constraint via knowledge distillation for non-photorealistic editing. We demonstrate the superior quality, consistency, and flexibility of StyleRF-VolVis by experimenting with various volume rendering scenes and reference images and comparing StyleRF-VolVis against other image-based (AdaIN), video-based (ReReVST), and NeRF-based (ARF and SNeRF) style rendering solutions.

Updated: 2024-07-31 20:26:30

标题: StyleRF-VolVis：神经辐射场的风格转移用于富有表现力的体积可视化

摘要: 在体积可视化中，由于其能够生成新颖的可视化而不遵循传统的渲染流程，可视化合成引起了很多关注。然而，现有基于生成对抗网络的解决方案通常需要大量的训练图像，并且需要显著的训练时间。仍然存在质量低、一致性和灵活性等问题。本文介绍了StyleRF-VolVis，这是一个通过神经辐射场（NeRF）实现具有表现力的体积可视化（VolVis）的创新风格转移框架。StyleRF-VolVis的表现力得到了保持，因为它能够准确地分离底层场景几何（即内容）和颜色外观（即风格），方便地修改原始渲染的颜色、不透明度和照明，同时保持视觉内容在视图间的一致性，并有效地将任意风格从参考图像传输到重建的3D场景中。为了实现这些目标，我们设计了一个用于场景几何提取的基础NeRF模型，一个用于对辐射场的区域进行分类以进行逼真编辑的调色板网络，以及一个通过知识蒸馏解除颜色调色板限制以进行非照片逼真编辑的不受限制的颜色网络。通过对各种体积渲染场景和参考图像进行实验，并将StyleRF-VolVis与其他基于图像（AdaIN）、基于视频（ReReVST）和基于NeRF（ARF和SNeRF）的风格渲染解决方案进行比较，展示了StyleRF-VolVis的出色质量、一致性和灵活性。

更新时间: 2024-07-31 20:26:30

领域: cs.GR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.00150v1

Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates

When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict. These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy. We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations. Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs. We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic. We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.

Updated: 2024-07-31 20:21:15

标题: 在强化学习代理中的正式道德义务：验证和政策更新

摘要: 在设计在不确定环境中运行的代理时，设计师需要工具来自动推理代理应该做什么，这与实际发生的情况发生冲突，以及如何修改政策以消除冲突。这些义务包括道德和社会义务，许可和禁止，这些限制了代理如何实现其任务并执行其政策。我们提出了一种新的义务逻辑，预期行为效用义务逻辑，用于在设计时进行此种推理：用于指定和验证代理的战略义务，然后修改其政策以满足这些义务。与在奖励级别工作的方法不同，通过在逻辑级别工作增加了权衡的透明度。我们介绍了两种算法：一种用于模型检查RL代理是否具有正确的战略义务，另一种用于修改参考决策政策以满足我们逻辑中表达的义务。我们将我们的算法演示在准确抽象神经决策政策的DAC-MDP上，以及在玩具格子世界环境中。

更新时间: 2024-07-31 20:21:15

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2408.00147v1

Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors

Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modal information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation due to noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA that aligns instances of each object category across domains. In particular, an attention module coupled with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms the state-of-the-art methods and is robust to class imbalance using a conceptually simple class-conditioning method. Our code is available at https://github.com/imatif17/ACIA.

Updated: 2024-07-31 20:13:40

标题: 基于注意力的类别条件对齐，用于对象检测器多源域自适应

摘要: 目标检测（OD）的域自适应方法旨在通过促进源域和目标域之间的特征对齐来减轻分布偏移的影响。多源域自适应（MSDA）允许利用多个带注释的源数据集和未标记的目标数据来提高检测模型的准确性和鲁棒性。大多数用于OD的最先进的MSDA方法以类别不可知的方式执行特征对齐。这是具有挑战性的，因为由于跨域对象外观的变化，对象具有独特的模态信息。最近提出的基于原型的方法提出了一种按类别对齐的方法，但由于可能会对具有不平衡数据的自适应产生负面影响，所以存在错误积累的问题。为了克服这些限制，我们提出了一种基于注意力的类别条件对齐方法，用于MSDA，它在跨领域中对齐每个对象类别的实例。特别是，一个与对抗域分类器耦合的注意力模块允许学习域不变和类别特定的实例表示。在多个基准MSDA数据集上的实验结果表明，我们的方法优于最先进的方法，并且使用概念上简单的类别调节方法对类别不平衡具有鲁棒性。我们的代码可在https://github.com/imatif17/ACIA 上找到。

更新时间: 2024-07-31 20:13:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.09918v3

Distributed In-Context Learning under Non-IID Among Clients

Advancements in large language models (LLMs) have shown their effectiveness in multiple complicated natural language reasoning tasks. A key challenge remains in adapting these models efficiently to new or unfamiliar tasks. In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query, called in-context examples (ICE), from a training dataset and providing them during the inference as context. Most existing studies utilize a centralized training dataset, yet many real-world datasets may be distributed among multiple clients, and remote data retrieval can be associated with costs. Especially when the client data are non-identical independent distributions (non-IID), retrieving from clients a proper set of ICEs needed for a test query presents critical challenges. In this paper, we first show that in this challenging setting, test queries will have different preferences among clients because of non-IIDness, and equal contribution often leads to suboptimal performance. We then introduce a novel approach to tackle the distributed non-IID ICL problem when a data usage budget is present. The principle is that each client's proper contribution (budget) should be designed according to the preference of each query for that client. Our approach uses a data-driven manner to allocate a budget for each client, tailored to each test query. Through extensive empirical studies on diverse datasets, our framework demonstrates superior performance relative to competing baselines.

Updated: 2024-07-31 20:06:25

标题: 分布式场景下客户端之间的非独立同分布学习

摘要: 大语言模型（LLMs）的进展已经显示出它们在多个复杂的自然语言推理任务中的有效性。一个关键挑战在于有效地将这些模型适应到新的或不熟悉的任务中。上下文学习（ICL）通过从训练数据集中检索与查询相关的一组数据点，称为上下文示例（ICE），并在推理过程中提供它们作为上下文，为少样本适应提供了一个有前途的解决方案。大多数现有研究利用集中式训练数据集，然而许多现实世界的数据集可能分布在多个客户端之间，远程数据检索可能涉及成本。特别是当客户端数据是非相同独立分布（非IID）时，从客户端检索出一个适合测试查询所需的一组ICE所面临的关键挑战。在本文中，我们首先展示了在这种具有挑战性的情况下，由于非IID性，测试查询将在客户端之间具有不同的偏好，而平等贡献通常会导致次优性能。然后，我们介绍了一种新颖的方法来解决分布式非IID ICL问题，当存在数据使用预算时。原则是每个客户端的适当贡献（预算）应根据每个查询对该客户端的偏好进行设计。我们的方法使用数据驱动的方式为每个客户端分配预算，根据每个测试查询进行定制。通过对多种数据集进行广泛的实证研究，我们的框架相对于竞争基线表现出更优越的性能。

更新时间: 2024-07-31 20:06:25

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2408.00144v1

Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher

Adapting visual object detectors to operational target domains is a challenging task, commonly achieved using unsupervised domain adaptation (UDA) methods. Recent studies have shown that when the labeled dataset comes from multiple source domains, treating them as separate domains and performing a multi-source domain adaptation (MSDA) improves the accuracy and robustness over blending these source domains and performing a UDA. For adaptation, existing MSDA methods learn domain-invariant and domain-specific parameters (for each source domain). However, unlike single-source UDA methods, learning domain-specific parameters makes them grow significantly in proportion to the number of source domains. This paper proposes a novel MSDA method called Prototype-based Mean Teacher (PMT), which uses class prototypes instead of domain-specific subnets to encode domain-specific information. These prototypes are learned using a contrastive loss, aligning the same categories across domains and separating different categories far apart. Given the use of prototypes, the number of parameters required for our PMT method does not increase significantly with the number of source domains, thus reducing memory issues and possible overfitting. Empirical studies indicate that PMT outperforms state-of-the-art MSDA methods on several challenging object detection datasets. Our code is available at https://github.com/imatif17/Prototype-Mean-Teacher.

Updated: 2024-07-31 20:04:53

标题: 多源领域自适应目标检测基于原型的均值教师

摘要: 将视觉目标检测器调整到操作目标领域是一项具有挑战性的任务，通常通过无监督领域适应（UDA）方法实现。最近的研究表明，当标记的数据集来自多个源领域时，将它们视为单独的领域并执行多源领域适应（MSDA）可以提高准确性和鲁棒性，而不是混合这些源领域并执行UDA。对于适应，现有的MSDA方法学习领域不变和领域特定参数（对于每个源领域）。然而，与单源UDA方法不同，学习领域特定参数会使它们与源领域的数量成比例增长。本文提出了一种名为基于原型的均值教师（PMT）的新型MSDA方法，该方法使用类原型而不是领域特定子网来编码领域特定信息。这些原型是使用对比损失学习的，将相同类别在各个领域中对齐并将不同类别分开。由于使用了原型，我们的PMT方法所需的参数数量不会随着源领域数量的增加而显著增加，从而减少内存问题和可能的过拟合。实证研究表明，PMT在几个具有挑战性的目标检测数据集上优于最先进的MSDA方法。我们的代码可在https://github.com/imatif17/Prototype-Mean-Teacher找到。

更新时间: 2024-07-31 20:04:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.14950v3

Reputational Algorithm Aversion

People are often reluctant to incorporate information produced by algorithms into their decisions, a phenomenon called ``algorithm aversion''. This paper shows how algorithm aversion arises when the choice to follow an algorithm conveys information about a human's ability. I develop a model in which workers make forecasts of an uncertain outcome based on their own private information and an algorithm's signal. Low-skill workers receive worse information than the algorithm and hence should always follow the algorithm's signal, while high-skill workers receive better information than the algorithm and should sometimes override it. However, due to reputational concerns, low-skill workers inefficiently override the algorithm to increase the likelihood they are perceived as high-skill. The model provides a fully rational microfoundation for algorithm aversion that aligns with the broad concern that AI systems will displace many types of workers.

Updated: 2024-07-31 20:01:52

标题: 声誉算法厌恶

摘要: 人们常常不愿意将由算法产生的信息纳入他们的决策中，这种现象被称为“算法厌恶”。本文展示了当选择遵循算法时传达了有关人类能力的信息时，算法厌恶是如何产生的。我建立了一个模型，其中工作者基于他们自己的私人信息和算法的信号对一个不确定结果做出预测。低技能工作者得到比算法更差的信息，因此应该始终遵循算法的信号，而高技能工作者得到比算法更好的信息，有时应该覆盖它。然而，由于声誉方面的考虑，低技能工作者会不效率地覆盖算法，以增加被认为是高技能的可能性。该模型为算法厌恶提供了一个完全理性的微基础，与AI系统将取代许多类型的工作者的广泛关注相一致。

更新时间: 2024-07-31 20:01:52

领域: econ.TH,cs.AI,cs.GT,cs.HC

下载: http://arxiv.org/abs/2402.15418v3

Blackout Mitigation via Physics-guided RL

This paper considers the sequential design of remedial control actions in response to system anomalies for the ultimate objective of preventing blackouts. A physics-guided reinforcement learning (RL) framework is designed to identify effective sequences of real-time remedial look-ahead decisions accounting for the long-term impact on the system's stability. The paper considers a space of control actions that involve both discrete-valued transmission line-switching decisions (line reconnections and removals) and continuous-valued generator adjustments. To identify an effective blackout mitigation policy, a physics-guided approach is designed that uses power-flow sensitivity factors associated with the power transmission network to guide the RL exploration during agent training. Comprehensive empirical evaluations using the open-source Grid2Op platform demonstrate the notable advantages of incorporating physical signals into RL decisions, establishing the gains of the proposed physics-guided approach compared to its black box counterparts. One important observation is that strategically~\emph{removing} transmission lines, in conjunction with multiple real-time generator adjustments, often renders effective long-term decisions that are likely to prevent or delay blackouts.

Updated: 2024-07-31 19:57:23

标题: 物理引导的强化学习在停电缓解中的应用

摘要: 本文考虑了为了防止停电而对系统异常进行的迭代式修复控制行动的设计。设计了一种物理引导的强化学习（RL）框架，以识别考虑系统稳定性长期影响的实时修复先行决策的有效顺序。本文考虑了一个涉及离散值传输线开关决策（线路重新连接和移除）和连续值发电机调整的控制行动空间。为了确定有效的停电缓解策略，设计了一种物理引导方法，利用与电力传输网络相关的功率流敏感因素来引导RL探索。使用开源的Grid2Op平台进行的全面实证评估显示了将物理信号纳入RL决策的显著优势，建立了所提出的物理引导方法相对于其黑盒对手的收益。一个重要观察是，策略性地\emph{移除}传输线，结合多个实时发电机调整，通常会产生有效的长期决策，可能会防止或延迟停电。

更新时间: 2024-07-31 19:57:23

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2401.09640v2

Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

A binary decision task, like yes-no questions or answer verification, reflects a significant real-world scenario such as where users look for confirmation about the correctness of their decisions on specific issues. In this work, we observe that language models exhibit a negative bias in the binary decisions of complex reasoning tasks. Based on our observations and the rationale about attention-based model dynamics, we propose a negative attention score (NAS) to systematically and quantitatively formulate negative bias. Based on NAS, we identify attention heads that attend to negative tokens provided in the instructions as answer candidate of binary decisions, regardless of the question in the prompt, and validate their association with the negative bias. Additionally, we propose the negative attention score alignment (NASA) method, which is a parameter-efficient fine-tuning technique to address the extracted negatively biased attention heads. Experimental results from various domains of reasoning tasks and large model search space demonstrate that NASA significantly reduces the gap between precision and recall caused by negative bias while preserving their generalization abilities. Our codes are available at \url{https://github.com/ysw1021/NASA}.

Updated: 2024-07-31 19:50:57

标题: 通过负注意力分数对齐纠正大型语言模型中的负偏差

摘要: 一个类似于是非问题或答案验证的二元决策任务反映了一个重要的现实场景，即用户寻求关于特定问题决策正确性的确认。在这项工作中，我们观察到语言模型在复杂推理任务的二元决策中表现出负面偏见。基于我们的观察和基于注意力模型动态的原理，我们提出了一个负面注意力分数（NAS）来系统化和定量地制定负面偏见。基于NAS，我们确定了关注负面标记的注意力头，这些标记作为二元决策的答案候选项，而不考虑提示中的问题，并验证它们与负面偏见的关联。此外，我们提出了负面注意力分数对齐（NASA）方法，这是一种参数高效的微调技术，用于解决提取出的负面偏见注意力头。来自各个推理任务领域和大型模型搜索空间的实验结果表明，NASA显著减少了由负面偏见引起的精确度和召回率之间的差距，同时保留了它们的泛化能力。我们的代码可在\url{https://github.com/ysw1021/NASA}获取。

更新时间: 2024-07-31 19:50:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.00137v1

Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions

The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by definition scarce, the potential for model misspecification error is inherent to these applications, thus DRO estimators are natural. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. We study both tractable convex formulations for some problems of interest (e.g. CVaR) and more general neural network based estimators. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques. Additionally, the proposed method is applied to a real data set of financial returns for comparison to a previous analysis. We established the proposed model as a novel formulation in the multivariate EVT domain, and innovative with respect to performance when compared to relevant alternate proposals.

Updated: 2024-07-31 19:45:27

标题: 分布鲁棒优化作为一种可扩展的框架来表征极值分布

摘要: 本文旨在开发分布鲁棒优化（DRO）估计量，特别是针对多维极值理论（EVT）统计的情况。EVT支持使用称为最大稳定分布的半参数模型，这些模型是由空间泊松点过程构建而成的。虽然强大，但这些模型仅在大样本时渐近有效。然而，由于极端数据在定义上是稀缺的，因此模型错误规范估计误差在这些应用中是固有的，因此DRO估计量是自然的选择。为了减轻过度保守的估计，同时增强样本外性能，我们研究了在点过程空间中受半参数最大稳定约束启发的DRO估计量。我们研究了一些感兴趣问题（例如CVaR）的可解凸形式以及更一般的基于神经网络的估计量。这两种方法都使用合成生成的数据进行验证，恢复了规定的特征，并验证了所提出技术的有效性。此外，所提出的方法被应用于金融回报的真实数据集，以与先前的分析进行比较。我们将所提出的模型确立为多元EVT领域的新型制定，并与相关备选提案相比，具有创新性能。

更新时间: 2024-07-31 19:45:27

领域: stat.ML,cs.AI,cs.LG,q-fin.RM

下载: http://arxiv.org/abs/2408.00131v1

Vera Verto: Multimodal Hijacking Attack

The increasing cost of training machine learning (ML) models has led to the inclusion of new parties to the training pipeline, such as users who contribute training data and companies that provide computing resources. This involvement of such new parties in the ML training process has introduced new attack surfaces for an adversary to exploit. A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own -- possibly malicious -- hijacking tasks. However, the scope of the model hijacking attack is so far limited to the homogeneous-modality tasks. In this paper, we transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Specifically, we focus on the setting where an adversary implements a natural language processing (NLP) hijacking task into an image classification model. To mount the attack, we propose a novel encoder-decoder based framework, namely the Blender, which relies on advanced image and language models. Experimental results show that our modal hijacking attack achieves strong performances in different settings. For instance, our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNIST classifiers.

Updated: 2024-07-31 19:37:06

标题: Vera Verto: 多模态劫持攻击

摘要: 机器学习（ML）模型训练成本的增加导致了新参与训练流程的各方的加入，例如贡献训练数据的用户和提供计算资源的公司。这些新方参与ML训练过程为对手利用引入了新的攻击面。最近在这个领域的一种攻击是模型劫持攻击，即对手劫持受害者模型来执行他们自己的--可能是恶意的--劫持任务。然而，模型劫持攻击的范围目前仅限于同质模态任务。在本文中，我们将模型劫持攻击转化为更通用的多模态设置，其中劫持和原始任务是在不同模态的数据上执行的。具体而言，我们专注于对手将自然语言处理（NLP）劫持任务实施到图像分类模型的设置。为了发动攻击，我们提出了一种基于编码器-解码器的新框架，即Blender，它依赖于先进的图像和语言模型。实验结果表明，我们的多模态劫持攻击在不同设置下取得了很强的表现。例如，我们的攻击在使用搜狗新闻数据集劫持STL10、CIFAR-10和MNIST分类器时，攻击成功率分别达到94％、94％和95％。

更新时间: 2024-07-31 19:37:06

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2408.00129v1

Revisiting Monte Carlo Strength Evaluation

The Monte Carlo method, proposed by Dell'Amico and Filippone, estimates a password's rank within a probabilistic model for password generation, i.e., it determines the password's strength according to this model. We propose several ideas to improve the precision or speed of the estimation. Through experimental tests, we demonstrate that improved sampling can yield slightly better precision. Moreover, additional precomputation results in faster estimations with a modest increase in memory usage.

Updated: 2024-07-31 19:26:15

标题: 重新审视蒙特卡洛强度评估

摘要: 蒙特卡洛方法由Dell'Amico和Filippone提出，用于估计密码在密码生成的概率模型中的排名，即根据该模型确定密码的强度。我们提出了几种改进估计精度或速度的方法。通过实验测试，我们证明改进的抽样可以略微提高精度。此外，额外的预计算可以在适度增加内存使用量的情况下更快地估计。

更新时间: 2024-07-31 19:26:15

领域: cs.CR

下载: http://arxiv.org/abs/2408.00124v1

Semantic Codebook Learning for Dynamic Recommendation Models

Dynamic sequential recommendation (DSR) can generate model parameters based on user behavior to improve the personalization of sequential recommendation under various user preferences. However, it faces the challenges of large parameter search space and sparse and noisy user-item interactions, which reduces the applicability of the generated model parameters. The Semantic Codebook Learning for Dynamic Recommendation Models (SOLID) framework presents a significant advancement in DSR by effectively tackling these challenges. By transforming item sequences into semantic sequences and employing a dual parameter model, SOLID compresses the parameter generation search space and leverages homogeneity within the recommendation system. The introduction of the semantic metacode and semantic codebook, which stores disentangled item representations, ensures robust and accurate parameter generation. Extensive experiments demonstrates that SOLID consistently outperforms existing DSR, delivering more accurate, stable, and robust recommendations.

Updated: 2024-07-31 19:25:25

标题: 动态推荐模型的语义代码书学习

摘要: 动态顺序推荐（DSR）可以根据用户行为生成模型参数，以改善在各种用户偏好下的顺序推荐的个性化。然而，它面临着参数搜索空间大、用户-物品交互稀疏和嘈杂等挑战，这降低了生成模型参数的适用性。动态推荐模型的语义编码学习（SOLID）框架通过有效地解决这些挑战，为DSR带来了显著进展。通过将物品序列转化为语义序列并采用双参数模型，SOLID压缩了参数生成搜索空间，并利用推荐系统内的同质性。引入语义元编码和语义编码手册，存储分离的物品表示，确保了参数生成的稳健和准确性。广泛实验证明，SOLID始终优于现有的DSR，在提供更准确、稳定和可靠的推荐方面表现更好。

更新时间: 2024-07-31 19:25:25

领域: cs.IR,cs.AI,cs.MM,cs.SI

下载: http://arxiv.org/abs/2408.00123v1

Gemma 2: Improving Open Language Models at a Practical Size

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

Updated: 2024-07-31 19:13:07

标题: Gem2: 在实际大小上改进开放语言模型

摘要: 在这项工作中，我们介绍了Gemma 2，这是Gemma系列的新成员，是一种轻量级、最先进的开放模型，规模从20亿到270亿参数不等。在这个新版本中，我们应用了一些已知的技术修改Transformer架构，比如交错使用局部-全局注意力（Beltagy等，2020a）和组查询注意力（Ainslie等，2023）。我们还使用知识蒸馏（Hinton等，2015）来训练20亿和90亿模型，而不是下一个标记预测。由此产生的模型在其尺寸上表现出最佳性能，甚至提供了与体积大2-3倍的模型竞争的替代方案。我们向社区发布了所有我们的模型。

更新时间: 2024-07-31 19:13:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.00118v1

Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods

This work addresses the certification of the local robustness of vision-based two-stage 6D object pose estimation. The two-stage method for object pose estimation achieves superior accuracy by first employing deep neural network-driven keypoint regression and then applying a Perspective-n-Point (PnP) technique. Despite advancements, the certification of these methods' robustness remains scarce. This research aims to fill this gap with a focus on their local robustness on the system level--the capacity to maintain robust estimations amidst semantic input perturbations. The core idea is to transform the certification of local robustness into neural network verification for classification tasks. The challenge is to develop model, input, and output specifications that align with off-the-shelf verification tools. To facilitate verification, we modify the keypoint detection model by substituting nonlinear operations with those more amenable to the verification processes. Instead of injecting random noise into images, as is common, we employ a convex hull representation of images as input specifications to more accurately depict semantic perturbations. Furthermore, by conducting a sensitivity analysis, we propagate the robustness criteria from pose to keypoint accuracy, and then formulating an optimal error threshold allocation problem that allows for the setting of a maximally permissible keypoint deviation thresholds. Viewing each pixel as an individual class, these thresholds result in linear, classification-akin output specifications. Under certain conditions, we demonstrate that the main components of our certification framework are both sound and complete, and validate its effects through extensive evaluations on realistic perturbations. To our knowledge, this is the first study to certify the robustness of large-scale, keypoint-based pose estimation given images in real-world scenarios.

Updated: 2024-07-31 19:02:54

标题: 验证基于学习的关键点检测和姿态估计方法的稳健性

摘要: 这项工作涉及基于视觉的两阶段6D对象姿态估计的本地鲁棒性认证。对象姿态估计的两阶段方法通过首先利用深度神经网络驱动的关键点回归，然后应用透视-点（PnP）技术，实现了更高的准确性。尽管有所进展，但这些方法的鲁棒性认证仍然很少见。本研究旨在填补这一空白，重点关注系统级别的本地鲁棒性——即在语义输入扰动中保持鲁棒估计的能力。核心思想是将本地鲁棒性的认证转化为用于分类任务的神经网络验证。挑战在于开发与现成验证工具相一致的模型、输入和输出规范。为了促进验证，我们通过将非线性操作替换为更适合验证过程的操作，修改了关键点检测模型。与常见的向图像注入随机噪声不同，我们采用凸包表示图像作为输入规范，以更准确地描述语义扰动。此外，通过进行敏感性分析，我们将鲁棒性标准从姿态传播到关键点准确性，然后制定一个允许设置最大可允许关键点偏移阈值的最佳误差阈值分配问题。将每个像素视为一个独立的类别，这些阈值导致线性、类似分类的输出规范。在某些条件下，我们证明我们认证框架的主要组成部分既严谨又完整，并通过对现实扰动的广泛评估验证其效果。据我们所知，这是第一项在真实世界场景中给定图像的大规模、基于关键点的姿态估计的鲁棒性认证研究。

更新时间: 2024-07-31 19:02:54

领域: cs.CV,cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2408.00117v1

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation. Extending these methods to the nonlinear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective -- the mean-squared Bellman error (MSBE) -- which naturally facilitate nonlinear approximation. In this work, we build on these insights and introduce a new generalized MSPBE that extends the linear MSPBE to the nonlinear setting. We show how this generalized objective unifies previous work and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algorithm to minimize the generalized objective, and show that it is more stable across runs, is less sensitive to hyperparameters, and performs favorably across four control domains with neural network function approximation.

Updated: 2024-07-31 18:50:28

标题: 强化学习中基于离线策略价值估计的广义投影贝尔曼误差

摘要: 许多强化学习算法依赖于值估计，然而，最广泛使用的算法--即时差分算法--在离策略采样和非线性函数逼近下可能会发散。许多算法基于线性均方投影贝尔曼误差（MSPBE）进行离策略值估计，并在线性函数逼近下是有效的。将这些方法扩展到非线性情况通常不成功。最近，一些方法被引入，近似不同的目标--均方贝尔曼误差（MSBE）--这自然促进了非线性逼近。在这项工作中，我们基于这些见解，引入了一个新的广义MSPBE，将线性MSPBE扩展到非线性设置。我们展示了这个广义目标如何统一以前的工作，并得到了解决方案值误差的新界限。我们推导了一个易于使用但有效的算法来最小化广义目标，并展示它在四个带有神经网络函数逼近的控制领域中更稳定，对超参数更不敏感，并表现出色。

更新时间: 2024-07-31 18:50:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2104.13844v3

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning, leading to a blending of the two. This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning? While the deductive reasoning capabilities of LLMs, (i.e. their capacity to follow instructions in reasoning tasks), have received considerable attention, their abilities in true inductive reasoning remain largely unexplored. To delve into the true inductive reasoning capabilities of LLMs, we propose a novel framework, SolverLearner. This framework enables LLMs to learn the underlying function (i.e., $y = f_w(x)$), that maps input data points $(x)$ to their corresponding output values $(y)$, using only in-context examples. By focusing on inductive reasoning and separating it from LLM-based deductive reasoning, we can isolate and investigate inductive reasoning of LLMs in its pure form via SolverLearner. Our observations reveal that LLMs demonstrate remarkable inductive reasoning capabilities through SolverLearner, achieving near-perfect performance with ACC of 1 in most cases. Surprisingly, despite their strong inductive reasoning abilities, LLMs tend to relatively lack deductive reasoning capabilities, particularly in tasks involving ``counterfactual'' reasoning.

Updated: 2024-07-31 18:47:11

标题: 归纳还是演绎？重新思考LLM的基本推理能力

摘要: 推理包括两种典型类型：演绎推理和归纳推理。尽管对大型语言模型（LLMs）的推理能力进行了广泛研究，但大多数研究未能严格区分归纳推理和演绎推理，导致两者混为一谈。这提出了一个重要问题：在LLM推理中，演绎推理和归纳推理哪个更具挑战性？尽管LLMs的演绎推理能力（即它们在推理任务中遵循指令的能力）受到了广泛关注，但它们在真正的归纳推理能力方面仍然大多未被探索。为了深入研究LLMs的真正归纳推理能力，我们提出了一个新颖的框架，SolverLearner。该框架使LLMs能够学习映射输入数据点（$x$）到它们相应输出值（$y$）的基础函数（即$y = f_w(x)$），仅使用上下文示例。通过专注于归纳推理并将其与基于LLM的演绎推理分离，我们可以通过SolverLearner孤立并研究LLMs的纯粹的归纳推理。我们的观察结果显示，LLMs通过SolverLearner展现出卓越的归纳推理能力，大多数情况下实现了几乎完美的性能，ACC为1。令人惊讶的是，尽管它们具有强大的归纳推理能力，LLMs在演绎推理能力方面相对缺乏，特别是在涉及“反事实”推理的任务中。

更新时间: 2024-07-31 18:47:11

领域: cs.AI

下载: http://arxiv.org/abs/2408.00114v1

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

What latent features are encoded in language model (LM) representations? Recent work on training sparse autoencoders (SAEs) to disentangle interpretable features in LM representations has shown significant promise. However, evaluating the quality of these SAEs is difficult because we lack a ground-truth collection of interpretable features that we expect good SAEs to recover. We thus propose to measure progress in interpretable dictionary learning by working in the setting of LMs trained on chess and Othello transcripts. These settings carry natural collections of interpretable features -- for example, "there is a knight on F3" -- which we leverage into $\textit{supervised}$ metrics for SAE quality. To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $\textit{p-annealing}$, which improves performance on prior unsupervised metrics as well as our new metrics.

Updated: 2024-07-31 18:45:13

标题: 用棋盘游戏模型衡量字典学习在语言模型可解释性方面的进展

摘要: 语言模型（LM）表示中编码了哪些潜在特征？最近关于训练稀疏自动编码器（SAEs）以解开LM表示中可解释特征的研究显示出了显著的潜力。然而，评估这些SAEs的质量是困难的，因为我们缺乏一组可解释特征的基准集合，我们期望良好的SAEs能够恢复。因此，我们建议通过在围棋和奥赛罗的转录上训练的LMs的设置中衡量可解释字典学习的进展。这些设置具有自然的可解释特征集合，例如“F3上有一个骑士”，我们利用这些特征作为SAE质量的$\textit{监督}$指标。为了指导可解释字典学习的进展，我们引入了一种新的SAE训练技术，$\textit{p-annealing}$，它提高了先前无监督指标以及我们的新指标的性能。

更新时间: 2024-07-31 18:45:13

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.00113v1

Preference-Based Abstract Argumentation for Case-Based Reasoning (with-Appendix)

In the pursuit of enhancing the efficacy and flexibility of interpretable, data-driven classification models, this work introduces a novel incorporation of user-defined preferences with Abstract Argumentation and Case-Based Reasoning (CBR). Specifically, we introduce Preference-Based Abstract Argumentation for Case-Based Reasoning (which we call AA-CBR-P), allowing users to define multiple approaches to compare cases with an ordering that specifies their preference over these comparison approaches. We prove that the model inherently follows these preferences when making predictions and show that previous abstract argumentation for case-based reasoning approaches are insufficient at expressing preferences over constituents of an argument. We then demonstrate how this can be applied to a real-world medical dataset sourced from a clinical trial evaluating differing assessment methods of patients with a primary brain tumour. We show empirically that our approach outperforms other interpretable machine learning models on this dataset.

Updated: 2024-07-31 18:31:04

标题: 基于偏好的抽象论证在基于案例推理中的应用（附附录）

摘要: 在提高可解释性、数据驱动分类模型的功效和灵活性方面，本研究引入了用户定义偏好与抽象论证和基于案例推理（CBR）的创新融合。具体来说，我们引入了基于偏好的抽象论证用于基于案例推理（我们称之为AA-CBR-P），允许用户定义多种方法来比较案例，并通过排序指定他们对这些比较方法的偏好。我们证明，该模型在进行预测时固有地遵循这些偏好，并展示了先前基于案例推理的抽象论证方法不足以表达对论证要素的偏好。然后，我们演示了如何将此方法应用于从评估原发性脑肿瘤患者不同评估方法的临床试验中获取的真实医疗数据集。我们通过实验证明，我们的方法在该数据集上优于其他可解释的机器学习模型。

更新时间: 2024-07-31 18:31:04

领域: cs.AI

下载: http://arxiv.org/abs/2408.00108v1

WAS: Dataset and Methods for Artistic Text Segmentation

Accurate text segmentation results are crucial for text-related generative tasks, such as text image generation, text editing, text removal, and text style transfer. Recently, some scene text segmentation methods have made significant progress in segmenting regular text. However, these methods perform poorly in scenarios containing artistic text. Therefore, this paper focuses on the more challenging task of artistic text segmentation and constructs a real artistic text segmentation dataset. One challenge of the task is that the local stroke shapes of artistic text are changeable with diversity and complexity. We propose a decoder with the layer-wise momentum query to prevent the model from ignoring stroke regions of special shapes. Another challenge is the complexity of the global topological structure. We further design a skeleton-assisted head to guide the model to focus on the global structure. Additionally, to enhance the generalization performance of the text segmentation model, we propose a strategy for training data synthesis, based on the large multi-modal model and the diffusion model. Experimental results show that our proposed method and synthetic dataset can significantly enhance the performance of artistic text segmentation and achieve state-of-the-art results on other public datasets.

Updated: 2024-07-31 18:29:36

标题: 艺术文本分割的数据集和方法：WAS

摘要: 准确的文本分割结果对于文本相关的生成任务至关重要，例如文本图像生成，文本编辑，文本去除和文本风格转移。最近，一些场景文本分割方法在分割常规文本方面取得了重大进展。然而，这些方法在包含艺术文本的情景中表现不佳。因此，本文专注于更具挑战性的艺术文本分割任务，并构建了一个真实的艺术文本分割数据集。该任务的一个挑战是艺术文本的局部笔画形状具有多样性和复杂性。我们提出了一种带有逐层动量查询的解码器，以防止模型忽视特殊形状的笔画区域。另一个挑战是全局拓扑结构的复杂性。我们进一步设计了一个骨架辅助头部，引导模型专注于全局结构。此外，为了增强文本分割模型的泛化性能，我们提出了一种基于大型多模型和扩散模型的训练数据合成策略。实验结果显示，我们提出的方法和合成数据集可以显著提升艺术文本分割的性能，并在其他公共数据集上取得了最先进的结果。

更新时间: 2024-07-31 18:29:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.00106v1

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

Entity Linking (EL) and Relation Extraction (RE) are fundamental tasks in Natural Language Processing, serving as critical components in a wide range of applications. In this paper, we propose ReLiK, a Retriever-Reader architecture for both EL and RE, where, given an input text, the Retriever module undertakes the identification of candidate entities or relations that could potentially appear within the text. Subsequently, the Reader module is tasked to discern the pertinent retrieved entities or relations and establish their alignment with the corresponding textual spans. Notably, we put forward an innovative input representation that incorporates the candidate entities or relations alongside the text, making it possible to link entities or extract relations in a single forward pass and to fully leverage pre-trained language models contextualization capabilities, in contrast with previous Retriever-Reader-based methods, which require a forward pass for each candidate. Our formulation of EL and RE achieves state-of-the-art performance in both in-domain and out-of-domain benchmarks while using academic budget training and with up to 40x inference speed compared to competitors. Finally, we show how our architecture can be used seamlessly for Information Extraction (cIE), i.e. EL + RE, and setting a new state of the art by employing a shared Reader that simultaneously extracts entities and relations.

Updated: 2024-07-31 18:25:49

标题: ReLiK：检索和链接，基于学术预算快速准确的实体链接和关系提取

摘要: 实体链接（EL）和关系抽取（RE）是自然语言处理中的基本任务，是广泛应用中至关重要的组成部分。在本文中，我们提出了ReLiK，一个用于EL和RE的检索器-阅读器架构，其中检索器模块根据输入文本确定可能出现在文本中的候选实体或关系。随后，阅读器模块负责区分相关的检索到的实体或关系，并建立它们与相应文本范围的对齐。值得注意的是，我们提出了一种创新的输入表示，将候选实体或关系与文本一起纳入，使得能够在单次前向传递中链接实体或提取关系，并充分利用预训练语言模型的上下文化能力，与之前需要为每个候选实体进行前向传递的检索器-阅读器方法形成对比。我们的EL和RE配方在领域内和领域外基准测试中实现了最先进的性能，同时使用学术预算训练，并与竞争对手相比推理速度提高了40倍。最后，我们展示了我们的架构如何无缝地用于信息抽取（cIE），即EL + RE，并通过使用共享阅读器同时提取实体和关系，树立了一个新的艺术水平。

更新时间: 2024-07-31 18:25:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.00103v1

Adaptive Transit Signal Priority based on Deep Reinforcement Learning and Connected Vehicles in a Traffic Microsimulation Environment

Model free reinforcement learning (RL) provides a potential alternative to earlier formulations of adaptive transit signal priority (TSP) algorithms based on mathematical programming that require complex and nonlinear objective functions. This study extends RL - based traffic control to include TSP. Using a microscopic simulation environment and connected vehicle data, the study develops and tests a TSP event-based RL agent that assumes control from another developed RL - based general traffic signal controller. The TSP agent assumes control when transit buses enter the dedicated short-range communication (DSRC) zone of the intersection. This agent is shown to reduce the bus travel time by about 21%, with marginal impacts to general traffic at a saturation rate of 0.95. The TSP agent also shows slightly better bus travel time compared to actuated signal control with TSP. The architecture of the agent and simulation is selected considering the need to improve simulation run time efficiency.

Updated: 2024-07-31 18:17:22

标题: 基于深度强化学习和连接车辆的交通微观仿真环境中的自适应公交车辆信号优先

摘要: 无模型强化学习（RL）为基于数学规划的复杂和非线性目标函数的早期自适应交通信号优先（TSP）算法提供了潜在的替代方案。本研究将RL - 基础的交通控制扩展到包括TSP。利用微观模拟环境和连接车辆数据，研究开发并测试了一个TSP基于事件的RL代理，该代理在另一个开发的RL - 基础的一般交通信号控制器控制时接管控制。当公交车进入十字路口的专用短程通信（DSRC）区域时，TSP代理才会接管控制。在饱和率为0.95时，该代理被证明可以将公交车的行驶时间缩短约21％，对一般交通的影响微乎其微。与带有TSP的感应信号控制相比，TSP代理还显示出稍微更好的公交车行驶时间。代理和模拟的架构被选中，考虑到了提高模拟运行时间效率的需求。

更新时间: 2024-07-31 18:17:22

领域: cs.LG

下载: http://arxiv.org/abs/2408.00098v1

From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification

Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodal analysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation-guided re-identification(TBPGR).

Updated: 2024-07-31 18:16:18

标题: 从属性到自然语言：基于文本的人员重新识别的调查和展望

摘要: 基于文本的人员重识别（Re-ID）是复杂多模态分析领域中具有挑战性的主题，其最终目标是通过审查属性/自然语言描述来识别特定的行人。尽管适用领域广泛，如安全监控、视频检索、人员跟踪和社交媒体分析，但在技术角度上缺乏专门总结基于文本的人员重识别的综合评估。为了填补这一空白，我们提出引入一个跨评估、策略、架构和优化维度的分类法，提供对基于文本的人员重识别任务的全面调查。我们首先奠定基础为基于文本的人员重识别，阐明与属性/自然语言识别相关的基本概念。然后介绍现有基准数据集和指标的彻底检查。随后，我们进一步深入研究了用于文本基础人员Re-ID研究的流行特征提取策略，然后简要总结了该领域内常见的网络架构。还对用于模型优化和模态对齐的流行损失函数进行了审查。最后，我们提供了对我们研究结果的简要总结，指出了基于文本的人员重识别中的挑战。针对这些挑战，我们概述了未来开放式基于文本的人员重识别的潜在途径，并提出了一个基于文本的行人图像生成引导重识别（TBPGR）的基线架构。

更新时间: 2024-07-31 18:16:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.00096v1

Execution Semantics of Behavior Trees in Robotic Applications

This document aims at describing, in a suitably precise and unambiguous though informal way, the execution semantics of Behavior Trees as used in Robotics applications, with particular attention to the Halt semantics.

Updated: 2024-07-31 18:08:59

标题: 行为树在机器人应用中的执行语义

摘要: 本文档旨在以适当精确和明确但非正式的方式描述在机器人应用中使用的行为树的执行语义，特别关注停止语义。

更新时间: 2024-07-31 18:08:59

领域: cs.RO,cs.AI,68T30,I.2.4

下载: http://arxiv.org/abs/2408.00090v1

Approximating Rayleigh Scattering in Exoplanetary Atmospheres using Physics-informed Neural Networks (PINNs)

This research introduces an innovative application of physics-informed neural networks (PINNs) to tackle the intricate challenges of radiative transfer (RT) modeling in exoplanetary atmospheres, with a special focus on efficiently handling scattering phenomena. Traditional RT models often simplify scattering as absorption, leading to inaccuracies. Our approach utilizes PINNs, noted for their ability to incorporate the governing differential equations of RT directly into their loss function, thus offering a more precise yet potentially fast modeling technique. The core of our method involves the development of a parameterized PINN tailored for a modified RT equation, enhancing its adaptability to various atmospheric scenarios. We focus on RT in transiting exoplanet atmospheres using a simplified 1D isothermal model with pressure-dependent coefficients for absorption and Rayleigh scattering. In scenarios of pure absorption, the PINN demonstrates its effectiveness in predicting transmission spectra for diverse absorption profiles. For Rayleigh scattering, the network successfully computes the RT equation, addressing both direct and diffuse stellar light components. While our preliminary results with simplified models are promising, indicating the potential of PINNs in improving RT calculations, we acknowledge the errors stemming from our approximations as well as the challenges in applying this technique to more complex atmospheric conditions. Specifically, extending our approach to atmospheres with intricate temperature-pressure profiles and varying scattering properties, such as those introduced by clouds and hazes, remains a significant area for future development.

Updated: 2024-07-31 18:00:55

标题: 用物理信息神经网络（PINNs）近似外行星大气中的瑞利散射

摘要: 这项研究介绍了一种创新的物理信息神经网络（PINNs）在处理外行星大气中辐射传递（RT）建模的复杂挑战方面的应用，特别关注有效处理散射现象。传统的RT模型通常将散射简化为吸收，导致不准确性。我们的方法利用PINNs，这种网络以能够直接将RT的控制微分方程纳入其损失函数中而闻名，从而提供了一种更精确但潜在快速的建模技术。我们方法的核心是开发一个针对修改后的RT方程量身定制的参数化PINN，增强其适应各种大气情景的能力。我们专注于使用简化的1D等温模型进行的过境外行星大气中的RT，其中吸收和瑞利散射的系数取决于压力。在纯吸收情况下，PINN展示了其在预测各种吸收剖面的透射光谱中的有效性。对于瑞利散射，网络成功计算了RT方程，处理了直接和漫射的恒星光成分。尽管我们使用简化模型的初步结果令人鼓舞，表明PINNs在改进RT计算方面的潜力，但我们承认源自我们近似的误差以及将这种技术应用于更复杂大气条件的挑战。具体地，将我们的方法扩展到具有复杂温度-压力剖面和不同散射特性的大气，例如由云层和雾霾引起的那些情况，仍然是未来发展的一个重要领域。

更新时间: 2024-07-31 18:00:55

领域: astro-ph.EP,astro-ph.IM,cs.LG,cs.NE

下载: http://arxiv.org/abs/2408.00084v1

TASI Lectures on Physics for Machine Learning

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.

Updated: 2024-07-31 18:00:22

标题: TASI关于机器学习物理的讲座

摘要: 这些笔记是基于我在TASI 2024关于物理与机器学习的讲座。重点是神经网络理论，按照网络表达能力、统计和动力学组织。我介绍了经典结果，如普适逼近定理和神经网络/高斯过程对应，以及更近期的结果，如神经切向核、最大更新参数化的特征学习，以及Kolmogorov-Arnold网络。神经网络理论的阐述强调了熟悉理论物理学家的场论视角。我详细阐述了两者之间的联系，包括神经网络方法对场论的应用。

更新时间: 2024-07-31 18:00:22

领域: hep-th,cs.LG,hep-ph

下载: http://arxiv.org/abs/2408.00082v1

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.

Updated: 2024-07-31 17:59:58

标题: 在视觉语言模型时代的广义分布外检测及其进展：一项调查

摘要: 检测超出分布范围（OOD）样本对于确保机器学习系统的安全至关重要，并且已经塑造了OOD检测领域。同时，还有几个与OOD检测密切相关的问题，包括异常检测（AD）、新颖性检测（ND）、开放集识别（OSR）和异常检测（OD）。为了统一这些问题，提出了一个泛化的OOD检测框架，对这五个问题进行了分类。然而，视觉语言模型（VLMs）如CLIP已经显著改变了范式，并模糊了这些领域之间的界限，再次让研究人员感到困惑。在本调查中，我们首先提出了一个泛化的OOD检测v2，概括了AD、ND、OSR、OOD检测和OD在VLM时代的演变。我们的框架揭示了，通过一定程度的领域不活跃和整合，需求挑战已经变成了OOD检测和AD。此外，我们还强调了定义、问题设置和基准的显著转变；因此，我们特色是对OOD检测方法论的全面审查，包括对其他相关任务的讨论以澄清它们与OOD检测的关系。最后，我们探讨了新兴大型视觉语言模型（LVLM）时代的进展，如GPT-4V。我们以开放挑战和未来方向总结了这项调查。

更新时间: 2024-07-31 17:59:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21794v1

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

As artificial intelligence systems grow more powerful, there has been increasing interest in "AI safety" research to address emerging and future risks. However, the field of AI safety remains poorly defined and inconsistently measured, leading to confusion about how researchers can contribute. This lack of clarity is compounded by the unclear relationship between AI safety benchmarks and upstream general capabilities (e.g., general knowledge and reasoning). To address these issues, we conduct a comprehensive meta-analysis of AI safety benchmarks, empirically analyzing their correlation with general capabilities across dozens of models and providing a survey of existing directions in AI safety. Our findings reveal that many safety benchmarks highly correlate with upstream model capabilities, potentially enabling "safetywashing" -- where capability improvements are misrepresented as safety advancements. Based on these findings, we propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context as a set of clearly delineated research goals that are empirically separable from generic capabilities advancements. In doing so, we aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.

Updated: 2024-07-31 17:59:24

标题: 安全洗白：人工智能安全基准实际上衡量安全进展吗？

摘要: 随着人工智能系统变得越来越强大，对于解决新兴和未来风险的“人工智能安全”研究越来越受到关注。然而，人工智能安全领域仍然定义模糊，评估不一致，导致研究人员如何贡献存在混淆。这种缺乏清晰性受到AI安全基准和上游通用能力（例如通用知识和推理）之间关系不清的影响。为了解决这些问题，我们进行了一项全面的AI安全基准的元分析，通过对数十个模型进行实证分析，分析它们与通用能力的相关性，并提供AI安全现有方向的调查。我们的研究结果表明，许多安全基准与上游模型能力高度相关，有可能导致“安全洗白”-其中能力改进被误传为安全进展。基于这些发现，我们提出了建立更有意义的安全度量标准的实证基础，并在机器学习研究背景下定义AI安全为一组与通用能力进步在实证上可分离的研究目标。通过这样做，我们旨在为AI安全研究提供更严谨的框架，推进安全评估科学，澄清通向可量化进展的道路。

更新时间: 2024-07-31 17:59:24

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2407.21792v1

Deep Learning for Options Trading: An End-To-End Approach

We introduce a novel approach to options trading strategies using a highly scalable and data-driven machine learning algorithm. In contrast to traditional approaches that often require specifications of underlying market dynamics or assumptions on an option pricing model, our models depart fundamentally from the need for these prerequisites, directly learning non-trivial mappings from market data to optimal trading signals. Backtesting on more than a decade of option contracts for equities listed on the S&P 100, we demonstrate that deep learning models trained according to our end-to-end approach exhibit significant improvements in risk-adjusted performance over existing rules-based trading strategies. We find that incorporating turnover regularization into the models leads to further performance enhancements at prohibitively high levels of transaction costs.

Updated: 2024-07-31 17:59:09

标题: 期权交易的深度学习：一种端到端的方法

摘要: 我们引入了一种新颖的期权交易策略，使用高度可扩展和数据驱动的机器学习算法。与传统方法相反，传统方法通常需要规定基础市场动态或对期权定价模型进行假设，我们的模型从根本上摆脱了这些先决条件的需要，直接从市场数据学习到最佳交易信号的非平凡映射。我们对标准普尔100指数上的股票期权合约进行了超过十年的回测，证明了根据我们端到端方法训练的深度学习模型在风险调整绩效方面显著优于现有基于规则的交易策略。我们发现将换手率正则化纳入模型会在交易成本高得无法承受的水平上进一步提升绩效。

更新时间: 2024-07-31 17:59:09

领域: q-fin.PM,cs.LG,q-fin.CP,q-fin.TR

下载: http://arxiv.org/abs/2407.21791v1

PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks

Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation. We open-source our code and benchmark comprising 13 new tasks with 23 unique task variations, each requiring a high degree of coordination and adaptability. To kickstart the benchmark, we extended several state-of-the art methods to bimanual manipulation and also present a language-conditioned behavioral cloning agent -- PerAct2, which enables the learning and execution of bimanual 6-DoF manipulation tasks. Our novel network architecture efficiently integrates language processing with action prediction, allowing robots to understand and perform complex bimanual tasks in response to user-specified goals. Project website with code is available at: http://bimanual.github.io

Updated: 2024-07-31 17:57:37

标题: PerAct2：机器人双手操作任务的基准测试与学习

摘要: Bimanual manipulation（双手操作）具有挑战性，因为需要两只手之间精确的空间和时间协调。虽然存在一些真实世界的双手操作系统，但缺乏一个包含大量任务多样性的模拟基准，以系统地研究跨广泛范围的桌面任务的双手能力。本文通过将RLBench扩展到双手操作来填补这一空白。我们开源了我们的代码和基准，包括13个新任务和23个独特的任务变体，每个任务都需要高度的协调和适应能力。为了启动基准，我们将几种最先进的方法扩展到双手操作，并提出了一种语言条件的行为克隆代理——PerAct2，它可以学习和执行双手6自由度操作任务。我们的新颖网络架构有效地将语言处理与动作预测整合在一起，使机器人能够理解并根据用户指定的目标执行复杂的双手任务。项目网站及代码可在http://bimanual.github.io获取。

更新时间: 2024-07-31 17:57:37

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.00278v2

Occam Gradient Descent

Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, overprovisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we exploit learning theory to derive Occam Gradient Descent, an algorithm that interleaves adaptive reduction of model size to minimize generalization error, with gradient descent on model weights to minimize fitting error. In contrast, traditional gradient descent greedily minimizes fitting error without regard to generalization error. Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification. With respect to loss, compute and model size, our experiments show (a) on image classification benchmarks, linear and convolutional neural networks trained with Occam Gradient Descent outperform traditional gradient descent with or without post-train pruning; (b) on a range of tabular data classification tasks, neural networks trained with Occam Gradient Descent outperform traditional gradient descent, as well as Random Forests; (c) on natural language transformers, Occam Gradient Descent outperforms traditional gradient descent.

Updated: 2024-07-31 17:57:33

标题: Occam梯度下降

摘要: 深度学习神经网络模型必须足够大，以适应其问题领域，同时又要足够小，以避免在梯度下降过程中过拟合训练数据。为了平衡这些竞争性需求，过度提供的深度学习模型如变压器在大数据集上进行单次时代的训练，因此在计算资源和训练数据方面效率低下。为了应对这些低效问题，我们利用学习理论推导出Occam梯度下降算法，该算法交替进行模型大小的自适应减少以最小化泛化误差，并在模型权重上进行梯度下降以最小化拟合误差。相比之下，传统的梯度下降贪婪地最小化拟合误差，而不考虑泛化误差。我们的算法同时降低神经网络的权重空间和拓扑大小，不需进行修改。在损失、计算和模型大小方面，我们的实验结果表明：（a）在图像分类基准测试中，使用Occam梯度下降训练的线性和卷积神经网络优于传统梯度下降，无论是否进行后训练修剪；（b）在一系列表格数据分类任务中，使用Occam梯度下降训练的神经网络优于传统梯度下降，以及随机森林；（c）在自然语言变压器中，Occam梯度下降优于传统梯度下降。

更新时间: 2024-07-31 17:57:33

领域: cs.LG

下载: http://arxiv.org/abs/2405.20194v4

Vision-Language Model Based Handwriting Verification

Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGemma, to address these challenges. By leveraging their Visual Question Answering capabilities and 0-shot Chain-of-Thought (CoT) reasoning, our goal is to provide clear, human-understandable explanations for model decisions. Our experiments on the CEDAR handwriting dataset demonstrate that VLMs offer enhanced interpretability, reduce the need for large training datasets, and adapt better to diverse handwriting styles. However, results show that the CNN-based ResNet-18 architecture outperforms the 0-shot CoT prompt engineering approach with GPT-4o (Accuracy: 70%) and supervised fine-tuned PaliGemma (Accuracy: 71%), achieving an accuracy of 84% on the CEDAR AND dataset. These findings highlight the potential of VLMs in generating human-interpretable decisions while underscoring the need for further advancements to match the performance of specialized deep learning models.

Updated: 2024-07-31 17:57:32

标题: 基于视觉-语言模型的手写验证

摘要: 手写验证在文档取证中至关重要。基于深度学习的方法往往会面临法庭文件鉴定人员的怀疑，因为它们缺乏可解释性，并且依赖于大量的训练数据和手工特征。本文探讨了使用视觉语言模型（VLMs），如OpenAI的GPT-4o和Google的PaliGemma，来解决这些挑战。通过利用它们的视觉问答能力和0-shot Chain-of-Thought（CoT）推理，我们的目标是为模型决策提供清晰、易于理解的解释。我们在CEDAR手写数据集上的实验表明，VLMs提供了增强的可解释性，减少了对大型训练数据集的需求，并更好地适应了多样化的手写风格。然而，结果显示基于CNN的ResNet-18架构在CEDAR AND数据集上的准确率达到84%，优于0-shot CoT提示工程方法与GPT-4o（准确率：70%）和经过监督微调的PaliGemma（准确率：71%）。这些发现突显了VLMs在生成人类可解释决策方面的潜力，同时也强调了需要进一步的发展来匹配专门的深度学习模型的性能。

更新时间: 2024-07-31 17:57:32

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21788v1

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models. Moreover, using current API pricing, amplifying the cheaper DeepSeek model with five samples is more cost-effective and solves more issues than paying a premium for one sample from GPT-4o or Claude 3.5 Sonnet. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. Finally, we find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers. When solving math word problems from GSM8K and MATH, coverage with Llama-3 models grows to over 95% with 10,000 samples. However, common methods to pick correct solutions from a sample collection, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget.

Updated: 2024-07-31 17:57:25

标题: 大语言猴：通过重复采样实现推理计算的可扩展性

摘要: 将用于训练语言模型的计算量进行缩放显著提高了它们的能力。然而，在推理方面，我们经常将计算量限制为每个问题仅一次尝试。在这里，我们通过增加生成样本数量来探索推理计算作为另一个缩放轴。在多个任务和模型中，我们观察到覆盖率 - 任何尝试解决的问题的比例 - 随着样本数量增加而扩展了四个数量级。在编码和形式证明等领域，所有答案都可以自动验证，这些覆盖率的增加直接转化为性能提高。当我们将重复采样应用于SWE-bench Lite时，使用250个样本，DeepSeek-V2-Coder-Instruct解决的问题比使用一个样本的15.9%提高到56%，超过了使用更强大的前沿模型的单次尝试的最新技术43%。此外，根据当前的API定价，使用更便宜的DeepSeek模型和五个样本比支付GPT-4o或Claude 3.5 Sonnet的一个样本更具成本效益并解决更多问题。有趣的是，覆盖率与样本数量之间的关系通常是对数线性的，并且可以用指数化的权力定律来建模，这表明存在推理时间缩放定律。最后，我们发现在没有自动验证器的领域，从许多生成中识别正确样本仍然是未来研究的一个重要方向。当解决来自GSM8K和MATH的数学文字问题时，使用Llama-3模型，覆盖率在10,000个样本中增长到超过95%。然而，常见的从样本集合中选择正确解决方案的方法，如多数投票或奖励模型，在几百个样本后达到平台，并且不能完全随着样本预算的扩展。

更新时间: 2024-07-31 17:57:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21787v1

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Recently, large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks. Typically, an LLM is pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during fine-tuning, LLMs may forget the knowledge acquired in the pre-training stage, leading to a decline in general capabilities. To address this issue, we propose a new fine-tuning algorithm termed Momentum-Filtered Optimizer (MoFO). The key idea of MoFO is to iteratively select and update the model parameters with the largest momentum magnitudes. Compared to full-parameter training, MoFO achieves similar fine-tuning performance while keeping parameters closer to the pre-trained model, thereby mitigating knowledge forgetting. Unlike most existing methods for forgetting mitigation, MoFO combines the following two advantages. First, MoFO does not require access to pre-training data. This makes MoFO particularly suitable for fine-tuning scenarios where pre-training data is unavailable, such as fine-tuning checkpoint-only open-source LLMs. Second, MoFO does not alter the original loss function. This could avoid impairing the model performance on the fine-tuning tasks. We validate MoFO through rigorous convergence analysis and extensive experiments, demonstrating its superiority over existing methods in mitigating forgetting and enhancing fine-tuning performance.

Updated: 2024-07-31 17:56:03

标题: MoFO：用于减轻LLM微调中遗忘现象的动量滤波优化器

摘要: 最近，大型语言模型（LLMs）在各种任务中展示出了卓越的能力。通常，LLM会在大型语料库上进行预训练，然后在特定任务数据集上进行微调。然而，在微调过程中，LLMs可能会忘记在预训练阶段获得的知识，导致整体能力下降。为了解决这个问题，我们提出了一种新的微调算法，称为动量过滤优化器（MoFO）。MoFO的关键思想是迭代选择和更新具有最大动量幅度的模型参数。与全参数训练相比，MoFO在保持参数接近预训练模型的同时实现了类似的微调性能，从而缓解了知识遗忘。与大多数现有的遗忘缓解方法不同，MoFO结合了以下两个优点。首先，MoFO不需要访问预训练数据。这使得MoFO特别适用于微调场景，其中预训练数据不可用，例如仅使用检查点的开源LLMs进行微调。其次，MoFO不会改变原始损失函数。这有助于避免影响模型在微调任务上的性能。我们通过严格的收敛分析和大量实验证明了MoFO在减轻遗忘和增强微调性能方面优于现有方法。

更新时间: 2024-07-31 17:56:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20999v2

FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning

Federated learning (FL) is a promising framework for learning from distributed data while maintaining privacy. The development of efficient FL algorithms encounters various challenges, including heterogeneous data and systems, limited communication capacities, and constrained local computational resources. Recently developed FedADMM methods show great resilience to both data and system heterogeneity. However, they still suffer from performance deterioration if the hyperparameters are not carefully tuned. To address this issue, we propose an inexact and self-adaptive FedADMM algorithm, termed FedADMM-InSa. First, we design an inexactness criterion for the clients' local updates to eliminate the need for empirically setting the local training accuracy. This inexactness criterion can be assessed by each client independently based on its unique condition, thereby reducing the local computational cost and mitigating the undesirable straggle effect. The convergence of the resulting inexact ADMM is proved under the assumption of strongly convex loss functions. Additionally, we present a self-adaptive scheme that dynamically adjusts each client's penalty parameter, enhancing algorithm robustness by mitigating the need for empirical penalty parameter choices for each client. Extensive numerical experiments on both synthetic and real-world datasets are conducted. As validated by some numerical tests, our proposed algorithm can reduce the clients' local computational load significantly and also accelerate the learning process compared to the vanilla FedADMM.

Updated: 2024-07-31 17:55:00

标题: FedADMM-InSa：一种用于联邦学习的不精确和自适应ADMM

摘要: 联邦学习（FL）是一种有前途的学习框架，可以从分布式数据中学习同时保护隐私。有效的FL算法的发展遇到了各种挑战，包括异构数据和系统、有限的通信容量以及受限的本地计算资源。最近开发的FedADMM方法表现出对数据和系统异构性的很强的韧性。然而，如果超参数没有经过仔细调整，它们仍然会遭受性能下降。为了解决这个问题，我们提出了一种不精确和自适应的FedADMM算法，称为FedADMM-InSa。首先，我们为客户端的本地更新设计了一个不精确性标准，以消除对经验性地设置本地训练准确度的需要。这种不精确性标准可以由每个客户端独立评估，基于其独特条件，从而减少本地计算成本并减轻不良的拖延效应。在假设强凸损失函数的情况下，证明了所得到的不精确ADMM的收敛性。此外，我们提出了一个自适应方案，动态调整每个客户端的惩罚参数，通过减轻对每个客户端经验性惩罚参数选择的需求，增强算法的鲁棒性。进行了大量关于合成和真实数据集的数值实验。通过一些数值测试验证，我们提出的算法可以显著减少客户端的本地计算负载，并且相对于普通的FedADMM加快学习过程。

更新时间: 2024-07-31 17:55:00

领域: cs.LG,cs.CR,cs.DC,math.OC

下载: http://arxiv.org/abs/2402.13989v3

The Llama 3 Herd of Models

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Updated: 2024-07-31 17:54:27

标题: 模型的羊驼3群

摘要: 现代人工智能（AI）系统由基础模型驱动。本文介绍了一组新的基础模型，名为Llama 3。它是一群天然支持多语言、编码、推理和工具使用的语言模型。我们最大的模型是一个拥有405B参数和最多128K令牌的上下文窗口的密集Transformer。本文对Llama 3进行了广泛的实证评估。我们发现Llama 3在大量任务上与领先的语言模型（如GPT-4）具有可比质量。我们公开发布了Llama 3，包括经过预训练和后训练的405B参数语言模型以及我们的Llama Guard 3模型，用于输入和输出安全性。本文还介绍了一系列实验结果，我们通过组合方法将图像、视频和语音功能整合到Llama 3中。我们观察到这种方法在图像、视频和语音识别任务上表现出与最先进技术相媲美的竞争力。由此产生的模型尚未广泛发布，因为它们仍在开发中。

更新时间: 2024-07-31 17:54:27

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2407.21783v1

GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning

Proteins play a vital role in biological processes and are indispensable for living organisms. Accurate representation of proteins is crucial, especially in drug development. Recently, there has been a notable increase in interest in utilizing machine learning and deep learning techniques for unsupervised learning of protein representations. However, these approaches often focus solely on the amino acid sequence of proteins and lack factual knowledge about proteins and their interactions, thus limiting their performance. In this study, we present GOProteinGNN, a novel architecture that enhances protein language models by integrating protein knowledge graph information during the creation of amino acid level representations. Our approach allows for the integration of information at both the individual amino acid level and the entire protein level, enabling a comprehensive and effective learning process through graph-based learning. By doing so, we can capture complex relationships and dependencies between proteins and their functional annotations, resulting in more robust and contextually enriched protein representations. Unlike previous fusion methods, GOProteinGNN uniquely learns the entire protein knowledge graph during training, which allows it to capture broader relational nuances and dependencies beyond mere triplets as done in previous work. We perform a comprehensive evaluation on several downstream tasks demonstrating that GOProteinGNN consistently outperforms previous methods, showcasing its effectiveness and establishing it as a state-of-the-art solution for protein representation learning.

Updated: 2024-07-31 17:54:22

标题: GOProteinGNN：利用蛋白质知识图谱进行蛋白质表示学习

摘要: 蛋白质在生物过程中发挥着至关重要的作用，对生物体来说是不可或缺的。准确表征蛋白质尤为关键，特别是在药物开发中。最近，人们对利用机器学习和深度学习技术进行无监督学习蛋白质表征的兴趣明显增加。然而，这些方法往往仅关注蛋白质的氨基酸序列，缺乏关于蛋白质及其相互作用的实际知识，从而限制了它们的性能。在这项研究中，我们提出了GOProteinGNN，这是一种新颖的架构，通过在氨基酸水平表示的创建过程中集成蛋白质知识图信息来增强蛋白质语言模型。我们的方法允许在单个氨基酸水平和整个蛋白质水平上集成信息，通过基于图的学习实现全面和有效的学习过程。通过这样做，我们可以捕捉蛋白质及其功能注释之间的复杂关系和依赖关系，从而得到更加健壮和具有上下文丰富性的蛋白质表征。与以往的融合方法不同，GOProteinGNN在训练过程中独特地学习整个蛋白质知识图，从而使其能够捕捉更广泛的关系微妙之处和依赖关系，超越了以前的工作中仅限于三元组的范围。我们对多个下游任务进行了全面评估，结果表明GOProteinGNN始终优于以前的方法，展示了其有效性，并确立了它作为蛋白质表征学习的最新解决方案的地位。

更新时间: 2024-07-31 17:54:22

领域: q-bio.BM,cs.LG,I.2

下载: http://arxiv.org/abs/2408.00057v1

Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries

We introduce tulip agent, an architecture for autonomous LLM-based agents with Create, Read, Update, and Delete access to a tool library containing a potentially large number of tools. In contrast to state-of-the-art implementations, tulip agent does not encode the descriptions of all available tools in the system prompt, which counts against the model's context window, or embed the entire prompt for retrieving suitable tools. Instead, the tulip agent can recursively search for suitable tools in its extensible tool library, implemented exemplarily as a vector store. The tulip agent architecture significantly reduces inference costs, allows using even large tool libraries, and enables the agent to adapt and extend its set of tools. We evaluate the architecture with several ablation studies in a mathematics context and demonstrate its generalizability with an application to robotics. A reference implementation and the benchmark are available at github.com/HRI-EU/tulip_agent.

Updated: 2024-07-31 17:50:54

标题: 郁金香代理——使基于LLM的代理能够利用大型工具库解决任务

摘要: 我们介绍了郁金香代理，这是一种基于自主LLM的代理架构，具有对包含大量工具的工具库的创建、读取、更新和删除访问权限。与最先进的实现不同，郁金香代理不会在系统提示中编码所有可用工具的描述，这会影响模型的上下文窗口，也不会嵌入检索适用工具的整个提示。相反，郁金香代理可以在可扩展的工具库中递归搜索适用工具，该库被实例化为向量存储器。郁金香代理架构显著降低了推理成本，允许使用甚至大型工具库，并使代理能够适应和扩展其工具集。我们在数学背景下进行了几项消融研究来评估该架构，并通过一个应用于机器人技术的示例展示了其泛化能力。参考实现和基准测试可在github.com/HRI-EU/tulip_agent上找到。

更新时间: 2024-07-31 17:50:54

领域: cs.AI,cs.RO,H.3.3; I.2.6; I.2.8; I.2.9

下载: http://arxiv.org/abs/2407.21778v1

ShieldGemma: Generative AI Content Moderation Based on Gemma

We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation models built upon Gemma2. These models provide robust, state-of-the-art predictions of safety risks across key harm types (sexually explicit, dangerous content, harassment, hate speech) in both user input and LLM-generated output. By evaluating on both public and internal benchmarks, we demonstrate superior performance compared to existing models, such as Llama Guard (+10.8\% AU-PRC on public benchmarks) and WildCard (+4.3\%). Additionally, we present a novel LLM-based data curation pipeline, adaptable to a variety of safety-related tasks and beyond. We have shown strong generalization performance for model trained mainly on synthetic data. By releasing ShieldGemma, we provide a valuable resource to the research community, advancing LLM safety and enabling the creation of more effective content moderation solutions for developers.

Updated: 2024-07-31 17:48:14

标题: ShieldGemma：基于Gemma的生成式AI内容审核

摘要: 我们介绍了ShieldGemma，这是一个基于Gemma2构建的基于LLM的全面安全内容调节模型套件。这些模型提供了针对关键伤害类型（性暴力，危险内容，骚扰，仇恨言论）在用户输入和LLM生成的输出中的安全风险的强大、最新预测。通过在公共和内部基准上进行评估，我们展示了与现有模型（如Llama Guard（公共基准上+10.8\% AU-PRC）和WildCard（+4.3\%））相比的卓越性能。此外，我们提出了一种新颖的基于LLM的数据策划管道，可适应各种安全相关任务及其他任务。我们展示了主要在合成数据上训练的模型的强大泛化性能。通过发布ShieldGemma，我们为研究社区提供了宝贵资源，推动了LLM安全的发展，并为开发者创造更有效的内容调节解决方案。

更新时间: 2024-07-31 17:48:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21772v1

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adaptivity. Our empirical results reveal substantial pre-training efficiency gains through this modality-specific parameter allocation. Under a 1-trillion-token training budget, the MoMa 1.4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3.7x overall, with 2.6x for text and 5.2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss. This outperforms the standard expert-choice MoE with 8 mixed-modal experts, which achieves 3x overall FLOPs savings (3x for text, 2.8x for image). Combining MoMa with mixture-of-depths (MoD) further improves pre-training FLOPs savings to 4.2x overall (text: 3.4x, image: 5.3x), although this combination hurts performance in causal inference due to increased sensitivity to router accuracy. These results demonstrate MoMa's potential to significantly advance the efficiency of mixed-modal, early-fusion language model pre-training, paving the way for more resource-efficient and capable multimodal AI systems.

Updated: 2024-07-31 17:46:51

标题: MoMa：具有模态感知专家混合的高效早期融合预训练

摘要: 我们介绍了MoMa，这是一种新颖的模态感知混合专家（MoE）架构，旨在为预训练混合模态、早期融合语言模型设计。MoMa通过将专家模块分成模态特定的组，可以以任意顺序处理图像和文本。这些组专门处理指定的标记，同时在每个组内使用学习路由来保持语义信息适应性。我们的实证结果表明，通过这种模态特定的参数分配，可以实现实质性的预训练效率提升。在1万亿标记的训练预算下，MoMa 1.4B模型，包括4个文本专家和4个图像专家，实现了令人印象深刻的FLOPs节省：总体上为3.7倍，其中文本处理为2.6倍，图像处理为5.2倍，与计算等效的密集基线相比，通过预训练损失进行衡量。这优于具有8个混合模态专家的标准专家选择MoE，后者实现了总体FLOPs节省3倍（文本为3倍，图像为2.8倍）。将MoMa与深度混合（MoD）结合使用，进一步提高了预训练FLOPs节省至总体4.2倍（文本为3.4倍，图像为5.3倍），尽管这种组合会由于对路由器准确性的增加敏感性而降低因果推断的性能。这些结果表明MoMa有望显著提高混合模态、早期融合语言模型预训练的效率，为更具资源效率和能力的多模态人工智能系统铺平道路。

更新时间: 2024-07-31 17:46:51

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21770v1

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Process Mining offers a powerful framework for uncovering, analyzing, and optimizing real-world business processes. Petri nets provide a versatile means of modeling process behavior. However, traditional methods often struggle to effectively compare complex Petri nets, hindering their potential for process enhancement. To address this challenge, we introduce PetriNet2Vec, an unsupervised methodology inspired by Doc2Vec. This approach converts Petri nets into embedding vectors, facilitating the comparison, clustering, and classification of process models. We validated our approach using the PDC Dataset, comprising 96 diverse Petri net models. The results demonstrate that PetriNet2Vec effectively captures the structural properties of process models, enabling accurate process classification and efficient process retrieval. Specifically, our findings highlight the utility of the learned embeddings in two key downstream tasks: process classification and process retrieval. In process classification, the embeddings allowed for accurate categorization of process models based on their structural properties. In process retrieval, the embeddings enabled efficient retrieval of similar process models using cosine distance. These results demonstrate the potential of PetriNet2Vec to significantly enhance process mining capabilities.

Updated: 2024-07-31 17:19:34

标题: Process Mining Embeddings: 学习 Petri 网络的向量表示

摘要: 过程挖掘为揭示、分析和优化现实世界业务流程提供了一个强大的框架。Petri 网提供了建模流程行为的多功能手段。然而，传统方法常常很难有效比较复杂的 Petri 网，从而阻碍了它们对流程增强的潜力。为了解决这一挑战，我们引入了 PetriNet2Vec，这是受到 Doc2Vec 启发的一种无监督方法。这种方法将 Petri 网转换为嵌入向量，有助于比较、聚类和分类流程模型。我们使用包含 96 个不同 Petri 网模型的 PDC 数据集来验证我们的方法。结果表明，PetriNet2Vec 能有效捕捉流程模型的结构特性，实现准确的流程分类和高效的流程检索。具体来说，我们的研究结果突出了学习嵌入在两个关键下游任务中的实用性：流程分类和流程检索。在流程分类中，嵌入允许根据结构特性准确对流程模型进行分类。在流程检索中，嵌入使用余弦距离实现了对相似流程模型的高效检索。这些结果展示了 PetriNet2Vec 显著增强流程挖掘能力的潜力。

更新时间: 2024-07-31 17:19:34

领域: cs.AI

下载: http://arxiv.org/abs/2404.17129v3

Characterizing User Archetypes and Discussions on Scored.co

In recent years, the proliferation of social platforms has drastically transformed the way individuals interact, organize, and share information. In this scenario, we experience an unprecedented increase in the scale and complexity of interactions and, at the same time, little to no research about some fringe social platforms. In this paper, we present a multi-dimensional framework for characterizing nodes and hyperedges in social hypernetworks, with a focus on the understudied alt-right platform Scored.co. Our approach integrates the possibility of studying higher-order interactions, thanks to the hypernetwork representation, and various node features such as user activity, sentiment, and toxicity, with the aim to define distinct user archetypes and understand their roles within the network. Utilizing a comprehensive dataset from Scored.co, we analyze the dynamics of these archetypes over time and explore their interactions and influence within the community. The framework's versatility allows for detailed analysis of both individual user behaviors and broader social structures. Our findings highlight the importance of higher-order interactions in understanding social dynamics, offering new insights into the roles and behaviors that emerge in complex online environments.

Updated: 2024-07-31 17:18:25

标题: 描述用户原型和关于Scored.co的讨论

摘要: 近年来，社交平台的泛滥已经彻底改变了个人互动、组织和信息分享的方式。在这种情况下，我们经历了规模和复杂性的互动的前所未有的增加，同时对一些边缘社交平台几乎没有研究。本文提出了一个多维框架，用于表征社交超网络中的节点和超边，重点关注少有研究的右翼替代平台Scored.co。我们的方法整合了通过超网络表示来研究高阶互动的可能性，以及各种节点特征，如用户活动、情感和毒性，旨在定义不同的用户原型并了解他们在网络中的角色。利用来自Scored.co的全面数据集，我们分析了这些原型随时间的动态变化，并探讨了他们在社区内的互动和影响。该框架的多功能性允许对个体用户行为和更广泛社会结构进行详细分析。我们的研究结果突显了理解社交动态中高阶互动的重要性，为了解复杂在线环境中出现的角色和行为提供了新的见解。

更新时间: 2024-07-31 17:18:25

领域: cs.SI,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.21753v1

Temporal Subspace Clustering for Molecular Dynamics Data

We introduce MOSCITO (MOlecular Dynamics Subspace Clustering with Temporal Observance), a subspace clustering for molecular dynamics data. MOSCITO groups those timesteps of a molecular dynamics trajectory together into clusters in which the molecule has similar conformations. In contrast to state-of-the-art methods, MOSCITO takes advantage of sequential relationships found in time series data. Unlike existing work, MOSCITO does not need a two-step procedure with tedious post-processing, but directly models essential properties of the data. Interpreting clusters as Markov states allows us to evaluate the clustering performance based on the resulting Markov state models. In experiments on 60 trajectories and 4 different proteins, we show that the performance of MOSCITO achieves state-of-the-art performance in a novel single-step method. Moreover, by modeling temporal aspects, MOSCITO obtains better segmentation of trajectories, especially for small numbers of clusters.

Updated: 2024-07-31 17:13:34

标题: 分子动力学数据的时空子空间聚类

摘要: 我们介绍了一种用于分子动力学数据的子空间聚类方法MOSCITO（MOlecular Dynamics Subspace Clustering with Temporal Observance）。MOSCITO将分子动力学轨迹中的时间步骤分组到具有相似构象的簇中。与现有技术相比，MOSCITO利用了时间序列数据中的顺序关系。与现有方法不同，MOSCITO不需要繁琐的后处理的两步过程，而是直接建模了数据的基本属性。将簇解释为马尔可夫状态使我们能够基于生成的马尔可夫状态模型评估聚类性能。在对60条轨迹和4种不同蛋白质进行的实验中，我们展示了MOSCITO的性能达到了新型单步方法的最先进性能。此外，通过建模时间方面，MOSCITO获得了更好的轨迹分割，尤其是对于较少数量的簇。

更新时间: 2024-07-31 17:13:34

领域: cs.LG,cs.IR,physics.chem-ph,I.5.3; H.3.3; J.2

下载: http://arxiv.org/abs/2408.00056v1

Diagnostic Runtime Monitoring with Martingales

Machine learning systems deployed in safety-critical robotics settings must be robust to distribution shifts. However, system designers must understand the cause of a distribution shift in order to implement the appropriate intervention or mitigation strategy and prevent system failure. In this paper, we present a novel framework for diagnosing distribution shifts in a streaming fashion by deploying multiple stochastic martingales simultaneously. We show that knowledge of the underlying cause of a distribution shift can lead to proper interventions over the lifecycle of a deployed system. Our experimental framework can easily be adapted to different types of distribution shifts, models, and datasets. We find that our method outperforms existing work on diagnosing distribution shifts in terms of speed, accuracy, and flexibility, and validate the efficiency of our model in both simulated and live hardware settings.

Updated: 2024-07-31 17:05:10

标题: 用马丁格尔监控进行诊断运行时监测

摘要: 在安全关键的机器人设置中部署的机器学习系统必须对分布偏移具有鲁棒性。然而，系统设计者必须了解分布偏移的原因，以实施适当的干预或缓解策略，并防止系统故障。在本文中，我们提出了一种新颖的框架，通过同时部署多个随机鞅，以流式方式诊断分布偏移。我们展示了对分布偏移根本原因的了解可以引导在部署系统的整个生命周期中采取适当的干预措施。我们的实验框架可以轻松地适应不同类型的分布偏移、模型和数据集。我们发现，我们的方法在速度、准确性和灵活性方面优于现有的诊断分布偏移的工作，并验证了我们的模型在模拟和实际硬件环境中的效率。

更新时间: 2024-07-31 17:05:10

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.21748v1

HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image and text data, such approaches have not yet been explored for graph data. Unlike Euclidean data, graph data exhibits greater diversity but lower robustness to perturbations, complicating the integration of outliers. To tackle these challenges, we propose the introduction of \textbf{H}ybrid External and Internal \textbf{G}raph \textbf{O}utlier \textbf{E}xposure (HGOE) to improve graph OOD detection performance. Our framework involves using realistic external graph data from various domains and synthesizing internal outliers within ID subgroups to address the poor robustness and presence of OOD samples within the ID class. Furthermore, we develop a boundary-aware OE loss that adaptively assigns weights to outliers, maximizing the use of high-quality OOD samples while minimizing the impact of low-quality ones. Our proposed HGOE framework is model-agnostic and designed to enhance the effectiveness of existing graph OOD detection models. Experimental results demonstrate that our HGOE framework can significantly improve the performance of existing OOD detection models across all 8 real datasets.

Updated: 2024-07-31 16:55:18

标题: HGOE：用于图形外部分布检测的混合外部和内部图形异常值暴露

摘要: 随着深度图学习的不断进步，图数据的离群检测(out-of-distribution, OOD)已经成为一个关键挑战。虽然辅助数据集在增强图像和文本数据的OOD检测方面已经得到了广泛研究，但这样的方法尚未在图数据中得到探讨。与欧几里得数据不同，图数据表现出更大的多样性但对扰动的鲁棒性较低，这使得异常值的整合变得复杂。为了应对这些挑战，我们提出引入混合外部和内部图离群暴露(Hybrid External and Internal Graph Outlier Exposure, HGOE)来提高图OOD检测性能。我们的框架涉及使用来自各个领域的真实外部图数据，并合成内部ID子组中的异常值，以解决鲁棒性不佳和ID类中OOD样本的存在问题。此外，我们开发了一种边界感知的OE损失，自适应地为异常值分配权重，最大化高质量OOD样本的使用，同时最小化低质量样本的影响。我们提出的HGOE框架是与模型无关的，旨在增强现有图OOD检测模型的有效性。实验结果表明，我们的HGOE框架可以显著提高现有OOD检测模型在所有8个真实数据集上的性能。

更新时间: 2024-07-31 16:55:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21742v1

Contrastive Factor Analysis

Factor analysis, often regarded as a Bayesian variant of matrix factorization, offers superior capabilities in capturing uncertainty, modeling complex dependencies, and ensuring robustness. As the deep learning era arrives, factor analysis is receiving less and less attention due to their limited expressive ability. On the contrary, contrastive learning has emerged as a potent technique with demonstrated efficacy in unsupervised representational learning. While the two methods are different paradigms, recent theoretical analysis has revealed the mathematical equivalence between contrastive learning and matrix factorization, providing a potential possibility for factor analysis combined with contrastive learning. Motivated by the interconnectedness of contrastive learning, matrix factorization, and factor analysis, this paper introduces a novel Contrastive Factor Analysis framework, aiming to leverage factor analysis's advantageous properties within the realm of contrastive learning. To further leverage the interpretability properties of non-negative factor analysis, which can learn disentangled representations, contrastive factor analysis is extended to a non-negative version. Finally, extensive experimental validation showcases the efficacy of the proposed contrastive (non-negative) factor analysis methodology across multiple key properties, including expressiveness, robustness, interpretability, and accurate uncertainty estimation.

Updated: 2024-07-31 16:52:00

标题: 对比因素分析

摘要: 因子分析通常被视为贝叶斯变体的矩阵因子分解，具有优越的捕捉不确定性、建模复杂依赖关系和确保稳健性的能力。随着深度学习时代的到来，由于其有限的表达能力，因子分析受到的关注越来越少。相反，对比学习已经成为一种有效的技术，已经证明在无监督表示学习中具有有效性。虽然这两种方法是不同的范式，但最近的理论分析揭示了对比学习和矩阵因子分解之间的数学等价性，为因子分析与对比学习相结合提供了潜在可能性。受到对比学习、矩阵因子分解和因子分析的相互联系的启发，本文介绍了一种新颖的对比因子分析框架，旨在利用因子分析在对比学习领域的有利特性。为了进一步利用非负因子分析的可解释性特性，该对比因子分析被扩展为非负版本。最后，广泛的实验验证展示了所提出的对比（非负）因子分析方法在多个关键属性上的有效性，包括表达能力、稳健性、可解释性和准确的不确定性估计。

更新时间: 2024-07-31 16:52:00

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.21740v1

Post-Quantum Cryptography (PQC) Network Instrument: Measuring PQC Adoption Rates and Identifying Migration Pathways

The problem of adopting quantum-resistant cryptographic network protocols or post-quantum cryptography (PQC) is critically important to democratizing quantum computing. The problem is urgent because practical quantum computers will break classical encryption in the next few decades. Past encrypted data has already been collected and can be decrypted in the near future. The main challenges of adopting post-quantum cryptography lie in algorithmic complexity and hardware/software/network implementation. The grand question of how existing cyberinfrastructure will support post-quantum cryptography remains unanswered. This paper describes: i) the design of a novel Post-Quantum Cryptography (PQC) network instrument placed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and a part of the FABRIC testbed; ii) the latest results on PQC adoption rate across a wide spectrum of network protocols (Secure Shell -- SSH, Transport Layer Security -- TLS, etc.); iii) the current state of PQC implementation in key scientific applications (e.g., OpenSSH or SciTokens); iv) the challenges of being quantum-resistant; and v) discussion of potential novel attacks. This is the first large-scale measurement of PQC adoption at national-scale supercomputing centers and FABRIC testbeds. Our results show that only OpenSSH and Google Chrome have successfully implemented PQC and achieved an initial adoption rate of 0.029\% (6,044 out of 20,556,816) for OpenSSH connections at NCSA coming from major Internet Service Providers or Autonomous Systems (ASes) such as OARNET, GTT, Google Fiber Webpass (U.S.) and Uppsala Lans Landsting (Sweden), with an overall increasing adoption rate year-over-year for 2023-2024. Our analyses identify pathways to migrate current applications to be quantum-resistant.

Updated: 2024-07-31 16:48:40

标题: 后量子密码学（PQC）网络仪器：测量PQC采用率并确定迁移路径

摘要: 采用量子抗性加密网络协议或后量子密码学（PQC）的问题对于民主化量子计算至关重要。这个问题紫晚紧迫，因为实用的量子计算机将在未来几十年内破解经典加密。过去加密的数据已经被收集，可以在不久的将来被解密。采用后量子密码学的主要挑战在于算法复杂性和硬件/软件/网络实施。现有的网络基础设施将如何支持后量子密码学这一重大问题尚未得到解答。本文描述了：i) 在伊利诺伊大学厄巴纳-香槟分校国家超级计算应用中心（NCSA）设立的一种新型后量子密码学（PQC）网络仪器的设计，并且是FABRIC实验平台的一部分；ii) 在各种网络协议（如安全外壳SSH、传输层安全TLS等）中PQC采用率的最新结果；iii) 关键科学应用程序（例如OpenSSH或SciTokens）中PQC实施的当前状态；iv) 量子抗性的挑战；v）潜在新攻击的讨论。这是在国家级超级计算中心和FABRIC实验平台上对PQC采用的首次大规模测量。我们的结果显示，只有OpenSSH和Google Chrome成功实施了PQC，并在NCSA来自主要互联网服务提供商或自治系统（AS）（如OARNET、GTT、Google Fiber Webpass（美国）和乌普萨拉兰斯兰斯廷（瑞典））的OpenSSH连接中实现了初始采用率为0.029％（20,556,816中的6,044），整体采用率逐年增加，为2023-2024年。我们的分析确定了将当前应用程序迁移到量子抗性的途径。

更新时间: 2024-07-31 16:48:40

领域: cs.NI,cs.CR,quant-ph

下载: http://arxiv.org/abs/2408.00054v1

A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation

Adapting foundation models for medical image analysis requires finetuning them on a considerable amount of data because of extreme distribution shifts between natural (source) data used for pretraining and medical (target) data. However, collecting task-specific medical data for such finetuning at a central location raises many privacy concerns. Although Federated learning (FL) provides an effective means for training on private decentralized data, communication costs in federating large foundation models can quickly become a significant bottleneck, impacting the solution's scalability. In this work, we address this problem of efficient communication while ensuring effective learning in FL by combining the strengths of Parameter-Efficient Fine-tuning (PEFT) with FL. Specifically, we study plug-and-play Low-Rank Adapters (LoRA) in a federated manner to adapt the Segment Anything Model (SAM) for 3D medical image segmentation. Unlike prior works that utilize LoRA and finetune the entire decoder, we critically analyze the contribution of each granular component of SAM on finetuning performance. Thus, we identify specific layers to be federated that are very efficient in terms of communication cost while producing on-par accuracy. Our experiments show that retaining the parameters of the SAM model (including most of the decoder) in their original state during adaptation is beneficial because fine-tuning on small datasets tends to distort the inherent capabilities of the underlying foundation model. On Fed-KiTS, our approach decreases communication cost (~48x) compared to full fine-tuning while increasing performance (~6% Dice score) in 3D segmentation tasks. Our approach performs similar to SAMed while achieving ~2.8x reduction in communication and parameters to be finetuned. We further validate our approach with experiments on Fed-IXI and Prostate MRI datasets.

Updated: 2024-07-31 16:48:06

标题: 一种适用于参数高效微调三维分割中SAM的联邦学习友好方法

摘要: 将基础模型调整为医学图像分析需要在大量数据上进行微调，因为用于预训练的自然（源）数据和医学（目标）数据之间存在极端的分布转变。然而，在集中位置收集用于此类微调的特定任务医学数据引发了许多隐私问题。虽然联邦学习（FL）为在私有分散数据上进行训练提供了有效手段，但在联合大型基础模型时通信成本可能很快成为一个重要瓶颈，影响解决方案的可扩展性。在这项工作中，我们通过将参数高效微调（PEFT）与FL相结合，解决了这个在FL中确保有效学习的高效通信问题。具体而言，我们以联邦方式研究插拔式低秩适配器（LoRA），以适应三维医学图像分割的Segment Anything Model（SAM）。与以往利用LoRA并微调整个解码器的工作不同，我们对SAM的每个细粒度组件对微调性能的贡献进行了批判性分析。因此，我们确定了需要在联邦中进行的特定层，这些层在通信成本方面非常高效，同时产生相当的精确度。我们的实验证明，在适应过程中保留SAM模型的参数（包括大部分解码器）处于其原始状态是有益的，因为在小数据集上微调往往会扭曲基础模型的固有能力。在Fed-KiTS上，我们的方法在三维分割任务中降低了通信成本（约48倍），同时提高了性能（约6% Dice分数）。我们的方法表现类似于SAMed，同时实现了通信和需要微调参数的约2.8倍减少。我们进一步通过在Fed-IXI和前列腺MRI数据集上进行实验证实了我们的方法。

更新时间: 2024-07-31 16:48:06

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.21739v1

Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

Self-supervised learning (SSL) methods are popular since they can address situations with limited annotated data by directly utilising the underlying data distribution. However, the adoption of such methods is not explored enough in ultrasound (US) imaging, especially for fetal assessment. We investigate the potential of dual-encoder SSL in utilizing unlabelled US video data to improve the performance of challenging downstream Standard Fetal Cardiac Planes (SFCP) classification using limited labelled 2D US images. We study 7 SSL approaches based on reconstruction, contrastive loss, distillation, and information theory and evaluate them extensively on a large private US dataset. Our observations and findings are consolidated from more than 500 downstream training experiments under different settings. Our primary observation shows that for SSL training, the variance of the dataset is more crucial than its size because it allows the model to learn generalisable representations, which improve the performance of downstream tasks. Overall, the BarlowTwins method shows robust performance, irrespective of the training settings and data variations, when used as an initialisation for downstream tasks. Notably, full fine-tuning with 1% of labelled data outperforms ImageNet initialisation by 12% in F1-score and outperforms other SSL initialisations by at least 4% in F1-score, thus making it a promising candidate for transfer learning from US video to image data.

Updated: 2024-07-31 16:47:21

标题: 利用自监督学习对超声扫描视频进行胎儿心脏平面分类

摘要: 自监督学习（SSL）方法很受欢迎，因为它们可以通过直接利用底层数据分布来解决标注数据有限的情况。然而，在超声（US）成像领域，特别是在胎儿评估方面，对这种方法的采用还没有得到足够的探索。我们研究了双编码器SSL在利用未标记US视频数据来改善有限标记2D US图像下游标准胎儿心脏平面（SFCP）分类的性能的潜力。我们研究了基于重建、对比损失、蒸馏和信息理论的7种SSL方法，并在一个大型私人US数据集上广泛评估它们。我们的观察和发现来自于超过500个不同设置下的下游训练实验。我们的主要观察显示，对于SSL训练，数据集的变异性比其大小更为关键，因为这使模型能够学习可泛化的表示，从而提高下游任务的性能。总体而言，BarlowTwins方法表现出稳健的性能，无论在何种训练设置和数据变化下使用作为下游任务的初始化。值得注意的是，使用1%的标记数据进行全微调优于ImageNet初始化12%的F1分数，并且优于其他SSL初始化至少4%的F1分数，因此使其成为从US视频到图像数据的迁移学习的有希望的候选方法。

更新时间: 2024-07-31 16:47:21

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.21738v1

Areas of Improvement for Autonomous Vehicles: A Machine Learning Analysis of Disengagement Reports

Since 2014, the California Department of Motor Vehicles (CDMV) has compiled information from manufacturers of autonomous vehicles (AVs) regarding factors that lead to the disengagement from autonomous driving mode in these vehicles. These disengagement reports (DRs) contain information detailing whether the AV disengaged from autonomous mode due to technology failure, manual override, or other factors during driving tests. This paper presents a machine learning (ML) based analysis of the information from the 2023 DRs. We use a natural language processing (NLP) approach to extract important information from the description of a disengagement, and use the k-Means clustering algorithm to group report entries together. The cluster frequency is then analyzed, and each cluster is manually categorized based on the factors leading to disengagement. We discuss findings from previous years' DRs, and provide our own analysis to identify areas of improvement for AVs.

Updated: 2024-07-31 16:36:10

标题: 自动驾驶车辆改进领域：对失效报告的机器学习分析

摘要: 自2014年以来，加利福尼亚机动车管理局（CDMV）已经从自动驾驶汽车（AVs）制造商那里收集了关于导致这些车辆退出自动驾驶模式的因素的信息。这些退出报告（DRs）包含了详细信息，说明AV在驾驶测试过程中是由于技术故障、手动覆盖或其他因素退出自动模式。本文采用基于机器学习（ML）的方法对2023年的DRs信息进行分析。我们使用自然语言处理（NLP）方法从退出描述中提取重要信息，并使用k-Means聚类算法将报告条目分组在一起。然后对聚类频率进行分析，并根据导致退出的因素手动对每个聚类进行分类。我们讨论了以前年份的DRs的研究结果，并提供我们自己的分析，以确定AVs的改进领域。

更新时间: 2024-07-31 16:36:10

领域: cs.AI

下载: http://arxiv.org/abs/2408.00051v1

Algorithms for Collaborative Machine Learning under Statistical Heterogeneity

Learning from distributed data without accessing them is undoubtedly a challenging and non-trivial task. Nevertheless, the necessity for distributed training of a statistical model has been increasing, due to the privacy concerns of local data owners and the cost in centralizing the massively distributed data. Federated learning (FL) is currently the de facto standard of training a machine learning model across heterogeneous data owners, without leaving the raw data out of local silos. Nevertheless, several challenges must be addressed in order for FL to be more practical in reality. Among these challenges, the statistical heterogeneity problem is the most significant and requires immediate attention. From the main objective of FL, three major factors can be considered as starting points -- \textit{parameter}, textit{mixing coefficient}, and \textit{local data distributions}. In alignment with the components, this dissertation is organized into three parts. In Chapter II, a novel personalization method, \texttt{SuPerFed}, inspired by the mode-connectivity is introduced. In Chapter III, an adaptive decision-making algorithm, \texttt{AAggFF}, is introduced for inducing uniform performance distributions in participating clients, which is realized by online convex optimization framework. Finally, in Chapter IV, a collaborative synthetic data generation method, \texttt{FedEvg}, is introduced, leveraging the flexibility and compositionality of an energy-based modeling approach. Taken together, all of these approaches provide practical solutions to mitigate the statistical heterogeneity problem in data-decentralized settings, paving the way for distributed systems and applications using collaborative machine learning methods.

Updated: 2024-07-31 16:32:34

标题: 协作机器学习在统计异质性下的算法

摘要: 在不访问分布式数据的情况下学习无疑是一项具有挑战性且非平凡的任务。然而，由于本地数据所有者的隐私担忧和集中大规模分布式数据的成本，对统计模型进行分布式训练的必要性不断增加。联邦学习（FL）目前是跨异构数据所有者训练机器学习模型的事实标准，而不将原始数据留在本地存储中。然而，为了使FL在现实中更加实用，必须解决几个挑战。在这些挑战中，统计异质性问题是最重要的，需要立即关注。从FL的主要目标出发，可以考虑三个主要因素作为起点-- 参数、混合系数和本地数据分布。为了与这些组件保持一致，这篇论文分为三个部分。在第二章中，引入了一种受模态连接性启发的新颖个性化方法\texttt{SuPerFed}。在第三章中，引入了一种自适应决策算法\texttt{AAggFF}，用于在参与客户端中引入均匀性能分布，实现了在线凸优化框架。最后，在第四章中，引入了一种协作合成数据生成方法\texttt{FedEvg}，利用基于能量的建模方法的灵活性和组成性。综合考虑，所有这些方法提供了实用的解决方案，以减轻数据分散设置中的统计异质性问题，为使用协作机器学习方法的分布式系统和应用铺平了道路。

更新时间: 2024-07-31 16:32:34

领域: stat.ML,cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.00050v1

ParLS-PBO: A Parallel Local Search Solver for Pseudo Boolean Optimization

As a broadly applied technique in numerous optimization problems, recently, local search has been employed to solve Pseudo-Boolean Optimization (PBO) problem. A representative local search solver for PBO is LSPBO. In this paper, firstly, we improve LSPBO by a dynamic scoring mechanism, which dynamically strikes a balance between score on hard constraints and score on the objective function. Moreover, on top of this improved LSPBO , we develop the first parallel local search PBO solver. The main idea is to share good solutions among different threads to guide the search, by maintaining a pool of feasible solutions. For evaluating solutions when updating the pool, we propose a function that considers both the solution quality and the diversity of the pool. Furthermore, we calculate the polarity density in the pool to enhance the scoring function of local search. Our empirical experiments show clear benefits of the proposed parallel approach, making it competitive with the parallel version of the famous commercial solver Gurobi.

Updated: 2024-07-31 16:30:04

标题: ParLS-PBO：一种用于伪布尔优化的并行局部搜索求解器

摘要: 作为许多优化问题中广泛应用的一种技术，最近，局部搜索已被用来解决伪布尔优化（PBO）问题。代表性的PBO局部搜索求解器是LSPBO。在本文中，首先，我们通过动态评分机制改进了LSPBO，该机制动态平衡了对硬约束条件和目标函数的评分。此外，在这个改进的LSPBO基础上，我们开发了第一个并行局部搜索PBO求解器。主要思想是通过维护一组可行解来在不同线程之间共享好的解以引导搜索。在更新解组时评估解时，我们提出了一个考虑解质量和解组多样性的函数。此外，我们计算解组中的极性密度以增强局部搜索的评分函数。我们的实证实验显示了提出的并行方法的明显优势，使其与著名商业求解器Gurobi的并行版本竞争力十足。

更新时间: 2024-07-31 16:30:04

领域: cs.AI

下载: http://arxiv.org/abs/2407.21729v1

Artificial Intelligence Approaches for Energy Efficiency: A Review

United Nations set Sustainable Development Goals and this paper focuses on 7th (Affordable and Clean Energy), 9th (Industries, Innovation and Infrastructure), and 13th (Climate Action) goals. Climate change is a major concern in our society; for this reason, a current global objective is to reduce energy waste. This work summarizes all main approaches towards energy efficiency using Artificial Intelligence with a particular focus on multi-agent systems to create smart buildings. It mentions the tight relationship between AI, especially IoT, and Big Data. It explains the application of AI to anomaly detection in smart buildings and a possible classification of Intelligent Energy Management Systems: Direct and Indirect. Finally, some drawbacks of AI approaches and some possible future research focuses are proposed.

Updated: 2024-07-31 16:24:52

标题: 人工智能在能源效率方面的应用：一项综述

摘要: 联合国制定了可持续发展目标，本文关注第7（可负担和清洁能源）、第9（产业、创新和基础设施）和第13（气候行动）目标。气候变化是我们社会的一个主要关注点；因此，当前的全球目标是减少能源浪费。这项工作总结了所有主要的能源效率方法，重点介绍了使用人工智能特别是多代理系统创建智能建筑。它提到了人工智能，尤其是物联网和大数据之间的紧密关系。它解释了人工智能在智能建筑中异常检测的应用，以及智能能源管理系统的可能分类：直接和间接。最后，提出了一些人工智能方法的缺点和一些可能的未来研究重点。

更新时间: 2024-07-31 16:24:52

领域: cs.AI

下载: http://arxiv.org/abs/2407.21726v1

Stable Audio Open

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

Updated: 2024-07-31 16:22:42

标题: 稳定音频开放

摘要: 开放式生成模型对社区至关重要，可以进行微调，并在提出新模型时作为基准。然而，大多数当前的文本到音频模型是私有的，无法供艺术家和研究人员构建。在这里，我们描述了一个新的开放权重文本到音频模型的架构和训练过程，使用了创意共享数据进行训练。我们的评估表明，该模型的性能在各种指标上与最先进的模型竞争力相当。值得注意的是，报告的FDopenl3结果（衡量生成物的逼真程度）展示了其在44.1kHz高质量立体声合成声音方面的潜力。

更新时间: 2024-07-31 16:22:42

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.14358v2

Optimal Decision Tree and Adaptive Submodular Ranking with Noisy Outcomes

In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.

Updated: 2024-07-31 16:20:51

标题: 最佳决策树和带有噪声结果的自适应次模排名

摘要: 在基于池的主动学习中，学习者被给定一个未标记的数据集，并旨在通过查询数据点的标签来有效地学习未知的假设。这可以被表述为经典的最优决策树（ODT）问题：给定一组测试、一组假设以及每对测试和假设的结果，我们的目标是找到一个低成本的测试过程（即决策树），以识别真实的假设。这个优化问题在假设每个测试生成确定性结果的情况下得到了广泛研究。然而，在许多应用中，例如临床试验，结果可能是不确定的，这使得确定性设置中的想法无效。在这项工作中，我们研究了ODT问题的一个基本变体，其中一些测试结果是有噪音的，甚至在更一般的情况下，噪音是持久的，即重复进行测试会产生相同的有噪音输出。我们的近似算法提供了几乎最佳可能的保证，并适用于每个测试或每个假设的大量有噪音结果的一般情况，其中性能随着这个数量的增加而持续下降。我们对我们的算法进行了数值评估，用于识别有毒化学品和学习线性分类器，并观察到我们的算法的成本非常接近信息理论最小值。

更新时间: 2024-07-31 16:20:51

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.15357v2

A Survey on Self-Supervised Graph Foundation Models: Knowledge-Based Perspective

Graph self-supervised learning (SSL) is now a go-to method for pre-training graph foundation models (GFMs). There is a wide variety of knowledge patterns embedded in the graph data, such as node properties and clusters, which are crucial to learning generalized representations for GFMs. However, existing surveys of GFMs have several shortcomings: they lack comprehensiveness regarding the most recent progress, have unclear categorization of self-supervised methods, and take a limited architecture-based perspective that is restricted to only certain types of graph models. As the ultimate goal of GFMs is to learn generalized graph knowledge, we provide a comprehensive survey of self-supervised GFMs from a novel knowledge-based perspective. We propose a knowledge-based taxonomy, which categorizes self-supervised graph models by the specific graph knowledge utilized. Our taxonomy consists of microscopic (nodes, links, etc.), mesoscopic (context, clusters, etc.), and macroscopic knowledge (global structure, manifolds, etc.). It covers a total of 9 knowledge categories and more than 25 pretext tasks for pre-training GFMs, as well as various downstream task generalization strategies. Such a knowledge-based taxonomy allows us to re-examine graph models based on new architectures more clearly, such as graph language models, as well as provide more in-depth insights for constructing GFMs.

Updated: 2024-07-31 16:16:12

标题: 一个关于自监督图基础模型的调查：基于知识的视角

摘要: 图形自监督学习（SSL）现在是预训练图形基础模型（GFMs）的首选方法。图形数据中嵌入了各种知识模式，如节点属性和簇，对于学习GFMs的广义表示至关重要。然而，现有的GFMs调查存在一些缺点：它们缺乏关于最新进展的全面性，对自监督方法的分类不清晰，并且采用了一种受限于特定类型图形模型的有限架构视角。由于GFMs的最终目标是学习广义图形知识，我们提供了一个从新的基于知识的角度对自监督GFMs进行全面调查。我们提出了一个基于知识的分类法，通过利用特定的图形知识对自监督图形模型进行分类。我们的分类法包括微观（节点，链接等），中观（上下文，簇等）和宏观知识（全局结构，流形等）。它涵盖了总共9个知识类别和25多个用于预训练GFMs的前置任务，以及各种下游任务泛化策略。这样一个基于知识的分类法使我们能够更清晰地重新审视基于新架构的图形模型，如图形语言模型，并为构建GFMs提供更深入的见解。

更新时间: 2024-07-31 16:16:12

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2403.16137v2

Open-Vocabulary Audio-Visual Semantic Segmentation

Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking the generalization ability to detect novel categories in practical applications. In this paper, we introduce a new task: open-vocabulary audio-visual semantic segmentation, extending AVSS task to open-world scenarios beyond the annotated label space. This is a more challenging task that requires recognizing all categories, even those that have never been seen nor heard during training. Moreover, we propose the first open-vocabulary AVSS framework, OV-AVSS, which mainly consists of two parts: 1) a universal sound source localization module to perform audio-visual fusion and locate all potential sounding objects and 2) an open-vocabulary classification module to predict categories with the help of the prior knowledge from large-scale pre-trained vision-language models. To properly evaluate the open-vocabulary AVSS, we split zero-shot training and testing subsets based on the AVSBench-semantic benchmark, namely AVSBench-OV. Extensive experiments demonstrate the strong segmentation and zero-shot generalization ability of our model on all categories. On the AVSBench-OV dataset, OV-AVSS achieves 55.43% mIoU on base categories and 29.14% mIoU on novel categories, exceeding the state-of-the-art zero-shot method by 41.88%/20.61% and open-vocabulary method by 10.2%/11.6%. The code is available at https://github.com/ruohaoguo/ovavss.

Updated: 2024-07-31 16:14:09

标题: 开放词汇的音频-视觉语义分割

摘要: 音频视觉语义分割（AVSS）旨在利用声学线索在视频中分割和分类声音对象。然而，大多数方法基于封闭集假设操作，并且只能从训练数据中识别预定义的类别，缺乏在实际应用中检测新颖类别的泛化能力。本文介绍了一个新任务：开放词汇音频视觉语义分割，将AVSS任务扩展到超出注释标签空间的开放世界场景。这是一个更具挑战性的任务，需要识别所有类别，即使这些类别在训练期间从未被看到或听到过。此外，我们提出了第一个开放词汇AVSS框架OV-AVSS，主要包括两个部分：1）通用声源定位模块，用于执行音频-视觉融合并定位所有潜在的声音对象，2）开放词汇分类模块，借助大规模预训练视觉语言模型的先验知识来预测类别。为了正确评估开放词汇AVSS，我们基于AVSBench-semantic基准将零样本训练和测试子集进行了拆分，即AVSBench-OV。大量实验证明了我们模型在所有类别上的强大分割和零样本泛化能力。在AVSBench-OV数据集上，OV-AVSS在基本类别上达到了55.43%的mIoU，在新颖类别上达到了29.14%的mIoU，超过了最先进的零样本方法41.88%/20.61%和开放词汇方法10.2%/11.6%。代码可在https://github.com/ruohaoguo/ovavss获取。

更新时间: 2024-07-31 16:14:09

领域: cs.MM,cs.AI

下载: http://arxiv.org/abs/2407.21721v1

Assessing the State of AI Policy

The deployment of artificial intelligence (AI) applications has accelerated rapidly. AI enabled technologies are facing the public in many ways including infrastructure, consumer products and home applications. Because many of these technologies present risks either in the form of physical injury, or bias, potentially yielding unfair outcomes, policy makers must consider the need for oversight. Most policymakers, however, lack the technical knowledge to judge whether an emerging AI technology is safe, effective, and requires oversight, therefore policy makers must depend on expert opinion. But policymakers are better served when, in addition to expert opinion, they have some general understanding of existing guidelines and regulations. This work provides an overview [the landscape] of AI legislation and directives at the international, U.S. state, city and federal levels. It also reviews relevant business standards, and technical society initiatives. Then an overlap and gap analysis are performed resulting in a reference guide that includes recommendations and guidance for future policy making.

Updated: 2024-07-31 16:09:25

标题: 评估人工智能政策的现状

摘要: 人工智能（AI）应用的部署迅速加快。AI技术已经以多种方式面向公众，包括基础设施、消费品和家庭应用。因为许多这些技术存在风险，可能导致身体伤害或偏见，从而产生不公平的结果，政策制定者必须考虑监督的必要性。然而，大多数政策制定者缺乏技术知识来判断新兴AI技术是否安全、有效，并且需要监督，因此政策制定者必须依赖专家意见。但当政策制定者除了专家意见外，还具有一定的现有指导方针和法规的一般了解时，效果更好。本文概述了国际、美国州、城市和联邦级别的AI立法和指令情况。它还审查了相关的商业标准和技术协会的倡议。然后进行了重叠和差距分析，产生了一个包含未来政策制定建议和指导的参考指南。

更新时间: 2024-07-31 16:09:25

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.21717v1

UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora

The abundance of intestinal flora is closely related to human diseases, but diseases are not caused by a single gut microbe. Instead, they result from the complex interplay of numerous microbial entities. This intricate and implicit connection among gut microbes poses a significant challenge for disease prediction using abundance information from OTU data. Recently, several methods have shown potential in predicting corresponding diseases. However, these methods fail to learn the inner association among gut microbes from different hosts, leading to unsatisfactory performance. In this paper, we present a novel architecture, Unsupervised Multi-graph Merge Adversarial Network (UMMAN). UMMAN can obtain the embeddings of nodes in the Multi-Graph in an unsupervised scenario, so that it helps learn the multiplex association. Our method is the first to combine Graph Neural Network with the task of intestinal flora disease prediction. We employ complex relation-types to construct the Original-Graph and disrupt the relationships among nodes to generate corresponding Shuffled-Graph. We introduce the Node Feature Global Integration (NFGI) module to represent the global features of the graph. Furthermore, we design a joint loss comprising adversarial loss and hybrid attention loss to ensure that the real graph embedding aligns closely with the Original-Graph and diverges from the Shuffled-Graph. Comprehensive experiments on five classical OTU gut microbiome datasets demonstrate the effectiveness and stability of our method. (We will release our code soon.)

Updated: 2024-07-31 16:06:43

标题: UMMAN：基于肠道菌群的无监督多图融合对抗网络用于疾病预测

摘要: 肠道菌群的丰度与人类疾病密切相关，但疾病并非由单一肠道微生物引起。相反，它们是由众多微生物实体之间复杂的相互作用所导致的。肠道微生物之间的这种错综复杂的联系对利用OTU数据的丰度信息进行疾病预测构成了重大挑战。最近，一些方法显示出了预测相应疾病的潜力。然而，这些方法未能从不同宿主的肠道微生物之间学习内在的关联，导致性能不佳。在本文中，我们提出了一种新颖的架构，无监督多图合并对抗网络（UMMAN）。UMMAN可以在无监督场景下获得多图中节点的嵌入，从而有助于学习多重关联。我们的方法是第一个将图神经网络与肠道菌群疾病预测任务结合起来的。我们使用复杂的关系类型构建原始图，并干扰节点之间的关系以生成相应的混洗图。我们引入了节点特征全局整合（NFGI）模块来表示图的全局特征。此外，我们设计了包括对抗损失和混合注意力损失的联合损失，以确保真实图嵌入与原始图紧密对齐并与混洗图分离。对五个经典OTU肠道微生物组数据集的全面实验表明了我们方法的有效性和稳定性。（我们将很快发布我们的代码。）

更新时间: 2024-07-31 16:06:43

领域: cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2407.21714v1

Social Learning through Interactions with Other Agents: A Survey

Social learning plays an important role in the development of human intelligence. As children, we imitate our parents' speech patterns until we are able to produce sounds; we learn from them praising us and scolding us; and as adults, we learn by working with others. In this work, we survey the degree to which this paradigm -- social learning -- has been mirrored in machine learning. In particular, since learning socially requires interacting with others, we are interested in how embodied agents can and have utilised these techniques. This is especially in light of the degree to which recent advances in natural language processing (NLP) enable us to perform new forms of social learning. We look at how behavioural cloning and next-token prediction mirror human imitation, how learning from human feedback mirrors human education, and how we can go further to enable fully communicative agents that learn from each other. We find that while individual social learning techniques have been used successfully, there has been little unifying work showing how to bring them together into socially embodied agents.

Updated: 2024-07-31 16:06:34

标题: 与其他代理人互动中的社交学习：一项调查

摘要: 社会学习在人类智力发展中扮演着重要角色。作为孩子，我们模仿父母的语音模式直到我们能够发出声音；我们从他们的表扬和责骂中学习；作为成年人，我们通过与他人合作学习。在这项工作中，我们调查了这种范式——社会学习——在机器学习中的反映程度。特别是因为社会学习需要与他人互动，我们对具体的实体代理如何利用这些技术感兴趣。尤其是考虑到自然语言处理（NLP）的最新进展使我们能够进行新形式的社会学习。我们研究了行为克隆和下一个令牌预测如何反映人类的模仿，如何从人类反馈学习反映人类教育，以及如何进一步实现能够相互学习的完全沟通代理。我们发现，虽然个别的社会学习技术已经被成功应用，但很少有统一的工作展示如何将它们整合到社会实体代理中。

更新时间: 2024-07-31 16:06:34

领域: cs.LG,cs.AI,I.2.7; I.2.0

下载: http://arxiv.org/abs/2407.21713v1

Exact Fractional Inference via Re-Parametrization & Interpolation between Tree-Re-Weighted- and Belief Propagation- Algorithms

The computational complexity of inference -- required to compute the partition function, $Z$, of an Ising model over a graph of $N$''spins" -- is most likely exponential in $N$. Efficient variational methods, such as Belief Propagation (BP) and Tree Re-Weighted (TRW) algorithms, compute $Z$ approximately by minimizing the respective (BP- or TRW-) free energy. We generalize the variational scheme by building a $\lambda$-fractional interpolation, $Z^{(\lambda)}$, where $\lambda=0$ and $\lambda=1$ correspond to TRW- and BP-approximations, respectively. This fractional scheme -- coined Fractional Belief Propagation (FBP) -- guarantees that in the attractive (ferromagnetic) case $Z^{(TRW)} \geq Z^{(\lambda)} \geq Z^{(BP)}$, and there exists a unique (``exact") $\lambda_*$ such that $Z=Z^{(\lambda_*)}$. Generalizing the re-parametrization approach of \citep{wainwright_tree-based_2002} and the loop series approach of \citep{chertkov_loop_2006}, we show how to express $Z$ as a product, $\forall \lambda:\ Z=Z^{(\lambda)}{\tilde Z}^{(\lambda)}$, where the multiplicative correction, ${\tilde Z}^{(\lambda)}$, is an expectation over a node-independent probability distribution built from node-wise fractional marginals. Our theoretical analysis is complemented by extensive experiments with models from Ising ensembles over planar and random graphs of medium- and large-sizes. The empirical study yields a number of interesting observations, such as the ability to estimate ${\tilde Z}^{(\lambda)}$ with $O(N^{2::4})$ fractional samples and suppression of $\lambda_*$ fluctuations with an increase in $N$ for instances from a particular random Ising ensemble. We also verify and discuss the applicability of this approach to the problem of image de-noising.

Updated: 2024-07-31 16:00:23

标题: 通过重新参数化和树重新加权与置信传播算法之间的插值实现精确分数推断

摘要: 推断的计算复杂度——计算Ising模型在包含$N$“自旋”的图上的分区函数$Z$——很可能随着$N$呈指数增长。高效的变分方法，如信念传播（BP）和树重新加权（TRW）算法，通过最小化相应的（BP-或TRW-）自由能来近似计算$Z$。我们通过构建$\lambda$-分数插值$Z^{(\lambda)}$来推广变分方案，其中$\lambda=0$和$\lambda=1$分别对应于TRW-和BP-近似。这种分数方案——被称为分数信念传播（FBP）——保证在吸引（铁磁性）情况下$Z^{(TRW)}\geq Z^{(\lambda)}\geq Z^{(BP)}$，并且存在唯一的（“精确”）$\lambda_*$使得$Z=Z^{(\lambda_*)}$。通过推广\citep{wainwright_tree-based_2002}的重新参数化方法和\citep{chertkov_loop_2006}的环级数方法，我们展示了如何将$Z$表示为一个乘积，对于所有$\lambda$：$Z=Z^{(\lambda)}{\tilde Z}^{(\lambda)}$，其中乘法修正${\tilde Z}^{(\lambda)}$是一个期望，来自基于节点的分数边缘构建的节点独立概率分布。我们的理论分析得到了来自平面和随机中等和大型图的Ising集合模型的广泛实验支持。经验研究产生了许多有趣的发现，比如能够估计${\tilde Z}^{(\lambda)}$，只需$O(N^{2::4})$的分数样本，并且随着$N$增加，来自特定随机Ising集合的实例中$\lambda_*$的波动被抑制。我们还验证并讨论了这种方法在图像去噪问题中的适用性。

更新时间: 2024-07-31 16:00:23

领域: cs.LG,cond-mat.stat-mech

下载: http://arxiv.org/abs/2301.10369v3

CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature

Ontologies are formal representations of knowledge in specific domains that provide a structured framework for organizing and understanding complex information. Creating ontologies, however, is a complex and time-consuming endeavor. ChEBI is a well-known ontology in the field of chemistry, which provides a comprehensive resource for defining chemical entities and their properties. However, it covers only a small fraction of the rapidly growing knowledge in chemistry and does not provide references to the scientific literature. To address this, we propose a methodology that involves augmenting existing annotated text corpora with knowledge from Chebi and fine-tuning a large language model (LLM) to recognize chemical entities and their roles in scientific text. Our experiments demonstrate the effectiveness of our approach. By combining ontological knowledge and the language understanding capabilities of LLMs, we achieve high precision and recall rates in identifying both the chemical entities and roles in scientific literature. Furthermore, we extract them from a set of 8,000 ChemRxiv articles, and apply a second LLM to create a knowledge graph (KG) of chemical entities and roles (CEAR), which provides complementary information to ChEBI, and can help to extend it.

Updated: 2024-07-31 15:56:06

标题: CEAR：从科学文献自动构建化学实体和角色知识图谱

摘要: 本文介绍了本体论是特定领域知识的形式化表示，为组织和理解复杂信息提供了结构化框架。然而，创建本体是一个复杂且耗时的工作。ChEBI是化学领域中众所周知的本体，为定义化学实体及其属性提供了全面的资源。然而，它仅覆盖了化学领域迅速增长知识的一小部分，并且没有提供科学文献的参考。为了解决这个问题，我们提出了一种方法，该方法涉及将现有的带有Chebi知识的注释文本语料库与大型语言模型（LLM）相结合，以识别科学文本中的化学实体及其角色。我们的实验证明了我们方法的有效性。通过结合本体知识和LLM的语言理解能力，我们在识别科学文献中的化学实体和角色方面实现了高精度和召回率。此外，我们从8000篇ChemRxiv文章中提取它们，并应用第二个LLM创建一个化学实体和角色（CEAR）知识图，为ChEBI提供补充信息，并有助于扩展它。

更新时间: 2024-07-31 15:56:06

领域: cs.AI

下载: http://arxiv.org/abs/2407.21708v1

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information gathering. How to utilize ToD accurately, efficiently and effectively for information gathering has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can significantly enhance the performance of TOD through fine-tuning. However, current datasets primarily cater to user-led systems and are limited to predefined specific scenarios and slots, thereby necessitating improvements in the proactiveness, diversity, and capabilities of TOD. In this study, we present a detailed multi-domain task-oriented data construction process for conversations, and a Chinese dialogue dataset generated based on this process, \textbf{TransferTOD}, which authentically simulates human-machine dialogues in 30 popular life service scenarios. Leveraging this dataset, we trained a \textbf{TransferTOD-7B} model using full-parameter fine-tuning, showcasing notable abilities in slot filling and questioning. Our work has demonstrated its strong generalization capabilities in various downstream scenarios, significantly enhancing both data utilization efficiency and system performance. The data is released in https://github.com/KongLongGeFDU/TransferTOD.

Updated: 2024-07-31 15:38:15

标题: TransferTOD：一个具有传输能力的可推广的中文多领域任务导向对话系统

摘要: 任务导向对话（TOD）系统旨在有效处理任务导向对话，包括信息收集。如何准确、高效和有效地利用TOD进行信息收集一直是一个关键且具有挑战性的任务。最近的研究表明，大型语言模型（LLMs）在对话、指令生成和推理方面表现出色，并且可以通过微调显著提升TOD的性能。然而，目前的数据集主要面向用户驱动的系统，且仅限于预定义的特定场景和插槽，因此需要改进TOD的主动性、多样性和能力。在本研究中，我们提出了一个详细的多领域任务导向数据构建过程，以及基于该过程生成的一个中文对话数据集\textbf{TransferTOD}，在30个热门生活服务场景中真实模拟人机对话。利用这个数据集，我们训练了一个使用全参数微调的\textbf{TransferTOD-7B}模型，展示了在插槽填充和提问方面显著的能力。我们的工作在各种下游场景中展示了其强大的泛化能力，显著提升了数据利用效率和系统性能。数据发布在https://github.com/KongLongGeFDU/TransferTOD。

更新时间: 2024-07-31 15:38:15

领域: cs.AI

下载: http://arxiv.org/abs/2407.21693v1

Dynamic Object Queries for Transformer-based Incremental Object Detection

Incremental object detection (IOD) aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones. As the training data arrives with annotations only with new classes, IOD suffers from catastrophic forgetting. Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay, ignoring the conflict between limited model capacity and increasing knowledge. In this paper, we explore \textit{dynamic object queries} for incremental object detection built on Transformer architecture. We propose the \textbf{Dy}namic object \textbf{Q}uery-based \textbf{DE}tection \textbf{TR}ansformer (DyQ-DETR), which incrementally expands the model representation ability to achieve stability-plasticity tradeoff. First, a new set of learnable object queries are fed into the decoder to represent new classes. These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well. Second, we propose the isolated bipartite matching for object queries in different phases, based on disentangled self-attention. The interaction among the object queries at different phases is eliminated to reduce inter-class confusion. Thanks to the separate supervision and computation over object queries, we further present the risk-balanced partial calibration for effective exemplar replay. Extensive experiments demonstrate that DyQ-DETR significantly surpasses the state-of-the-art methods, with limited parameter overhead. Code will be made publicly available.

Updated: 2024-07-31 15:29:34

标题: 基于Transformer的增量目标检测的动态对象查询

摘要: 增量目标检测（IOD）旨在顺序学习新类别，同时保持定位和识别旧类别的能力。由于训练数据只带有新类别的注释，IOD 遭受灾难性遗忘。先前的方法主要通过知识蒸馏和示例重放来解决遗忘问题，忽视了有限模型容量和知识增加之间的冲突。在本文中，我们探索基于 Transformer 架构的增量目标检测的动态目标查询。我们提出了基于动态目标查询的检测 Transformer（DyQ-DETR），该模型逐步扩展模型表示能力以实现稳定性-可塑性折衷。首先，将一组新的可学习目标查询输入解码器以表示新类别。这些新的目标查询与先前阶段的查询聚合在一起，以很好地适应旧知识和新知识。其次，我们提出了基于解耦自注意力的不同阶段目标查询的孤立二部匹配。消除在不同阶段的目标查询之间的交互，以减少类间混淆。由于对目标查询的分离监督和计算，我们进一步提出了风险平衡的部分校准以实现有效的示例重放。大量实验证明 DyQ-DETR 显著超越了最先进的方法，且参数开销有限。代码将公开提供。

更新时间: 2024-07-31 15:29:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21687v1

Green Edge AI: A Contemporary Survey

Artificial intelligence (AI) technologies have emerged as pivotal enablers across a multitude of industries largely due to their significant resurgence over the past decade. The transformative power of AI is primarily derived from the utilization of deep neural networks (DNNs), which require extensive data for training and substantial computational resources for processing. Consequently, DNN models are typically trained and deployed on resource-rich cloud servers. However, due to potential latency issues associated with cloud communications, deep learning (DL) workflows are increasingly being transitioned to wireless edge networks in proximity to end-user devices (EUDs). This shift is designed to support latency-sensitive applications and has given rise to a new paradigm of edge AI, which will play a critical role in upcoming sixth-generation (6G) networks to support ubiquitous AI applications. Despite its considerable potential, edge AI faces substantial challenges, mostly due to the dichotomy between the resource limitations of wireless edge networks and the resource-intensive nature of DL. Specifically, the acquisition of large-scale data, as well as the training and inference processes of DNNs, can rapidly deplete the battery energy of EUDs. This necessitates an energy-conscious approach to edge AI to ensure both optimal and sustainable performance. In this paper, we present a contemporary survey on green edge AI. We commence by analyzing the principal energy consumption components of edge AI systems to identify the fundamental design principles of green edge AI. Guided by these principles, we then explore energy-efficient design methodologies for the three critical tasks in edge AI systems, including training data acquisition, edge training, and edge inference. Finally, we underscore potential future research directions to further enhance the energy efficiency of edge AI.

Updated: 2024-07-31 15:17:58

标题: 绿色边缘人工智能：当代调查

摘要: 人工智能（AI）技术已经成为多个行业的关键推动力，主要是由于过去十年中它们的显著复苏。AI的变革力量主要源于深度神经网络（DNN）的利用，它们需要大量数据进行训练和大量计算资源进行处理。因此，DNN模型通常在资源丰富的云服务器上进行训练和部署。然而，由于与云通信相关的潜在延迟问题，深度学习（DL）工作流程越来越多地转移到接近终端用户设备（EUDs）的无线边缘网络。这种转变旨在支持延迟敏感的应用程序，并催生了边缘AI的新范式，将在即将到来的第六代（6G）网络中发挥关键作用，以支持无处不在的AI应用程序。尽管边缘AI具有相当大的潜力，但主要是由于无线边缘网络的资源限制与DL的资源密集型特性之间的二元对立，边缘AI面临着重大挑战。具体来说，大规模数据的获取以及DNN的训练和推理过程可能会迅速耗尽EUDs的电池能量。这需要对边缘AI采取节能的方法，以确保性能既优化又可持续。在本文中，我们提出了一份关于绿色边缘AI的现代调查。我们首先分析边缘AI系统的主要能耗组成部分，以确定绿色边缘AI的基本设计原则。在这些原则的指导下，我们探讨了边缘AI系统中三个关键任务的节能设计方法，包括训练数据获取、边缘训练和边缘推理。最后，我们强调了潜在的未来研究方向，以进一步提高边缘AI的能效。

更新时间: 2024-07-31 15:17:58

领域: cs.AI,cs.IT,cs.NI,math.IT

下载: http://arxiv.org/abs/2312.00333v2

Synthetic Simplicity: Unveiling Bias in Medical Data Augmentation

Synthetic data is becoming increasingly integral in data-scarce fields such as medical imaging, serving as a substitute for real data. However, its inherent statistical characteristics can significantly impact downstream tasks, potentially compromising deployment performance. In this study, we empirically investigate this issue and uncover a critical phenomenon: downstream neural networks often exploit spurious distinctions between real and synthetic data when there is a strong correlation between the data source and the task label. This exploitation manifests as \textit{simplicity bias}, where models overly rely on superficial features rather than genuine task-related complexities. Through principled experiments, we demonstrate that the source of data (real vs.\ synthetic) can introduce spurious correlating factors leading to poor performance during deployment when the correlation is absent. We first demonstrate this vulnerability on a digit classification task, where the model spuriously utilizes the source of data instead of the digit to provide an inference. We provide further evidence of this phenomenon in a medical imaging problem related to cardiac view classification in echocardiograms, particularly distinguishing between 2-chamber and 4-chamber views. Given the increasing role of utilizing synthetic datasets, we hope that our experiments serve as effective guidelines for the utilization of synthetic datasets in model training.

Updated: 2024-07-31 15:14:17

标题: 合成简约：揭示医疗数据增强中的偏见

摘要: 合成数据在数据稀缺领域（如医学成像）中变得越来越重要，作为真实数据的替代品。然而，其固有的统计特性可能会显著影响下游任务，潜在地影响部署性能。在这项研究中，我们经验性地调查了这个问题，并发现了一个关键现象：当数据源与任务标签之间存在强相关性时，下游神经网络经常会利用真实数据和合成数据之间的表面区别。这种利用表现为\textit{简单性偏见}，模型过度依赖表面特征而不是真正与任务相关的复杂性。通过原则性实验，我们证明了数据源（真实 vs. 合成）可能引入虚假的相关因素，在缺乏相关性时导致部署性能不佳。我们首先在一个数字分类任务上展示了这种脆弱性，模型错误地利用数据源而不是数字进行推断。我们在与心脏超声心动图相关的医学成像问题中进一步证实了这一现象，尤其是区分2腔和4腔视图。鉴于合成数据集的使用角色日益增加，我们希望我们的实验能够成为在模型训练中利用合成数据集的有效指南。

更新时间: 2024-07-31 15:14:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21674v1

Universal Approximation Theory: Foundations for Parallelism in Neural Networks

Neural networks are increasingly evolving towards training large models with big data, a method that has demonstrated superior performance across many tasks. However, this approach introduces an urgent problem: current deep learning models are predominantly serial, meaning that as the number of network layers increases, so do the training and inference times. This is unacceptable if deep learning is to continue advancing. Therefore, this paper proposes a deep learning parallelization strategy based on the Universal Approximation Theorem (UAT). From this foundation, we designed a parallel network called Para-Former to test our theory. Unlike traditional serial models, the inference time of Para-Former does not increase with the number of layers, significantly accelerating the inference speed of multi-layer networks. Experimental results validate the effectiveness of this network.

Updated: 2024-07-31 15:13:39

标题: 通用逼近理论：神经网络并行性的基础

摘要: 神经网络越来越倾向于使用大数据训练大型模型，这种方法在许多任务中表现出卓越的性能。然而，这种方法引入了一个紧迫的问题：当前的深度学习模型主要是串行的，也就是说随着网络层数的增加，训练和推断时间也会增加。如果深度学习要继续发展，这是不可接受的。因此，本文提出了一种基于通用逼近定理（UAT）的深度学习并行化策略。基于这一基础，我们设计了一个名为Para-Former的并行网络来测试我们的理论。与传统的串行模型不同，Para-Former的推断时间不随着层数的增加而增加，显著加快了多层网络的推断速度。实验证明了这种网络的有效性。

更新时间: 2024-07-31 15:13:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21670v1

Synth-Empathy: Towards High-Quality Synthetic Empathy Data

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present Synth-Empathy, an LLM-based data generation and quality and diversity selection pipeline that automatically generates high-quality empathetic data while discarding low-quality data. With the data generated from a low empathetic model, we are able to further improve empathetic response performance and achieve state-of-the-art (SoTA) results across multiple benchmarks. Moreover, our model achieves SoTA performance on various human evaluation benchmarks, demonstrating its effectiveness and robustness in real-world applications. Furthermore, we show the trade-off between data quantity and quality, providing insights into empathetic data generation and selection.

Updated: 2024-07-31 15:12:24

标题: 合成共情：朝向高质量合成共情数据

摘要: 在最近几年，随着大型语言模型（LLMs）的快速发展，实现出色的共情回应能力已成为一项至关重要的先决条件。因此，管理和理解共情数据变得日益重要。然而，共情数据通常是人工标记的，导致数据集不足和浪费人力。在这项工作中，我们提出了Synth-Empathy，一个基于LLM的数据生成、质量和多样性选择管道，可以自动生成高质量的共情数据，同时丢弃低质量数据。通过从低共情模型生成的数据，我们能够进一步提高共情回应性能，在多个基准测试中取得最新成果。此外，我们的模型在各种人类评估基准上实现了最新成果，展示了其在实际应用中的有效性和稳健性。此外，我们展示了数据数量和质量之间的权衡，为共情数据生成和选择提供了见解。

更新时间: 2024-07-31 15:12:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21669v1

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

In-context learning refers to the learning ability of a model during inference time without adapting its parameters. The input (i.e., prompt) to the model (e.g., transformers) consists of both a context (i.e., instance-label pairs) and a query instance. The model is then able to output a label for the query instance according to the context during inference. A possible explanation for in-context learning is that the forward pass of (linear) transformers implements iterations of gradient descent on the instance-label pairs in the context. In this paper, we prove by construction that transformers can also implement temporal difference (TD) learning in the forward pass, a phenomenon we refer to as in-context TD. We demonstrate the emergence of in-context TD after training the transformer with a multi-task TD algorithm, accompanied by theoretical analysis. Furthermore, we prove that transformers are expressive enough to implement many other policy evaluation algorithms in the forward pass, including residual gradient, TD with eligibility trace, and average-reward TD.

Updated: 2024-07-31 15:10:28

标题: 变压器学习上下文强化学习中的时差方法

摘要: 上下文学习指的是模型在推断时学习能力，而不需要调整其参数。模型（例如transformers）的输入包括上下文（即实例标签对）和一个查询实例。然后，模型能够根据上下文在推断过程中为查询实例输出一个标签。上下文学习的一个可能解释是，（线性）transformers的前向传播在上下文中实现了对实例标签对的梯度下降迭代。在本文中，我们通过构造证明了transformers也能在前向传播中实现时间差分（TD）学习，我们将这种现象称为上下文TD。我们展示了在用多任务TD算法训练transformer后，上下文TD的出现，同时伴随着理论分析。此外，我们证明了transformers具有足够的表达能力来在前向传播中实现许多其他策略评估算法，包括残差梯度、带有资格迹的TD和平均奖励TD。

更新时间: 2024-07-31 15:10:28

领域: cs.LG

下载: http://arxiv.org/abs/2405.13861v3

An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification

Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model's decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.

Updated: 2024-07-31 15:08:26

标题: 一种可解释的视觉Transformer模型：基于转移学习和支持向量机的高效干旱胁迫识别

摘要: 早期检测干旱应激对于及时采取措施减少作物损失至关重要，以免干旱影响变得不可逆转。对于干旱应激的细微表型和生理变化可以通过非侵入性成像技术捕捉到，这些成像数据可作为机器学习方法识别干旱应激的宝贵资源。虽然卷积神经网络（CNNs）被广泛使用，但视觉变换器（ViTs）作为一种有希望的替代方法，能够捕捉长距离依赖性和复杂的空间关系，从而增强对干旱应激细微指标的检测能力。我们提出了一个可解释的深度学习流程，利用ViTs在土豆作物的航空影像中检测干旱应激。我们应用了两种不同的方法：ViT和支持向量机（SVM）的协同组合方法，其中ViT从航空影像中提取复杂的空间特征，SVM将作物分类为受压力或健康，以及使用ViT内的专门分类层的端到端方法，直接检测干旱应激。我们的关键发现通过可视化关注图解释了ViT模型的决策过程。这些图突出了ViT模型将注意力集中在哪些特定的空间特征上，这些特征被认为是干旱应激的标志。我们的研究结果表明，所提出的方法不仅在干旱应激识别方面取得高准确度，还揭示了与干旱应激相关的各种细微植物特征。这为农民提供了一种稳健且可解释的解决方案，以便他们能够作出明智的决策来改善作物管理。

更新时间: 2024-07-31 15:08:26

领域: cs.CV,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.21666v1

A State-of-the-Art Review of Computational Models for Analyzing Longitudinal Wearable Sensor Data in Healthcare

Wearable devices are increasingly used as tools for biomedical research, as the continuous stream of behavioral and physiological data they collect can provide insights about our health in everyday contexts. Long-term tracking, defined in the timescale of months of year, can provide insights of patterns and changes as indicators of health changes. These insights can make medicine and healthcare more predictive, preventive, personalized, and participative (The 4P's). However, the challenges in modeling, understanding and processing longitudinal data are a significant barrier to their adoption in research studies and clinical settings. In this paper, we review and discuss three models used to make sense of longitudinal data: routines, rhythms and stability metrics. We present the challenges associated with the processing and analysis of longitudinal wearable sensor data, with a special focus on how to handle the different temporal dynamics at various granularities. We then discuss current limitations and identify directions for future work. This review is essential to the advancement of computational modeling and analysis of longitudinal sensor data for pervasive healthcare.

Updated: 2024-07-31 15:08:15

标题: 一种用于分析医疗领域纵向可穿戴传感器数据的计算模型的最新综述

摘要: 可穿戴设备越来越被用作生物医学研究工具，因为它们收集的行为和生理数据流可以为我们的健康提供日常环境下的见解。长期跟踪，定义为几个月或一年的时间尺度，可以提供模式和变化的见解，作为健康变化的指标。这些见解可以使医学和医疗更加预测、预防、个性化和参与性（4P's）。然而，对长期数据进行建模、理解和处理的挑战是它们在研究和临床环境中被采用的重要障碍。在本文中，我们审查和讨论了用于理解长期数据的三种模型：例行、节奏和稳定性指标。我们介绍了处理和分析长期可穿戴传感器数据所面临的挑战，特别关注如何处理不同时间动态的各种粒度。然后，我们讨论当前的限制，并确定未来工作的方向。这个审查对于推动长期传感器数据的计算建模和分析对于普遍医疗是至关重要的。

更新时间: 2024-07-31 15:08:15

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.21665v1

Is $F_1$ Score Suboptimal for Cybersecurity Models? Introducing $C_{score}$, a Cost-Aware Alternative for Model Assessment

The cost of errors related to machine learning classifiers, namely, false positives and false negatives, are not equal and are application dependent. For example, in cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack. Various design choices during machine learning model building, such as hyperparameter tuning and model selection, allow a data scientist to trade-off between these two errors. However, most of the commonly used metrics to evaluate model quality, such as $F_1$ score, which is defined in terms of model precision and recall, treat both these errors equally, making it difficult for users to optimize for the actual cost of these errors. In this paper, we propose a new cost-aware metric, $C_{score}$ based on precision and recall that can replace $F_1$ score for model evaluation and selection. It includes a cost ratio that takes into account the differing costs of handling false positives and false negatives. We derive and characterize the new cost metric, and compare it to $F_1$ score. Further, we use this metric for model thresholding for five cybersecurity related datasets for multiple cost ratios. The results show an average cost savings of 49%.

Updated: 2024-07-31 15:03:57

标题: F1分数对于网络安全模型是否不够优化？引入$C_{score}$，一种成本感知的模型评估替代方案

摘要: 与机器学习分类器相关的错误成本，即假阳性和假阴性，并不相等，并且取决于应用程序。例如，在网络安全应用中，未检测到攻击的成本与将良性活动标记为攻击是非常不同的。在机器学习模型构建过程中的各种设计选择，如超参数调整和模型选择，允许数据科学家在这两种错误之间进行权衡。然而，大多数常用的评估模型质量的指标，如$F_1$得分，它是根据模型精度和召回率定义的，将这两种错误视为相等，使用户难以优化这些错误的实际成本。在本文中，我们提出了一种基于精度和召回率的新的成本感知度量$C_{score}$，可以取代$F_1$得分用于模型评估和选择。它包括一个考虑处理假阳性和假阴性不同成本的成本比率。我们推导和表征了新的成本指标，并将其与$F_1$得分进行了比较。此外，我们使用该指标对五个与网络安全相关的数据集进行了多个成本比率的模型阈值处理。结果显示平均成本节约了49%。

更新时间: 2024-07-31 15:03:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14664v2

GPT-3 Powered Information Extraction for Building Robust Knowledge Bases

This work uses the state-of-the-art language model GPT-3 to offer a novel method of information extraction for knowledge base development. The suggested method attempts to solve the difficulties associated with obtaining relevant entities and relationships from unstructured text in order to extract structured information. We conduct experiments on a huge corpus of text from diverse fields to assess the performance of our suggested technique. The evaluation measures, which are frequently employed in information extraction tasks, include precision, recall, and F1-score. The findings demonstrate that GPT-3 can be used to efficiently and accurately extract pertinent and correct information from text, hence increasing the precision and productivity of knowledge base creation. We also assess how well our suggested approach performs in comparison to the most advanced information extraction techniques already in use. The findings show that by utilizing only a small number of instances in in-context learning, our suggested strategy yields competitive outcomes with notable savings in terms of data annotation and engineering expense. Additionally, we use our proposed method to retrieve Biomedical information, demonstrating its practicality in a real-world setting. All things considered, our suggested method offers a viable way to overcome the difficulties involved in obtaining structured data from unstructured text in order to create knowledge bases. It can greatly increase the precision and effectiveness of information extraction, which is necessary for many applications including chatbots, recommendation engines, and question-answering systems.

Updated: 2024-07-31 14:59:29

标题: GPT-3驱动的信息提取技术用于构建强大的知识库

摘要: 这项工作利用最先进的语言模型GPT-3，提供了一种新颖的信息提取方法，用于知识库的开发。建议的方法试图解决从非结构化文本中获取相关实体和关系以提取结构化信息所面临的困难。我们在来自不同领域的大量文本语料库上进行实验，以评估我们建议的技术的性能。评估指标通常用于信息提取任务，包括精确率、召回率和F1分数。研究结果表明，GPT-3可以有效、准确地从文本中提取相关和正确的信息，从而提高知识库创建的精度和生产力。我们还评估了我们建议的方法与已经在使用中的最先进的信息提取技术的表现。研究结果显示，通过仅利用少量实例进行上下文学习，我们的建议策略在数据注释和工程费用方面节省显著的成本，并产生具有竞争力的结果。此外，我们使用我们提出的方法检索生物医学信息，展示了其在现实世界环境中的实用性。总的来说，我们的建议方法提供了一种克服从非结构化文本中获取结构化数据的困难的可行方法，以创建知识库。它可以大大提高信息提取的精度和效果，这对许多应用程序包括聊天机器人、推荐引擎和问答系统都是必要的。

更新时间: 2024-07-31 14:59:29

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.04641v1

Beat this! Accurate beat tracking without DBN postprocessing

We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.

Updated: 2024-07-31 14:59:17

标题: 打败这个！准确的节拍追踪，无需DBN后处理

摘要: 我们提出了一个用于跟踪节拍和下拍的系统，具有两个目标：在多样化音乐范围内的泛化性和高准确性。我们通过在多个数据集上进行训练（包括独奏乐器录音、时间签名变化的作品和有高速度变化的古典音乐），并移除常用的动态贝叶斯网络（DBN）后处理，该后处理会对节拍和速度引入约束，从而实现泛化性。为了提高准确性，我们开发了一种容忍注释时间微小偏移的损失函数，以及一个交替使用卷积和变压器的架构，可以在频率或时间上进行。尽管没有使用DBN，我们的系统在F1分数上超过了当前的技术水平。然而，它仍然可能失败，尤其是对于困难和代表性不足的音乐类型，并在连续性指标上表现较差，因此我们发布了我们的模型、代码和预处理数据集，并邀请其他人来挑战。

更新时间: 2024-07-31 14:59:17

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.21658v1

Comgra: A Tool for Analyzing and Debugging Neural Networks

Neural Networks are notoriously difficult to inspect. We introduce comgra, an open source python library for use with PyTorch. Comgra extracts data about the internal activations of a model and organizes it in a GUI (graphical user interface). It can show both summary statistics and individual data points, compare early and late stages of training, focus on individual samples of interest, and visualize the flow of the gradient through the network. This makes it possible to inspect the model's behavior from many different angles and save time by rapidly testing different hypotheses without having to rerun it. Comgra has applications for debugging, neural architecture design, and mechanistic interpretability. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at https://github.com/FlorianDietz/comgra.

Updated: 2024-07-31 14:57:23

标题: Comgra：用于分析和调试神经网络的工具

摘要: 神经网络因其难以检查而臭名昭著。我们引入了comgra，这是一个用于PyTorch的开源Python库。Comgra提取关于模型内部激活的数据，并在GUI（图形用户界面）中进行组织。它可以显示总结统计数据和个别数据点，比较训练的早期和后期阶段，专注于感兴趣的个别样本，并可视化梯度沿网络的流动。这使得可以从许多不同角度检查模型的行为，并通过快速测试不同假设而无需重新运行模型来节省时间。Comgra在调试、神经架构设计和机械可解释性方面具有应用价值。我们通过Python Package Index（PyPI）发布我们的库，并在https://github.com/FlorianDietz/comgra上提供代码、文档和教程。

更新时间: 2024-07-31 14:57:23

领域: cs.LG

下载: http://arxiv.org/abs/2407.21656v1

Early detection of inflammatory arthritis to improve referrals using multimodal machine learning from blood testing, semi-structured and unstructured patient records

Early detection of inflammatory arthritis (IA) is critical to efficient and accurate hospital referral triage for timely treatment and preventing the deterioration of the IA disease course, especially under limited healthcare resources. The manual assessment process is the most common approach in practice for the early detection of IA, but it is extremely labor-intensive and inefficient. A large amount of clinical information needs to be assessed for every referral from General Practice (GP) to the hospitals. Machine learning shows great potential in automating repetitive assessment tasks and providing decision support for the early detection of IA. However, most machine learning-based methods for IA detection rely on blood testing results. But in practice, blood testing data is not always available at the point of referrals, so we need methods to leverage multimodal data such as semi-structured and unstructured data for early detection of IA. In this research, we present fusion and ensemble learning-based methods using multimodal data to assist decision-making in the early detection of IA, and a conformal prediction-based method to quantify the uncertainty of the prediction and detect any unreliable predictions. To the best of our knowledge, our study is the first attempt to utilize multimodal data to support the early detection of IA from GP referrals.

Updated: 2024-07-31 14:54:25

标题: 使用多模式机器学习从血液检测、半结构化和非结构化患者记录早期检测炎症性关节炎以改善转诊

摘要: 早期检测炎症性关节炎（IA）对于高效和准确的医院转诊分级以及及时治疗和预防IA疾病过程恶化至关重要，尤其是在有限的医疗资源下。手动评估流程是实践中早期检测IA最常见的方法，但这种方法非常耗时和低效。每次从全科医疗（GP）转诊至医院时需要评估大量的临床信息。机器学习在自动化重复性评估任务和为早期检测IA提供决策支持方面显示出巨大潜力。然而，大多数基于机器学习的IA检测方法依赖于血液检测结果。但在实践中，血液检测数据并不总是在转诊时可用，因此我们需要利用半结构化和非结构化数据等多模态数据的方法来进行早期检测IA。在这项研究中，我们提出了基于融合和集成学习的方法，利用多模态数据来辅助早期检测IA的决策，并提出了一种基于符合预测的方法来量化预测的不确定性并检测任何不可靠的预测。据我们所知，我们的研究是第一次尝试利用多模态数据来支持从GP转诊的早期检测IA。

更新时间: 2024-07-31 14:54:25

领域: cs.LG

下载: http://arxiv.org/abs/2310.19967v3

Spatial Transformer Network YOLO Model for Agricultural Object Detection

Object detection plays a crucial role in the field of computer vision by autonomously identifying and locating objects of interest. The You Only Look Once (YOLO) model is an effective single-shot detector. However, YOLO faces challenges in cluttered or partially occluded scenes and can struggle with small, low-contrast objects. We propose a new method that integrates spatial transformer networks (STNs) into YOLO to improve performance. The proposed STN-YOLO aims to enhance the model's effectiveness by focusing on important areas of the image and improving the spatial invariance of the model before the detection process. Our proposed method improved object detection performance both qualitatively and quantitatively. We explore the impact of different localization networks within the STN module as well as the robustness of the model across different spatial transformations. We apply the STN-YOLO on benchmark datasets for Agricultural object detection as well as a new dataset from a state-of-the-art plant phenotyping greenhouse facility. Our code and dataset are publicly available.

Updated: 2024-07-31 14:53:41

标题: 空间变换网络YOLO模型用于农业目标检测

摘要: 目标检测在计算机视觉领域中发挥着至关重要的作用，通过自主识别和定位感兴趣的物体。You Only Look Once（YOLO）模型是一种有效的单次检测器。然而，在拥挤或部分遮挡的场景中，YOLO面临挑战，并且可能在小型、低对比度的物体上遇到困难。我们提出了一种新方法，将空间变换网络（STNs）集成到YOLO中以提高性能。提出的STN-YOLO旨在通过专注于图像的重要区域并在检测过程之前改善模型的空间不变性来增强模型的有效性。我们提出的方法在定性和定量上都改善了目标检测性能。我们探讨了STN模块中不同定位网络的影响，以及模型在不同空间变换下的稳健性。我们将STN-YOLO应用于农业目标检测的基准数据集，以及来自一家最先进的植物表型鉴定温室设施的新数据集。我们的代码和数据集是公开可用的。

更新时间: 2024-07-31 14:53:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21652v1

PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge arises from the expensive and limited nature of user data, coupled with the scene state space tending towards infinity. These factors contribute to overfitting and poor generalization problems during model training. Henceforth, we propose an instance-based transfer imitation learning approach. This method facilitates knowledge transfer from extensive expert domain data to the user domain, presenting a fundamental resolution to these issues. We initially train a pre-trained model using large-scale expert data. Subsequently, during the fine-tuning phase, we feed the batch data, which comprises expert and user data. Employing the inverse reinforcement learning technique, we extract the style feature distribution from user demonstrations, constructing the regularization term for the approximation of user style. In our experiments, we conducted extensive evaluations of the proposed method. Compared to the baseline methods, our approach mitigates the overfitting issue caused by sparse user data. Furthermore, we discovered that integrating the driving model with a differentiable nonlinear optimizer as a safety protection layer for end-to-end personalized fine-tuning results in superior planning performance.

Updated: 2024-07-31 14:53:23

标题: PP-TIL：基于实例转移模仿学习的自动驾驶个性化规划

摘要: 个性化运动规划在城市自动驾驶中具有重要意义，满足个体用户的独特需求。然而，先前的努力经常遇到困难，即同时解决两个关键方面：在复杂的城市环境中进行个性化规划，以及通过数据利用提升规划性能。挑战在于用户数据昂贵且有限，同时场景状态空间趋向于无穷大。这些因素导致了模型训练过程中过拟合和泛化能力差的问题。因此，我们提出了一种基于实例的转移模仿学习方法。该方法促进了从广泛的专家领域数据到用户领域的知识转移，为这些问题提供了根本解决方案。我们首先使用大规模专家数据对预先训练的模型进行训练。随后，在微调阶段，我们将批处理数据（包括专家和用户数据）进行输入。通过逆强化学习技术，我们从用户演示中提取风格特征分布，构建用于近似用户风格的正则化项。在我们的实验中，我们对所提出的方法进行了广泛评估。与基准方法相比，我们的方法缓解了由稀疏用户数据引起的过拟合问题。此外，我们发现将驾驶模型与可微分非线性优化器集成作为端到端个性化微调的安全保护层，可以实现更优越的规划性能。

更新时间: 2024-07-31 14:53:23

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.18569v2

Noise Level Adaptive Diffusion Model for Robust Reconstruction of Accelerated MRI

In general, diffusion model-based MRI reconstruction methods incrementally remove artificially added noise while imposing data consistency to reconstruct the underlying images. However, real-world MRI acquisitions already contain inherent noise due to thermal fluctuations. This phenomenon is particularly notable when using ultra-fast, high-resolution imaging sequences for advanced research, or using low-field systems favored by low- and middle-income countries. These common scenarios can lead to sub-optimal performance or complete failure of existing diffusion model-based reconstruction techniques. Specifically, as the artificially added noise is gradually removed, the inherent MRI noise becomes increasingly pronounced, making the actual noise level inconsistent with the predefined denoising schedule and consequently inaccurate image reconstruction. To tackle this problem, we propose a posterior sampling strategy with a novel NoIse Level Adaptive Data Consistency (Nila-DC) operation. Extensive experiments are conducted on two public datasets and an in-house clinical dataset with field strength ranging from 0.3T to 3T, showing that our method surpasses the state-of-the-art MRI reconstruction methods, and is highly robust against various noise levels. The code for Nila is available at https://github.com/Solor-pikachu/Nila.

Updated: 2024-07-31 14:53:08

标题: 噪音水平自适应扩散模型用于加速MRI的稳健重建

摘要: 总的来说，基于扩散模型的MRI重建方法逐步去除人为添加的噪声，同时强调数据一致性以重建基础图像。然而，现实世界中的MRI采集已经包含由热涨落引起的固有噪声。当使用超快速、高分辨率成像序列进行先进研究，或者使用低场系统受到低收入和中等收入国家青睐时，这种现象尤为显著。这些常见情况可能导致现有基于扩散模型的重建技术性能亚优化或完全失败。具体而言，随着人为添加的噪声逐渐消除，固有的MRI噪声变得越来越明显，使实际噪声水平与预定义的去噪时间表不一致，从而导致图像重建不准确。为了解决这个问题，我们提出了一种后验采样策略，配合一种新颖的NoIse Level Adaptive Data Consistency（Nila-DC）操作。我们在两个公共数据集和一个内部临床数据集上进行了大量实验，涵盖了0.3T至3T的场强范围，结果显示我们的方法超越了最先进的MRI重建方法，并且对各种噪声水平具有高度鲁棒性。Nila的代码可在https://github.com/Solor-pikachu/Nila 上找到。

更新时间: 2024-07-31 14:53:08

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.05245v2

Human interaction classifier for LLM based chatbot

This study investigates different approaches to classify human interactions in an artificial intelligence-based environment, specifically for Applus+ IDIADA's intelligent agent AIDA. The main objective is to develop a classifier that accurately identifies the type of interaction received (Conversation, Services, or Document Translation) to direct requests to the appropriate channel and provide a more specialized and efficient service. Various models are compared, including LLM-based classifiers, KNN using Titan and Cohere embeddings, SVM, and artificial neural networks. Results show that SVM and ANN models with Cohere embeddings achieve the best overall performance, with superior F1 scores and faster execution times compared to LLM-based approaches. The study concludes that the SVM model with Cohere embeddings is the most suitable option for classifying human interactions in the AIDA environment, offering an optimal balance between accuracy and computational efficiency.

Updated: 2024-07-31 14:50:11

标题: 基于LLM的聊天机器人的人类交互分类器

摘要: 这项研究调查了在基于人工智能的环境中分类人类互动的不同方法，特别是针对Applus+ IDIADA的智能代理AIDA。主要目标是开发一个分类器，准确识别接收到的互动类型（对话、服务或文档翻译），以将请求定向到适当的渠道，并提供更专业化和更高效的服务。比较了各种模型，包括基于LLM的分类器、使用Titan和Cohere嵌入的KNN、SVM和人工神经网络。结果表明，使用Cohere嵌入的SVM和ANN模型实现了最佳的整体性能，具有优秀的F1得分和比基于LLM方法更快的执行时间。研究结论认为，使用Cohere嵌入的SVM模型是在AIDA环境中分类人类互动的最合适选择，提供了准确性和计算效率之间的最佳平衡。

更新时间: 2024-07-31 14:50:11

领域: cs.AI

下载: http://arxiv.org/abs/2407.21647v1

Navigating Fairness: Practitioners' Understanding, Challenges, and Strategies in AI/ML Development

The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the perspectives and experiences of AI practitioners in developing a fair AI/ML system. Understanding AI practitioners' perspectives and experiences on the fairness of AI/ML systems are important because they are directly involved in its development and deployment and their insights can offer valuable real-world perspectives on the challenges associated with ensuring fairness in AI/ML systems. We conducted semi-structured interviews with 22 AI practitioners to investigate their understanding of what a 'fair AI/ML' is, the challenges they face in developing a fair AI/ML system, the consequences of developing an unfair AI/ML system, and the strategies they employ to ensure AI/ML system fairness. We developed a framework showcasing the relationship between AI practitioners' understanding of 'fair AI/ML' system and (i) their challenges in its development, (ii) the consequences of developing an unfair AI/ML system, and (iii) strategies used to ensure AI/ML system fairness. By exploring AI practitioners' perspectives and experiences, this study provides actionable insights to enhance AI/ML fairness, which may promote fairer systems, reduce bias, and foster public trust in AI technologies. Additionally, we also identify areas for further investigation and offer recommendations to aid AI practitioners and AI companies in navigating fairness.

Updated: 2024-07-31 14:47:24

标题: 导航公平：AI/ML开发中从业者的理解、挑战和策略

摘要: 近年来，跨各行业AI/ML应用的增加引发了对AI/ML公平性的讨论。尽管先前已有关于AI/ML公平性的研究，但缺乏针对理解AI从业者在开发公平AI/ML系统方面的观点和经验的实证研究。理解AI从业者对AI/ML系统公平性的观点和经验至关重要，因为他们直接参与其开发和部署，他们的见解可以提供有价值的现实世界视角，以应对确保AI/ML系统公平性所面临的挑战。我们对22位AI从业者进行了半结构化访谈，以调查他们对“公平AI/ML”是什么的理解，他们在开发公平AI/ML系统中面临的挑战，开发不公平AI/ML系统的后果，以及他们用于确保AI/ML系统公平性的策略。我们制定了一个框架，展示了AI从业者对“公平AI/ML”系统的理解与(i)其开发中的挑战，(ii)开发不公平AI/ML系统的后果，以及(iii)确保AI/ML系统公平性的策略之间的关系。通过探索AI从业者的观点和经验，这项研究提供了可操作的见解，以增强AI/ML的公平性，可能促进更公平的系统，减少偏见，并增进公众对AI技术的信任。此外，我们还确定了需要进一步调查的领域，并提供了建议，以帮助AI从业者和AI公司在导航公平性方面。

更新时间: 2024-07-31 14:47:24

领域: cs.CY,cs.AI,cs.SE

下载: http://arxiv.org/abs/2403.15481v2

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Fairness is one of the socio-technical concerns that must be addressed in software systems. Considering the popularity of mobile software applications (apps) among a wide range of individuals worldwide, mobile apps with unfair behaviors and outcomes can affect a significant proportion of the global population, potentially more than any other type of software system. Users express a wide range of socio-technical concerns in mobile app reviews. This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based mobile apps may be higher than in non-AI-based apps. To this end, we first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.

Updated: 2024-07-31 14:45:52

标题: App评论中的公平性问题：基于人工智能移动应用的研究

摘要: 公平性是软件系统中必须解决的社会技术问题之一。考虑到全球各地广泛群体中移动软件应用程序（应用）的流行，具有不公平行为和结果的移动应用可能影响全球人口的相当比例，潜在影响可能超过任何其他类型的软件系统。用户在移动应用评论中表达了各种社会技术关注。本研究旨在调查移动应用评论中提出的公平性关注。我们的研究重点放在基于人工智能的移动应用评论上，因为基于人工智能的移动应用中不公平行为和结果的可能性可能比非基于人工智能的应用更高。为此，我们首先手动构建了一个基准数据集，包括1,132个公平性评论和1,473个非公平性评论。利用基准数据集，我们开发并评估了一组机器学习和深度学习模型，用于区分公平性评论和非公平性评论。我们的实验表明，我们的最佳模型可以以94%的准确度检测到公平性评论。然后，我们将最佳模型应用于从108个基于人工智能的应用中收集的约950万条评论，并识别出约92,000个公平性评论。接下来，将K均值聚类技术应用于92,000个公平性评论，随后进行手动分析，导致识别出六种不同类型的公平性关注（例如，“在不同平台和设备上接收到不同质量的功能和服务”和“处理用户生成内容时缺乏透明度和公平性”）。最后，对2,248个应用所有者对公平性评论的回复进行手动分析，识别出六种根本原因（例如，“版权问题”），应用所有者报告以证明公平性关注。

更新时间: 2024-07-31 14:45:52

领域: cs.SE,cs.AI,cs.CY

下载: http://arxiv.org/abs/2401.08097v4

Lyapunov weights to convey the meaning of time in physics-informed neural networks

Time is not a dimension as the others. In Physics-Informed Neural Networks (PINN) several proposals attempted to adapt the time sampling or time weighting to take into account the specifics of this special dimension. But these proposals are not principled and need guidance to be used. We explain here theoretically why the Lyapunov exponents give actionable insights and propose a weighting scheme to automatically adapt to chaotic, periodic or stable dynamics. We characterize theoretically the best weighting scheme under computational constraints as a cumulative exponential integral of the local Lyapunov exponent estimators and show that it performs well in practice under the regimes mentioned above.

Updated: 2024-07-31 14:41:40

标题: 李亚普诺夫权重在基于物理的神经网络中传达时间含义

摘要: 时间不像其他维度那样。在物理信息神经网络（PINN）中，有几个提议试图调整时间采样或时间加权，以考虑这个特殊维度的特定情况。但这些提议并不是基本的，并需要指导才能使用。我们在这里从理论上解释了为什么Lyapunov指数提供了可操作的见解，并提出了一种加权方案，可以自动适应混沌、周期性或稳定的动态。我们理论上表征了在计算约束下作为本地Lyapunov指数估计器的累积指数积分的最佳加权方案，并展示了在上述情况下在实践中表现良好。

更新时间: 2024-07-31 14:41:40

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2407.21642v1

Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models

The rapid development of Large Language Models (LLMs) has brought remarkable generative capabilities across diverse tasks. However, despite the impressive achievements, these models still have numerous security vulnerabilities, particularly when faced with jailbreak attacks. Therefore, by investigating jailbreak attacks, we can uncover hidden weaknesses in LLMs and guide us in developing more robust defense mechanisms to fortify their security. In this paper, we further explore the boundary of jailbreak attacks on LLMs and propose Analyzing-based Jailbreak (ABJ). This effective jailbreak attack method takes advantage of LLMs' growing analyzing and reasoning capability and reveals their underlying vulnerabilities when facing analysis-based tasks. We conduct a detailed evaluation of ABJ across various open-source and closed-source LLMs, which achieves 94.8% Attack Success Rate (ASR) and 1.06 Attack Efficiency (AE) on GPT-4-turbo-0409, demonstrating state-of-the-art attack effectiveness and efficiency. Our research highlights the importance of prioritizing and enhancing the safety of LLMs to mitigate the risks of misuse.The code is publicly available at https://github.com/theshi-1128/ABJ-Attack.

Updated: 2024-07-31 14:37:05

标题: 搞清楚：对大型语言模型的基于分析的越狱攻击

摘要: 大型语言模型（LLMs）的快速发展带来了在各种任务中出色的生成能力。然而，尽管取得了令人瞩目的成就，这些模型仍然存在许多安全漏洞，特别是在面对越狱攻击时。因此，通过调查越狱攻击，我们可以揭示LLMs中隐藏的弱点，并指导我们开发更强大的防御机制以加强它们的安全性。在本文中，我们进一步探讨了LLMs上越狱攻击的边界，并提出了一种基于分析的越狱（ABJ）方法。这种有效的越狱攻击方法利用了LLMs日益增长的分析和推理能力，并在面对基于分析的任务时揭示了它们的潜在漏洞。我们对各种开源和闭源LLMs进行了ABJ的详细评估，在GPT-4-turbo-0409上取得了94.8%的攻击成功率（ASR）和1.06的攻击效率（AE），展示了最先进的攻击效果和效率。我们的研究强调了优先考虑和增强LLMs的安全性以减轻滥用风险的重要性。代码可以在https://github.com/theshi-1128/ABJ-Attack上公开获取。

更新时间: 2024-07-31 14:37:05

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.16205v2

Quality Control for Radiology Report Generation Models via Auxiliary Auditing Components

Automation of medical image interpretation could alleviate bottlenecks in diagnostic workflows, and has become of particular interest in recent years due to advancements in natural language processing. Great strides have been made towards automated radiology report generation via AI, yet ensuring clinical accuracy in generated reports is a significant challenge, hindering deployment of such methods in clinical practice. In this work we propose a quality control framework for assessing the reliability of AI-generated radiology reports with respect to semantics of diagnostic importance using modular auxiliary auditing components (AC). Evaluating our pipeline on the MIMIC-CXR dataset, our findings show that incorporating ACs in the form of disease-classifiers can enable auditing that identifies more reliable reports, resulting in higher F1 scores compared to unfiltered generated reports. Additionally, leveraging the confidence of the AC labels further improves the audit's effectiveness.

Updated: 2024-07-31 14:37:00

标题: 辅助审计组件在放射学报告生成模型质量控制中的应用

摘要: 医学图像解释的自动化可以缓解诊断工作流程中的瓶颈，并且由于自然语言处理的进步，近年来已经引起了特别关注。通过人工智能实现自动化放射学报告生成已经取得了巨大进展，然而确保生成报告的临床准确性是一个重要挑战，阻碍了这些方法在临床实践中的部署。在这项工作中，我们提出了一个质量控制框架，用于评估由人工智能生成的放射学报告在诊断重要性语义方面的可靠性，采用模块化的辅助审计组件（AC）。通过在MIMIC-CXR数据集上评估我们的流程，我们的研究结果表明，将疾病分类器形式的AC纳入审计中可以识别出更可靠的报告，相比未经筛选的生成报告，可以实现更高的F1分数。此外，利用AC标签的置信度进一步提高了审计的效果。

更新时间: 2024-07-31 14:37:00

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.21638v1

Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations. Acknowledgments: The participation of Javier Naranjo-Alcazar, Jordi Grau-Haro and Pedro Zuccarello in this research was funded by the Valencian Institute for Business Competitiveness (IVACE) and the FEDER funds by means of project Soroll-IA2 (IMDEEA/2023/91).

Updated: 2024-07-31 14:34:43

标题: 使用AI辅助策略优化标注预算从野外录音中创建音频数据集的实践方面

摘要: 机器听力专注于开发从音频信号中提取相关信息的技术。这些项目的一个关键方面是获取和标记具有上下文的数据，这是固有复杂的，需要特定的资源和策略。尽管一些音频数据集可用，但许多并不适用于商业应用。本文强调了使用专家标记者进行主动学习（AL）的重要性，而不是众包，后者通常缺乏对数据集结构的详细洞察。AL是一个迭代过程，结合人类标记者和AI模型，通过智能选择样本进行人工审查，优化标记预算。这种方法解决了处理大型、不断增长的数据集的挑战，这些数据集超出了可用的计算资源和内存。本文提出了一个面向数据的全面框架，用于机器听力项目，详细说明了录音节点的配置、数据库结构和在资源受限情况下的标记预算优化。该框架应用于西班牙瓦伦西亚的一个工业港口，在五个月内成功标记了6540个持续十秒的音频样本，仅用一小团队，展示了其有效性和适应性，适用于各种资源可用情况。致谢：哈维尔·纳兰霍-阿尔卡萨尔、乔尔迪·格劳-哈罗和佩德罗·祖卡雷洛参与了这项研究，该研究得到了瓦伦西亚商业竞争力研究所（IVACE）和FEDER基金的资助，通过Soroll-IA2项目（IMDEEA/2023/91）。

更新时间: 2024-07-31 14:34:43

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.18153v2

MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction

Multi-agent trajectory prediction is crucial to autonomous driving and understanding the surrounding environment. Learning-based approaches for multi-agent trajectory prediction, such as primarily relying on graph neural networks, graph transformers, and hypergraph neural networks, have demonstrated outstanding performance on real-world datasets in recent years. However, the hypergraph transformer-based method for trajectory prediction is yet to be explored. Therefore, we present a MultiscAle Relational Transformer (MART) network for multi-agent trajectory prediction. MART is a hypergraph transformer architecture to consider individual and group behaviors in transformer machinery. The core module of MART is the encoder, which comprises a Pair-wise Relational Transformer (PRT) and a Hyper Relational Transformer (HRT). The encoder extends the capabilities of a relational transformer by introducing HRT, which integrates hyperedge features into the transformer mechanism, promoting attention weights to focus on group-wise relations. In addition, we propose an Adaptive Group Estimator (AGE) designed to infer complex group relations in real-world environments. Extensive experiments on three real-world datasets (NBA, SDD, and ETH-UCY) demonstrate that our method achieves state-of-the-art performance, enhancing ADE/FDE by 3.9%/11.8% on the NBA dataset. Code is available at https://github.com/gist-ailab/MART.

Updated: 2024-07-31 14:31:49

标题: MART: 多尺度关系变换网络用于多智能体轨迹预测

摘要: 多智能体轨迹预测对自动驾驶和理解周围环境至关重要。近年来，基于学习的多智能体轨迹预测方法，主要依赖于图神经网络、图变换器和超图神经网络，在真实世界数据集上展现出优异的性能。然而，基于超图变换器的轨迹预测方法尚未被探索。因此，我们提出了一个用于多智能体轨迹预测的多尺度关系变换器（MART）网络。MART是一个超图变换器架构，考虑了变换器机制中的个体和群体行为。MART的核心模块是编码器，包括一对关系变换器（PRT）和一个超关系变换器（HRT）。编码器通过引入HRT扩展了关系变换器的能力，将超边特征整合到变换器机制中，促进注意力权重集中在群体关系上。此外，我们提出了一种适应性群体估计器（AGE），旨在推断现实世界环境中的复杂群体关系。对三个真实世界数据集（NBA、SDD和ETH-UCY）进行的大量实验表明，我们的方法达到了最先进的性能，在NBA数据集上将ADE/FDE提高了3.9%/11.8%。代码可在https://github.com/gist-ailab/MART 上获得。

更新时间: 2024-07-31 14:31:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.21635v1

Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations

General intelligence requires quick adaption across tasks. While existing reinforcement learning (RL) methods have made progress in generalization, they typically assume only distribution changes between source and target domains. In this paper, we explore a wider range of scenarios where both the distribution and environment spaces may change. For example, in Atari games, we train agents to generalize to tasks with different levels of mode and difficulty, where there could be new state or action variables that never occurred in previous environments. To address this challenging setting, we introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively and efficiently across a sequence of tasks with evolving dynamics. Specifically, we employ causal representation learning to characterize the latent causal variables and world models within the RL system. Such compact causal representations uncover the structural relationships among variables, enabling the agent to autonomously determine whether changes in the environment stem from distribution shifts or variations in space, and to precisely locate these changes. We then devise a three-step strategy to fine-tune the model under different scenarios accordingly. Empirical experiments show that CSR efficiently adapts to the target domains with only a few samples and outperforms state-of-the-art baselines on a wide range of scenarios, including our simulated environments, Cartpole, and Atari games.

Updated: 2024-07-31 14:24:20

标题: 朝向可推广的因果引导自适应表示的强化学习

摘要: 智能需要在各种任务中快速适应。虽然现有的强化学习（RL）方法在泛化方面取得了进展，但它们通常只假设源领域和目标领域之间的分布发生变化。在本文中，我们探索了更广泛的情景，其中分布和环境空间可能同时发生变化。例如，在Atari游戏中，我们训练代理程序以泛化到具有不同模式和难度级别的任务，其中可能存在以前环境中从未出现过的新状态或动作变量。为了解决这一具有挑战性的设置，我们引入了一种基于因果指导的自适应表示方法，称为CSR，使代理程序能够有效和高效地在不断演变的任务序列中泛化。具体而言，我们采用因果表示学习来表征RL系统中的潜在因果变量和世界模型。这种紧凑的因果表示揭示了变量之间的结构关系，使代理程序能够自主确定环境变化是源于分布的转变还是空间的变化，并精确地定位这些变化。然后，我们设计了一个三步策略，根据不同情景对模型进行微调。实证实验表明，CSR能够仅凭少量样本就有效地适应目标领域，并在包括我们模拟环境、Cartpole和Atari游戏在内的各种情景中优于现有技术基线。

更新时间: 2024-07-31 14:24:20

领域: cs.LG

下载: http://arxiv.org/abs/2407.20651v2

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper.

Updated: 2024-07-31 14:23:00

标题: ICAGC 2024：2024年启发性和令人信服的音频生成挑战

摘要: 2024年激励和令人信服的音频生成挑战（ICAGC 2024）是ISCSLP 2024竞赛和挑战赛道的一部分。尽管当前的文本转语音（TTS）技术可以生成高质量的音频，但其传达复杂情感和控制细节内容的能力仍然有限。这一限制导致了在实际应用中（如儿童伴侣机器人和营销机器人）生成的音频与人类主观感知之间存在差异。核心问题在于高质量音频生成与最终人类主观体验之间的不一致性。因此，这一挑战旨在提高合成音频的说服力和可接受性，重点是人类对齐令人信服和激励音频生成。共有19支团队注册参加了挑战，本文描述了比赛和竞赛的结果。

更新时间: 2024-07-31 14:23:00

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2407.12038v2

Grid-Based Decompositions for Spatial Data under Local Differential Privacy

Local differential privacy (LDP) has recently emerged as a popular privacy standard. With the growing popularity of LDP, several recent works have applied LDP to spatial data, and grid-based decompositions have been a common building block in the collection and analysis of spatial data under DP and LDP. In this paper, we study three grid-based decomposition methods for spatial data under LDP: Uniform Grid (UG), PrivAG, and AAG. UG is a static approach that consists of equal-sized cells. To enable data-dependent decomposition, PrivAG was proposed by Yang et al. as the most recent adaptive grid method. To advance the state-of-the-art in adaptive grids, in this paper we propose the Advanced Adaptive Grid (AAG) method. For each grid cell, following the intuition that the cell's intra-cell density distribution will be affected by its neighbors, AAG performs uneven cell divisions depending on the neighboring cells' densities. We experimentally compare UG, PrivAG, and AAG using three real-world location datasets, varying privacy budgets, and query sizes. Results show that AAG provides higher utility than PrivAG, demonstrating the superiority of our proposed approach. Furthermore, UG's performance is heavily dependent on the choice of grid size. When the grid size is chosen optimally in UG, AAG still beats UG for small queries, but UG beats AAG for large (coarse-grained) queries.

Updated: 2024-07-31 14:17:44

标题: 基于网格的空间数据在局部差分隐私下的分解方法

摘要: 局部差分隐私（LDP）最近已成为一种流行的隐私标准。随着LDP的日益普及，一些最近的工作将LDP应用于空间数据，并且基于网格的分解在收集和分析DP和LDP下的空间数据中是一个常见的构建模块。在本文中，我们研究了三种基于网格的空间数据分解方法：均匀网格（UG）、PrivAG和AAG。UG是一种静态方法，由相同大小的单元格组成。为了实现数据相关的分解，Yang等人提出了PrivAG作为最新的自适应网格方法。为了推动自适应网格的最新技术，本文提出了高级自适应网格（AAG）方法。根据每个单元格的直觉，即单元格内部密度分布将受到邻近单元格的影响，AAG根据邻近单元格的密度进行不均匀的单元格划分。我们使用三个真实世界的位置数据集，不同的隐私预算和查询大小对UG、PrivAG和AAG进行了实验比较。结果显示，AAG比PrivAG提供了更高的效用，证明了我们提出的方法的优越性。此外，UG的性能严重依赖于网格大小的选择。当在UG中选择了最佳的网格大小时，AAG仍然在小查询中击败UG，但UG在大（粗粒度）查询中击败AAG。

更新时间: 2024-07-31 14:17:44

领域: cs.CR

下载: http://arxiv.org/abs/2407.21624v1

Extended Fiducial Inference: Toward an Automated Process of Statistical Inference

While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set --`inferring the uncertainty of model parameters on the basis of observations' -- has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leveraging advanced statistical computing techniques while remaining scalable for big data. EFI involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. EFI also provides an innovative framework for semi-supervised learning.

Updated: 2024-07-31 14:15:42

标题: 扩展的标定推断：走向统计推断的自动化过程

摘要: 虽然R.A. Fisher广泛认为信度推断是一个大错误，但他最初设定的目标 --`根据观察结果推断模型参数的不确定性' -- 一直被许多统计学家不断追求。为此，我们开发了一种新的统计推断方法，称为扩展信度推断（EFI）。这种新方法通过利用先进的统计计算技术实现了信度推断的目标，同时在处理大数据时仍具有可扩展性。EFI通过利用随机梯度马尔可夫链蒙特卡洛方法联合估计观察中实现的随机误差，并利用稀疏深度神经网络（DNN）估计逆函数来实现。稀疏DNN估计器的一致性确保了观察中嵌入的不确定性通过估计的逆函数正确传播到模型参数，从而验证下游的统计推断。与频率主义和贝叶斯方法相比，EFI在参数估计和假设检验方面提供了显著优势。具体来说，EFI在参数估计方面提供了更高的准确性，特别是在观察中存在异常值时；并且在假设检验中消除了对理论参考分布的需求，从而自动化了统计推断过程。EFI还为半监督学习提供了一种创新框架。

更新时间: 2024-07-31 14:15:42

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2407.21622v1

Optimizing Disease Prediction with Artificial Intelligence Driven Feature Selection and Attention Networks

The rapid integration of machine learning methodologies in healthcare has ignited innovative strategies for disease prediction, particularly with the vast repositories of Electronic Health Records (EHR) data. This article delves into the realm of multi-disease prediction, presenting a comprehensive study that introduces a pioneering ensemble feature selection model. This model, designed to optimize learning systems, combines statistical, deep, and optimally selected features through the innovative Stabilized Energy Valley Optimization with Enhanced Bounds (SEV-EB) algorithm. The objective is to achieve unparalleled accuracy and stability in predicting various disorders. This work proposes an advanced ensemble model that synergistically integrates statistical, deep, and optimally selected features. This combination aims to enhance the predictive power of the model by capturing diverse aspects of the health data. At the heart of the proposed model lies the SEV-EB algorithm, a novel approach to optimal feature selection. The algorithm introduces enhanced bounds and stabilization techniques, contributing to the robustness and accuracy of the overall prediction model. To further elevate the predictive capabilities, an HSC-AttentionNet is introduced. This network architecture combines deep temporal convolution capabilities with LSTM, allowing the model to capture both short-term patterns and long-term dependencies in health data. Rigorous evaluations showcase the remarkable performance of the proposed model. Achieving a 95% accuracy and 94% F1-score in predicting various disorders, the model surpasses traditional methods, signifying a significant advancement in disease prediction accuracy. The implications of this research extend beyond the confines of academia.

Updated: 2024-07-31 14:12:27

标题: 利用人工智能驱动的特征选择和注意力网络优化疾病预测

摘要: 在医疗保健领域，机器学习方法的快速整合引发了疾病预测的创新策略，尤其是利用大量电子健康记录（EHR）数据存储库。本文深入探讨了多疾病预测领域，提出了一项全面研究，介绍了一种开创性的集成特征选择模型。该模型旨在通过创新的稳定能量谷优化增强边界（SEV-EB）算法，结合统计、深度和优选特征，优化学习系统。其目标是在预测各种疾病时实现无与伦比的准确性和稳定性。本研究提出了一个先进的集成模型，将统计、深度和优选特征协同整合。这种组合旨在通过捕捉健康数据的各个方面，增强模型的预测能力。提出的模型的核心是SEV-EB算法，这是一种优化特征选择的新方法。该算法引入了增强边界和稳定技术，有助于整体预测模型的稳健性和准确性。为了进一步提升预测能力，引入了HSC-AttentionNet。这种网络架构将深层时间卷积能力与LSTM相结合，使模型能够捕捉健康数据中的短期模式和长期依赖关系。严格的评估展示了提出模型的卓越性能。在预测各种疾病方面达到95%的准确性和94%的F1分数，该模型超越了传统方法，标志着疾病预测准确性的重大进展。这项研究的影响超越了学术界的范围。

更新时间: 2024-07-31 14:12:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.03151v1

Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i.i.d.), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD.

Updated: 2024-07-31 14:06:55

标题: Con4m：面向分段时间序列分类的上下文感知一致性学习框架

摘要: 时间序列分类（TSC）包括两种情况：对整个序列进行分类或对分割后的子序列进行分类。分割后的TSC通常包含具有不同持续时间的多个类别（MVD）。因此，MVD的特征对于分割后的TSC提出了独特的挑战，但目前的研究大多忽视了这一点。具体来说，在MVD内需要分类的连续实例（段）之间存在自然的时间依赖性。然而，主流的TSC模型依赖于独立同分布（i.i.d.）的假设，主要关注独立地对每个段进行建模。此外，具有不同专业知识的注释者可能提供不一致的边界标签，导致无噪声TSC模型性能不稳定。为了解决这些挑战，我们首先正式证明有价值的上下文信息增强了分类实例的区分能力。利用MVD在数据和标签层面的上下文先验，我们提出了一种新颖的一致性学习框架Con4m，有效地利用了更有利于在分割后TSC任务中区分连续段的上下文信息，同时调和训练中不一致的边界标签。在多个数据集上进行的广泛实验验证了Con4m在处理MVD上的分割TSC任务中的有效性。

更新时间: 2024-07-31 14:06:55

领域: cs.AI

下载: http://arxiv.org/abs/2408.00041v1

Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction

Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this discovery process. Our approach utilises the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein, achieving state-of-the-art predictive performance against multiple established benchmarks. The use of gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for large computational resources. In addition, we further benchmarked new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interaction predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions.

Updated: 2024-07-31 14:06:18

标题: 巴洛双胞胎深度神经网络用于高级1D药物靶标相互作用预测

摘要: 准确预测药物靶点相互作用对于推动药物发现至关重要。通过减少时间和成本，机器学习和深度学习可以加速这一发现过程。我们的方法利用强大的Barlow Twins架构进行特征提取，同时考虑目标蛋白的结构，实现了针对多个已建立基准的最新预测性能。梯度提升机作为基础预测器确保了快速高效的预测，无需大量计算资源。此外，我们还将新基线与现有方法进行了进一步的基准测试。这些创新共同提高了药物靶点相互作用预测的效率和效果，为加速药物开发并加深对分子相互作用的理解提供了强大的工具。

更新时间: 2024-07-31 14:06:18

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00040v1

Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music

Generative AI models have recently blossomed, significantly impacting artistic and musical traditions. Research investigating how humans interact with and deem these models is therefore crucial. Through a listening and reflection study, we explore participants' perspectives on AI- vs human-generated progressive metal, in symbolic format, using rock music as a control group. AI-generated examples were produced by ProgGP, a Transformer-based model. We propose a mixed methods approach to assess the effects of generation type (human vs. AI), genre (progressive metal vs. rock), and curation process (random vs. cherry-picked). This combines quantitative feedback on genre congruence, preference, creativity, consistency, playability, humanness, and repeatability, and qualitative feedback to provide insights into listeners' experiences. A total of 32 progressive metal fans completed the study. Our findings validate the use of fine-tuning to achieve genre-specific specialization in AI music generation, as listeners could distinguish between AI-generated rock and progressive metal. Despite some AI-generated excerpts receiving similar ratings to human music, listeners exhibited a preference for human compositions. Thematic analysis identified key features for genre and AI vs. human distinctions. Finally, we consider the ethical implications of our work in promoting musical data diversity within MIR research by focusing on an under-explored genre.

Updated: 2024-07-31 14:03:45

标题: 我和人工智能之间：分析听众对人工智能和人类创作的前卫金属音乐的看法

摘要: 生成式人工智能模型近来蓬勃发展，显著影响着艺术和音乐传统。因此，研究人类如何与这些模型互动并评价它们是至关重要的。通过一项听力和反思研究，我们探讨了参与者对人工智能生成的渐进金属音乐与人类生成的音乐的看法，以符号格式使用摇滚音乐作为对照组。人工智能生成的示例由基于Transformer的模型ProgGP生成。我们提出了一种混合方法来评估生成类型（人类 vs. 人工智能）、流派（渐进金属 vs. 摇滚）和策展过程（随机 vs. 精心挑选）的影响。这结合了关于流派一致性、偏好、创造力、一致性、可玩性、人性化和可重复性的定量反馈，以及定性反馈以提供关于听众体验的见解。共有32名渐进金属音乐爱好者完成了这项研究。我们的研究结果验证了通过精细调整来实现音乐生成中特定流派专业化的可行性，因为听众能够区分人工智能生成的摇滚和渐进金属音乐。尽管一些人工智能生成的片段得到了与人类音乐类似的评分，但听众更偏好人类作品。主题分析确定了流派和人工智能与人类的区别的关键特征。最后，我们考虑了我们的工作在推动音乐数据多样性方面的伦理意义，重点关注了一个未充分开发的流派在MIR研究中的重要性。

更新时间: 2024-07-31 14:03:45

领域: cs.SD,cs.AI,cs.HC,eess.AS

下载: http://arxiv.org/abs/2407.21615v1

Analysis of Total Variation Minimization for Clustered Federated Learning

A key challenge in federated learning applications is the statistical heterogeneity of local datasets. Clustered federated learning addresses this challenge by identifying clusters of local datasets that are approximately homogeneous. One recent approach to clustered federated learning is generalized total variation minimization (GTVMin). This approach requires a similarity graph which can be obtained by domain expertise or in a data-driven fashion via graph learning techniques. Under a widely applicable clustering assumption, we derive an upper bound the deviation between GTVMin solutions and their cluster-wise averages. This bound provides valuable insights into the effectiveness and robustness of GTVMin in addressing statistical heterogeneity within federated learning environments.

Updated: 2024-07-31 13:57:36

标题: 对聚类联邦学习中总变差最小化的分析

摘要: 在联邦学习应用中的一个关键挑战是本地数据集的统计异质性。聚类联邦学习通过识别大致均匀的本地数据集群来解决这一挑战。最近一种应用于聚类联邦学习的方法是广义总变差最小化（GTVMin）。该方法需要一个相似性图，可以通过领域专业知识或数据驱动方式通过图学习技术获得。在一个广泛适用的聚类假设下，我们推导出GTVMin解决方案与它们的群集平均值之间的偏差上界。这个界限为理解GTVMin在解决联邦学习环境中的统计异质性的有效性和稳健性提供了宝贵的见解。

更新时间: 2024-07-31 13:57:36

领域: cs.LG,I.2.11; I.5.3

下载: http://arxiv.org/abs/2403.06298v2

Diversifying AI: Towards Creative Chess with AlphaZero

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.

Updated: 2024-07-31 13:55:49

标题: 拓展AI的多样性：朝着具有创造力的AlphaZero国际象棋发展

摘要: 在最近几年，人工智能系统在各种计算任务中已经超越了人类智能。然而，人工智能系统，就像人类一样，也会犯错、有盲点、产生幻觉，并且在推广到新情况时很难。本研究探讨了当人工智能系统被推到计算合理性的极限时，是否可以从创造性决策机制中受益。具体来说，我们研究了一个多样化人工智能系统团队是否能够通过作为一个群体生成更多想法然后选择最好的那些来在具有挑战性的任务中胜过单个人工智能。我们在象棋游戏中研究了这个问题，这被称为人工智能的果蝇。我们建立在AlphaZero(AZ)的基础上，并通过一种潜在条件的架构扩展它，我们称之为AZ_db。我们通过行为多样性技术训练AZ_db生成更广泛的想法，并使用次加规划选择最有前途的想法。我们的实验表明，AZ_db以多种方式下象棋，作为一个团队解决更多的难题，并且胜过一个更同质化的团队。值得注意的是，AZ_db解决的挑战性问题比AZ多两倍，包括具有挑战性的Penrose位置。当从不同开局下棋时，我们注意到AZ_db中的玩家专门针对不同的开局，并且使用次加规划为每个开局选择一个玩家相比AZ可以提高50个Elo。我们的研究结果表明，人工智能代理团队中也会出现多样性奖励，就像在人类团队中一样，并且多样性是解决计算难题的宝贵资产。

更新时间: 2024-07-31 13:55:49

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.09175v3

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.

Updated: 2024-07-31 13:49:17

标题: 用边界感知注意机制增强部分伪造音频定位

摘要: 部分伪造音频定位的任务旨在准确地确定帧级别的音频真实性。尽管一些研究取得了令人鼓舞的成果，但在单个模型内利用边界信息仍然是一个未被探索的研究课题。在这项工作中，我们提出了一种名为Boundary-aware Attention Mechanism（BAM）的新方法。具体而言，它由两个核心模块组成：Boundary Enhancement和Boundary Frame-wise Attention。前者汇集了帧内和帧间信息，以提取具有区分性的边界特征，随后用于边界位置检测和真实性决策，而后者利用边界预测结果来明确控制帧之间的特征交互，从而实现了对真假帧的有效区分。在PartialSpoof数据库上的实验结果表明，我们提出的方法取得了最佳性能。代码可在https://github.com/media-sec-lab/BAM找到。

更新时间: 2024-07-31 13:49:17

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.21611v1

Ironing the Graphs: Toward a Correct Geometric Analysis of Large-Scale Graphs

Graph embedding approaches attempt to project graphs into geometric entities, i.e, manifolds. The idea is that the geometric properties of the projected manifolds are helpful in the inference of graph properties. However, if the choice of the embedding manifold is incorrectly performed, it can lead to incorrect geometric inference. In this paper, we argue that the classical embedding techniques cannot lead to correct geometric interpretation as they miss the curvature at each point, of manifold. We advocate that for doing correct geometric interpretation the embedding of graph should be done over regular constant curvature manifolds. To this end, we present an embedding approach, the discrete Ricci flow graph embedding (dRfge) based on the discrete Ricci flow that adapts the distance between nodes in a graph so that the graph can be embedded onto a constant curvature manifold that is homogeneous and isotropic, i.e., all directions are equivalent and distances comparable, resulting in correct geometric interpretations. A major contribution of this paper is that for the first time, we prove the convergence of discrete Ricci flow to a constant curvature and stable distance metrics over the edges. A drawback of using the discrete Ricci flow is the high computational complexity that prevented its usage in large-scale graph analysis. Another contribution of this paper is a new algorithmic solution that makes it feasible to calculate the Ricci flow for graphs of up to 50k nodes, and beyond. The intuitions behind the discrete Ricci flow make it possible to obtain new insights into the structure of large-scale graphs. We demonstrate this through a case study on analyzing the internet connectivity structure between countries at the BGP level.

Updated: 2024-07-31 13:47:53

标题: 对图进行熨烫：朝着大规模图的准确几何分析

摘要: 图嵌入方法试图将图投影到几何实体，即流形中。其思想是投影流形的几何属性有助于推断图的属性。然而，如果选择嵌入流形的方式不正确，可能会导致几何推断错误。本文认为经典的嵌入技术无法进行正确的几何解释，因为它们忽略了流形上每个点的曲率。我们主张为了进行正确的几何解释，应该将图的嵌入在常曲率流形上进行。为此，我们提出了一种基于离散Ricci流的图嵌入方法（dRfge），该方法通过调整图中节点之间的距离，使得图可以嵌入到一个均匀和等向性的常曲率流形上，即所有方向等效，距离可比，从而得到正确的几何解释。本文的一个重要贡献是首次证明了离散Ricci流收敛到常曲率和稳定距离度量。使用离散Ricci流的一个缺点是高计算复杂性，这阻止了其在大规模图分析中的使用。本文的另一个贡献是提出了一种新的算法解决方案，使得可以计算具有多达50k节点的图的Ricci流，甚至更多。离散Ricci流背后的直觉使得可以获得关于大规模图结构的新见解。我们通过一个案例研究展示了这一点，该案例研究分析了国家之间在BGP级别的互联网连接结构。

更新时间: 2024-07-31 13:47:53

领域: cs.CG,cs.LG

下载: http://arxiv.org/abs/2407.21609v1

U-Net-based Lung Thickness Map for Pixel-level Lung Volume Estimation of Chest X-rays

Purpose: We aimed to estimate the total lung volume (TLV) from real and synthetic frontal X-ray radiographs on a pixel level using lung thickness maps generated by a U-Net. Methods: 5,959 thorax X-ray computed tomography (CT) scans were retrieved from two publicly available datasets of the lung nodule analysis 2016 (n=656) and the RSNA pulmonary embolism detection challenge 2020 (n=5,303). Additionally, thorax CT scans from 72 subjects (33 healthy: 20 men, mean age [range] = 62.4 [34, 80]; 39 suffering from chronic obstructive pulmonary disease: 25 men, mean age [range] = 69.0 [47, 91]) were retrospectively selected (10.2018-12.2019) from our in-house dataset such that for each subject, a frontal chest X-ray radiograph no older than seven days was available. All CT scans and their corresponding lung segmentation were forward projected using a simulated X-ray spectrum to generate synthetic radiographs and lung thickness maps, respectively. A U-Net model was trained and tested on synthetic radiographs from the public datasets to predict lung thickness maps and consequently estimate TLV. Model performance was further assessed by evaluating the TLV estimations for the in-house synthetic and real radiograph pairs using Pearson correlation coefficient (r) and significance testing. Results: Strong correlations were measured between the predicted and CT-derived ground truth TLV values for test data from synthetic ($n_{Public}$=1,191, r=0.987, P < 0.001; $n_{In-house}$=72, r=0.973, P < 0.001) and real radiographs (n=72, r=0.908, P < 0.001). Conclusion: TLV from U-Net-generated pixel-level lung thickness maps were successfully estimated for synthetic and real radiographs.

Updated: 2024-07-31 13:41:24

标题: 基于U-Net的肺厚度图像用于胸部X射线像素级肺容积估计

摘要: 目的：我们旨在利用由U-Net生成的肺厚度图，从真实和合成的前方X射线放射图像上的像素级别估计总肺容积（TLV）。方法：从肺结节分析2016（n=656）和RSNA肺栓塞检测挑战2020（n=5,303）的两个公开可用数据集中检索了5959个胸部X射线计算机断层扫描（CT）扫描。此外，从我们的内部数据集中回顾性选择了72名受试者的胸部CT扫描（33名健康受试者：20名男性，平均年龄[范围]=62.4 [34, 80]；39名患有慢性阻塞性肺病：25名男性，平均年龄[范围]=69.0 [47, 91]），以便为每个受试者提供不超过七天的前胸部X射线放射图像。使用模拟X射线光谱将所有CT扫描及其相应的肺分割进行正向投影，以生成合成放射图和肺厚度图。对公共数据集中的合成放射图进行训练和测试，以预测肺厚度图，从而估计TLV。通过评估内部合成和真实放射图对的TLV估计，进一步评估模型性能，使用Pearson相关系数（r）和显著性检验。结果：在合成测试数据中，预测值与CT衍生的TLV地面真实值之间的相关性很强（$n_{Public}$=1,191，r=0.987，P < 0.001；$n_{In-house}$=72，r=0.973，P < 0.001）和真实放射图（n=72，r=0.908，P < 0.001）。结论：通过U-Net生成的像素级肺厚度图成功地估计了合成和真实放射图上的TLV。

更新时间: 2024-07-31 13:41:24

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2110.12509v5

A comparison between black-, grey- and white-box modeling for the bidirectional Raman amplifier optimization

Designing and optimizing optical amplifiers to maximize system performance is becoming increasingly important as optical communication systems strive to increase throughput. Offline optimization of optical amplifiers relies on models ranging from white-box models deeply rooted in physics to black-box data-driven and physics-agnostic models. Here, we compare the capabilities of white-, grey- and black-box models on the challenging test case of optimizing a bidirectional distributed Raman amplifier to achieve a target frequency-distance signal power profile. We show that any of the studied methods can achieve similar frequency and distance flatness of between 1 and 3.6 dB (depending on the definition of flatness) over the C-band in an 80-km span. Then, we discuss the models' applicability, advantages, and drawbacks based on the target application scenario, in particular in terms of flexibility, optimization speed, and access to training data.

Updated: 2024-07-31 13:41:15

标题: 黑盒、灰盒和白盒建模在双向喇曼放大器优化中的比较

摘要: 设计和优化光放大器以最大化系统性能正在变得越来越重要，因为光通信系统致力于提高吞吐量。离线优化光放大器依赖于从根深蒂固于物理学的白盒模型到黑盒数据驱动和与物理无关的模型的各种模型。在这里，我们比较白盒、灰盒和黑盒模型在优化双向分布式拉曼放大器以实现目标频率-距离信号功率配置文件的挑战性测试用例上的能力。我们展示了所研究方法中的任何一种都可以在80公里的跨度内在C波段上实现1到3.6 dB之间的频率和距离平坦度（取决于平坦度的定义）。然后，我们根据目标应用场景讨论了模型的适用性、优势和缺点，特别是在灵活性、优化速度和访问训练数据方面。

更新时间: 2024-07-31 13:41:15

领域: physics.app-ph,cs.CE,cs.LG,physics.optics

下载: http://arxiv.org/abs/2310.05954v2

Higher order quantum reservoir computing for non-intrusive reduced-order models

Forecasting dynamical systems is of importance to numerous real-world applications. When possible, dynamical systems forecasts are constructed based on first-principles-based models such as through the use of differential equations. When these equations are unknown, non-intrusive techniques must be utilized to build predictive models from data alone. Machine learning (ML) methods have recently been used for such tasks. Moreover, ML methods provide the added advantage of significant reductions in time-to-solution for predictions in contrast with first-principle based models. However, many state-of-the-art ML-based methods for forecasting rely on neural networks, which may be expensive to train and necessitate requirements for large amounts of memory. In this work, we propose a quantum mechanics inspired ML modeling strategy for learning nonlinear dynamical systems that provides data-driven forecasts for complex dynamical systems with reduced training time and memory costs. This approach, denoted the quantum reservoir computing technique (QRC), is a hybrid quantum-classical framework employing an ensemble of interconnected small quantum systems via classical linear feedback connections. By mapping the dynamical state to a suitable quantum representation amenable to unitary operations, QRC is able to predict complex nonlinear dynamical systems in a stable and accurate manner. We demonstrate the efficacy of this framework through benchmark forecasts of the NOAA Optimal Interpolation Sea Surface Temperature dataset and compare the performance of QRC to other ML methods.

Updated: 2024-07-31 13:37:04

标题: 更高阶量子储备计算用于非侵入式降阶模型

摘要: 动力系统的预测对于许多现实世界应用至关重要。在可能的情况下，动力系统的预测是基于基于第一原理的模型构建的，例如通过使用微分方程。当这些方程未知时，必须利用非侵入性技术仅从数据中构建预测模型。机器学习（ML）方法最近已被用于这些任务。此外，与基于第一原理的模型相比，ML方法还提供了显著减少预测时间的额外优势。然而，许多用于预测的最新ML方法依赖于神经网络，这可能昂贵且需要大量内存。在这项工作中，我们提出了一种受量子力学启发的ML建模策略，用于学习非线性动力系统，为复杂动力系统提供基于数据驱动的预测，并减少培训时间和内存成本。这种方法，称为量子储备计算技术（QRC），是一个混合量子-经典框架，通过经典线性反馈连接的一组相互连接的小量子系统。通过将动力状态映射到适合进行幺正操作的量子表示，QRC能够以稳定和准确的方式预测复杂非线性动力系统。我们通过对NOAA最佳插值海表温度数据集的基准预测来展示这一框架的效力，并将QRC的性能与其他ML方法进行比较。

更新时间: 2024-07-31 13:37:04

领域: cs.LG,math.DS,physics.comp-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2407.21602v1

Robust Simultaneous Multislice MRI Reconstruction Using Deep Generative Priors

Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to the complex signal interactions between and within the excited slices. This study presents a robust SMS MRI reconstruction method using deep generative priors. Starting from Gaussian noise, we leverage denoising diffusion probabilistic models (DDPM) to gradually recover the individual slices through reverse diffusion iterations while imposing data consistency from the measured k-space under readout concatenation framework. The posterior sampling procedure is designed such that the DDPM training can be performed on single-slice images without special adjustments for SMS tasks. Additionally, our method integrates a low-frequency enhancement (LFE) module to address a practical issue that SMS-accelerated fast spin echo (FSE) and echo-planar imaging (EPI) sequences cannot easily embed autocalibration signals. Extensive experiments demonstrate that our approach consistently outperforms existing methods and generalizes well to unseen datasets. The code is available at https://github.com/Solor-pikachu/ROGER after the review process.

Updated: 2024-07-31 13:34:14

标题: 使用深度生成先验的鲁棒同时多层切片MRI重建

摘要: Simultaneous multislice (SMS)成像是加速磁共振成像（MRI）采集的一种强大技术。然而，由于激发的切片之间和之内的复杂信号相互作用，SMS重建仍然具有挑战性。本研究提出了一种使用深度生成先验的稳健SMS MRI重建方法。从高斯噪声开始，我们利用去噪扩散概率模型（DDPM）通过反向扩散迭代逐渐恢复单个切片，同时在读取连接框架下施加来自测量k空间的数据一致性。后验采样过程设计得使得DDPM训练可以在单切片图像上进行，而不需要为SMS任务进行特殊调整。此外，我们的方法集成了低频增强（LFE）模块，以解决SMS加速的快速自旋回波（FSE）和回波平面成像（EPI）序列无法轻松嵌入自动校准信号的实际问题。大量实验证明，我们的方法始终优于现有方法，并很好地泛化到未见数据集。代码在审查过程后可在https://github.com/Solor-pikachu/ROGER上获得。

更新时间: 2024-07-31 13:34:14

领域: eess.IV,cs.AI,cs.CV,eess.SP,physics.med-ph

下载: http://arxiv.org/abs/2407.21600v1

Naeural AI OS -- Decentralized ubiquitous computing MLOps execution engine

Over the past few years, ubiquitous, or pervasive computing has gained popularity as the primary approach for a wide range of applications, including enterprise-grade systems, consumer applications, and gaming systems. Ubiquitous computing refers to the integration of computing technologies into everyday objects and environments, creating a network of interconnected devices that can communicate with each other and with humans. By using ubiquitous computing technologies, communities can become more connected and efficient, with members able to communicate and collaborate more easily. This enabled interconnectedness and collaboration can lead to a more successful and sustainable community. The spread of ubiquitous computing, however, has emphasized the importance of automated learning and smart applications in general. Even though there have been significant strides in Artificial Intelligence and Deep Learning, large scale adoption has been hesitant due to mounting pressure on expensive and highly complex cloud numerical-compute infrastructures. Adopting, and even developing, practical machine learning systems can come with prohibitive costs, not only in terms of complex infrastructures but also of solid expertise in Data Science and Machine Learning. In this paper we present an innovative approach for low-code development and deployment of end-to-end AI cooperative application pipelines. We address infrastructure allocation, costs, and secure job distribution in a fully decentralized global cooperative community based on tokenized economics.

Updated: 2024-07-31 13:31:40

标题: 神经人工智能操作系统——分散的无处不在的计算MLOps执行引擎

摘要: 在过去几年中，普及计算或泛在计算已经成为广泛应用的主要方法，包括企业级系统、消费者应用和游戏系统。泛在计算指的是将计算技术整合到日常物品和环境中，创建一个互联设备网络，可以相互交流并与人类交流。通过使用泛在计算技术，社区可以变得更加连接和高效，成员可以更轻松地进行沟通和协作。这种互联和协作可以导致更成功和可持续的社区。然而，泛在计算的传播强调了自动学习和智能应用的重要性。尽管在人工智能和深度学习方面已经取得了重大进展，但由于对昂贵和高度复杂的云数值计算基础设施的压力不断增加，大规模采用一直存在犹豫。采用，甚至开发实用的机器学习系统可能会带来巨大的成本，不仅包括复杂的基础设施，还包括数据科学和机器学习方面的扎实专业知识。在本文中，我们提出了一种创新的低代码开发和部署端到端人工智能合作应用程序流水线的方法。我们解决了基于代币经济的完全去中心化全球合作社区中的基础设施分配、成本和安全作业分配问题。

更新时间: 2024-07-31 13:31:40

领域: cs.AI,cs.DC,cs.NI,I.2.5; I.2.11

下载: http://arxiv.org/abs/2306.08708v3

Measuring What Matters: Intrinsic Distance Preservation as a Robust Metric for Embedding Quality

Unsupervised embeddings are fundamental to numerous machine learning applications, yet their evaluation remains a challenging task. Traditional assessment methods often rely on extrinsic variables, such as performance in downstream tasks, which can introduce confounding factors and mask the true quality of embeddings. This paper introduces the Intrinsic Distance Preservation Evaluation (IDPE) method, a novel approach for assessing embedding quality based on the preservation of Mahalanobis distances between data points in the original and embedded spaces. We demonstrate the limitations of extrinsic evaluation methods through a simple example, highlighting how they can lead to misleading conclusions about embedding quality. IDPE addresses these issues by providing a task-independent measure of how well embeddings preserve the intrinsic structure of the original data. Our method leverages efficient similarity search techniques to make it applicable to large-scale datasets. We compare IDPE with established intrinsic metrics like trustworthiness and continuity, as well as extrinsic metrics such as Average Rank and Mean Reciprocal Rank. Our results show that IDPE offers a more comprehensive and reliable assessment of embedding quality across various scenarios. We evaluate PCA and t-SNE embeddings using IDPE, revealing insights into their performance that are not captured by traditional metrics. This work contributes to the field by providing a robust, efficient, and interpretable method for embedding evaluation. IDPE's focus on intrinsic properties offers a valuable tool for researchers and practitioners seeking to develop and assess high-quality embeddings for diverse machine learning applications.

Updated: 2024-07-31 13:26:09

标题: 衡量重要性：内在距离保持作为嵌入质量的稳健度量的方法

摘要: 无监督嵌入是许多机器学习应用的基础，但它们的评估仍然是一个具有挑战性的任务。传统的评估方法通常依赖于外在变量，比如在下游任务中的表现，这可能引入混淆因素并掩盖嵌入的真实质量。本文介绍了内在距离保持评估（IDPE）方法，这是一种基于原始数据点和嵌入空间之间马氏距离保持的评估嵌入质量的新方法。我们通过一个简单的例子展示了外在评估方法的局限性，突出了它们如何导致关于嵌入质量的误导性结论。IDPE通过提供一个独立于任务的度量来解决这些问题，用于评估嵌入如何保持原始数据的内在结构。我们的方法利用高效的相似性搜索技术，使其适用于大规模数据集。我们将IDPE与已建立的内在指标如可信度和连续性以及外在指标如平均排名和平均互惠排名进行比较。我们的结果显示，IDPE在各种情况下提供了更全面和可靠的嵌入质量评估。我们使用IDPE评估PCA和t-SNE嵌入，揭示了它们的性能的见解，这是传统指标所无法捕捉的。这项工作通过提供一种健壮、高效和可解释的嵌入评估方法，为研究人员和从业者寻求开发和评估各种机器学习应用的高质量嵌入提供了有价值的工具。

更新时间: 2024-07-31 13:26:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21590v1

Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.

Updated: 2024-07-31 13:18:13

标题: 位置：受启发于认知神经科学教训的AI内在可解释性框架

摘要: 内在可解释性是一个有前途的新兴领域，其任务是揭示人工智能系统的内在机制，尽管如何发展这些机械理论仍存在很大争议。此外，最近的批评提出了质疑其对推进人工智能更广泛目标的有用性的问题。然而，人们忽视了这些问题类似于另一个领域中已经处理过的问题：认知神经科学。在这里，我们建立相关联系并突出可以在领域之间进行有益转移的经验教训。基于这些，我们提出了一个通用的概念框架，并提供了在人工智能内在可解释性研究中构建机械解释的具体方法策略。凭借这个概念框架，内在可解释性可以抵御批评，并将自己定位在解释人工智能系统的有生产力的路径上。

更新时间: 2024-07-31 13:18:13

领域: cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.01352v2

Voxel Scene Graph for Intracranial Hemorrhage

Patients with Intracranial Hemorrhage (ICH) face a potentially life-threatening condition, and patient-centered individualized treatment remains challenging due to possible clinical complications. Deep-Learning-based methods can efficiently analyze the routinely acquired head CTs to support the clinical decision-making. The majority of early work focuses on the detection and segmentation of ICH, but do not model the complex relations between ICH and adjacent brain structures. In this work, we design a tailored object detection method for ICH, which we unite with segmentation-grounded Scene Graph Generation (SGG) methods to learn a holistic representation of the clinical cerebral scene. To the best of our knowledge, this is the first application of SGG for 3D voxel images. We evaluate our method on two head-CT datasets and demonstrate that our model can recall up to 74% of clinically relevant relations. This work lays the foundation towards SGG for 3D voxel data. The generated Scene Graphs can already provide insights for the clinician, but are also valuable for all downstream tasks as a compact and interpretable representation.

Updated: 2024-07-31 13:10:59

标题: 颅内出血的体素场景图

摘要: 患有颅内出血（ICH）的患者面临潜在的危及生命的情况，由于可能出现临床并发症，患者中心的个性化治疗仍然具有挑战性。基于深度学习的方法可以高效地分析定期获取的头部CT图像，以支持临床决策。大部分早期研究侧重于ICH的检测和分割，但不会对ICH与相邻脑结构之间的复杂关系进行建模。在这项工作中，我们设计了一种针对ICH的定制对象检测方法，将其与基于分割的场景图生成（SGG）方法结合起来，以学习临床大脑场景的整体表示。据我们所知，这是首个将SGG应用于3D体素图像的例子。我们在两个头部CT数据集上评估了我们的方法，并展示了我们的模型可以回忆高达74%的临床相关关系。这项工作为3D体素数据的SGG奠定了基础。生成的场景图不仅可以为临床医生提供见解，还可以作为简洁且易于解释的表示形式，对所有下游任务都具有价值。

更新时间: 2024-07-31 13:10:59

领域: cs.CV,cs.AI,68T07,I.2.10

下载: http://arxiv.org/abs/2407.21580v1

A Performance Study of LLM-Generated Code on Leetcode

This study evaluates the efficiency of code generation by Large Language Models (LLMs) and measures their performance against human-crafted solutions using a dataset from Leetcode. We compare 18 LLMs, considering factors such as model temperature and success rate, and their impact on code performance. This research introduces a novel method for measuring and comparing the speed of LLM-generated code, revealing that LLMs produce code with comparable performance, irrespective of the adopted LLM. We also find that LLMs are capable of generating code that is, on average, more efficient than the code written by humans. The paper further discusses the use of Leetcode as a benchmarking dataset, the limitations imposed by potential data contamination, and the platform's measurement reliability. We believe that our findings contribute to a better understanding of LLM capabilities in code generation and set the stage for future optimizations in the field.

Updated: 2024-07-31 13:10:03

标题: 《LLM生成的Leetcode代码的性能研究》

摘要: 这项研究评估了大型语言模型（LLMs）生成代码的效率，并使用来自Leetcode的数据集对它们的性能进行了比较，并与人工制定的解决方案进行了比较。我们比较了18个LLMs，考虑了诸如模型温度和成功率等因素，以及它们对代码性能的影响。这项研究引入了一种新方法来衡量和比较LLM生成的代码的速度，揭示LLMs生成的代码具有可比性能，无论采用的LLM如何。我们还发现，LLMs能够生成平均效率比人类编写的代码更高的代码。本文进一步讨论了Leetcode作为基准数据集的使用，潜在数据污染带来的限制，以及平台的测量可靠性。我们相信我们的发现有助于更好地理解LLM在代码生成中的能力，并为未来优化奠定了基础。

更新时间: 2024-07-31 13:10:03

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2407.21579v1

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, will LVLMs be misled and respond incorrectly, even though the ground visual information exists? To answer this, we propose a framework called MMHalSnowball to evaluate LVLMs' behaviors when encountering generated hallucinations, where LVLMs are required to answer specific visual questions within a curated hallucinatory conversation. Crucially, our experiment shows that the performance of open-source LVLMs drops by at least $31\%$, indicating that LVLMs are prone to accept the generated hallucinations and make false claims that they would not have supported without distractions. We term this phenomenon Multimodal Hallucination Snowballing. To mitigate this, we further propose a training-free method called Residual Visual Decoding, where we revise the output distribution of LVLMs with the one derived from the residual visual input, providing models with direct access to the visual information. Experiments show that our method can mitigate more than $24\%$ of the snowballed multimodal hallucination while maintaining capabilities.

Updated: 2024-07-31 13:08:22

标题: 调查和减轻大型视觉语言模型中多模式幻觉的滚雪球效应

摘要: 尽管大型视觉-语言模型（LVLMs）在理解人类语言中的视觉信息方面已经很先进，但仍然存在多模态幻觉问题。一个自然的担忧是，在多模态交互过程中，生成的幻觉可能会影响LVLMs的后续生成。因此，我们提出了一个问题：当LVLMs被呈现与先前生成的幻觉相关的查询时，它们会被误导并做出错误回应吗，即使基本的视觉信息是存在的？为了回答这个问题，我们提出了一个名为MMHalSnowball的框架，用于评估LVLMs在遇到生成的幻觉时的行为，其中LVLMs被要求在精心策划的幻觉对话中回答特定的视觉问题。关键是，我们的实验表明，开源LVLMs的性能下降了至少31％，表明LVLMs容易接受生成的幻觉，并做出在没有干扰的情况下不会支持的虚假主张。我们将这种现象称为多模态幻觉滚雪球效应。为了缓解这一问题，我们进一步提出了一种无需训练的方法，称为剩余视觉解码，通过将LVLMs的输出分布与从剩余视觉输入中导出的分布进行修订，为模型提供直接访问视觉信息的能力。实验表明，我们的方法可以减轻超过24％的滚雪球效应多模态幻觉，同时保持模型的能力。

更新时间: 2024-07-31 13:08:22

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.00569v3

Neural Retrievers are Biased Towards LLM-Generated Content

Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search, by generating vast amounts of human-like texts on the Internet. As a result, IR systems in the LLM era are facing a new challenge: the indexed documents are now not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher. We refer to this category of biases in neural retrievers towards the LLM-generated content as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, in-depth analyses from the perspective of text compression indicate that LLM-generated texts exhibit more focused semantics with less noise, making it easier for neural retrieval models to semantic match. To mitigate the source bias, we also propose a plug-and-play debiased constraint for the optimization objective, and experimental results show its effectiveness. Finally, we discuss the potential severe concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks are available at https://github.com/KID-22/Source-Bias.

Updated: 2024-07-31 13:08:08

标题: 神经检索器偏向LLM生成的内容

摘要: 最近，大型语言模型（LLMs）的出现彻底改变了信息检索（IR）应用的范式，特别是在网络搜索中，通过在互联网上生成大量类似人类文本。因此，在LLM时代，IR系统面临一个新挑战：索引文档不仅由人类编写，还由LLMs自动生成。LLM生成的文档如何影响IR系统是一个紧迫但尚未探讨的问题。在这项工作中，我们在涉及人类撰写和LLM生成文本的情景下对IR模型进行定量评估。令人惊讶的是，我们的研究结果表明，神经检索模型倾向于将LLM生成的文档排名较高。我们将这种神经检索器对LLM生成内容的偏见称为“来源偏见”。此外，我们发现这种偏见不仅限于第一阶段的神经检索器，还延伸到第二阶段的神经重新排序器。然后，从文本压缩的角度进行深入分析表明，LLM生成的文本具有更加聚焦的语义和更少的噪音，使神经检索模型更容易进行语义匹配。为了减轻来源偏见，我们还提出了一个插拔式的去偏约束优化目标，并实验结果表明其有效性。最后，我们讨论了观察到的来源偏见可能引起的严重关切，并希望我们的研究结果能够成为IR社区及其他领域的一次重要警示。为了促进在LLM时代进行IR的未来探索，我们构建了两个新的基准，可在https://github.com/KID-22/Source-Bias 上获取。

更新时间: 2024-07-31 13:08:08

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.20501v3

Multi-Site Class-Incremental Learning with Weighted Experts in Echocardiography

Building an echocardiography view classifier that maintains performance in real-life cases requires diverse multi-site data, and frequent updates with newly available data to mitigate model drift. Simply fine-tuning on new datasets results in "catastrophic forgetting", and cannot adapt to variations of view labels between sites. Alternatively, collecting all data on a single server and re-training may not be feasible as data sharing agreements may restrict image transfer, or datasets may only become available at different times. Furthermore, time and cost associated with re-training grows with every new dataset. We propose a class-incremental learning method which learns an expert network for each dataset, and combines all expert networks with a score fusion model. The influence of ``unqualified experts'' is minimised by weighting each contribution with a learnt in-distribution score. These weights promote transparency as the contribution of each expert is known during inference. Instead of using the original images, we use learned features from each dataset, which are easier to share and raise fewer licensing and privacy concerns. We validate our work on six datasets from multiple sites, demonstrating significant reductions in training time while improving view classification performance.

Updated: 2024-07-31 13:05:32

标题: 超声心动图中带权重专家的多站点类增量学习

摘要: 构建一个在真实案例中保持性能的超声心动图视图分类器需要多样化的多站点数据，并且需要经常更新新的可用数据以减轻模型漂移。简单地在新数据集上进行微调会导致“灾难性遗忘”，并且无法适应不同站点之间视图标签的变化。另外，将所有数据收集到单个服务器并重新训练可能并非可行，因为数据共享协议可能限制图像传输，或者数据集可能只在不同时间才能获得。此外，随着每个新数据集的增加，重新训练所需的时间和成本也会增加。我们提出了一种类增量学习方法，该方法为每个数据集学习一个专家网络，并使用得分融合模型将所有专家网络组合起来。通过使用学习到的分布得分对每个贡献进行加权，可以最小化“不合格专家”的影响。这些权重促进了透明性，因为在推理过程中可以知道每个专家的贡献。我们不使用原始图像，而是使用每个数据集中学习到的特征，这样更容易共享并减少了许可和隐私问题。我们在来自多个站点的六个数据集上验证了我们的工作，展示了训练时间的显著减少以及视图分类性能的提高。

更新时间: 2024-07-31 13:05:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21577v1

XMeCap: Meme Caption Generation with Sub-Image Adaptability

Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. While advances have been made in natural language processing, real-world humor often thrives in a multi-modal context, encapsulated distinctively by memes. This paper poses a particular emphasis on the impact of multi-images on meme captioning. After that, we introduce the \textsc{XMeCap} framework, a novel approach that adopts supervised fine-tuning and reinforcement learning based on an innovative reward model, which factors in both global and local similarities between visuals and text. Our results, benchmarked against contemporary models, manifest a marked improvement in caption generation for both single-image and multi-image memes, as well as different meme categories. \textsc{XMeCap} achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71\% and 4.82\%, respectively. This research not only establishes a new frontier in meme-related studies but also underscores the potential of machines in understanding and generating humor in a multi-modal setting.

Updated: 2024-07-31 12:56:22

标题: XMeCap：具有子图适应性的模因标题生成

摘要: 幽默，根植于社会意义和文化细节，对机器提出了独特挑战。虽然在自然语言处理方面取得了进展，但现实世界中的幽默往往在多模态环境中蓬勃发展，主要通过表情包来独特地呈现。本文特别强调多图像对表情包标题的影响。在此之后，我们介绍了XMeCap框架，这是一种采用监督微调和基于创新奖励模型的强化学习的新方法，考虑了视觉和文本之间的全局和局部相似性。我们的结果，与当代模型进行了基准测试，表明在单图像和多图像表情包的标题生成方面取得了显著改进，以及不同表情包类别。XMeCap框架分别为单图像表情包和多图像表情包实现了平均评分分别为75.85和66.32，分别优于最佳基线3.71%和4.82%。这项研究不仅在与表情包相关的研究中确立了新的前沿，还强调了机器在多模态环境中理解和生成幽默的潜力。

更新时间: 2024-07-31 12:56:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17152v2

PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning

Large Language Models (LLMs) encounter significant challenges in continual learning due to catastrophic forgetting, where new information overwrites previously acquired knowledge. This limitation leads to substantial environmental and economic waste. In this study, we introduce the PMoE, Progressive Mixture of Experts with Asymmetric Transformer, which aims to minimize forgetting by utilizing an asymmetric design with shallow layers dedicated to general knowledge and deep layers for new knowledge. PMoE incorporates progressively added experts in deep layers and a router that allocates new knowledge to the appropriate experts efficiently. The router, positioned adjacent to the deep layers, utilizes deep features aggregating consolidated information. This enables the router to perform efficiently, allocating new knowledge to the appropriate experts, which progressively increase in the deep layers. Extensive experiments on TRACE datasets and general language understanding datasets demonstrate that the proposed PMoE outperforms previous state-of-the-art approaches.

Updated: 2024-07-31 12:56:14

标题: PMoE：用不对称变换器实现渐进式专家混合以实现持续学习

摘要: 大型语言模型（LLMs）在持续学习中遇到了显著的挑战，这是由于灾难性遗忘，新信息会覆盖先前获得的知识。这种限制导致了环境和经济资源的大量浪费。在这项研究中，我们介绍了PMoE，即Progressive Mixture of Experts with Asymmetric Transformer，旨在通过利用具有浅层用于一般知识和深层用于新知识的不对称设计来最小化遗忘。PMoE在深层逐渐添加专家和一个路由器，可有效地将新知识分配给适当的专家。这个路由器位于深层旁边，利用深层特征聚合整合信息。这使得路由器能够高效地执行，将新知识分配给逐渐增加的深层专家。对TRACE数据集和一般语言理解数据集进行的大量实验表明，所提出的PMoE优于先前的最先进方法。

更新时间: 2024-07-31 12:56:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21571v1

Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

Accurately predicting watch time is crucial for optimizing recommendations and user experience in short video platforms. However, existing methods that estimate a single average watch time often fail to capture the inherent uncertainty and diversity in user engagement patterns. In this paper, we propose the Conditional Quantile Estimation (CQE) framework to model the entire conditional distribution of watch time. Using quantile regression, CQE characterizes the complex watch-time distribution for each user-video pair, providing a flexible and comprehensive approach to understanding user behavior. We further design multiple strategies to combine the quantile estimates, adapting to different recommendation scenarios and user preferences. Extensive offline experiments and online A/B tests demonstrate the superiority of CQE in watch time prediction and user engagement modeling. In particular, the online deployment of CQE in KuaiShow has led to significant improvements in key evaluation metrics, including active days, active users, engagement duration, and video view counts. These results highlight the practical impact of our proposed approach in enhancing the user experience and overall performance of the short video recommendation system. The code will be released after publication.

Updated: 2024-07-31 12:49:17

标题: 短视频推荐中不确定观看时间的条件分位数估计

摘要: 精确预测观看时间对于优化短视频平台推荐和用户体验至关重要。然而，现有的方法往往无法捕捉用户参与模式中的固有不确定性和多样性，仅估计单一平均观看时间。本文提出了条件分位数估计（CQE）框架，以建模整个观看时间的条件分布。利用分位数回归，CQE为每个用户-视频对特征化了复杂的观看时间分布，提供了一种灵活全面的方法来理解用户行为。我们进一步设计了多种策略来结合分位数估计，适应不同的推荐场景和用户偏好。大量离线实验和在线A/B测试表明CQE在观看时间预测和用户参与建模方面的优越性。特别是，在快秀中在线部署CQE已经显著改善了关键评估指标，包括活跃天数、活跃用户、参与时长和视频观看次数。这些结果突显了我们提出的方法在增强用户体验和短视频推荐系统整体性能方面的实际影响。代码将在发表后发布。

更新时间: 2024-07-31 12:49:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.12223v3

TRGR: Transmissive RIS-aided Gait Recognition Through Walls

Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR.

Updated: 2024-07-31 12:42:25

标题: TRGR：透射式RIS辅助的透过墙壁步态识别

摘要: 使用射频（RF）信号进行步态识别能够实现许多需要准确识别的潜在应用。然而，当前系统要求个体必须在视线范围内，并且在信号穿过混凝土和厚墙时会遇到低信噪比（SNR）的问题。为了解决这些挑战，我们提出了TRGR，一种新颖的透射可重构智能表面（RIS）辅助步态识别系统。TRGR可以通过一对收发器的信道状态信息（CSI）的幅度测量来识别人的身份，穿过墙壁。具体地，通过利用透射RIS以及一种配置交替优化算法，TRGR增强了穿墙能力和信号质量，实现了准确的步态识别。此外，提出了一个残差卷积网络（RCNN）作为主干网络，学习稳健的人类信息。实验结果证实了透射RIS的有效性，突显了透射RIS在增强基于RF的步态识别系统方面的巨大潜力。大量实验结果显示，当信号穿过混凝土墙壁时，TRGR在识别人员方面达到了97.88％的平均准确率，证明了TRGR的有效性和稳健性。

更新时间: 2024-07-31 12:42:25

领域: cs.AI

下载: http://arxiv.org/abs/2407.21566v1

Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection

Deep reinforcement learning (DRL) has found application in numerous use-cases pertaining to flow control. Multi-agent RL (MARL), a variant of DRL, has shown to be more effective than single-agent RL in controlling flows exhibiting locality and translational invariance. We present, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-B\'enard convection (RBC). Control is executed by modifying the temperature distribution along the bottom wall divided into multiple control segments, each of which acts as an independent agent. Two regimes of RBC are considered at Rayleigh numbers $\mathrm{Ra}=500$ and $750$. Evaluation of the learned control policy reveals a reduction in convection intensity by $23.5\%$ and $8.7\%$ at $\mathrm{Ra}=500$ and $750$, respectively. The MARL controller converts irregularly shaped convective patterns to regular straight rolls with lower convection that resemble flow in a relatively more stable regime. We draw comparisons with proportional control at both $\mathrm{Ra}$ and show that MARL is able to outperform the proportional controller. The learned control strategy is complex, featuring different non-linear segment-wise actuator delays and actuation magnitudes. We also perform successful evaluations on a larger domain than used for training, demonstrating that the invariant property of MARL allows direct transfer of the learnt policy.

Updated: 2024-07-31 12:41:20

标题: 多智能体强化学习用于控制三维瑞利-贝纳德对流

摘要: 深度强化学习（DRL）已在许多与流控制相关的用例中找到应用。多智体强化学习（MARL）作为DRL的一种变体，在控制显示局部性和平移不变性的流中表现出比单智体强化学习更有效的效果。我们首次提出了基于MARL的三维雷利-贝纳德对流（RBC）控制实现。通过修改沿底壁分为多个控制段的温度分布来执行控制，每个控制段充当独立的智能体。考虑了雷利数为$Ra=500$和$750$的两种RBC制度。评估学习的控制策略显示，在$Ra=500$和$750$时，对流强度分别降低了$23.5\%$和$8.7\%$。MARL控制器将不规则形状的对流模式转变为具有较低对流的规则直线卷曲，类似于相对更稳定制度中的流动。我们在$Ra$值上与比例控制进行比较，并显示MARL能够胜过比例控制器。学习的控制策略复杂，具有不同的非线性分段执行器延迟和执行幅度。我们还在比训练使用的更大领域上进行了成功评估，表明MARL的不变性属性允许直接转移学到的策略。

更新时间: 2024-07-31 12:41:20

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2407.21565v1

SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Advanced text-to-image models such as DALL-E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.

Updated: 2024-07-31 12:38:38

标题: 替代提示：通过替换绕过文本到图像模型的安全过滤器

摘要: 先进的文本到图像模型，如DALL-E 2和Midjourney，具有生成高度逼真图像的能力，引发了对不安全内容潜在传播的重大关注。这包括成人、暴力或政治人物欺骗性图像。尽管这些模型声称实施了严格的安全机制来限制生成不适宜工作场所（NSFW）内容，但我们成功地设计并展示了对Midjourney的首次提示攻击，导致生产大量逼真的NSFW图像。我们揭示了这种提示攻击的基本原则，并建议在可疑提示中策略性地替换高风险部分，以规避封闭源安全措施。我们的新框架SurrogatePrompt系统地生成攻击提示，利用大型语言模型、图像到文本和图像到图像模块来自动化规模化攻击提示的创建。评估结果显示，在我们的攻击提示下，成功绕过Midjourney的专有安全过滤器的成功率达到88％，导致生成描绘政治人物在暴力场景中的伪造图像。主观和客观评估验证了从我们的攻击提示生成的图像存在相当大的安全风险。

更新时间: 2024-07-31 12:38:38

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2309.14122v2

Enhancing Interpretability of Vertebrae Fracture Grading using Human-interpretable Prototypes

Vertebral fracture grading classifies the severity of vertebral fractures, which is a challenging task in medical imaging and has recently attracted Deep Learning (DL) models. Only a few works attempted to make such models human-interpretable despite the need for transparency and trustworthiness in critical use cases like DL-assisted medical diagnosis. Moreover, such models either rely on post-hoc methods or additional annotations. In this work, we propose a novel interpretable-by-design method, ProtoVerse, to find relevant sub-parts of vertebral fractures (prototypes) that reliably explain the model's decision in a human-understandable way. Specifically, we introduce a novel diversity-promoting loss to mitigate prototype repetitions in small datasets with intricate semantics. We have experimented with the VerSe'19 dataset and outperformed the existing prototype-based method. Further, our model provides superior interpretability against the post-hoc method. Importantly, expert radiologists validated the visual interpretability of our results, showing clinical applicability.

Updated: 2024-07-31 12:34:39

标题: 使用人类可解释的原型提高椎体骨折分级的可解释性

摘要: 椎体骨折分级对椎体骨折的严重程度进行分类，这是医学成像中的一项具有挑战性的任务，最近吸引了深度学习（DL）模型。尽管在关键用例如DL辅助医学诊断中需要透明度和可信度，但只有少数作品尝试使这些模型具有人类可解释性。此外，这些模型要么依赖事后方法，要么依赖额外的注释。在这项工作中，我们提出了一种新颖的可解释性设计方法ProtoVerse，用于找到能够以人类可理解方式可靠解释模型决策的相关椎体骨折子部分（原型）。具体来说，我们引入了一种新颖的促进多样性的损失来减轻在具有复杂语义的小数据集中的原型重复现象。我们已经在VerSe'19数据集上进行了实验，并在现有基于原型的方法上表现出色。此外，我们的模型在与事后方法相比提供了更优秀的可解释性。重要的是，专家放射科医生验证了我们结果的视觉可解释性，显示了临床应用价值。

更新时间: 2024-07-31 12:34:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02830v2

Generative Sentiment Analysis via Latent Category Distribution and Constrained Decoding

Fine-grained sentiment analysis involves extracting and organizing sentiment elements from textual data. However, existing approaches often overlook issues of category semantic inclusion and overlap, as well as inherent structural patterns within the target sequence. This study introduces a generative sentiment analysis model. To address the challenges related to category semantic inclusion and overlap, a latent category distribution variable is introduced. By reconstructing the input of a variational autoencoder, the model learns the intensity of the relationship between categories and text, thereby improving sequence generation. Additionally, a trie data structure and constrained decoding strategy are utilized to exploit structural patterns, which in turn reduces the search space and regularizes the generation process. Experimental results on the Restaurant-ACOS and Laptop-ACOS datasets demonstrate a significant performance improvement compared to baseline models. Ablation experiments further confirm the effectiveness of latent category distribution and constrained decoding strategy.

Updated: 2024-07-31 12:29:17

标题: 通过潜在类别分布和受限解码实现生成式情感分析

摘要: 细粒度情感分析涉及从文本数据中提取和组织情感元素。然而，现有方法往往忽视类别语义包含和重叠问题，以及目标序列内在结构模式。本研究引入了一种生成式情感分析模型。为了解决与类别语义包含和重叠相关的挑战，引入了一个潜在的类别分布变量。通过重构变分自动编码器的输入，模型学习了类别与文本之间的关系强度，从而提高了序列生成质量。此外，利用前缀树数据结构和约束解码策略来利用结构模式，从而减少搜索空间并规范生成过程。对餐厅-ACOS和笔记本电脑-ACOS数据集的实验结果表明，与基准模型相比，性能显著提高。消融实验进一步证实了潜在类别分布和约束解码策略的有效性。

更新时间: 2024-07-31 12:29:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21560v1

Self-Sovereign Identity for Consented and Content-Based Access to Medical Records using Blockchain

Electronic Health Records (EHRs) and Medical Data are classified as personal data in every privacy law, meaning that any related service that includes processing such data must come with full security, confidentiality, privacy and accountability. Solutions for health data management, as in storing it, sharing and processing it, are emerging quickly and were significantly boosted by the Covid-19 pandemic that created a need to move things online. EHRs makes a crucial part of digital identity data, and the same digital identity trends -- as in self sovereign identity powered by decentralized ledger technologies like Blockchain, are being researched or implemented in contexts managing digital interactions between health facilities, patients and health professionals. In this paper, we propose a blockchain-based solution enabling secure exchange of EHRs between different parties powered by a self-sovereign identity (SSI) wallet and decentralized identifiers. We also make use of a consortium IPFS network for off-chain storage and attribute-based encryption (ABE) to ensure data confidentiality and integrity. Through our solution, we grant users full control over their medical data, and enable them to securely share it in total confidentiality over secure communication channels between user wallets using encryption. We also use DIDs for better user privacy and limit any possible correlations or identification by using pairwise DIDs. Overall, combining this set of technologies guarantees secure exchange of EHRs, secure storage and management along with by-design features inherited from the technological stack.

Updated: 2024-07-31 12:27:31

标题: 使用区块链实现医疗记录的自主身份验证和基于内容的访问

摘要: 电子健康记录（EHRs）和医疗数据在每一项隐私法中都被归类为个人数据，这意味着任何涉及处理此类数据的相关服务必须具备完整的安全性、保密性、隐私性和问责性。解决健康数据管理问题的方案，如存储、共享和处理数据，正在迅速涌现，并且受到新冠疫情的显著推动，这促使了将事务转移到在线平台的需求。EHRs构成数字身份数据的一个关键部分，而同样的数字身份趋势--例如由区块链等分布式账本技术驱动的自主身份--正在被研究或应用于管理医疗机构、患者和医疗专业人员之间的数字互动。在本文中，我们提出了一种基于区块链的解决方案，通过自主身份（SSI）钱包和分散式标识符，实现了不同方之间EHRs的安全交换。我们还利用联合IPFS网络进行离线存储，并利用基于属性的加密（ABE）来确保数据的机密性和完整性。通过我们的解决方案，我们赋予用户对其医疗数据的完全控制，并使他们能够通过加密在用户钱包之间的安全通信渠道中保密地共享数据。我们还利用DIDs以提高用户隐私性，并通过使用成对DIDs限制任何可能的相关性或识别。总的来说，将这一系列技术结合起来可以保证EHRs的安全交换、安全存储和管理，以及从技术堆栈继承的设计功能。

更新时间: 2024-07-31 12:27:31

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2407.21559v1

Operator-based semantics for choice programs: is choosing losing? (full version)

Choice constructs are an important part of the language of logic programming, yet the study of their semantics has been a challenging task. So far, only two-valued semantics have been studied, and the different proposals for such semantics have not been compared in a principled way. In this paper, an operator-based framework allow for the definition and comparison of different semantics in a principled way is proposed.

Updated: 2024-07-31 12:25:57

标题: 基于操作符的选择程序语义：选择是否意味着失去？（完整版本）

摘要: 选择结构是逻辑编程语言的重要组成部分，然而它们的语义研究一直是一项具有挑战性的任务。到目前为止，只有两值语义得到了研究，而对于这种语义的不同提议并没有以一种有原则的方式进行比较。本文提出了一个基于运算符的框架，允许以一种有原则的方式定义和比较不同的语义。

更新时间: 2024-07-31 12:25:57

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2407.21556v1

Learning to Plan for Language Modeling from Unlabeled Data

By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. Given the textual context, this planning module learns to predict future abstract writing actions, which correspond to centroids in a clustered text embedding space. By conditioning on these actions, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.

Updated: 2024-07-31 12:25:14

标题: 学习从无标签数据中为语言建模制定计划

摘要: 通过训练来预测未标记语料库中的下一个令牌，大型语言模型学会执行许多任务，而无需任何标记数据。然而，它们的下一个令牌预测目标可能限制它们在需要规划的情景中的表现，比如写一篇连贯的文章。在这篇论文中，我们通过自监督学习目标训练了一个用于规划未来写作过程的模块。给定文本上下文，这个规划模块学会预测未来的抽象写作动作，这些动作对应于聚类文本嵌入空间中的中心点。通过根据这些动作进行条件设置，我们的模型以无监督的方式将成功的语言模型公式扩展到更抽象的规划中。实验证明，我们的方法在一般情况下提高了语言建模性能，特别是在文本结构方面。由于我们的框架使用了一个无监督的、外部于语言模型的规划模块，新的规划模块可以在大规模上进行训练，并且可以轻松地与社区分享。

更新时间: 2024-07-31 12:25:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00614v2

CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment

This paper presents the Customer Experience (CX) Simulator, a novel framework designed to assess the effects of untested web-marketing campaigns through user behavior simulations. The proposed framework leverages large language models (LLMs) to represent various events in a user's behavioral history, such as viewing an item, applying a coupon, or purchasing an item, as semantic embedding vectors. We train a model to predict transitions between events from their LLM embeddings, which can even generalize to unseen events by learning from diverse training data. In web-marketing applications, we leverage this transition prediction model to simulate how users might react differently when new campaigns or products are presented to them. This allows us to eliminate the need for costly online testing and enhance the marketers' abilities to reveal insights. Our numerical evaluation and user study, utilizing BigQuery Public Datasets from the Google Merchandise Store, demonstrate the effectiveness of our framework.

Updated: 2024-07-31 12:22:40

标题: CX模拟器：使用LLM嵌入进行网络营销活动评估的用户行为模拟

摘要: 本文介绍了客户体验（CX）模拟器，这是一个旨在通过用户行为模拟来评估未经测试的网络营销活动效果的新框架。所提出的框架利用大型语言模型（LLMs）来表示用户行为历史中的各种事件，如查看商品、应用优惠券或购买商品，作为语义嵌入向量。我们训练一个模型来预测从它们的LLM嵌入之间的事件转换，甚至可以通过从多样化的训练数据中学习来推广到未见事件。在网络营销应用中，我们利用这种转换预测模型来模拟用户在新活动或产品呈现给他们时可能不同的反应。这使我们能够消除昂贵的在线测试的需要，并增强市场营销人员揭示见解的能力。我们的数字评估和用户研究，利用谷歌商品商店的BigQuery公共数据集，展示了我们框架的有效性。

更新时间: 2024-07-31 12:22:40

领域: cs.LG,cs.SY,eess.SY,I.6.3; H.5.2

下载: http://arxiv.org/abs/2407.21553v1

First Analysis of the EU Artifical Intelligence Act: Towards a Global Standard for Trustworthy AI?

The EU Artificial Intelligence Act (AI Act) came into force in the European Union (EU) on 1 August 2024. It is a key piece of legislation both for the citizens at the heart of AI technologies and for the industry active in the internal market. The AI Act imposes progressive compliance on organisations - both private and public - involved in the global value chain of AI systems and models marketed and used in the EU. While the Act is unprecedented on an international scale in terms of its horizontal and binding regulatory scope, its global appeal in support of trustworthy AI is one of its major challenges.

Updated: 2024-07-31 12:16:03

标题: 《欧盟人工智能法案首次分析：走向可信人工智能的全球标准？》

摘要: 欧盟人工智能法案（AI Act）于2024年8月1日在欧盟生效。这是一项对AI技术核心的公民和活跃在内部市场的行业都至关重要的立法。AI法案对涉及在全球价值链中市场和使用在欧盟的AI系统和模型的组织（包括私营和公共组织）强制实施逐步合规要求。虽然该法案在国际范围内在水平和约束性监管范围方面是前所未有的，但其在支持可信赖AI方面的全球吸引力是其主要挑战之一。

更新时间: 2024-07-31 12:16:03

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2408.08318v1

Manifold learning in Wasserstein space

This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $\mathrm{W}$. We begin by introducing a construction of submanifolds $\Lambda$ of probability measures equipped with metric $\mathrm{W}_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,\mathrm{W}_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $\mathrm{W}$ only. In particular, we show that the metric space $(\Lambda,\mathrm{W}_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.

Updated: 2024-07-31 12:10:56

标题: Wasserstein空间中的多样学习

摘要: 本文旨在建立流形学习算法在绝对连续概率测度空间中的理论基础，该空间是$\mathbb{R}^d$中紧凸子集，并用Wasserstein-2距离$\mathrm{W}$进行度量。我们首先介绍了概率测度子流形$\Lambda$的构造，配备了度量$\mathrm{W}_\Lambda$，这是$W$在$\Lambda$上的测地约束。与其他构造不同，这些子流形不一定是平坦的，但仍允许以类似于$\mathbb{R}^d$的黎曼子流形的方式进行局部线性化。然后我们展示了如何从$\Lambda$的样本$\{\lambda_i\}_{i=1}^N$和仅有外部Wasserstein距离$\mathrm{W}$学习$(\Lambda,\mathrm{W}_{\Lambda})$的潜在流形结构。特别地，我们展示了度量空间$(\Lambda,\mathrm{W}_{\Lambda})$可以在Gromov-Wasserstein的意义下从具有节点$\{\lambda_i\}_{i=1}^N$和边权重$W(\lambda_i,\lambda_j)$的图中渐近地恢复。此外，我们展示了如何通过对一个适当的“协方差算子”进行谱分析，使用从$\lambda$到足够接近和多样化的样本$\{\lambda_i\}_{i=1}^N$的最优传输映射来渐近地恢复样本$\lambda$处的切空间。最后，本文通过一些具体的子流形$\Lambda$构造和通过谱分析恢复切空间的数值示例来结束。

更新时间: 2024-07-31 12:10:56

领域: stat.ML,cs.LG,math.DG,49Q22, 41A65, 58B20, 53Z50

下载: http://arxiv.org/abs/2311.08549v2

Black box meta-learning intrinsic rewards for sparse-reward environments

Despite the successes and progress of deep reinforcement learning over the last decade, several challenges remain that hinder its broader application. Some fundamental aspects to improve include data efficiency, generalization capability, and ability to learn in sparse-reward environments, which often require human-designed dense rewards. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. The focus is on meta-learning intrinsic rewards under a framework that doesn't rely on the use of meta-gradients. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations, and with only sparse rewards accessible for the evaluation tasks.

Updated: 2024-07-31 12:09:33

标题: 黑匣子元学习稀疏奖励环境的内在奖励

摘要: 尽管深度强化学习在过去十年取得了成功和进展，但仍然存在一些挑战，阻碍了其更广泛的应用。一些需要改进的基本方面包括数据效率、泛化能力和在稀疏奖励环境中学习的能力，这些往往需要人为设计的丰富奖励。元学习已经成为解决这些问题的一种有希望的方法，通过优化学习算法的组件以满足所需特性。此外，另一条研究线已广泛研究了使用内在奖励来增强算法的探索能力。本文探讨了元学习如何改进强化学习代理接收到的训练信号。重点是在不依赖元梯度使用的框架下元学习内在奖励。我们分析并比较了这种方法与外部奖励和元学习优势函数的使用。开发的算法在连续控制任务的分布上进行评估，包括参数化和非参数化变体，并且只有对评估任务可获得的稀疏奖励。

更新时间: 2024-07-31 12:09:33

领域: cs.LG

下载: http://arxiv.org/abs/2407.21546v1

Can LLMs Understand Computer Networks? Towards a Virtual System Administrator

Recent advancements in Artificial Intelligence, and particularly Large Language Models (LLMs), offer promising prospects for aiding system administrators in managing the complexity of modern networks. However, despite this potential, a significant gap exists in the literature regarding the extent to which LLMs can understand computer networks. Without empirical evidence, system administrators might rely on these models without assurance of their efficacy in performing network-related tasks accurately. In this paper, we are the first to conduct an exhaustive study on LLMs' comprehension of computer networks. We formulate several research questions to determine whether LLMs can provide correct answers when supplied with a network topology and questions on it. To assess them, we developed a thorough framework for evaluating LLMs' capabilities in various network-related tasks. We evaluate our framework on multiple computer networks employing proprietary (e.g., GPT4) and open-source (e.g., Llama2) models. Our findings in general purpose LLMs using a zero-shot scenario demonstrate promising results, with the best model achieving an average accuracy of 79.3%. Proprietary LLMs achieve noteworthy results in small and medium networks, while challenges persist in comprehending complex network topologies, particularly for open-source models. Moreover, we provide insight into how prompt engineering can enhance the accuracy of some tasks.

Updated: 2024-07-31 12:02:56

标题: 硕士生能理解计算机网络吗？走向虚拟系统管理员

摘要: 最近人工智能的进展，特别是大型语言模型(LLMs)，为系统管理员在管理现代网络复杂性方面提供了有希望的前景。然而，尽管存在这种潜力，文献中存在一个重要差距，即LLMs能够理解计算机网络的程度。在没有经验证据的情况下，系统管理员可能会依赖这些模型，但不能确保它们在执行与网络相关的任务时的有效性。在本文中，我们是第一个对LLMs对计算机网络的理解进行全面研究的人。我们制定了几个研究问题，以确定LLMs在提供网络拓扑和相关问题时是否能够提供正确答案。为了评估它们，我们开发了一个全面的框架，用于评估LLMs在各种与网络相关任务中的能力。我们在多个计算机网络上评估我们的框架，使用专有模型(例如GPT4)和开源模型(例如Llama2)。我们在零-shot情景下对通用LLMs进行的研究显示出有希望的结果，最佳模型的平均准确率达到79.3%。专有LLMs在小型和中型网络中取得了显著成果，但在理解复杂网络拓扑方面仍存在挑战，尤其是对于开源模型。此外，我们提供了如何通过及时工程来提高某些任务准确性的见解。

更新时间: 2024-07-31 12:02:56

领域: cs.NI,cs.AI,cs.ET,C.2.1; C.2.5; I.2.1

下载: http://arxiv.org/abs/2404.12689v2

Enhanced Fault Detection and Cause Identification Using Integrated Attention Mechanism

This study introduces a novel methodology for fault detection and cause identification within the Tennessee Eastman Process (TEP) by integrating a Bidirectional Long Short-Term Memory (BiLSTM) neural network with an Integrated Attention Mechanism (IAM). The IAM combines the strengths of scaled dot product attention, residual attention, and dynamic attention to capture intricate patterns and dependencies crucial for TEP fault detection. Initially, the attention mechanism extracts important features from the input data, enhancing the model's interpretability and relevance. The BiLSTM network processes these features bidirectionally to capture long-range dependencies, and the IAM further refines the output, leading to improved fault detection results. Simulation results demonstrate the efficacy of this approach, showcasing superior performance in accuracy, false alarm rate, and misclassification rate compared to existing methods. This methodology provides a robust and interpretable solution for fault detection and diagnosis in the TEP, highlighting its potential for industrial applications.

Updated: 2024-07-31 12:01:57

标题: 使用集成的注意机制增强故障检测和原因识别

摘要: 这项研究介绍了一种新颖的故障检测和原因识别方法，该方法在田纳西伊斯曼过程（TEP）中集成了双向长短期记忆（BiLSTM）神经网络和集成注意力机制（IAM）。IAM结合了缩放点积注意力、残差注意力和动态注意力的优势，捕捉对TEP故障检测至关重要的复杂模式和依赖关系。最初，注意力机制从输入数据中提取重要特征，增强了模型的可解释性和相关性。BiLSTM网络双向处理这些特征，以捕捉长距离依赖关系，IAM进一步优化输出，从而提高了故障检测结果。模拟结果证明了这种方法的有效性，与现有方法相比，在准确性、误报率和误分类率方面表现出卓越的性能。这种方法为TEP中的故障检测和诊断提供了稳健且可解释的解决方案，突显了其在工业应用中的潜力。

更新时间: 2024-07-31 12:01:57

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2408.00033v1

Recording First-person Experiences to Build a New Type of Foundation Model

Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of Internet data. However, it is becoming apparent that this data is likely to be exhausted soon, and technology companies are looking for new sources of data to train the next generation of foundation models. Reinforcement learning, RAG, prompt engineering and cognitive modelling are often used to fine-tune and augment the behaviour of foundation models. These techniques have been used to replicate people, such as Caryn Marjorie. These chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, a surface-level approximation to the characters they are imitating. To address these issues, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their skin conductance (GSR), facial expression and brain state (14 channel EEG). AI algorithms are used to process this data into a rich picture of the environment and internal states of the subject. Foundation models trained on this data could replicate human behaviour much more accurately than the personality models that have been developed so far. This type of model has many potential applications, including recommendation, personal assistance, GAN systems, dating and recruitment. This paper gives some background to this work and describes the recording rig and preliminary tests of its functionality. It then suggests how a new type of foundation model could be created from the data captured by the rig and outlines some applications. Data gathering and model training are expensive, so we are currently working on the launch of a start-up that could raise funds for the next stage of the project.

Updated: 2024-07-31 11:51:26

标题: 记录第一人称经验以构建一种新型基础模型

摘要: 基础模型近年来产生了巨大影响，在当前人工智能繁荣中已经投入了数十亿美元。像Chat-GPT这样的更受欢迎的模型是在大量的互联网数据上进行训练的。然而，很明显这些数据很快就会耗尽，科技公司正在寻找新的数据来源来训练下一代基础模型。强化学习、RAG、提示工程和认知建模经常被用来微调和增强基础模型的行为。这些技术已经被用来复制像Caryn Marjorie这样的人。这些聊天机器人并不基于人们对环境的实际情绪和生理反应，所以它们充其量只是对他们所模仿的角色的表层近似。为了解决这些问题，我们开发了一个记录设备，可以捕捉佩戴者所看到和听到的以及他们的皮肤电导（GSR）、面部表情和脑状态（14通道脑电图）数据。人工智能算法被用来处理这些数据，形成对主体环境和内部状态的丰富图像。在这些数据上训练的基础模型可以比迄今为止开发的人格模型更准确地复制人类行为。这种模型有许多潜在应用，包括推荐、个人助理、GAN系统、约会和招聘。本文为这项工作提供了一些背景信息，并描述了记录设备及其功能的初步测试。然后建议如何从记录设备捕获的数据创建一种新类型的基础模型，并概述一些应用。数据收集和模型训练成本昂贵，所以我们目前正在筹备一家初创公司，为项目的下一阶段筹集资金。

更新时间: 2024-07-31 11:51:26

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2408.02680v1

Probabilistic Scoring Lists for Interpretable Machine Learning

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions, or, more generally, probability intervals. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct a case study in the medical domain.

Updated: 2024-07-31 11:44:54

标题: 可解释机器学习的概率评分列表

摘要: 评分系统是一个简单的决策模型，检查一组特征，为满足的每个特征总分加上一定数量的分数，最终通过将总分与阈值进行比较来做出决策。评分系统在医疗保健和司法等安全关键领域有着悠久的活跃使用历史，在这些领域中，它们为做出客观和准确的决策提供指导。鉴于它们的真正可解释性，从可解释人工智能的角度来看，从数据中学习评分系统的想法显然具有吸引力。在本文中，我们提出了一种实际动机的评分系统扩展称为概率评分列表（PSL），以及一种从数据中学习PSL的方法。与做出确定性决策不同，PSL以概率分布或更一般地以概率区间的形式表示不确定性。此外，与决策列表的精神相符，PSL逐个评估特征，一旦可以有足够信心做出决策就停止。为了评估我们的方法，我们在医疗领域进行了一个案例研究。

更新时间: 2024-07-31 11:44:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.21535v1

V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions

NL2VIS (natural language to visualization) is a promising and recent research area that involves interpreting natural language queries and translating them into visualizations that accurately represent the underlying data. As we navigate the era of big data, NL2VIS holds considerable application potential since it greatly facilitates data exploration by non-expert users. Following the increasingly widespread usage of generative AI in NL2VIS applications, in this paper we present V-RECS, the first LLM-based Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration. V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users. Furthermore, our proposed solution mitigates computational, controllability, and cost issues associated with using powerful LLMs by leveraging a methodology to effectively fine-tune small models. To generate insightful visualization narratives, we use Chain-of-Thoughts (CoT), a prompt engineering technique to help LLM identify and generate the logical steps to produce a correct answer. Since CoT is reported to perform poorly with small LLMs, we adopted a strategy in which a large LLM (GPT-4), acting as a Teacher, generates CoT-based instructions to fine-tune a small model, Llama-2-7B, which plays the role of a Student. Extensive experiments-based on a framework for the quantitative evaluation of AI-based visualizations and on manual assessment by a group of participants-show that V-RECS achieves performance scores comparable to GPT-4, at a much lower cost. The efficacy of the V-RECS teacher-student paradigm is also demonstrated by the fact that the un-tuned Llama fails to perform the task in the vast majority of test cases. We release V-RECS for the visualization community to assist visualization designers throughout the entire visualization generation process.

Updated: 2024-07-31 11:39:32

标题: V-RECS，一款具有解释、标题和建议功能的低成本LLM4VIS推荐系统

摘要: NL2VIS（自然语言到可视化）是一个前景广阔且最近兴起的研究领域，涉及解释自然语言查询并将其转化为准确代表基础数据的可视化。随着大数据时代的到来，NL2VIS具有巨大的应用潜力，因为它极大地便利了非专业用户对数据的探索。随着在NL2VIS应用中越来越广泛使用生成式AI，在本文中，我们提出了V-RECS，这是第一个基于LLM的视觉推荐系统，增加了解释（E）、标题（C）和进一步数据探索的建议（S）。V-RECS的可视化叙述既便于非专业用户进行响应验证，也便于数据探索。此外，我们提出的解决方案通过利用一种有效的方法来对小型模型进行微调，从而缓解了使用强大LLM所涉及的计算、可控性和成本问题。为了生成富有洞察力的可视化叙述，我们使用了Chain-of-Thoughts（CoT），这是一种提示工程技术，可以帮助LLM识别和生成逻辑步骤以产生正确答案。由于CoT在小型LLM上表现不佳，我们采取了一种策略，即一个大型LLM（GPT-4）作为教师，生成基于CoT的指导以微调一个小型模型，Llama-2-7B，扮演学生的角色。基于一个用于量化评估基于AI的可视化和由一组参与者进行手动评估的框架进行的大量实验显示，V-RECS在较低成本下实现了与GPT-4可比的性能得分。V-RECS师生范式的有效性也通过未经微调的Llama在绝大多数测试用例中未能完成任务来证明。我们向可视化社区发布了V-RECS，以协助可视化设计师在整个可视化生成过程中。

更新时间: 2024-07-31 11:39:32

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.15259v2

Data Contamination Report from the 2024 CONDA Shared Task

The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models. The goal of the shared task and associated database is to assist the community in understanding the extent of the problem and to assist researchers in avoiding reporting evaluation results on known contaminated resources. The shared task provides a structured, centralized public database for the collection of contamination evidence, open to contributions from the community via GitHub pool requests. This first compilation paper is based on 566 reported entries over 91 contaminated sources from a total of 23 contributors. The details of the individual contamination events are available in the platform. The platform continues to be online, open to contributions from the community.

Updated: 2024-07-31 11:26:57

标题: 2024 CONDA共享任务的数据污染报告

摘要: 第一届数据污染研讨会（CONDA 2024）侧重于自然语言处理中数据污染的所有相关方面，其中数据污染被理解为在用于训练大规模模型的预训练语料库中包含评估数据的情况，从而损害评估结果。该研讨会促进了一个共享任务，收集当前可用数据集和模型中的数据污染证据。共享任务及相关数据库的目标是帮助社区了解问题的程度，并帮助研究人员避免在已知的污染资源上报告评估结果。共享任务提供了一个结构化的、集中的公共数据库，用于收集污染证据，通过GitHub池请求向社区开放。这篇首次编译的论文基于来自23名贡献者的总共91个受污染来源的566个报告条目。个别污染事件的详细信息可在平台上找到。该平台继续在线，欢迎社区贡献。

更新时间: 2024-07-31 11:26:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21530v1

The syzygy distinguisher

We present a new distinguisher for alternant and Goppa codes, whose complexity is subexponential in the error-correcting capability, hence better than that of generic decoding algorithms. Moreover it does not suffer from the strong regime limitations of the previous distinguishers or structure recovery algorithms: in particular, it applies to the codes used in the Classic McEliece candidate for postquantum cryptography standardization. The invariants that allow us to distinguish are graded Betti numbers of the homogeneous coordinate ring of a shortening of the dual code. Since its introduction in 1978, this is the first time an analysis of the McEliece cryptosystem breaks the exponential barrier.

Updated: 2024-07-31 11:18:35

标题: 协同分析器

摘要: 我们提出了一种新的区分交替和Goppa码的方法，其复杂度在纠错能力方面是次指数级的，因此比通用解码算法更好。此外，它不受先前区分器或结构恢复算法的强限制：特别是，它适用于经典McEliece候选方案中用于后量子密码标准化的代码。允许我们区分的不变量是对偶码的缩短的齐次坐标环的分级Betti数。自1978年引入以来，这是第一次突破指数障碍对McEliece加密系统进行分析。

更新时间: 2024-07-31 11:18:35

领域: cs.CR,cs.IT,math.AG,math.IT

下载: http://arxiv.org/abs/2407.15740v4

A New Type of Foundation Model Based on Recordings of People's Emotions and Physiology

Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of data from the Internet, and then reinforcement learning, RAG, prompt engineering and cognitive modelling are used to fine-tune and augment their behavior. This technology has been used to create models of individual people, such as Caryn Marjorie. However, these chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, surface-level approximations to the characters they are imitating. This paper describes how a new type of foundation model - a first-person foundation model - could be created from recordings of what a person sees and hears as well as their emotional and physiological reactions to these stimuli. A first-person foundation model would map environmental stimuli to a person's emotional and physiological states, and map a person's emotional and physiological states to their behavior. First-person foundation models have many exciting applications, including a new type of recommendation engine, personal assistants, generative adversarial networks, dating and recruitment. To obtain training data for a first-person foundation model, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their emotional and physiological states. This novel source of data could help to address the shortage of new data for building the next generation of foundation models.

Updated: 2024-07-31 11:14:45

标题: 基于人们情绪和生理记录的一种新型基础模型

摘要: 基础模型近年来产生了巨大影响，在当前人工智能热潮中有数十亿美元的投资。最受欢迎的模型，如Chat-GPT，是在互联网上大量数据的基础上进行训练，然后使用强化学习、RAG、提示工程和认知建模来微调和增强其行为。这项技术已被用于创建个体模型，如Caryn Marjorie。然而，这些聊天机器人并不基于人们对环境的实际情感和生理反应，因此它们充其量只是对他们所模仿的角色的表面层次的近似。本文描述了如何通过记录一个人看到和听到的东西以及他们对这些刺激的情感和生理反应来创建一种新型的基础模型-第一人称基础模型。第一人称基础模型将环境刺激映射到一个人的情感和生理状态，并将一个人的情感和生理状态映射到他们的行为。第一人称基础模型有许多令人兴奋的应用，包括一种新型的推荐引擎、个人助理、生成对抗网络、约会和招聘等。为了获取第一人称基础模型的训练数据，我们开发了一个记录装置，可以捕捉佩戴者看到和听到的东西以及他们的情感和生理状态。这种新颖的数据来源可以帮助解决建立下一代基础模型所需新数据的短缺问题。

更新时间: 2024-07-31 11:14:45

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00030v1

Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL such as imitation learning, pure exploration, constrained MDPs, offline RL, human-regularized RL, and others. Inverse reinforcement learning is a powerful paradigm that focuses on recovering an unknown reward function that can rationalize the observed behaviour of an agent. There has been recent theoretical advances in inverse RL where the problem is formulated as identifying the set of feasible reward functions. However, inverse RL for CURL problems has not been considered previously. In this paper we show that most of the standard IRL results do not apply to CURL in general, since CURL invalidates the classical Bellman equations. This calls for a new theoretical framework for the inverse CURL problem. Using a recent equivalence result between CURL and Mean-field Games, we propose a new definition for the feasible rewards for I-CURL by proving that this problem is equivalent to an inverse game theory problem in a subclass of mean-field games. We present initial query and sample complexity results for the I-CURL problem under assumptions such as Lipschitz-continuity. Finally, we outline future directions and applications in human--AI collaboration enabled by our results.

Updated: 2024-07-31 11:06:56

标题: 逆凹效用强化学习是逆博弈论

摘要: 我们考虑具有凹效用的反向强化学习问题。凹效用强化学习（CURL）是标准RL目标的一种泛化，它使用状态占用度的凹函数，而不是线性函数。 CURL近来受到关注，因为它能够表示许多重要应用的实例，包括标准RL，如模仿学习、纯探索、约束MDPs、离线RL、人类正则化RL等。反向强化学习是一种关注于恢复未知奖励函数的强大范式，可以合理解释代理的观察行为。最近在反向RL方面取得了理论进展，其中问题被表述为识别可行奖励函数的集合。然而，以前尚未考虑CURL问题的反向RL。在本文中，我们展示了大多数标准IRL结果通常不适用于一般的CURL，因为CURL使传统的Bellman方程式无效。这需要一个新的理论框架来解决逆CURL问题。利用最近关于CURL和均场博弈之间的等价结果，我们提出了I-CURL的可行奖励的新定义，并证明这个问题等效于在均场博弈子类中的一个逆博弈论问题。我们在假设Lipschitz连续性等条件下介绍了I-CURL问题的初始查询和样本复杂度结果。最后，我们概述了由我们的结果实现的人工智能协作的未来方向和应用。

更新时间: 2024-07-31 11:06:56

领域: cs.LG,cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2405.19024v2

Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution

Human Activity Recognition (HAR) is a field of study that focuses on identifying and classifying human activities. Skeleton-based Human Activity Recognition has received much attention in recent years, where Graph Convolutional Network (GCN) based method is widely used and has achieved remarkable results. However, the representation of skeleton data and the issue of over-smoothing in GCN still need to be studied. 1). Compared to central nodes, edge nodes can only aggregate limited neighbor information, and different edge nodes of the human body are always structurally related. However, the information from edge nodes is crucial for fine-grained activity recognition. 2). The Graph Convolutional Network suffers from a significant over-smoothing issue, causing nodes to become increasingly similar as the number of network layers increases. Based on these two ideas, we propose a two-stream graph convolution method called Spatial-Structural GCN (SpSt-GCN). Spatial GCN performs information aggregation based on the topological structure of the human body, and structural GCN performs differentiation based on the similarity of edge node sequences. The spatial connection is fixed, and the human skeleton naturally maintains this topology regardless of the actions performed by humans. However, the structural connection is dynamic and depends on the type of movement the human body is performing. Based on this idea, we also propose an entirely data-driven structural connection, which greatly increases flexibility. We evaluate our method on two large-scale datasets, i.e., NTU RGB+D and NTU RGB+D 120. The proposed method achieves good results while being efficient.

Updated: 2024-07-31 11:04:41

标题: 基于骨架的动作识别与空间结构图卷积

摘要: 人体活动识别（HAR）是一个研究领域，专注于识别和分类人类活动。基于骨架的人体活动识别近年来受到了广泛关注，其中基于图卷积网络（GCN）的方法被广泛使用并取得了显著成果。然而，骨架数据的表示和GCN中的过度平滑问题仍需要研究。1）与中心节点相比，边缘节点只能聚合有限的邻居信息，并且人体的不同边缘节点总是在结构上相关。然而，来自边缘节点的信息对于细粒度活动识别至关重要。2）图卷积网络存在严重的过度平滑问题，导致随着网络层数的增加，节点变得越来越相似。基于这两个观点，我们提出了一种名为Spatial-Structural GCN（SpSt-GCN）的两流图卷积方法。空间GCN基于人体的拓扑结构进行信息聚合，而结构GCN基于边缘节点序列的相似性进行区分。空间连接是固定的，人体骨架自然保持这种拓扑结构，无论人类执行何种动作。然而，结构连接是动态的，取决于人体正在执行的运动类型。基于这一想法，我们还提出了完全数据驱动的结构连接，大大增加了灵活性。我们在两个大规模数据集（NTU RGB+D和NTU RGB+D 120）上评估了我们的方法。所提出的方法在高效的同时取得了良好的结果。

更新时间: 2024-07-31 11:04:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21525v1

Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI

Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality tabular data for model training remains a significant obstacle. Numerous works have focused on tabular data augmentation (TDA) to enhance the original table with additional data, thereby improving downstream ML tasks. Recently, there has been a growing interest in leveraging the capabilities of generative AI for TDA. Therefore, we believe it is time to provide a comprehensive review of the progress and future prospects of TDA, with a particular emphasis on the trending generative AI. Specifically, we present an architectural view of the TDA pipeline, comprising three main procedures: pre-augmentation, augmentation, and post-augmentation. Pre-augmentation encompasses preparation tasks that facilitate subsequent TDA, including error handling, table annotation, table simplification, table representation, table indexing, table navigation, schema matching, and entity matching. Augmentation systematically analyzes current TDA methods, categorized into retrieval-based methods, which retrieve external data, and generation-based methods, which generate synthetic data. We further subdivide these methods based on the granularity of the augmentation process at the row, column, cell, and table levels. Post-augmentation focuses on the datasets, evaluation and optimization aspects of TDA. We also summarize current trends and future directions for TDA, highlighting promising opportunities in the era of generative AI. In addition, the accompanying papers and related resources are continuously updated and maintained in the GitHub repository at https://github.com/SuDIS-ZJU/awesome-tabular-data-augmentation to reflect ongoing advancements in the field.

Updated: 2024-07-31 10:56:20

标题: 机器学习中的表格数据增强：拥抱生成人工智能的进展和前景

摘要: 表格数据上的机器学习（ML）是无处不在的，然而获取丰富高质量的表格数据用于模型训练仍然是一个重要障碍。许多研究致力于表格数据增强（TDA），以增加额外数据来提高原始表格，从而改进下游的ML任务。最近，人们越来越关注利用生成AI的能力进行TDA。因此，我们认为现在是时候对TDA的进展和未来前景进行全面审查，特别强调流行的生成AI。具体而言，我们提出了TDA管道的架构视图，包括三个主要过程：前增强、增强和后增强。前增强包括准备任务，以便促进后续的TDA，包括错误处理、表格注释、表格简化、表格表示、表格索引、表格导航、模式匹配和实体匹配。增强系统系统地分析当前TDA方法，分为检索型方法，检索外部数据，和生成型方法，生成合成数据。我们进一步根据增强过程在行、列、单元和表格级别的细粒度将这些方法进行细分。后增强关注TDA的数据集、评估和优化方面。我们还总结了TDA的当前趋势和未来方向，突出了在生成AI时代的有前景的机会。此外，随附的论文和相关资源在GitHub库中持续更新和维护，以反映该领域的持续进展。

更新时间: 2024-07-31 10:56:20

领域: cs.LG,cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.21523v1

Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging

Next-generation radio interferometers like the Square Kilometer Array have the potential to unlock scientific discoveries thanks to their unprecedented angular resolution and sensitivity. One key to unlocking their potential resides in handling the deluge and complexity of incoming data. This challenge requires building radio interferometric imaging methods that can cope with the massive data sizes and provide high-quality image reconstructions with uncertainty quantification (UQ). This work proposes a method coined QuantifAI to address UQ in radio-interferometric imaging with data-driven (learned) priors for high-dimensional settings. Our model, rooted in the Bayesian framework, uses a physically motivated model for the likelihood. The model exploits a data-driven convex prior, which can encode complex information learned implicitly from simulations and guarantee the log-concavity of the posterior. We leverage probability concentration phenomena of high-dimensional log-concave posteriors that let us obtain information about the posterior, avoiding MCMC sampling techniques. We rely on convex optimisation methods to compute the MAP estimation, which is known to be faster and better scale with dimension than MCMC sampling strategies. Our method allows us to compute local credible intervals, i.e., Bayesian error bars, and perform hypothesis testing of structure on the reconstructed image. In addition, we propose a novel blazing-fast method to compute pixel-wise uncertainties at different scales. We demonstrate our method by reconstructing radio-interferometric images in a simulated setting and carrying out fast and scalable UQ, which we validate with MCMC sampling. Our method shows an improved image quality and more meaningful uncertainties than the benchmark method based on a sparsity-promoting prior. QuantifAI's source code: https://github.com/astro-informatics/QuantifAI.

Updated: 2024-07-31 10:51:03

标题: 可扩展的基于数据驱动先验的贝叶斯不确定性量化用于射电干涉成像

摘要: 下一代射电干涉仪，如方公里阵列，具有前所未有的角分辨率和灵敏度，有望解锁科学发现的潜力。解锁其潜力的关键之一在于处理涌入数据的洪流和复杂性。这一挑战需要构建能够处理大规模数据并提供高质量图像重建与不确定性量化（UQ）的射电干涉成像方法。本研究提出了一种名为QuantifAI的方法，以应对数据驱动（学习）先验在高维设置下的射电干涉成像中的UQ。我们的模型根植于贝叶斯框架，使用一个基于物理动机的模型作为似然函数。该模型利用了一个数据驱动的凸先验，可以从模拟中隐式学习复杂信息，并保证后验的对数凹性。我们利用高维对数凹后验的概率集中现象，使我们能够获取关于后验的信息，避免MCMC采样技术。我们依靠凸优化方法计算MAP估计，这被认为比MCMC采样策略更快速且更好地与维度匹配。我们的方法允许我们计算局部可信区间，即贝叶斯误差栏，并对重建图像的结构进行假设检验。此外，我们提出了一种新颖的快速方法，用于计算不同尺度下的像素级不确定性。我们通过在模拟环境中重建射电干涉图像并进行快速可扩展的UQ来展示我们的方法，我们验证了该方法与MCMC采样的性能。我们的方法显示出比基于稀疏促进先验的基准方法更好的图像质量和更有意义的不确定性。QuantifAI的源代码：https://github.com/astro-informatics/QuantifAI。

更新时间: 2024-07-31 10:51:03

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2312.00125v3

The Impacts of AI Avatar Appearance and Disclosure on User Motivation

This study examines the influence of perceived AI features on user motivation in virtual interactions. AI avatars, being disclosed as being an AI, or embodying specific genders, could be used in user-AI interactions. Leveraging insights from AI and avatar research, we explore how AI disclosure and gender affect user motivation. We conducted a game-based experiment involving over 72,500 participants who solved search problems alone or with an AI companion. Different groups experienced varying AI appearances and disclosures. We measured play intensity. Results revealed that the presence of another avatar led to less intense play compared to solo play. Disclosure of the avatar as AI heightened effort intensity compared to non-disclosed AI companions. Additionally, a masculine AI appearance reduced effort intensity.

Updated: 2024-07-31 10:48:55

标题: AI头像外观和披露对用户动机的影响

摘要: 这项研究探讨了用户对虚拟互动中感知到的人工智能特征的影响。人工智能化身，被公开为人工智能，或具有特定性别，可以在用户与人工智能的互动中使用。借鉴人工智能和化身研究的见解，我们探讨了人工智能披露和性别对用户动机的影响。我们进行了一个基于游戏的实验，涉及超过72,500名解决搜索问题的参与者，他们可以独自或与人工智能伴侣一起解决问题。不同的组体验了不同的人工智能外观和披露方式。我们测量了游戏强度。结果显示，另一个化身的存在导致比独自游戏更少的游戏强度。将化身披露为人工智能，相比未披露的人工智能伴侣，增加了努力强度。此外，男性化身外观降低了努力强度。

更新时间: 2024-07-31 10:48:55

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.21521v1

Expanding the Medical Decathlon dataset: segmentation of colon and colorectal cancer from computed tomography images

Colorectal cancer is the third-most common cancer in the Western Hemisphere. The segmentation of colorectal and colorectal cancer by computed tomography is an urgent problem in medicine. Indeed, a system capable of solving this problem will enable the detection of colorectal cancer at early stages of the disease, facilitate the search for pathology by the radiologist, and significantly accelerate the process of diagnosing the disease. However, scientific publications on medical image processing mostly use closed, non-public data. This paper presents an extension of the Medical Decathlon dataset with colorectal markups in order to improve the quality of segmentation algorithms. An experienced radiologist validated the data, categorized it into subsets by quality, and published it in the public domain. Based on the obtained results, we trained neural network models of the UNet architecture with 5-part cross-validation and achieved a Dice metric quality of $0.6988 \pm 0.3$. The published markups will improve the quality of colorectal cancer detection and simplify the radiologist's job for study description.

Updated: 2024-07-31 10:36:41

标题: 扩展医学十项全能数据集：从计算机断层扫描图像中分割结肠和结直肠癌

摘要: 结直肠癌是西半球第三常见的癌症。通过计算机断层扫描对结直肠和结直肠癌进行分割是医学上一个紧迫的问题。事实上，一个能够解决这个问题的系统将能够在疾病的早期阶段检测到结直肠癌，便于放射科医师寻找病理学，并显著加快诊断疾病的过程。然而，关于医学图像处理的科学出版物大多使用封闭的非公开数据。本文介绍了带有结直肠标记的医学十项全能数据集的扩展，以提高分割算法的质量。一位经验丰富的放射科医师验证了数据，按质量将其分类为子集，并将其发布在公共领域。根据获得的结果，我们使用UNet架构的神经网络模型进行了5部分交叉验证，并实现了Dice指标质量为0.6988 ± 0.3。发布的标记将提高结直肠癌检测的质量，并简化放射科医师对研究描述的工作。

更新时间: 2024-07-31 10:36:41

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.21516v1

Empirical Capacity Model for Self-Attention Neural Networks

Large pretrained self-attention neural networks, or transformers, have been very successful in various tasks recently. The performance of a model on a given task depends on its ability to memorize and generalize the training data. Large transformer models, which may have billions of parameters, in theory have a huge capacity to memorize content. However, the current algorithms for the optimization fall short of the theoretical capacity, and the capacity is also highly dependent on the content. In this paper, we focus on the memory capacity of these models obtained using common training algorithms and synthetic training data. Based on the results, we derive an empirical capacity model (ECM) for a generic transformer. The ECM can be used to design task-specific transformer models with an optimal number of parameters in cases where the target memorization capability of the task can be defined.

Updated: 2024-07-31 10:27:37

标题: 自注意力神经网络的经验容量模型

摘要: 最近，大型预训练的自注意力神经网络，或者transformers，在各种任务中取得了非常成功的成绩。模型在特定任务上的表现取决于其记忆和泛化训练数据的能力。大型transformer模型理论上具有巨大的记忆内容的容量，可能拥有数十亿个参数。然而，目前的优化算法未能充分发挥理论容量，而且容量也高度依赖于内容。本文重点研究了这些模型使用常见训练算法和合成训练数据获得的记忆容量。基于结果，我们推导出了一个通用transformer的经验容量模型（ECM）。ECM可用于设计具有最佳参数数量的特定任务的transformer模型，在这些任务的目标记忆能力可以定义的情况下使用。

更新时间: 2024-07-31 10:27:37

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2407.15425v2

Understanding Prediction Discrepancies in Machine Learning Classifiers

A multitude of classifiers can be trained on the same data to achieve similar performances during test time, while having learned significantly different classification patterns. This phenomenon, which we call prediction discrepancies, is often associated with the blind selection of one model instead of another with similar performances. When making a choice, the machine learning practitioner has no understanding on the differences between models, their limits, where they agree and where they don't. But his/her choice will result in concrete consequences for instances to be classified in the discrepancy zone, since the final decision will be based on the selected classification pattern. Besides the arbitrary nature of the result, a bad choice could have further negative consequences such as loss of opportunity or lack of fairness. This paper proposes to address this question by analyzing the prediction discrepancies in a pool of best-performing models trained on the same data. A model-agnostic algorithm, DIG, is proposed to capture and explain discrepancies locally, to enable the practitioner to make the best educated decision when selecting a model by anticipating its potential undesired consequences. All the code to reproduce the experiments is available.

Updated: 2024-07-31 10:26:55

标题: 理解机器学习分类器中的预测差异

摘要: 可以训练多种分类器在相同数据上，以在测试时达到类似的性能，同时学习到显著不同的分类模式。这种现象，我们称之为预测差异，通常与盲目选择一个模型而不是另一个具有类似性能的模型相关联。在做出选择时，机器学习从业者对模型之间的差异、它们的限制、它们的一致性和不一致性没有了解。但是他/她的选择将对要在差异区域分类的实例产生具体后果，因为最终决策将基于所选的分类模式。除了结果的任意性，错误的选择可能会导致进一步的负面后果，如失去机会或缺乏公平性。本文提出通过分析在相同数据上训练的最佳表现模型的预测差异来解决这个问题。提出了一种模型无关的算法DIG，用于局部捕获和解释差异，使从业者能够在选择模型时做出最明智的决定，预测其潜在的不良后果。所有用于重现实验的代码都可供使用。

更新时间: 2024-07-31 10:26:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2104.05467v2

FSSC: Federated Learning of Transformer Neural Networks for Semantic Image Communication

In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next, the FL framework is introduced to collaboratively learn a global model by aggregating local model parameters, rather than directly sharing clients' data. This approach enhances user privacy protection and reduces the workload on the server or mobile edge. Simulation evaluations indicate that our method outperforms the typical JSCC algorithm and traditional separate-based communication algorithms. Particularly after integrating local semantics, the global aggregation model has further increased the Peak Signal-to-Noise Ratio (PSNR) by more than 2dB, thoroughly proving the effectiveness of our algorithm.

Updated: 2024-07-31 10:25:24

标题: FSSC：用于语义图像传输的Transformer神经网络的联邦学习

摘要: 在这篇论文中，我们解决了多用户部署场景中的图像语义通信问题，并提出了一种基于Swin Transformer的语义通信系统（FSSC）的联邦学习（FL）策略。首先，我们证明采用Swin Transformer进行联合源通道编码（JSCC）有效地提取了通信系统中的语义信息。接下来，引入FL框架，通过聚合本地模型参数来协同学习全局模型，而不是直接共享客户端的数据。这种方法增强了用户隐私保护，并减少了服务器或移动边缘的工作量。模拟评估表明，我们的方法优于典型的JSCC算法和传统的基于分离的通信算法。特别是在集成本地语义后，全局聚合模型的峰值信噪比（PSNR）进一步提高了2dB以上，充分证明了我们算法的有效性。

更新时间: 2024-07-31 10:25:24

领域: cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.21507v1

Root Cause Analysis Of Productivity Losses In Manufacturing Systems Utilizing Ensemble Machine Learning

In today's rapidly evolving landscape of automation and manufacturing systems, the efficient resolution of productivity losses is paramount. This study introduces a data-driven ensemble approach, utilizing the cyclic multivariate time series data from binary sensors and signals from Programmable Logic Controllers (PLCs) within these systems. The objective is to automatically analyze productivity losses per cycle and pinpoint their root causes by assigning the loss to a system element. The ensemble approach introduced in this publication integrates various methods, including information theory and machine learning behavior models, to provide a robust analysis for each production cycle. To expedite the resolution of productivity losses and ensure short response times, stream processing becomes a necessity. Addressing this, the approach is implemented as data-stream analysis and can be transferred to batch processing, seamlessly integrating into existing systems without the need for extensive historical data analysis. This method has two positive effects. Firstly, the result of the analysis ensures that the period of lower productivity is reduced by identifying the likely root cause of the productivity loss. Secondly, these results are more reliable due to the ensemble approach and therefore avoid dependency on technical experts. The approach is validated using a semi-automated welding manufacturing system, an injection molding automation system, and a synthetically generated test PLC dataset. The results demonstrate the method's efficacy in offering a data-driven understanding of process behavior and mark an advancement in autonomous manufacturing system analysis.

Updated: 2024-07-31 10:21:20

标题: 制造系统中生产力损失的根本原因分析：利用整体机器学习

摘要: 在当今快速发展的自动化和制造系统领域，高效解决生产力损失至关重要。本研究引入了一种数据驱动的集成方法，利用来自二进制传感器和可编程逻辑控制器（PLCs）的循环多变量时间序列数据。其目标是自动分析每个周期的生产力损失，并通过将损失分配给系统元素来确定其根本原因。本文介绍的集成方法整合了各种方法，包括信息论和机器学习行为模型，为每个生产周期提供了强大的分析。为了加快解决生产力损失并确保快速响应时间，流处理变得必不可少。针对这一问题，该方法被实现为数据流分析，并可以无缝集成到现有系统中，无需进行大量历史数据分析。这种方法有两个积极效果。首先，分析结果确保通过识别生产力损失的可能根本原因来减少低生产力期间。其次，由于集成方法，这些结果更可靠，因此避免了对技术专家的依赖。该方法使用半自动焊接制造系统、注塑自动化系统和一组合成生成的测试PLC数据集进行验证。结果表明了该方法在提供数据驱动的过程行为理解方面的有效性，并标志着自主制造系统分析的进步。

更新时间: 2024-07-31 10:21:20

领域: cs.LG

下载: http://arxiv.org/abs/2407.21503v1

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation technique that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants, thereby preserves the original instruction's context and complexity while introducing variability, which is critical for training and evaluating LLMs' instruction-following precision. We developed the DeMoRecon dataset using this method to both fine-tune and evaluate LLMs. Our findings show that LLMs fine-tuned with DeMoRecon will gain significant performance boost on both ours and commonly used instructions-following benchmarks.

Updated: 2024-07-31 10:18:50

标题: 提升并评估细粒度指令变量下的指令遵循

摘要: 大型语言模型（LLMs）与准确指令的有效对齐对它们在各种实际场景中的应用至关重要。目前的方法侧重于增强训练和评估样本的多样性和复杂性，但在准确评估LLMs遵循类似指令变体的能力方面表现不佳。我们引入了一种有效的数据增强技术，将复杂指令分解为更简单的子组件，对其进行修改，并重构为新的变体，从而保留原始指令的上下文和复杂性，同时引入变化性，这对于训练和评估LLMs的指令遵循精度至关重要。我们使用这种方法开发了DeMoRecon数据集，用于微调和评估LLMs。我们的研究结果显示，使用DeMoRecon进行微调的LLMs将在我们和常用的指令遵循基准测试中获得显著的性能提升。

更新时间: 2024-07-31 10:18:50

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.11301v2

MaskUno: Switch-Split Block For Enhancing Instance Segmentation

Instance segmentation is an advanced form of image segmentation which, beyond traditional segmentation, requires identifying individual instances of repeating objects in a scene. Mask R-CNN is the most common architecture for instance segmentation, and improvements to this architecture include steps such as benefiting from bounding box refinements, adding semantics, or backbone enhancements. In all the proposed variations to date, the problem of competing kernels (each class aims to maximize its own accuracy) persists when models try to synchronously learn numerous classes. In this paper, we propose mitigating this problem by replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors. We name the method MaskUno and test it on various models from the literature, which are then trained on multiple classes using the benchmark COCO dataset. An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes. MaskUno proved to enhance the mAP of instance segmentation models regardless of the number and typ

Updated: 2024-07-31 10:12:14

标题: MaskUno：用于增强实例分割的开关分割块

摘要: 实例分割是图像分割的一种高级形式，除了传统分割之外，还需要识别场景中重复对象的个别实例。Mask R-CNN是实例分割最常见的架构，对该架构的改进包括从边界框细化、添加语义或骨干增强等步骤。到目前为止，所有提出的变体中，当模型尝试同步学习多个类时，竞争核心（每个类都旨在最大化自己的准确性）的问题仍然存在。在本文中，我们建议通过用处理细化的感兴趣区域（ROIs）、对其进行分类，并将其分配给专门的蒙版预测器的Switch-Split块来减轻这个问题。我们将该方法命名为MaskUno，并在文献中的各种模型上进行测试，这些模型随后使用基准COCO数据集对多个类进行训练。当高性能的DetectoRS在80个类上训练时，平均精度（mAP）增加了2.03％。MaskUno被证明可以提高实例分割模型的mAP，无论实例数量和类数如何。

更新时间: 2024-07-31 10:12:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21498v1

Synthetic Tabular Data Validation: A Divergence-Based Approach

The ever-increasing use of generative models in various fields where tabular data is used highlights the need for robust and standardized validation metrics to assess the similarity between real and synthetic data. Current methods lack a unified framework and rely on diverse and often inconclusive statistical measures. Divergences, which quantify discrepancies between data distributions, offer a promising avenue for validation. However, traditional approaches calculate divergences independently for each feature due to the complexity of joint distribution modeling. This paper addresses this challenge by proposing a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons. Our core contribution lies in applying a divergence estimator to build a validation metric considering the joint distribution of real and synthetic data. We leverage a probabilistic classifier to approximate the density ratio between datasets, allowing the capture of complex relationships. We specifically calculate two divergences: the well-known Kullback-Leibler (KL) divergence and the Jensen-Shannon (JS) divergence. KL divergence offers an established use in the field, while JS divergence is symmetric and bounded, providing a reliable metric. The efficacy of this approach is demonstrated through a series of experiments with varying distribution complexities. The initial phase involves comparing estimated divergences with analytical solutions for simple distributions, setting a benchmark for accuracy. Finally, we validate our method on a real-world dataset and its corresponding synthetic counterpart, showcasing its effectiveness in practical applications. This research offers a significant contribution with applicability beyond tabular data and the potential to improve synthetic data validation in various fields.

Updated: 2024-07-31 10:00:58

标题: 合成表格数据验证：一种基于发散的方法

摘要: 随着生成模型在各个领域中的不断增加使用，其中使用表格数据的需求凸显出了对于评估真实数据和合成数据之间相似性的强大和标准化的验证指标的需求。当前的方法缺乏统一的框架，依赖于多样且经常无法得出结论的统计指标。用来量化数据分布之间差异的发散性提供了一个有前途的验证途径。然而，由于联合分布建模的复杂性，传统方法独立计算每个特征的差异。本文通过提出一种新颖的方法，利用发散性估计来克服边际比较的局限性，来应对这一挑战。我们的核心贡献在于应用发散性估计器构建一个考虑真实数据和合成数据联合分布的验证指标。我们利用概率分类器来近似数据集之间的密度比，从而捕捉复杂的关系。我们特别计算了两种发散性：著名的Kullback-Leibler（KL）发散性和Jensen-Shannon（JS）发散性。KL发散性在该领域中有着已建立的应用，而JS发散性是对称的且有界的，提供了一个可靠的指标。通过一系列具有不同分布复杂性的实验来展示这种方法的有效性。初始阶段涉及对于简单分布的估计发散性与分析解的比较，为准确性设定了一个基准。最后，我们在一个真实数据集及其对应的合成数据上验证了我们的方法，展示了其在实际应用中的有效性。这项研究提供了一个具有重要贡献的应用范围超越表格数据，并有潜力在各个领域中改进合成数据验证。

更新时间: 2024-07-31 10:00:58

领域: cs.LG,cs.AI,I.2.0

下载: http://arxiv.org/abs/2405.07822v2

Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM.

Updated: 2024-07-31 09:59:20

标题: 可解释和可控运动曲线引导心脏超声视频生成

摘要: 超声心动图视频是诊断心脏疾病的主要方法，但有限的数据对临床教学和机器学习训练都带来挑战。最近，视频生成模型已经成为缓解这一问题的一种有前途的策略。然而，先前的方法通常在生成过程中依赖于整体条件，阻碍了对特定心脏结构的灵活运动控制。在这种背景下，我们提出了一种可解释和可控的超声心动图视频生成方法，以初始帧和运动曲线作为指导。我们的贡献有三个方面。首先，我们从每个心脏亚结构中提取运动信息，构建运动曲线，使扩散模型能够通过修改这些曲线来合成定制的超声心动图视频。其次，我们提出了结构到运动对齐模块，可以将语义特征映射到跨心脏结构的运动曲线上。第三，设计了位置感知注意机制，利用带有结构位置信息的高斯掩模来增强视频的一致性。对三个超声心动图数据集进行的大量实验表明，我们的方法在忠实度和一致性方面优于其他方法。完整的代码将在https://github.com/mlmi-2024-72/ECM 上发布。

更新时间: 2024-07-31 09:59:20

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.21490v1

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends

Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work,we challenge this recent trend by introducing Maverick, a carefully designed - yet simple - pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.

Updated: 2024-07-31 09:58:48

标题: "Maverick：高效准确的指代消解技术，挑战最新趋势"

摘要: 大型自回归生成模型已经成为在几个自然语言处理任务中实现最高性能的基石。然而，为了获得更优异的结果，有时会过早地替换经过精心设计的特定任务方法，而没有进行详尽的实验。共指消解任务也不例外；所有最近的最先进解决方案都采用大型生成自回归模型，这些模型胜过基于编码器的判别系统。在这项工作中，我们挑战了这种最近的趋势，介绍了Maverick，一个经过精心设计但简单的流程，能够在学术预算的限制下运行一个最先进的共指消解系统，胜过具有多达130亿参数的模型，只需500百万参数。Maverick在CoNLL-2012基准测试中取得了最先进的性能，使用的内存资源最高仅为以往最先进系统的0.006倍，并且推理速度比以往的系统快了170倍。我们通过一系列多样化的实验充分验证了Maverick框架的稳健性，在数据稀缺、长文档和域外设置方面比之前的系统有所改进。我们发布了我们的代码和模型供研究目的使用，网址为https://github.com/SapienzaNLP/maverick-coref。

更新时间: 2024-07-31 09:58:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21489v1

Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval

Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER employing Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critical issue termed the "\textbf{Hourglass}" phenomenon, occurs in RQ-SID, where intermediate codebook tokens become overly concentrated, hindering the full utilization of generative retrieval methods. This paper analyses and addresses this problem by identifying data sparsity and long-tailed distribution as the primary causes. Through comprehensive experiments and detailed ablation studies, we analyze the impact of these factors on codebook utilization and data distribution. Our findings reveal that the "Hourglass" phenomenon substantially impacts the performance of RQ-SID in generative retrieval. We propose effective solutions to mitigate this issue, thereby significantly enhancing the effectiveness of generative retrieval in real-world E-commerce applications.

Updated: 2024-07-31 09:52:53

标题: 打破残留量化的沙漏现象：提高生成检索的上限 bound

摘要: 生成式检索（GR）已经成为搜索和推荐系统中的一种转变范式，利用基于数字的标识符表示来增强效率和泛化能力。值得注意的是，像TIGER这样使用残差量化语义标识符（RQ-SID）的方法，在电子商务场景中展现出显著的潜力，有效管理物品标识符。然而，在RQ-SID中出现了一个关键问题，被称为“沙漏”现象，即中间码书令牌过于集中，阻碍了生成式检索方法的充分利用。本文通过识别数据稀疏性和长尾分布作为主要原因，分析和解决了这个问题。通过全面实验和详细消融研究，我们分析了这些因素对码书利用和数据分布的影响。我们的研究结果表明，“沙漏”现象显著影响了RQ-SID在生成式检索中的性能。我们提出有效的解决方案来缓解这个问题，从而显著提高了生成式检索在现实世界电子商务应用中的效果。

更新时间: 2024-07-31 09:52:53

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.21488v1

Parallel Strategies for Best-First Generalized Planning

In recent years, there has been renewed interest in closing the performance gap between state-of-the-art planning solvers and generalized planning (GP), a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances. One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with heuristic search, one of the foundations of modern planners. This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap. We first discuss why BFGP is well suited for parallelization and some of its differentiating characteristics from classical planners. Then, we propose two simple shared-memory parallel strategies with good scaling with the number of cores.

Updated: 2024-07-31 09:50:22

标题: 最佳优先广义规划的并行策略

摘要: 近年来，人们对关闭现代规划求解器和广义规划（GP）之间的性能差距重新产生兴趣，后者是人工智能研究领域的一个研究方向，研究自动合成算法样式解决方案，能够解决多个经典规划实例。目前的一个进展是引入了基于一种新颖解空间的Best-First广义规划（BFGP）算法，可以利用启发式搜索来探索这个解空间，这是现代规划器的基础之一。本文评估了将并行搜索技术应用于BFGP的效果，这是关闭性能差距的另一个关键组成部分。我们首先讨论了为什么BFGP很适合并行化，以及它与传统规划器的一些不同特征。然后，我们提出了两种简单的共享内存并行策略，可以根据核心数量良好扩展。

更新时间: 2024-07-31 09:50:22

领域: cs.AI,I.2.8; D.1.3

下载: http://arxiv.org/abs/2407.21485v1

eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs

Over the past few years, we have seen the emergence of large knowledge graphs combining information from multiple sources. Sometimes, this information is provided in the form of assertions about other assertions, defining contexts where assertions are valid. A recent extension to RDF which admits statements over statements, called RDF-star, is in revision to become a W3C standard. However, there is no proposal for a semantics of these RDF-star statements nor a built-in facility to operate over them. In this paper, we propose a query language for epistemic RDF-star metadata based on a four-valued logic, called eSPARQL. Our proposed query language extends SPARQL-star, the query language for RDF-star, with a new type of FROM clause to facilitate operating with multiple and sometimes conflicting beliefs. We show that the proposed query language can express four use case queries, including the following features: (i) querying the belief of an individual, (ii) the aggregating of beliefs, (iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs (i.e., nesting of beliefs).

Updated: 2024-07-31 09:48:27

标题: eSPARQL：在RDF-star知识图中表示和协调无神论和无神论信仰

摘要: 在过去几年里，我们看到了大型知识图谱的出现，将来自多个来源的信息结合在一起。有时，这些信息以关于其他断言的形式提供，定义了断言有效的上下文。最近对RDF的扩展允许对其他断言进行陈述，称为RDF-star，目前正在修订成为W3C标准。然而，对于这些RDF-star语句的语义尚无提议，也没有内置的操作工具。在本文中，我们提出了一种基于四值逻辑的认知RDF-star元数据查询语言，称为eSPARQL。我们提出的查询语言扩展了RDF-star的查询语言SPARQL-star，通过一个新的FROM子句类型来便于操作多个有时存在冲突的信念。我们展示了提出的查询语言可以表达四个用例查询，包括以下特点：(i)查询个体的信念，(ii)信念的聚合，(iii)查询谁与某人存在冲突，以及(iv)关于信念的信念（即信念的嵌套）。

更新时间: 2024-07-31 09:48:27

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.21483v1

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

The rapid development of neural text-to-speech (TTS) systems enabled its usage in other areas of natural language processing such as automatic speech recognition (ASR) or spoken language translation (SLT). Due to the large number of different TTS architectures and their extensions, selecting which TTS systems to use for synthetic data creation is not an easy task. We use the comparison of five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training. We compare the recognition results to computable metrics like NISQA MOS and intelligibility, finding that there are no clear relations to the ASR performance. We also observe that for data generation auto-regressive decoding performs better than non-autoregressive decoding, and propose an approach to quantify TTS generalization capabilities.

Updated: 2024-07-31 09:37:27

标题: 关于在自动语音识别中合成数据生成的文本到语音模型选择问题

摘要: 神经文本到语音（TTS）系统的快速发展使其能够在自然语言处理的其他领域，如自动语音识别（ASR）或口语翻译（SLT）中使用。由于不同TTS架构及其扩展的数量庞大，选择用于合成数据创建的TTS系统并不是一项容易的任务。我们在合成数据生成范围内比较了五种不同的TTS解码器架构，以展示对基于CTC的语音识别训练的影响。我们将识别结果与可计算的指标（如NISQA MOS和可懂度）进行比较，发现与ASR性能之间没有明显关系。我们还观察到，在数据生成过程中，自回归解码优于非自回归解码，并提出了一种方法来量化TTS的泛化能力。

更新时间: 2024-07-31 09:37:27

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.21476v1

Fine-gained Zero-shot Video Sampling

Incorporating a temporal dimension into pretrained image diffusion models for video generation is a prevalent approach. However, this method is computationally demanding and necessitates large-scale video datasets. More critically, the heterogeneity between image and video datasets often results in catastrophic forgetting of the image expertise. Recent attempts to directly extract video snippets from image diffusion models have somewhat mitigated these problems. Nevertheless, these methods can only generate brief video clips with simple movements and fail to capture fine-grained motion or non-grid deformation. In this paper, we propose a novel Zero-Shot video Sampling algorithm, denoted as $\mathcal{ZS}^2$, capable of directly sampling high-quality video clips from existing image synthesis methods, such as Stable Diffusion, without any training or optimization. Specifically, $\mathcal{ZS}^2$ utilizes the dependency noise model and temporal momentum attention to ensure content consistency and animation coherence, respectively. This ability enables it to excel in related tasks, such as conditional and context-specialized video generation and instruction-guided video editing. Experimental results demonstrate that $\mathcal{ZS}^2$ achieves state-of-the-art performance in zero-shot video generation, occasionally outperforming recent supervised methods. Homepage: \url{https://densechen.github.io/zss/}.

Updated: 2024-07-31 09:36:58

标题: 细粒度零样本视频采样

摘要: 将预训练的图像扩散模型中加入时间维度用于视频生成是一种流行的方法。然而，这种方法在计算上要求很高，并需要大规模的视频数据集。更为关键的是，图像和视频数据集之间的异质性往往导致图像专业知识的灾难性遗忘。最近直接从图像扩散模型中提取视频片段的尝试在一定程度上缓解了这些问题。然而，这些方法只能生成简单移动的简短视频剪辑，无法捕捉细粒度的运动或非网格变形。在本文中，我们提出了一种新颖的Zero-Shot视频采样算法，记为$\mathcal{ZS}^2$，能够直接从现有的图像合成方法（如稳定扩散）中采样高质量的视频剪辑，无需任何训练或优化。具体来说，$\mathcal{ZS}^2$利用依赖噪声模型和时间动量注意力来确保内容一致性和动画连贯性。这种能力使其在相关任务中表现出色，如条件和上下文专门化视频生成以及指导视频编辑。实验结果表明，$\mathcal{ZS}^2$在零样本视频生成中取得了最先进的性能，有时甚至优于最近的监督方法。主页：\url{https://densechen.github.io/zss/}。

更新时间: 2024-07-31 09:36:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21475v1

An Invertible State Space for Process Trees

Process models are, like event data, first-class citizens in most process mining approaches. Several process modeling formalisms have been proposed and used, e.g., Petri nets, BPMN, and process trees. Despite their frequent use, little research addresses the formal properties of process trees and the corresponding potential to improve the efficiency of solving common computational problems. Therefore, in this paper, we propose an invertible state space definition for process trees and demonstrate that the corresponding state space graph is isomorphic to the state space graph of the tree's inverse. Our result supports the development of novel, time-efficient, decomposition strategies for applications of process trees. Our experiments confirm that our state space definition allows for the adoption of bidirectional state space search, which significantly improves the overall performance of state space searches.

Updated: 2024-07-31 09:26:35

标题: 一个可逆的过程树状态空间

摘要: 过程模型在大多数过程挖掘方法中与事件数据一样，是第一类对象。已经提出和使用了几种过程建模形式，例如Petri网、BPMN和过程树。尽管它们被频繁使用，但很少有研究关注过程树的形式属性以及相应的潜力来改进解决常见计算问题的效率。因此，在本文中，我们提出了过程树的可逆状态空间定义，并展示了相应的状态空间图与树的逆的状态空间图是同构的。我们的结果支持新颖、高效的分解策略的发展，用于应用过程树。我们的实验证实，我们的状态空间定义允许采用双向状态空间搜索，显著提高了状态空间搜索的整体性能。

更新时间: 2024-07-31 09:26:35

领域: cs.DS,cs.AI

下载: http://arxiv.org/abs/2407.21468v1

Deep Learning-Based Longitudinal Prediction of Childhood Myopia Progression Using Fundus Image Sequences and Baseline Refraction Data

Childhood myopia constitutes a significant global health concern. It exhibits an escalating prevalence and has the potential to evolve into severe, irreversible conditions that detrimentally impact familial well-being and create substantial economic costs. Contemporary research underscores the importance of precisely predicting myopia progression to enable timely and effective interventions, thereby averting severe visual impairment in children. Such predictions predominantly rely on subjective clinical assessments, which are inherently biased and resource-intensive, thus hindering their widespread application. In this study, we introduce a novel, high-accuracy method for quantitatively predicting the myopic trajectory and myopia risk in children using only fundus images and baseline refraction data. This approach was validated through a six-year longitudinal study of 3,408 children in Henan, utilizing 16,211 fundus images and corresponding refractive data. Our method based on deep learning demonstrated predictive accuracy with an error margin of 0.311D per year and AUC scores of 0.944 and 0.995 for forecasting the risks of developing myopia and high myopia, respectively. These findings confirm the utility of our model in supporting early intervention strategies and in significantly reducing healthcare costs, particularly by obviating the need for additional metadata and repeated consultations. Furthermore, our method was designed to rely only on fundus images and refractive error data, without the need for meta data or multiple inquiries from doctors, strongly reducing the associated medical costs and facilitating large-scale screening. Our model can even provide good predictions based on only a single time measurement. Consequently, the proposed method is an important means to reduce medical inequities caused by economic disparities.

Updated: 2024-07-31 09:26:20

标题: 基于深度学习的儿童近视进展的纵向预测，利用眼底图像序列和基线屈光度数据

摘要: 童年近视构成一个重要的全球健康问题。它呈现出不断增加的患病率，并有可能演变成严重、不可逆转的病情，对家庭福祉产生不利影响，并造成巨大的经济成本。当代研究强调准确预测近视发展的重要性，以便及时有效地干预，从而避免儿童出现严重视力障碍。这种预测主要依赖于主观临床评估，这种评估具有固有的偏见性和资源密集性，从而影响了它们的广泛应用。在这项研究中，我们介绍了一种新颖、高准确性的方法，仅利用眼底图像和基线屈光度数据来量化预测儿童的近视轨迹和近视风险。通过对河南省的3,408名儿童进行为期六年的纵向研究，利用16,211张眼底图像和相应的屈光数据对该方法进行了验证。我们基于深度学习的方法表现出了0.311D每年的误差边界和分别为0.944和0.995的AUC分数，用于预测近视和高度近视的风险。这些发现证实了我们模型在支持早期干预策略和显著降低医疗费用方面的实用性，特别是通过消除对额外元数据和重复咨询的需求。此外，我们的方法旨在仅依赖眼底图像和屈光误差数据，无需元数据或多次医生查询，大大降低相关医疗费用并促进大规模筛查。我们的模型甚至可以根据单次测量提供良好的预测。因此，所提出的方法是减少经济差距引起的医疗不平等的重要手段。

更新时间: 2024-07-31 09:26:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21467v1

Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling

We present a novel time series anomaly detection method that achieves excellent detection accuracy while offering a superior level of explainability. Our proposed method, TimeVQVAE-AD, leverages masked generative modeling adapted from the cutting-edge time series generation method known as TimeVQVAE. The prior model is trained on the discrete latent space of a time-frequency domain. Notably, the dimensional semantics of the time-frequency domain are preserved in the latent space, enabling us to compute anomaly scores across different frequency bands, which provides a better insight into the detected anomalies. Additionally, the generative nature of the prior model allows for sampling likely normal states for detected anomalies, enhancing the explainability of the detected anomalies through counterfactuals. Our experimental evaluation on the UCR Time Series Anomaly archive demonstrates that TimeVQVAE-AD significantly surpasses the existing methods in terms of detection accuracy and explainability. We provide our implementation on GitHub: https://github.com/ML4ITS/TimeVQVAE-AnomalyDetection.

Updated: 2024-07-31 09:20:43

标题: 可解释的时间序列异常检测：使用掩模潜在生成建模

摘要: 我们提出了一种新颖的时间序列异常检测方法，它在提供卓越的检测准确性的同时，还提供了更高级别的可解释性。我们提出的方法，TimeVQVAE-AD，利用了源自最前沿的时间序列生成方法TimeVQVAE的掩码生成建模。先前模型在时间-频率域的离散潜在空间上进行训练。值得注意的是，在潜在空间中保留了时间-频率域的维度语义，使我们能够跨不同频段计算异常分数，从而更好地了解检测到的异常。此外，先前模型的生成性质允许对检测到的异常进行正常状态的抽样，通过反事实来增强对检测到的异常的可解释性。我们在UCR时间序列异常存档上的实验评估表明，TimeVQVAE-AD在检测准确性和可解释性方面明显优于现有方法。我们在GitHub上提供了我们的实施：https://github.com/ML4ITS/TimeVQVAE-AnomalyDetection。

更新时间: 2024-07-31 09:20:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.12550v5

Impact of data for forecasting on performance of model predictive control in buildings with smart energy storage

Data is required to develop forecasting models for use in Model Predictive Control (MPC) schemes in building energy systems. However, data is costly to both collect and exploit. Determining cost optimal data usage strategies requires understanding of the forecast accuracy and resulting MPC operational performance it enables. This study investigates the performance of both simple and state-of-the-art machine learning prediction models for MPC in multi-building energy systems using a simulated case study with historic building energy data. The impact on forecast accuracy of measures to improve model data efficiency are quantified, specifically for: reuse of prediction models, reduction of training data duration, reduction of model data features, and online model training. A simple linear multi-layer perceptron model is shown to provide equivalent forecast accuracy to state-of-the-art models, with greater data efficiency and generalisability. The use of more than 2 years of training data for load prediction models provided no significant improvement in forecast accuracy. Forecast accuracy and data efficiency were improved simultaneously by using change-point analysis to screen training data. Reused models and those trained with 3 months of data had on average 10% higher error than baseline, indicating that deploying MPC systems without prior data collection may be economic.

Updated: 2024-07-31 09:17:54

标题: 数据预测对智能能源储存建筑中模型预测控制性能的影响

摘要: 数据是开发用于建筑能源系统中的模型预测控制（MPC）方案的预测模型所必需的。然而，数据的收集和利用都是昂贵的。确定成本最优的数据使用策略需要理解预测准确性及其带来的MPC操作性能。本研究通过使用具有历史建筑能源数据的模拟案例研究，调查了简单和最先进的机器学习预测模型在多建筑能源系统中用于MPC的性能。量化了用于改善模型数据效率的措施对预测准确性的影响，具体包括：预测模型的重用，减少训练数据持续时间，减少模型数据特征，以及在线模型训练。结果显示，简单的线性多层感知器模型提供了与最先进模型相当的预测准确性，且具有更高的数据效率和泛化能力。对于负载预测模型使用超过2年的训练数据并没有显著提高预测准确性。通过使用变点分析来筛选训练数据，同时提高了预测准确性和数据效率。重复使用的模型和使用3个月数据训练的模型的平均误差比基线高出10％，表明在没有先前数据采集的情况下部署MPC系统可能是经济的。

更新时间: 2024-07-31 09:17:54

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2402.12539v2

Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network

Reinforcement Learning (RL) algorithms have been used to address the challenging problems in the offloading process of vehicular ad hoc networks (VANET). More recently, they have been utilized to improve the dissemination of high-definition (HD) Maps. Nevertheless, implementing solutions such as deep Q-learning (DQN) and Actor-critic at the autonomous vehicle (AV) may lead to an increase in the computational load, causing a heavy burden on the computational devices and higher costs. Moreover, their implementation might raise compatibility issues between technologies due to the required modifications to the standards. Therefore, in this paper, we assess the scalability of an application utilizing a Q-learning single-agent solution in a distributed multi-agent environment. This application improves the network performance by taking advantage of a smaller state, and action space whilst using a multi-agent approach. The proposed solution is extensively evaluated with different test cases involving reward function considering individual or overall network performance, number of agents, and centralized and distributed learning comparison. The experimental results demonstrate that the time latencies of our proposed solution conducted in voice, video, HD Map, and best-effort cases have significant improvements, with 40.4%, 36%, 43%, and 12% respectively, compared to the performances with the single-agent approach.

Updated: 2024-07-31 09:17:09

标题: 在车辆网络中进行高清地图更新的多智能体评估与QoS增强

摘要: 强化学习（RL）算法已被用于解决车载自组织网络（VANET）中的卸载过程中的挑战性问题。最近，它们还被用于改善高清晰度（HD）地图的传播。然而，在自动驾驶汽车（AV）中实施诸如深度Q学习（DQN）和演员-评论家等解决方案可能会增加计算负担，对计算设备造成沉重负担和更高的成本。此外，由于需要对标准进行修改，它们的实施可能会引起技术之间的兼容性问题。因此，在本文中，我们评估了在分布式多代理环境中利用Q学习单一代理解决方案的可扩展性。该应用通过利用较小的状态和动作空间，并采用多代理方法，改善了网络性能。所提出的解决方案在考虑个体或整体网络性能、代理数量以及集中式和分布式学习比较的不同测试案例中进行了广泛评估。实验结果表明，我们提出的解决方案在语音、视频、HD地图和尽力而为情况下的时间延迟分别比单一代理方法的性能提高了40.4%、36%、43%和12%。

更新时间: 2024-07-31 09:17:09

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21460v1

KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making

Data is crucial for evidence-based policymaking and enhancing public services, including those at the Ministry of Finance of the Republic of Indonesia. However, the complexity and dynamic nature of governmental financial data and regulations can hinder decision-making. This study investigates the potential of Large Language Models (LLMs) to address these challenges, focusing on Indonesia's financial data and regulations. While LLMs are effective in the financial sector, their use in the public sector in Indonesia is unexplored. This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning. The dataset from 2003 to 2023 was collected from the Ministry of Finance, Statistics Indonesia and the International Monetary Fund (IMF). Surveys and interviews with Ministry officials informed, enhanced and fine-tuned the model. We evaluated the model using human feedback, LLM-based evaluation and benchmarking. The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%. The Retrieval-Augmented Generation Assessment (RAGAS) framework showed that KemenkeuGPT achieved 44% correctness with 73% faithfulness, 40% precision and 60% recall, outperforming several other base models. An interview with an expert from the Ministry of Finance indicated that KemenkeuGPT has the potential to become an essential tool for decision-making. These results are expected to improve with continuous human feedback.

Updated: 2024-07-31 09:16:33

标题: KemenkeuGPT：利用印尼政府财务数据和法规的大型语言模型来增强决策制定

摘要: 数据对于基于证据的政策制定和增强公共服务至关重要，包括印度尼西亚共和国财政部的服务。然而，政府财政数据和法规的复杂性和动态性可能会阻碍决策。本研究调查了大型语言模型（LLM）在解决这些挑战方面的潜力，重点关注印度尼西亚的财政数据和法规。虽然LLM在金融部门中很有效，但在印度尼西亚公共部门中的应用尚未被探索。本研究通过使用Retrieval-Augmented Generation（RAG）、提示工程和微调，进行迭代过程开发了KemenkeuGPT。数据集从2003年到2023年收集自财政部、印度尼西亚统计局和国际货币基金组织（IMF）。与财政部官员的调查和访谈提供了信息，增强和微调模型。我们使用人类反馈、基于LLM的评估和基准测试评估了模型。模型的准确性从35%提高到61%，正确性从48%提高到64%。Retrieval-Augmented Generation Assessment（RAGAS）框架显示KemenkeuGPT达到了44%的正确性，73%的忠实度，40%的精度和60%的召回率，优于几个其他基础模型。与财政部专家的访谈表明，KemenkeuGPT有成为决策的重要工具的潜力。这些结果有望通过持续的人类反馈得到改进。

更新时间: 2024-07-31 09:16:33

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2407.21459v1

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research.

Updated: 2024-07-31 09:14:29

标题: 大型语言模型中的知识机制：调查和展望

摘要: 理解大型语言模型（LLMs）中的知识机制对于向可信任的AGI迈进至关重要。本文从新颖的分类法中审视了知识机制分析，包括知识利用和演化。知识利用深入探讨了记忆、理解和应用、以及创造的机制。知识演化关注于个体和群体LLMs内知识的动态进展。此外，我们讨论了LLMs所学得的知识、参数化知识脆弱性的原因，以及可能具有挑战性的潜在黑暗知识（假设）。我们希望这项工作能帮助理解LLMs中的知识，并为未来研究提供启示。

更新时间: 2024-07-31 09:14:29

领域: cs.CL,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.15017v2

RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot

Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~42% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers.

Updated: 2024-07-31 09:09:24

标题: RAPGen：一种用于修复零镜代码低效的方法

摘要: 性能错误是一种在经过充分测试的商业产品中甚至会出现的非功能性错误。修复这些性能错误是一个重要但具有挑战性的问题。在这项工作中，我们解决了这一挑战，并提出了一种名为检索增强提示生成（RAPGen）的新方法。给定一个存在性能问题的代码片段，RAPGen首先从预先构建的知识库中检索出一个提示指令，然后使用检索到的指令生成一个提示。然后，在一个大型语言模型（如Codex）上使用这个提示来生成一个修复方案。我们将我们的方法与各种提示变体和最新方法在性能错误修复任务中进行比较。我们的评估表明，在一个经过专家验证的数据集中，RAPGen在大约60%的情况下可以生成与开发人员相当或更好的性能改进建议，其中有大约42%是逐字逐句的，这些数据集包括过去由C#开发人员进行的性能更改。

更新时间: 2024-07-31 09:09:24

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2306.17077v3

TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors

Monitoring biodiversity at scale is challenging. Detecting and identifying species in fine grained taxonomies requires highly accurate machine learning (ML) methods. Training such models requires large high quality data sets. And deploying these models to low power devices requires novel compression techniques and model architectures. While species classification methods have profited from novel data sets and advances in ML methods, in particular neural networks, deploying these state of the art models to low power devices remains difficult. Here we present a comprehensive empirical comparison of various tinyML neural network architectures and compression techniques for species classification. We focus on the example of bird song detection, more concretely a data set curated for studying the corn bunting bird species. The data set is released along with all code and experiments of this study. In our experiments we compare predictive performance, memory and time complexity of classical spectrogram based methods and recent approaches operating on raw audio signal. Our results indicate that individual bird species can be robustly detected with relatively simple architectures that can be readily deployed to low power devices.

Updated: 2024-07-31 08:57:42

标题: 微型鸟鸣：使用TinyML模型在低功耗无线声学传感器上进行鸟鸣识别

摘要: 在大规模监测生物多样性是具有挑战性的。在细粒度分类中检测和识别物种需要高度准确的机器学习（ML）方法。训练这些模型需要大量高质量的数据集。而将这些模型部署到低功耗设备则需要新型的压缩技术和模型架构。虽然物种分类方法受益于新型数据集和机器学习方法的进步，尤其是神经网络，但将这些最先进的模型部署到低功耗设备仍然困难。本文介绍了各种微型ML神经网络架构和压缩技术在物种分类中的全面实证比较。我们以鸟类鸣声检测为例，更具体地说是针对研究玉米鹀鸟种类而策划的数据集。该数据集与本研究的所有代码和实验一同发布。在我们的实验中，我们比较了基于经典频谱图的方法和最近在原始音频信号上操作的方法的预测性能、内存和时间复杂度。我们的结果表明，可以使用相对简单的架构稳健地检测到个体鸟类，这些架构可以轻松部署到低功耗设备中。

更新时间: 2024-07-31 08:57:42

领域: cs.LG,cs.AI,cs.SD,eess.AS,eess.SP

下载: http://arxiv.org/abs/2407.21453v1

Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency

Despite large language models (LLMs) have demonstrated impressive performance in various tasks, they are still suffering from the factual inconsistency problem called hallucinations. For instance, LLMs occasionally generate content that diverges from source article, and prefer to extract information that appears at the beginning and end of the context, especially in long document summarization. Inspired by these findings, we propose to improve the faithfulness of LLMs in summarization by impelling them to process the entire article more fairly and faithfully. We present a novel summary generation strategy, namely SliSum, which exploits the ideas of sliding windows and self-consistency. Specifically, SliSum divides the source article into overlapping windows, and utilizes LLM to generate local summaries for the content in the windows. Finally, SliSum aggregates all local summaries using clustering and majority voting algorithm to produce more faithful summary of entire article. Extensive experiments demonstrate that SliSum significantly improves the faithfulness of diverse LLMs including LLaMA-2, Claude-2 and GPT-3.5 in both short and long text summarization, while maintaining their fluency and informativeness and without additional fine-tuning and resources. We further conduct qualitative and quantitative studies to investigate why SliSum works and impacts of hyperparameters in SliSum on performance.

Updated: 2024-07-31 08:48:48

标题: 通过滑动生成和自一致性提高大型语言模型在摘要中的忠实度

摘要: 尽管大型语言模型（LLMs）在各种任务中展示出令人印象深刻的性能，它们仍然受到被称为幻觉的事实不一致问题的困扰。例如，LLMs偶尔会生成与源文章不符的内容，并倾向于提取出现在文本开头和结尾的信息，特别是在长文档摘要中。受这些发现的启发，我们提出通过促使它们更公平和忠实地处理整篇文章来提高LLMs在摘要生成中的忠实度。我们提出了一种新的摘要生成策略，即SliSum，它利用滑动窗口和自一致性的思想。具体而言，SliSum将源文章分成重叠的窗口，并利用LLM为窗口中的内容生成局部摘要。最后，SliSum利用聚类和多数表决算法汇总所有局部摘要，以产生更忠实的整篇文章摘要。大量实验证明，SliSum显著提高了各种LLMs（包括LLaMA-2、Claude-2和GPT-3.5）在短文本和长文本摘要中的忠实度，同时保持它们的流畅性和信息量，并且不需要额外的微调和资源。我们进一步进行定性和定量研究，以探讨SliSum为何有效以及SliSum中超参数对性能的影响。

更新时间: 2024-07-31 08:48:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21443v1

Ontologies for Models and Algorithms in Applied Mathematics and Related Disciplines

In applied mathematics and related disciplines, the modeling-simulation-optimization workflow is a prominent scheme, with mathematical models and numerical algorithms playing a crucial role. For these types of mathematical research data, the Mathematical Research Data Initiative has developed, merged and implemented ontologies and knowledge graphs. This contributes to making mathematical research data FAIR by introducing semantic technology and documenting the mathematical foundations accordingly. Using the concrete example of microfracture analysis of porous media, it is shown how the knowledge of the underlying mathematical model and the corresponding numerical algorithms for its solution can be represented by the ontologies.

Updated: 2024-07-31 08:47:41

标题: 应用数学和相关学科中的模型和算法本体论

摘要: 在应用数学及相关学科中，建模-模拟-优化工作流程是一个突出的方案，数学模型和数值算法发挥着至关重要的作用。针对这类数学研究数据，数学研究数据倡议已经开发、合并和实施了本体论和知识图谱。通过引入语义技术并相应记录数学基础，这有助于使数学研究数据变得FAIR。以多孔介质微裂纹分析为具体例子，展示了如何通过本体论来表示基础数学模型及其解决方案的对应数值算法。

更新时间: 2024-07-31 08:47:41

领域: cs.AI,cs.DB,cs.DL,cs.IR,H.3; H.4; I.2.4

下载: http://arxiv.org/abs/2310.20443v2

Explain the Black Box for the Sake of Science: Revisiting the Scientific Method in the Era of Generative Artificial Intelligence

The scientific method is the cornerstone of human progress across all branches of the natural and applied sciences, from understanding the human body to explaining how the universe works. The scientific method is based on identifying systematic rules or principles that describe the phenomenon of interest in a reproducible way that can be validated through experimental evidence. In the era of artificial intelligence (AI), there are discussions on how AI systems may discover new knowledge. We argue that human complex reasoning for scientific discovery remains of vital importance, at least before the advent of artificial general intelligence. Yet, AI can be leveraged for scientific discovery via explainable AI. More specifically, knowing what data AI systems deemed important to make decisions can be a point of contact with domain experts and scientists, that can lead to divergent or convergent views on a given scientific problem. Divergent views may spark further scientific investigations leading to new scientific knowledge.

Updated: 2024-07-31 08:47:36

标题: 为了科学解释黑盒子：在生成人工智能时代重新审视科学方法

摘要: 科学方法是人类在自然和应用科学的所有领域取得进步的基石，从理解人体到解释宇宙如何运作。科学方法基于识别系统性规则或原则，以可重复验证的方式描述感兴趣的现象，并通过实验证据加以验证。在人工智能（AI）时代，人们讨论AI系统如何发现新知识。我们认为，在人工通用智能出现之前，人类复杂的科学发现推理仍然至关重要。然而，通过可解释的人工智能，AI可以被利用于科学发现。更具体地说，了解AI系统认为重要的数据以做出决策可以成为与领域专家和科学家接触的一个点，这可能导致给定科学问题的不同或相同观点。不同观点可能引发进一步的科学调查，从而产生新的科学知识。

更新时间: 2024-07-31 08:47:36

领域: cs.AI,cs.CY,math.DS

下载: http://arxiv.org/abs/2406.10557v2

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide accurate, up-to-date responses, particularly in dynamic or rapidly evolving contexts. Integrating Multimodal Retrieval-augmented Generation (Multimodal RAG) offers a promising solution, but the system would inevitably encounter the multi-granularity noisy correspondence (MNC) problem, which involves two types of noise: coarse-grained (query-caption) and fine-grained (query-image). This noise hinders accurate retrieval and generation. In this work, we propose \textbf{RagLLaVA}, a novel framework with knowledge-enhanced reranking and noise-injected training, to address these limitations. We instruction-tune the MLLM with a simple yet effective instruction template to induce its ranking ability and serve it as a reranker to precisely filter the top-k retrieved images. For generation, we inject visual noise during training at the data and token levels to enhance the generator's robustness. Extensive experiments are conducted on the subsets of two datasets that require retrieving and reasoning over images to answer a given query. Our results demonstrate the superiority of RagLLaVA in retrieving accurately and generating robustly. Code and models are available at https://github.com/IDEA-FinAI/RagLLaVA.

Updated: 2024-07-31 08:43:17

标题: MLLM是一个强大的重新排序器：通过知识增强的重新排序和注入噪声的训练推进多模式检索增强生成

摘要: 多模态大型语言模型（MLLMs）已经展示出在处理和生成跨多种数据模态（包括文本、图像、音频和视频）内容方面的显著能力。然而，MLLMs的一个重要缺点是它们依赖于静态训练数据，导致信息过时和上下文意识有限。这种静态性质阻碍了它们在动态或快速发展的环境中提供准确、最新响应的能力。整合多模态检索增强生成（Multimodal RAG）提供了一个有希望的解决方案，但系统不可避免地会遇到多粒度嘈杂对应（MNC）问题，涉及两种类型的噪声：粗粒度（查询-标题）和细粒度（查询-图像）噪声。这种噪声阻碍了准确的检索和生成。在这项工作中，我们提出了一种新颖的框架RagLLaVA，具有知识增强的重新排名和噪声注入训练，以解决这些限制。我们使用简单但有效的指导模板对MLLM进行指令调整，以诱导其排序能力，并将其作为重新排名器，精确地过滤出前k个检索到的图像。对于生成，我们在数据和标记级别注入视觉噪声以增强生成器的稳健性。我们在需要检索和推理图像以回答给定查询的两个数据集的子集上进行了大量实验。我们的结果表明RagLLaVA在准确检索和稳健生成方面的优越性。代码和模型可在https://github.com/IDEA-FinAI/RagLLaVA获得。

更新时间: 2024-07-31 08:43:17

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21439v1

Transient anisotropic kernel for probabilistic learning on manifolds

PLoM (Probabilistic Learning on Manifolds) is a method introduced in 2016 for handling small training datasets by projecting an It\^o equation from a stochastic dissipative Hamiltonian dynamical system, acting as the MCMC generator, for which the KDE-estimated probability measure with the training dataset is the invariant measure. PLoM performs a projection on a reduced-order vector basis related to the training dataset, using the diffusion maps (DMAPS) basis constructed with a time-independent isotropic kernel. In this paper, we propose a new ISDE projection vector basis built from a transient anisotropic kernel, providing an alternative to the DMAPS basis to improve statistical surrogates for stochastic manifolds with heterogeneous data. The construction ensures that for times near the initial time, the DMAPS basis coincides with the transient basis. For larger times, the differences between the two bases are characterized by the angle of their spanned vector subspaces. The optimal instant yielding the optimal transient basis is determined using an estimation of mutual information from Information Theory, which is normalized by the entropy estimation to account for the effects of the number of realizations used in the estimations. Consequently, this new vector basis better represents statistical dependencies in the learned probability measure for any dimension. Three applications with varying levels of statistical complexity and data heterogeneity validate the proposed theory, showing that the transient anisotropic kernel improves the learned probability measure.

Updated: 2024-07-31 08:38:39

标题: 在流形上的概率学习中的瞬时各向异性核

摘要: PLoM（Probabilistic Learning on Manifolds）是一种在2016年引入的方法，用于处理小训练数据集，通过从随机耗散哈密顿动力系统中投影出一个It\^o方程，作为MCMC生成器，其中使用KDE估计的概率测度与训练数据集是不变的测度。PLoM对与训练数据集相关的降阶向量基执行投影，使用由时间独立各向同性核构建的扩散映射（DMAPS）基础。在本文中，我们提出了一个从瞬态各向异性核构建的新的ISDE投影向量基，为具有异质数据的随机流形改进统计代理提供了一种替代方法。构造确保在接近初始时间的时间点，DMAPS基础与瞬态基础重合。对于较长时间，两个基础之间的差异由它们跨越的向量子空间的角度来表征。利用信息理论中的互信息估计确定产生最佳瞬态基础的最佳时刻，该估计通过熵估计进行归一化，以考虑在估计中使用的实现数量的影响。因此，这个新的向量基更好地代表了学习概率测度中的统计依赖关系，无论维度如何。通过三个具有不同统计复杂性和数据异质性水平的应用来验证所提出的理论，结果显示瞬态各向异性核改进了学习的概率测度。

更新时间: 2024-07-31 08:38:39

领域: stat.ML,cs.LG,68Q32, 68T05, 62R30, 6 0J20,G.3

下载: http://arxiv.org/abs/2407.21435v1

Analysis and Predictive Modeling of Solar Coronal Holes Using Computer Vision and ARIMA-LSTM Networks

In the era of space exploration, coronal holes on the sun play a significant role due to their impact on satellites and aircraft through their open magnetic fields and increased solar wind emissions. This study employs computer vision techniques to detect coronal hole regions and estimate their sizes using imagery from the Solar Dynamics Observatory (SDO). Additionally, we utilize hybrid time series prediction model, specifically combination of Long Short-Term Memory (LSTM) networks and ARIMA, to analyze trends in the area of coronal holes and predict their areas across various solar regions over a span of seven days. By examining time series data, we aim to identify patterns in coronal hole behavior and understand their potential effects on space weather.

Updated: 2024-07-31 08:28:48

标题: 利用计算机视觉和ARIMA-LSTM网络分析和预测太阳日冕空洞

摘要: 在太空探索时代，太阳上的日冕空洞由于其对卫星和飞行器的影响而发挥着重要作用，这是由于它们的开放磁场和增加的太阳风排放。本研究利用计算机视觉技术检测太阳动力学观测卫星（SDO）图像中的日冕空洞区域并估计其大小。此外，我们利用混合时间序列预测模型，具体是长短期记忆（LSTM）网络和ARIMA的组合，分析日冕空洞面积的趋势，并预测在七天的时间跨度内在各种太阳区域的面积。通过检查时间序列数据，我们旨在识别日冕空洞行为中的模式，并了解它们对空间天气的潜在影响。

更新时间: 2024-07-31 08:28:48

领域: astro-ph.SR,astro-ph.EP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09802v3

Managing Large Enclaves in a Data Center

Live migration of applications and VMs in data centers is an old and quintessential problem. In this large body of work, an important open problem still remains, which is the migration of secure enclaves (sandboxes) running on trusted execution environments (TEEs) like Intel SGX. Here, the decade-old stop-and-copy-based method is used, in which the entire application`s execution is stopped and the state is collected and transferred. This method has an exceedingly long downtime when we consider enclaves with large memory footprints. Better solutions have eluded us because of some design limitations posed by TEEs like Intel SGX, such as the opacity of data within enclaves (not visible to the OS/hypervisor) and the lack of mechanisms to track writes on secure pages. We propose a new technique, OptMig, to circumvent these limitations and implement secure enclave migration with a near-zero downtime. We rely on a short compiler pass and propose a novel migration mechanism. Our optimizations reduce the total downtime by 77-96% for a suite of Intel SGX applications that have multi-GB memory footprints. We show results for our system on a real cloud and in settings that use containers, VMs, and microVMs

Updated: 2024-07-31 08:28:27

标题: 在数据中心中管理大型飞地

摘要: 数据中心中应用程序和虚拟机的实时迁移是一个古老而基本的问题。在这一大量的工作中，仍然存在一个重要的开放问题，即安全隔离区（沙盒）在受信任执行环境（TEE）上运行时的迁移，如英特尔SGX。在这里，使用了十年前的基于停止和复制的方法，即停止整个应用程序的执行并收集和传输状态。当考虑具有大内存占用的隔离区时，该方法的停机时间过长。更好的解决方案一直困扰着我们，因为TEE（如英特尔SGX）提出了一些设计限制，如隔离区内数据的不透明性（对操作系统/虚拟化程序不可见）以及缺乏跟踪安全页面写入的机制。我们提出了一种新技术OptMig，以规避这些限制，并实现接近零停机时间的安全隔离区迁移。我们依赖于一个简短的编译器传递，并提出了一种新颖的迁移机制。我们的优化可将具有多GB内存占用的套件的英特尔SGX应用程序的总停机时间减少77-96%。我们展示了我们系统在真实云中以及使用容器、虚拟机和微虚拟机的环境中的结果。

更新时间: 2024-07-31 08:28:27

领域: cs.CR

下载: http://arxiv.org/abs/2311.06991v2

Cost-Based Semantics for Querying Inconsistent Weighted Knowledge Bases

In this paper, we explore a quantitative approach to querying inconsistent description logic knowledge bases. We consider weighted knowledge bases in which both axioms and assertions have (possibly infinite) weights, which are used to assign a cost to each interpretation based upon the axioms and assertions it violates. Two notions of certain and possible answer are defined by either considering interpretations whose cost does not exceed a given bound or restricting attention to optimal-cost interpretations. Our main contribution is a comprehensive analysis of the combined and data complexity of bounded cost satisfiability and certain and possible answer recognition, for description logics between ELbot and ALCO.

Updated: 2024-07-31 08:26:28

标题: 基于成本的语义查询不一致加权知识库

摘要: 在这篇论文中，我们探讨了一种定量方法来查询不一致的描述逻辑知识库。我们考虑加权知识库，其中公理和断言都具有（可能是无限的）权重，这些权重用于根据违反的公理和断言为每个解释分配成本。通过考虑成本不超过给定界限的解释或将注意力限制在最优成本解释上，定义了确定和可能答案的两个概念。我们的主要贡献是对有界成本可满足性以及确定和可能答案识别的综合分析，适用于ELbot和ALCO之间的描述逻辑的组合和数据复杂性。

更新时间: 2024-07-31 08:26:28

领域: cs.LO,cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.20754v2

Deformable 3D Shape Diffusion Model

The Gaussian diffusion model, initially designed for image generation, has recently been adapted for 3D point cloud generation. However, these adaptations have not fully considered the intrinsic geometric characteristics of 3D shapes, thereby constraining the diffusion model's potential for 3D shape manipulation. To address this limitation, we introduce a novel deformable 3D shape diffusion model that facilitates comprehensive 3D shape manipulation, including point cloud generation, mesh deformation, and facial animation. Our approach innovatively incorporates a differential deformation kernel, which deconstructs the generation of geometric structures into successive non-rigid deformation stages. By leveraging a probabilistic diffusion model to simulate this step-by-step process, our method provides a versatile and efficient solution for a wide range of applications, spanning from graphics rendering to facial expression animation. Empirical evidence highlights the effectiveness of our approach, demonstrating state-of-the-art performance in point cloud generation and competitive results in mesh deformation. Additionally, extensive visual demonstrations reveal the significant potential of our approach for practical applications. Our method presents a unique pathway for advancing 3D shape manipulation and unlocking new opportunities in the realm of virtual reality.

Updated: 2024-07-31 08:24:42

标题: 可变形三维形状扩散模型

摘要: 高斯扩散模型最初设计用于图像生成，最近已经被调整为用于3D点云生成。然而，这些调整并没有充分考虑3D形状的固有几何特征，从而限制了扩散模型在3D形状操作方面的潜力。为了解决这一限制，我们引入了一种新颖的可变形3D形状扩散模型，促进了全面的3D形状操作，包括点云生成、网格变形和面部动画。我们的方法创新地融入了差分变形核，将几何结构的生成分解为连续的非刚性变形阶段。通过利用概率扩散模型模拟这一逐步过程，我们的方法为涵盖从图形渲染到面部表情动画的广泛应用提供了多功能且高效的解决方案。实证证据突显了我们方法的有效性，在点云生成方面展现出最先进的性能，并在网格变形方面取得了有竞争力的结果。此外，广泛的视觉演示揭示了我们方法在实际应用中的巨大潜力。我们的方法为推进3D形状操作提供了一条独特的途径，并在虚拟现实领域开辟了新的机遇。

更新时间: 2024-07-31 08:24:42

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2407.21428v1

Towards a FAIR Documentation of Workflows and Models in Applied Mathematics

Modeling-Simulation-Optimization workflows play a fundamental role in applied mathematics. The Mathematical Research Data Initiative, MaRDI, responded to this by developing a FAIR and machine-interpretable template for a comprehensive documentation of such workflows. MaRDMO, a Plugin for the Research Data Management Organiser, enables scientists from diverse fields to document and publish their workflows on the MaRDI Portal seamlessly using the MaRDI template. Central to these workflows are mathematical models. MaRDI addresses them with the MathModDB ontology, offering a structured formal model description. Here, we showcase the interaction between MaRDMO and the MathModDB Knowledge Graph through an algebraic modeling workflow from the Digital Humanities. This demonstration underscores the versatility of both services beyond their original numerical domain.

Updated: 2024-07-31 08:19:16

标题: 走向应用数学中工作流程和模型的FAIR文档化

摘要: 建模-仿真-优化工作流在应用数学中起着基础性作用。数学研究数据倡议（MaRDI）通过开发一个符合FAIR和机器可解释性的模板，为这种工作流的全面文档化做出了响应。研究数据管理组织器的插件MaRDMO使来自不同领域的科学家可以使用MaRDI模板无缝地记录和发布他们的工作流至MaRDI门户网站。这些工作流的核心是数学模型。MaRDI通过MathModDB本体论解决了这些问题，提供了一个结构化的形式模型描述。在这里，我们展示了数字人文领域的代数建模工作流程中MaRDMO和MathModDB知识图之间的互动。这个演示突显了这两项服务在原始数值领域之外的多功能性。

更新时间: 2024-07-31 08:19:16

领域: cs.AI,cs.DB,cs.DL,H.3.3; H.3.7; E.0

下载: http://arxiv.org/abs/2403.17778v2

Cost-Effective Hallucination Detection for LLMs

Large language models (LLMs) can be prone to hallucinations - generating unreliable outputs that are unfaithful to their inputs, external facts or internally inconsistent. In this work, we address several challenges for post-hoc hallucination detection in production settings. Our pipeline for hallucination detection entails: first, producing a confidence score representing the likelihood that a generated answer is a hallucination; second, calibrating the score conditional on attributes of the inputs and candidate response; finally, performing detection by thresholding the calibrated score. We benchmark a variety of state-of-the-art scoring methods on different datasets, encompassing question answering, fact checking, and summarization tasks. We employ diverse LLMs to ensure a comprehensive assessment of performance. We show that calibrating individual scoring methods is critical for ensuring risk-aware downstream decision making. Based on findings that no individual score performs best in all situations, we propose a multi-scoring framework, which combines different scores and achieves top performance across all datasets. We further introduce cost-effective multi-scoring, which can match or even outperform more expensive detection methods, while significantly reducing computational overhead.

Updated: 2024-07-31 08:19:06

标题: 低成本的用于LLMs的幻觉检测

摘要: 大型语言模型(LLMs)可能容易出现幻觉-生成不可靠的输出，与其输入、外部事实或内部不一致。在这项工作中，我们解决了生产环境中后期幻觉检测的几个挑战。我们的幻觉检测流程包括：首先，生成表示生成答案为幻觉可能性的置信分数；其次，根据输入和候选响应的属性进行分数校准；最后，通过对校准分数进行阈值处理进行检测。我们在不同数据集上对各种最先进的评分方法进行基准测试，涵盖问答、事实核实和总结任务。我们使用不同的LLMs来确保综合评估性能。我们表明，调整单个评分方法对于确保风险感知的下游决策至关重要。根据发现，没有单个评分在所有情况下表现最佳，我们提出了一个多评分框架，结合不同的评分并在所有数据集上实现最佳性能。我们进一步介绍了经济高效的多评分方法，可以与甚至超越更昂贵的检测方法相匹配，同时显着减少计算开销。

更新时间: 2024-07-31 08:19:06

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.21424v1

Microservice Vulnerability Analysis: A Literature Review with Empirical Insights

Microservice architectures are revolutionizing both small businesses and large corporations, igniting a new era of innovation with their exceptional advantages in maintainability, reusability, and scalability. However, these benefits come with significant security challenges, as the increased complexity of service interactions, expanded attack surfaces, and intricate dependency management introduce a new array of cybersecurity vulnerabilities. While security concerns are mounting, there is a lack of comprehensive research that integrates a review of existing knowledge with empirical analysis of microservice vulnerabilities. This study aims to fill this gap by gathering, analyzing, and synthesizing existing literature on security vulnerabilities associated with microservice architectures. Through a thorough examination of 62 studies, we identify, analyze, and report 126 security vulnerabilities inherent in microservice architectures. This comprehensive analysis enables us to (i) propose a taxonomy that categorizes microservice vulnerabilities based on the distinctive features of microservice architectures; (ii) conduct an empirical analysis by performing vulnerability scans on four diverse microservice benchmark applications using three different scanning tools to validate our taxonomy; and (iii) map our taxonomy vulnerabilities with empirically identified vulnerabilities, providing an in-depth vulnerability analysis at microservice, application, and scanning tool levels. Our study offers crucial guidelines for practitioners and researchers to advance both the state-of-the-practice and the state-of-the-art in securing microservice architectures.

Updated: 2024-07-31 08:13:42

标题: 微服务漏洞分析：具有实证见解的文献综述

摘要: 微服务架构正在彻底改变小型企业和大型公司，以其在可维护性、可重用性和可扩展性方面的出色优势引发了创新的新时代。然而，这些优势伴随着重大的安全挑战，因为服务交互的复杂性增加、攻击面扩大以及错综复杂的依赖管理引入了一系列新的网络安全漏洞。尽管安全问题日益加剧，但目前缺乏综合研究，即整合对现有知识的审查和对微服务漏洞的实证分析。本研究旨在通过收集、分析和综合现有文献，探讨与微服务架构相关的安全漏洞。通过对62项研究的彻底审查，我们确定、分析并报告了微服务架构中固有的126个安全漏洞。这一综合分析使我们能够（i）提出一个基于微服务架构独特特征对微服务漏洞进行分类的分类法；（ii）通过使用三种不同的扫描工具对四个不同的微服务基准应用程序进行漏洞扫描，进行实证分析以验证我们的分类法；以及（iii）将我们的分类法漏洞与实证确定的漏洞进行映射，提供微服务、应用程序和扫描工具级别的深入漏洞分析。我们的研究为从业者和研究人员提供了重要的指导，以推进微服务架构安全的最佳实践和最新技术的发展。

更新时间: 2024-07-31 08:13:42

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2408.03960v1

Need of AI in Modern Education: in the Eyes of Explainable AI (xAI)

Modern Education is not \textit{Modern} without AI. However, AI's complex nature makes understanding and fixing problems challenging. Research worldwide shows that a parent's income greatly influences a child's education. This led us to explore how AI, especially complex models, makes important decisions using Explainable AI tools. Our research uncovered many complexities linked to parental income and offered reasonable explanations for these decisions. However, we also found biases in AI that go against what we want from AI in education: clear transparency and equal access for everyone. These biases can impact families and children's schooling, highlighting the need for better AI solutions that offer fair opportunities to all. This chapter tries to shed light on the complex ways AI operates, especially concerning biases. These are the foundational steps towards better educational policies, which include using AI in ways that are more reliable, accountable, and beneficial for everyone involved.

Updated: 2024-07-31 08:11:33

标题: 现代教育中的人工智能需求：从可解释人工智能(xAI)的角度看

摘要: 现代教育如果没有人工智能就不算现代。然而，人工智能的复杂性使得理解和解决问题具有挑战性。全球研究显示，父母的收入极大地影响着孩子的教育。这促使我们探索人工智能，特别是复杂模型，如何使用可解释人工智能工具做出重要决策。我们的研究揭示了许多与家庭收入相关的复杂性，并为这些决策提供了合理的解释。然而，我们也发现了人工智能中存在的偏见，这与我们期望从教育人工智能中得到的透明度和公平准入相矛盾。这些偏见可能影响家庭和孩子的学校教育，突显了需要更好的人工智能解决方案，为所有人提供公平机会。本章试图揭示人工智能运作的复杂方式，特别是涉及偏见。这些是朝着更好的教育政策的基础步骤，其中包括以更可靠、负责任和有益于所有参与者的方式使用人工智能。

更新时间: 2024-07-31 08:11:33

领域: cs.AI

下载: http://arxiv.org/abs/2408.00025v1

FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers

Many artificial intelligence models process input data of different lengths and resolutions, making the shape of the tensors dynamic. The performance of these models depends on the shape of the tensors, which makes it difficult to optimize the tensors before the model runs. There are two common solutions to this problem. The first is to add useless data to the input to match a pre-optimized tensor library. The second is to use small basic tensors to create a tensor that is closest in size to the input data and then tune it to minimize padding. However, this second solution can be time-consuming. This paper proposes a new technique for deep learning compilers called FTuner. Instead of using a large design space or training a cost model, we use an abstract computational unit called the uKernel to patch together small, various-sized tensors to match the shape of the input tensor. We determine the shape of the uKernel using an analytic hardware information model. Experiments show that the FTuner can achieve comparable operators and end-to-end performance to vendor libraries and achieves 3\% speedup on existing auto-tuner with the model-training compiler while reducing tuning time by two orders of magnitude.

Updated: 2024-07-31 08:05:33

标题: FTuner：深度学习编译器的快速动态形状张量程序自动调整器

摘要: 许多人工智能模型处理不同长度和分辨率的输入数据，使得张量的形状动态变化。这些模型的性能取决于张量的形状，这使得在模型运行之前优化张量变得困难。解决这个问题有两种常见方法。第一种是向输入中添加无用数据以匹配预优化的张量库。第二种是使用小型基本张量来创建一个尺寸最接近输入数据的张量，然后调整以最小化填充。然而，这第二种解决方案可能会耗费大量时间。本文提出了一种新的深度学习编译器技术，称为FTuner。我们使用一个抽象的计算单元uKernel，将小型、不同尺寸的张量拼接在一起，以匹配输入张量的形状，而不是使用大型设计空间或训练成本模型。我们使用一个分析硬件信息模型来确定uKernel的形状。实验证明，FTuner可以实现与供应商库相当的运算符和端到端性能，并且在使用模型训练编译器时实现3\%的加速，同时将调整时间缩短了两个数量级。

更新时间: 2024-07-31 08:05:33

领域: cs.LG,cs.DC,68M20 (Primary)

下载: http://arxiv.org/abs/2407.21418v1

Deep Fréchet Regression

Advancements in modern science have led to the increasing availability of non-Euclidean data in metric spaces. This paper addresses the challenge of modeling relationships between non-Euclidean responses and multivariate Euclidean predictors. We propose a flexible regression model capable of handling high-dimensional predictors without imposing parametric assumptions. Two primary challenges are addressed: the curse of dimensionality in nonparametric regression and the absence of linear structure in general metric spaces. The former is tackled using deep neural networks, while for the latter we demonstrate the feasibility of mapping the metric space where responses reside to a low-dimensional Euclidean space using manifold learning. We introduce a reverse mapping approach, employing local Fr\'echet regression, to map the low-dimensional manifold representations back to objects in the original metric space. We develop a theoretical framework, investigating the convergence rate of deep neural networks under dependent sub-Gaussian noise with bias. The convergence rate of the proposed regression model is then obtained by expanding the scope of local Fr\'echet regression to accommodate multivariate predictors in the presence of errors in predictors. Simulations and case studies show that the proposed model outperforms existing methods for non-Euclidean responses, focusing on the special cases of probability measures and networks.

Updated: 2024-07-31 07:54:14

标题: 深度弗雷歇回归

摘要: 现代科学的进步导致在度量空间中非欧几里得数据的可用性不断增加。本文解决了建立非欧几里得响应和多变量欧几里得预测变量之间关系的挑战。我们提出了一种灵活的回归模型，能够处理高维预测变量而不施加参数假设。主要解决了两个挑战：在非参数回归中维数灾难和一般度量空间中缺乏线性结构。前者使用深度神经网络来解决，而对于后者，我们展示了利用流形学习将响应所在的度量空间映射到低维欧几里得空间的可行性。我们引入了一种反向映射方法，利用局部Fr\'echet回归，将低维流形表示映射回原始度量空间中的对象。我们制定了一个理论框架，研究了在存在偏差的依赖性子高斯噪声下深度神经网络的收敛速率。然后，通过扩展局部Fr\'echet回归的范围以适应存在预测变量误差的多变量预测变量，得到了所提出的回归模型的收敛速率。模拟和案例研究表明，所提出的模型在非欧几里得响应方面优于现有方法，重点关注概率测度和网络的特殊情况。

更新时间: 2024-07-31 07:54:14

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2407.21407v1

ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-Modal Uniform Alignment

Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in a unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https://github.com/wzy-Sarah/ZeroDDI.

Updated: 2024-07-31 07:51:56

标题: ZeroDDI: 一个使用语义增强学习和双模态统一对齐的零射击药物相互作用事件预测方法

摘要: 药物相互作用（DDIs）可能导致各种药理学变化，可以被分类为不同的类别，即DDI事件（DDIEs）。近年来，先前未被观察到/未被看到的DDIEs不断出现，提出了一个新的分类任务，当未见过的类别在训练阶段没有标记实例时，这被规定为零样本DDIE预测（ZS-DDIE）任务。然而，现有的计算方法并不直接适用于ZS-DDIE，它面临两个主要挑战：获得适当的DDIE表示和处理类别不平衡问题。为了克服这些挑战，我们提出了一种名为ZeroDDI的新方法，用于ZS-DDIE任务。具体来说，我们设计了一个生物语义增强的DDIE表示学习模块，强调关键的生物语义并提炼出与DDIE表示学习相关的具有区别性的分子亚结构语义。此外，我们提出了一种双模态统一对齐策略，将药物对表示和DDIE语义表示均匀分布在一个单位球体中，并对齐匹配的部分，从而缓解类别不平衡问题。大量实验证明，ZeroDDI超越了基线，并表明它是一个检测未知DDIEs的有前途的工具。我们的代码已在https://github.com/wzy-Sarah/ZeroDDI上发布。

更新时间: 2024-07-31 07:51:56

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2407.00891v2

Fingerprint Theft Using Smart Padlocks: Droplock Exploits and Defenses

There is growing adoption of smart devices such as digital locks with remote control and sophisticated authentication mechanisms. However, a lack of attention to device security and user-awareness beyond the primary function of these IoT devices may be exposing users to invisible risks. This paper extends upon prior work that defined the "droplock", an attack whereby a smart lock is turned into a wireless fingerprint harvester. We perform a more in-depth analysis of a broader range of vulnerabilities and exploits that make a droplock attack easier to perform and harder to detect. Analysis is extended to a range of other smart lock models, and a threat model is used as the basis to recommend stronger security controls that may mitigate the risks of such as attack.

Updated: 2024-07-31 07:40:05

标题: 指纹盗窃利用智能挂锁：Droplock的利用和防御

摘要: 越来越多的智能设备（如带远程控制和复杂认证机制的数字锁）被广泛采用。然而，对设备安全和用户意识的关注不足，仅仅关注这些物联网设备的主要功能，可能会让用户暴露于看不见的风险之中。本文延续了先前定义的“droplock”攻击的工作，该攻击将智能锁变成无线指纹收集器。我们对更广泛范围的漏洞和攻击进行了更深入的分析，这些漏洞和攻击使得droplock攻击更容易执行，更难被检测到。分析还扩展到其他智能锁型号，并使用威胁模型作为基础，推荐更强大的安全控制措施，以减轻此类攻击的风险。

更新时间: 2024-07-31 07:40:05

领域: cs.CR

下载: http://arxiv.org/abs/2407.21398v1

A learning theory for quantum photonic processors and beyond

We consider the tasks of learning quantum states, measurements and channels generated by continuous-variable (CV) quantum circuits. This family of circuits is suited to describe optical quantum technologies and in particular it includes state-of-the-art photonic processors capable of showing quantum advantage. We define classes of functions that map classical variables, encoded into the CV circuit parameters, to outcome probabilities evaluated on those circuits. We then establish efficient learnability guarantees for such classes, by computing bounds on their pseudo-dimension or covering numbers, showing that CV quantum circuits can be learned with a sample complexity that scales polynomially with the circuit's size, i.e., the number of modes. Our results show that CV circuits can be trained efficiently using a number of training samples that, unlike their finite-dimensional counterpart, does not scale with the circuit depth.

Updated: 2024-07-31 07:39:48

标题: 一个学习理论用于量子光子处理器和更多领域

摘要: 我们考虑学习由连续变量（CV）量子电路生成的量子态、测量和通道的任务。这类电路适合描述光学量子技术，特别是包括能够展示量子优势的最新光子处理器。我们定义了一类将经编码为CV电路参数的经典变量映射到在这些电路上评估的结果概率的函数。然后，通过计算它们的伪维度或覆盖数的上界，我们为这些类建立了高效的可学习性保证，表明CV量子电路可以用与电路大小（即模式数）多项式地扩展的样本复杂度来学习。我们的结果表明，与其有限维对应物不同，CV电路可以通过少量训练样本有效地训练，而这些样本的数量不会随着电路深度的增加而增加。

更新时间: 2024-07-31 07:39:48

领域: quant-ph,cs.CC,cs.IT,cs.LG,math-ph,math.IT,math.MP

下载: http://arxiv.org/abs/2209.03075v4

QQQ: Quality Quattuor-Bit Quantization for Large Language Models

Quantization is a proven effective method for compressing large language models. Although popular techniques like W8A8 and W4A16 effectively maintain model performance, they often fail to concurrently speed up the prefill and decoding stages of inference. W4A8 is a promising strategy to accelerate both of them while usually leads to a significant performance degradation. To address these issues, we present QQQ, a Quality Quattuor-bit Quantization method with 4-bit weights and 8-bit activations. QQQ employs adaptive smoothing and Hessian-based compensation, significantly enhancing the performance of quantized models without extensive training. Furthermore, we meticulously engineer W4A8 GEMM kernels to increase inference speed. Our specialized per-channel W4A8 GEMM and per-group W4A8 GEMM achieve impressive speed increases of 3.67$\times$ and 3.29 $\times$ over FP16 GEMM. Our extensive experiments show that QQQ achieves performance on par with existing state-of-the-art LLM quantization methods while significantly accelerating inference, achieving speed boosts up to 2.24 $\times$, 2.10$\times$, and 1.25$\times$ compared to FP16, W8A8, and W4A16, respectively.

Updated: 2024-07-31 07:35:42

标题: QQQ：大型语言模型的质量四位量化

摘要: 量化是一种有效的压缩大型语言模型的方法。尽管像W8A8和W4A16这样的流行技术有效地保持了模型性能，但它们通常无法同时加快推理的预填充和解码阶段。W4A8是一种有希望加速这两个阶段的策略，但通常会导致显著的性能降级。为了解决这些问题，我们提出了QQQ，一种使用4位权重和8位激活的质量四位量化方法。QQQ采用自适应平滑和基于Hessian的补偿，显著提高了量化模型的性能，而无需进行大量训练。此外，我们精心设计了W4A8 GEMM核以提高推理速度。我们专门设计的每通道W4A8 GEMM和每组W4A8 GEMM分别比FP16 GEMM提速3.67倍和3.29倍。我们的大量实验表明，QQQ在与现有最先进的LLM量化方法性能相媲美的同时，显著加速了推理，与FP16、W8A8和W4A16相比，分别提速了2.24倍、2.10倍和1.25倍。

更新时间: 2024-07-31 07:35:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.09904v3

Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation

Safety-critical applications such as autonomous driving require robust 3D environment perception algorithms capable of handling diverse and ambiguous surroundings. The predictive performance of classification models is heavily influenced by the dataset and the prior knowledge provided by the annotated labels. While labels guide the learning process, they often fail to capture the inherent relationships between classes that are naturally understood by humans. We propose a training strategy for a 3D LiDAR semantic segmentation model that learns structural relationships between classes through abstraction. This is achieved by implicitly modeling these relationships using a learning rule for hierarchical multi-label classification (HMC). Our detailed analysis demonstrates that this training strategy not only improves the model's confidence calibration but also retains additional information useful for downstream tasks such as fusion, prediction, and planning.

Updated: 2024-07-31 07:32:55

标题: 层次性洞察：利用结构相似性实现可靠的3D语义分割

摘要: 安全关键应用，如自动驾驶，需要强大的3D环境感知算法，能够处理多样化和模糊的环境。分类模型的预测性能受数据集和注释标签提供的先验知识的影响很大。虽然标签指导学习过程，但它们经常无法捕捉人类自然理解的类别之间的内在关系。我们提出了一种用于3D LiDAR语义分割模型的训练策略，通过抽象学习类别之间的结构关系。这是通过使用层次多标签分类（HMC）的学习规则隐式建模这些关系实现的。我们的详细分析表明，这种训练策略不仅改善了模型的置信度校准，还保留了用于下游任务（如融合、预测和规划）的额外信息。

更新时间: 2024-07-31 07:32:55

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.06124v3

Model Free Prediction with Uncertainty Assessment

Deep nonparametric regression, characterized by the utilization of deep neural networks to learn target functions, has emerged as a focus of research attention in recent years. Despite considerable progress in understanding convergence rates, the absence of asymptotic properties hinders rigorous statistical inference. To address this gap, we propose a novel framework that transforms the deep estimation paradigm into a platform conducive to conditional mean estimation, leveraging the conditional diffusion model. Theoretically, we develop an end-to-end convergence rate for the conditional diffusion model and establish the asymptotic normality of the generated samples. Consequently, we are equipped to construct confidence regions, facilitating robust statistical inference. Furthermore, through numerical experiments, we empirically validate the efficacy of our proposed methodology.

Updated: 2024-07-31 07:27:56

标题: 无模型预测与不确定性评估

摘要: 深度非参数回归，以利用深度神经网络学习目标函数为特征，近年来已成为研究关注的焦点。尽管在理解收敛速度方面取得了相当大的进展，但缺乏渐近性质限制了严格的统计推断。为了填补这一空白，我们提出了一个新颖的框架，将深度估计范式转化为有利于条件均值估计的平台，利用条件扩散模型。理论上，我们为条件扩散模型开发了一个端到端的收敛速度，并建立了生成样本的渐近正态性。因此，我们能够构建置信区间，促进稳健的统计推断。此外，通过数值实验，我们经验性地验证了我们提出的方法的有效性。

更新时间: 2024-07-31 07:27:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.12684v4

A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation

Ophthalmology consultations are crucial for diagnosing, treating, and preventing eye diseases. However, the growing demand for consultations exceeds the availability of ophthalmologists. By leveraging large pre-trained language models, we can design effective dialogues for specific scenarios, aiding in consultations. Traditional fine-tuning strategies for question-answering tasks are impractical due to increasing model size and often ignoring patient-doctor role function during consultations. In this paper, we propose EyeDoctor, an ophthalmic medical questioning large language model that enhances accuracy through doctor-patient role perception guided and an augmented knowledge base with external disease information. Experimental results show EyeDoctor achieves higher question-answering precision in ophthalmology consultations. Notably, EyeDoctor demonstrated a 7.25% improvement in Rouge-1 scores and a 10.16% improvement in F1 scores on multi-round datasets compared to second best model ChatGPT, highlighting the importance of doctor-patient role differentiation and dynamic knowledge base expansion for intelligent medical consultations. EyeDoc also serves as a free available web based service and souce code is available at https://github.com/sperfu/EyeDoc.

Updated: 2024-07-31 07:24:30

标题: 基于风格差异的眼科会诊专用引导式大型语言模型

摘要: 眼科咨询对于诊断、治疗和预防眼部疾病至关重要。然而，对咨询的需求不断增长，超过了眼科医生的可用性。通过利用大型预训练语言模型，我们可以为特定场景设计有效的对话，帮助咨询。传统的针对问答任务的微调策略由于模型大小不断增加，通常忽略了在咨询过程中的患者-医生角色功能，因此变得不切实际。在本文中，我们提出了EyeDoctor，一个改进了准确性的眼科医疗问答大型语言模型，通过医生-患者角色感知和增强的知识库与外部疾病信息相结合。实验结果显示，EyeDoctor在眼科咨询中实现了更高的问答精度。值得注意的是，与第二好的模型ChatGPT相比，EyeDoctor在多轮数据集上的Rouge-1得分提高了7.25%，F1得分提高了10.16%，突显了医生-患者角色区分和动态知识库扩展对智能医疗咨询的重要性。EyeDoc也作为一个免费提供的基于web的服务，源代码可在https://github.com/sperfu/EyeDoc获取。

更新时间: 2024-07-31 07:24:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.18483v4

Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint

In underwater image enhancement (UIE), convolutional neural networks (CNN) have inherent limitations in modeling long-range dependencies and are less effective in recovering global features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity with increasing image resolution presents significant efficiency challenges. Additionally, most supervised learning methods lack effective physical model constraint, which can lead to insufficient realism and overfitting in generated images. To address these issues, we propose a physical model constraint-based underwater image enhancement framework, Mamba-UIE. Specifically, we decompose the input image into four components: underwater scene radiance, direct transmission map, backscatter transmission map, and global background light. These components are reassembled according to the revised underwater image formation model, and the reconstruction consistency constraint is applied between the reconstructed image and the original image, thereby achieving effective physical constraint on the underwater image enhancement process. To tackle the quadratic computational complexity of Transformers when handling long sequences, we introduce the Mamba-UIE network based on linear complexity state space models. By incorporating the Mamba in Convolution block, long-range dependencies are modeled at both the channel and spatial levels, while the CNN backbone is retained to recover local features and details. Extensive experiments on three public datasets demonstrate that our proposed Mamba-UIE outperforms existing state-of-the-art methods, achieving a PSNR of 27.13 and an SSIM of 0.93 on the UIEB dataset. Our method is available at https://github.com/zhangsong1213/Mamba-UIE.

Updated: 2024-07-31 07:20:53

标题: Mamba-UIE：利用物理模型约束增强水下图像

摘要: 在水下图像增强（UIE）中，卷积神经网络（CNN）在建模长程依赖方面存在固有限制，在恢复全局特征方面效果较差。而Transformer在建模长程依赖方面表现出色，但随着图像分辨率的增加，其二次计算复杂性带来了显著的效率挑战。此外，大多数监督学习方法缺乏有效的物理模型约束，这可能导致生成图像缺乏真实感和过拟合。为了解决这些问题，我们提出了基于物理模型约束的水下图像增强框架Mamba-UIE。具体地，我们将输入图像分解为四个部分：水下场景辐射、直接传输图、回波传输图和全局背景光。根据修订后的水下图像形成模型重新组装这些部分，并在重建图像和原始图像之间应用重建一致性约束，从而在水下图像增强过程中实现有效的物理约束。为了解决Transformer处理长序列时的二次计算复杂性，我们引入基于线性复杂度状态空间模型的Mamba-UIE网络。通过在卷积块中引入Mamba，在通道和空间级别建模长程依赖关系，同时保留CNN骨干以恢复局部特征和细节。在三个公共数据集上进行的大量实验表明，我们提出的Mamba-UIE优于现有的最先进方法，在UIEB数据集上实现了27.13的PSNR和0.93的SSIM。我们的方法可在https://github.com/zhangsong1213/Mamba-UIE找到。

更新时间: 2024-07-31 07:20:53

领域: cs.AI

下载: http://arxiv.org/abs/2407.19248v2

SmileyNet -- Towards the Prediction of the Lottery by Reading Tea Leaves with AI

We introduce SmileyNet, a novel neural network with psychic abilities. It is inspired by the fact that a positive mood can lead to improved cognitive capabilities including classification tasks. The network is hence presented in a first phase with smileys and an encouraging loss function is defined to bias it into a good mood. SmileyNet is then used to forecast the flipping of a coin based on an established method of Tasseology, namely by reading tea leaves. Training and testing in this second phase are done with a high-fidelity simulation based on real-world pixels sampled from a professional tea-reading cup. SmileyNet has an amazing accuracy of 72% to correctly predict the flip of a coin. Resnet-34, respectively YOLOv5 achieve only 49%, respectively 53%. It is then shown how multiple SmileyNets can be combined to win the lottery.

Updated: 2024-07-31 07:16:40

标题: SmileyNet - 通过人工智能读茶叶朝向彩票预测

摘要: 我们介绍了SmileyNet，这是一个具有心灵能力的新型神经网络。它受到一个积极心情可以提高认知能力的事实的启发，包括分类任务。因此，该网络首先以笑脸的形式呈现，并定义了一种鼓励性的损失函数，以将其偏向良好心情。然后，SmileyNet被用于根据已建立的Tasseology方法来预测硬币的翻转，即通过读茶叶。在第二阶段的训练和测试中，基于从专业读茶杯中采样的真实世界像素进行高保真模拟。SmileyNet有惊人的准确率，可以正确预测硬币翻转的概率为72%。Resnet-34和YOLOv5分别仅达到49%和53%的准确率。然后展示了如何将多个SmileyNet组合起来赢得彩票。

更新时间: 2024-07-31 07:16:40

领域: cs.AI,cs.CV,cs.CY,cs.LG,cs.RO,I.2; I.4; I.5; I.6; K.3.2

下载: http://arxiv.org/abs/2407.21385v1

GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction

Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text. Compared to sentence-level relation extraction, it requires more complex semantic understanding from a broader text context. Currently, some studies are utilizing logical rules within evidence sentences to enhance the performance of DocRE. However, in the data without provided evidence sentences, researchers often obtain a list of evidence sentences for the entire document through evidence retrieval (ER). Therefore, DocRE suffers from two challenges: firstly, the relevance between evidence and entity pairs is weak; secondly, there is insufficient extraction of complex cross-relations between long-distance multi-entities. To overcome these challenges, we propose GEGA, a novel model for DocRE. The model leverages graph neural networks to construct multiple weight matrices, guiding attention allocation to evidence sentences. It also employs multi-scale representation aggregation to enhance ER. Subsequently, we integrate the most efficient evidence information to implement both fully supervised and weakly supervised training processes for the model. We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED. The experimental results indicate that our model has achieved comprehensive improvements compared to the existing SOTA model.

Updated: 2024-07-31 07:15:33

标题: GEGA：图卷积网络和证据检索引导的注意力用于增强文档级关系抽取

摘要: 文档级关系抽取（DocRE）旨在从非结构化文档文本中提取实体之间的关系。与句子级关系抽取相比，它需要更复杂的语义理解来自更广泛的文本上下文。目前，一些研究正在利用证据句子中的逻辑规则来增强DocRE的性能。然而，在没有提供证据句子的数据中，研究人员通常通过证据检索（ER）获得整个文档的证据句子列表。因此，DocRE面临两个挑战：首先，证据和实体对之间的相关性较弱；其次，长距离多实体之间的复杂交叉关系提取不足。为了克服这些挑战，我们提出了GEGA，一种用于DocRE的新颖模型。该模型利用图神经网络构建多个权重矩阵，引导注意力分配到证据句子上。它还采用多尺度表示聚合来增强ER。随后，我们整合最有效的证据信息，为模型实现完全监督和弱监督训练过程。我们在三个广泛使用的基准数据集上评估了GEGA模型：DocRED、Re-DocRED和Revisit-DocRED。实验结果表明，与现有的SOTA模型相比，我们的模型取得了全面的改进。

更新时间: 2024-07-31 07:15:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21384v1

FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning

Continual graph learning (CGL) is an important and challenging task that aims to extend static GNNs to dynamic task flow scenarios. As one of the mainstream CGL methods, the experience replay (ER) method receives widespread attention due to its superior performance. However, existing ER methods focus on identifying samples by feature significance or topological relevance, which limits their utilization of comprehensive graph data. In addition, the topology-based ER methods only consider local topological information and add neighboring nodes to the buffer, which ignores the global topological information and increases memory overhead. To bridge these gaps, we propose a novel method called Feature-Topology Fusion-based Experience Replay (FTF-ER) to effectively mitigate the catastrophic forgetting issue with enhanced efficiency. Specifically, from an overall perspective to maximize the utilization of the entire graph data, we propose a highly complementary approach including both feature and global topological information, which can significantly improve the effectiveness of the sampled nodes. Moreover, to further utilize global topological information, we propose Hodge Potential Score (HPS) as a novel module to calculate the topological importance of nodes. HPS derives a global node ranking via Hodge decomposition on graphs, providing more accurate global topological information compared to neighbor sampling. By excluding neighbor sampling, HPS significantly reduces buffer storage costs for acquiring topological information and simultaneously decreases training time. Compared with state-of-the-art methods, FTF-ER achieves a significant improvement of 3.6% in AA and 7.1% in AF on the OGB-Arxiv dataset, demonstrating its superior performance in the class-incremental learning setting.

Updated: 2024-07-31 07:15:15

标题: FTF-ER：基于特征拓扑融合的经验重播方法，用于持续图学习

摘要: Continuous graph learning (CGL) is a crucial and challenging task that aims to extend static Graph Neural Networks (GNNs) to dynamic task flow scenarios. Among mainstream CGL methods, the Experience Replay (ER) method has gained widespread attention due to its superior performance. However, existing ER methods mainly focus on identifying samples based on feature significance or topological relevance, which limits their utilization of comprehensive graph data. Additionally, topology-based ER methods only consider local topological information and add neighboring nodes to the buffer, neglecting global topological information and increasing memory overhead. To address these limitations, we propose a novel method called Feature-Topology Fusion-based Experience Replay (FTF-ER) to effectively mitigate the catastrophic forgetting issue with enhanced efficiency. Specifically, we propose a highly complementary approach that incorporates both feature and global topological information to maximize the utilization of entire graph data, significantly improving the effectiveness of sampled nodes. Furthermore, to leverage global topological information, we introduce Hodge Potential Score (HPS) as a novel module for calculating the topological importance of nodes. HPS derives a global node ranking through Hodge decomposition on graphs, providing more accurate global topological information compared to neighbor sampling. By eliminating neighbor sampling, HPS reduces buffer storage costs for acquiring topological information and decreases training time. Compared to state-of-the-art methods, FTF-ER achieves a significant improvement of 3.6% in AA and 7.1% in AF on the OGB-Arxiv dataset, demonstrating its superior performance in the class-incremental learning setting.

更新时间: 2024-07-31 07:15:15

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2407.19429v2

An Extended Kalman Filter Integrated Latent Feature Model on Dynamic Weighted Directed Graphs

A dynamic weighted directed graph (DWDG) is commonly encountered in various application scenarios. It involves extensive dynamic interactions among numerous nodes. Most existing approaches explore the intricate temporal patterns hidden in a DWDG from the purely data-driven perspective, which suffers from accuracy loss when a DWDG exhibits strong fluctuations over time. To address this issue, this study proposes a novel Extended-Kalman-Filter-Incorporated Latent Feature (EKLF) model to represent a DWDG from the model-driven perspective. Its main idea is divided into the following two-fold ideas: a) adopting a control model, i.e., the Extended Kalman Filter (EKF), to track the complex temporal patterns precisely with its nonlinear state-transition and observation functions; and b) introducing an alternating least squares (ALS) algorithm to train the latent features (LFs) alternatively for precisely representing a DWDG. Empirical studies on DWDG datasets demonstrate that the proposed EKLF model outperforms state-of-the-art models in prediction accuracy and computational efficiency for missing edge weights of a DWDG. It unveils the potential for precisely representing a DWDG by incorporating a control model.

Updated: 2024-07-31 06:57:27

标题: 一种在动态加权有向图上集成潜在特征模型的扩展卡尔曼滤波器

摘要: 一个动态加权有向图（DWDG）在各种应用场景中经常遇到。它涉及许多节点之间的广泛动态交互。大多数现有方法从纯数据驱动的角度探索DWDG中隐藏的复杂时间模式，当DWDG随时间展现出强烈波动时，会出现精度损失。为了解决这个问题，本研究提出了一个新颖的Extended-Kalman-Filter-Incorporated Latent Feature（EKLF）模型，以模型驱动的角度表示DWDG。其主要思想分为以下两个方面：a）采用控制模型，即扩展卡尔曼滤波器（EKF），精确跟踪复杂的时间模式，具有非线性状态转移和观测函数；b）引入交替最小二乘（ALS）算法，交替训练潜在特征（LFs），以精确表示DWDG。对DWDG数据集的实证研究表明，提出的EKLF模型在预测准确性和计算效率方面优于最先进的模型，用于预测DWDG的缺失边权重。它揭示了通过结合控制模型来精确表示DWDG的潜力。

更新时间: 2024-07-31 06:57:27

领域: cs.AI

下载: http://arxiv.org/abs/2407.21376v1

Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction

This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances. By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods, enabling robots to understand gestures from long distances. With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments. Experimental validation demonstrates significant advancements in gesture recognition accuracy, particularly in prolonged gesture sequences.

Updated: 2024-07-31 06:56:46

标题: 在超远距离下的动态手势识别，用于有效的人机交互

摘要: 本文提出了一种新颖的超远程手势识别方法，解决了在长距离下人机交互（HRI）所面临的挑战。通过利用视频数据中的人类手势，我们提出了时间-时空融合网络（TSFN）模型，超越了当前方法的局限性，使机器人能够理解来自长距离的手势。在服务机器人、搜救行动和基于无人机的交互等领域应用中，我们的方法增强了在广阔环境中的HRI。实验验证表明，在手势识别准确性方面取得了显著进展，特别是在长时间手势序列中。

更新时间: 2024-07-31 06:56:46

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.21374v1

Two Completely Parameter-Free Alternating Gradient Projection Algorithms for Nonconvex-(strongly) Concave Minimax Problems

Due to their importance in various emerging applications, efficient algorithms for solving minimax problems have recently received increasing attention. However, many existing algorithms require prior knowledge of the problem parameters in order to achieve optimal iteration complexity. In this paper, we propose a completely parameter-free alternating gradient projection (PF-AGP) algorithm to solve the smooth nonconvex-(strongly) concave minimax problems using a backtracking strategy, which does not require prior knowledge of parameters such as the Lipschtiz constant $L$ or the strongly concave constant $\mu$. The PF-AGP algorithm utilizes a parameter-free gradient projection step to alternately update the outer and inner variables in each iteration. We show that the total number of gradient calls of the PF-AGP algorithm to obtain an $\varepsilon$-stationary point for nonconvex-strongly concave minimax problems is upper bounded by $\mathcal{O}\left( L\kappa^3\varepsilon^{-2} \right)$ where $\kappa$ is the condition number, while the total number of gradient calls to obtain an $\varepsilon$-stationary point for nonconvex-concave minimax problems is upper bounded by $\mathcal{O}\left( L^4\varepsilon^{-4} \right)$. As far as we know, this is the first completely parameter-free algorithm for solving nonconvex-strongly concave minimax problems, and it is also the completely parameter-free algorithm which achieves the best iteration complexity in single loop method for solving nonconvex-concave minimax problems. Numerical results validate the efficiency of the proposed PF-AGP algorithm.

Updated: 2024-07-31 06:54:24

标题: 两种完全无参数的交替梯度投影算法用于非凸-（强烈）凹减小极小化问题

摘要: 由于它们在各种新兴应用中的重要性，解决极小极大问题的高效算法最近受到越来越多的关注。然而，许多现有算法需要先前了解问题参数才能实现最佳迭代复杂度。在本文中，我们提出了一个完全无参数的交替梯度投影（PF-AGP）算法，使用回溯策略解决平滑非凸-(强)凹极小极大问题，该算法不需要先前了解参数，如Lipschtiz常数$L$或强凹常数$\mu$。PF-AGP算法利用无参数梯度投影步骤，交替更新每次迭代中的外部和内部变量。我们证明，PF-AGP算法获取非凸-强凹极小极大问题的$\varepsilon$-稳定点所需的梯度调用总数上界为$\mathcal{O}(L\kappa^3\varepsilon^{-2})$，其中$\kappa$是条件数，而获取非凸-凹极小极大问题的$\varepsilon$-稳定点所需的梯度调用总数上界为$\mathcal{O}(L^4\varepsilon^{-4})$。据我们所知，这是解决非凸-强凹极小极大问题的第一个完全无参数算法，也是在解决非凸-凹极小极大问题的单循环方法中实现最佳迭代复杂度的完全无参数算法。数值结果验证了提出的PF-AGP算法的效率。

更新时间: 2024-07-31 06:54:24

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.21372v1

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

Updated: 2024-07-31 06:46:16

标题: 机器人共感觉：利用视触觉传感进行手中操作

摘要: 执行接触丰富的操作任务需要融合触觉和视觉反馈。然而，这些模态的不同性质带来了重大挑战。在本文中，我们介绍了一种利用视觉和触觉感觉输入实现灵巧手部操作的系统。具体地，我们提出了Robot Synesthesia，这是一种受人类触觉-视觉共感觉启发的基于点云的触觉表示。这种方法允许同时和无缝地整合两种感觉输入，提供更丰富的空间信息，并促进对机器人动作的更好推理。该方法在模拟环境中进行训练，然后部署到真实机器人上，适用于各种手持物体旋转任务。我们进行了全面的消融实验，探讨视觉和触觉整合如何改善强化学习和从Sim2Real的表现。我们的项目页面位于https://yingyuan0414.github.io/visuotactile/。

更新时间: 2024-07-31 06:46:16

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.01853v3

Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Large Vision-Language Models (LVLMs) have achieved significant success in recent years, and they have been extended to the medical domain. Although demonstrating satisfactory performance on medical Visual Question Answering (VQA) tasks, Medical LVLMs (MLVLMs) suffer from the hallucination problem, which makes them fail to diagnose complex pathologies. Moreover, they readily fail to learn minority pathologies due to imbalanced training data. We propose two prompting strategies for MLVLMs that reduce hallucination and improve VQA performance. In the first strategy, we provide a detailed explanation of the queried pathology. In the second strategy, we fine-tune a cheap, weak learner to achieve high performance on a specific metric, and textually provide its judgment to the MLVLM. Tested on the MIMIC-CXR-JPG and Chexpert datasets, our methods significantly improve the diagnostic F1 score, with the highest increase being 0.27. We also demonstrate that our prompting strategies can be extended to general LVLM domains. Based on POPE metrics, it effectively suppresses the false negative predictions of existing LVLMs and improves Recall by approximately 0.07.

Updated: 2024-07-31 06:34:38

标题: 用视觉问答激发医学大型视觉语言模型对病变进行诊断

摘要: 大型视觉语言模型（LVLMs）近年来取得了显著的成功，并已扩展到医学领域。尽管在医学视觉问答（VQA）任务上表现出令人满意的性能，但医学LVLMs（MLVLMs）存在幻觉问题，导致它们无法诊断复杂病变。此外，由于训练数据不平衡，它们很容易无法学习少数病变。我们提出了两种提示策略，用于降低MLVLMs的幻觉并改善VQA性能。在第一种策略中，我们提供了有关被询问病变的详细解释。在第二种策略中，我们通过微调一个廉价、弱学习器，在特定指标上实现高性能，并将其判断以文本形式提供给MLVLM。在MIMIC-CXR-JPG和Chexpert数据集上进行测试，我们的方法显著提高了诊断F1分数，最高增加为0.27。我们还证明了我们的提示策略可以扩展到一般的LVLM领域。基于POPE指标，它有效抑制了现有LVLMs的假阴性预测，并将召回率提高了约0.07。

更新时间: 2024-07-31 06:34:38

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.21368v1

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios. This system leverages deep generative models to establish a new paradigm in SC. Specifically, At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction and compression. At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details. Additionally, we present a Multi-User Generative Semantic Communication (MU-GSC) system utilizing an asynchronous processing model. This model effectively manages multiple user requests and optimally utilizes system resources for parallel processing. Simulation results on public datasets demonstrate that our generative AI semantic communication systems achieve superior transmission efficiency and enhanced communication content quality across various channel conditions. Compared to CNN-based DeepJSCC, our methods improve the Peak Signal-to-Noise Ratio (PSNR) by 17.75% in Additive White Gaussian Noise (AWGN) channels and by 20.86% in Rayleigh channels.

Updated: 2024-07-31 06:08:51

标题: 语义逐步细化：一种生成式AI辅助语义通信框架

摘要: 语义通信（SC）是一种新兴技术，旨在超越Shannon极限。传统的SC策略通常在原始数据和重建数据之间最小化信号失真，忽视感知质量，尤其是在低信噪比（SNR）环境中。为了解决这个问题，我们引入了一种新颖的单用户场景下的生成式AI语义通信（GSC）系统。该系统利用深度生成模型建立了SC领域的新范式。具体地，在传输端，它采用基于Swin Transformer的联合源-信道编码机制，实现高效的语义特征提取和压缩。在接收端，先进的扩散模型（DM）从受损信号中重建高质量图像，增强感知细节。此外，我们提出了一种利用异步处理模型的多用户生成式语义通信（MU-GSC）系统。这种模型有效管理多个用户请求，并最大限度地利用系统资源进行并行处理。在公共数据集上的模拟结果表明，我们的生成式AI语义通信系统在各种信道条件下实现了卓越的传输效率和增强的通信内容质量。与基于CNN的DeepJSCC相比，我们的方法在加性白高斯噪声（AWGN）信道上将峰值信噪比（PSNR）提高了17.75％，在雷利信道上提高了20.86％。

更新时间: 2024-07-31 06:08:51

领域: cs.LG,cs.AI,eess.IV

下载: http://arxiv.org/abs/2408.05112v1

ProSpec RL: Plan Ahead, then Execute

Imagining potential outcomes of actions before execution helps agents make more informed decisions, a prospective thinking ability fundamental to human cognition. However, mainstream model-free Reinforcement Learning (RL) methods lack the ability to proactively envision future scenarios, plan, and guide strategies. These methods typically rely on trial and error to adjust policy functions, aiming to maximize cumulative rewards or long-term value, even if such high-reward decisions place the environment in extremely dangerous states. To address this, we propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories. Specifically, ProSpec employs a dynamic model to predict future states (termed "imagined states") based on the current state and a series of sampled actions. Furthermore, we integrate the concept of Model Predictive Control and introduce a cycle consistency constraint that allows the agent to evaluate and select the optimal actions from these trajectories. Moreover, ProSpec employs cycle consistency to mitigate two fundamental issues in RL: augmenting state reversibility to avoid irreversible events (low risk) and augmenting actions to generate numerous virtual trajectories, thereby improving data efficiency. We validated the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements. Code will be open-sourced upon acceptance.

Updated: 2024-07-31 06:04:55

标题: ProSpec RL：提前计划，然后执行

摘要: 在执行动作之前想象潜在结果有助于代理人做出更明智的决策，这是人类认知中基本的前瞻性思维能力。然而，主流的无模型强化学习（RL）方法缺乏主动设想未来场景、规划和指导策略的能力。这些方法通常依赖试错来调整策略函数，旨在最大化累积奖励或长期价值，即使这样的高奖励决策将环境置于极度危险的状态。为解决这一问题，我们提出了前瞻性（ProSpec）RL方法，通过设想未来n条轨迹，做出更高价值、更低风险的最优决策。具体而言，ProSpec利用动态模型预测未来状态（称为“想象状态”），基于当前状态和一系列采样动作。此外，我们整合了模型预测控制的概念，并引入了一个循环一致性约束，使代理能够评估和选择这些轨迹中的最优动作。此外，ProSpec利用循环一致性来缓解RL中的两个基本问题：增加状态可逆性以避免不可逆事件（低风险），增加动作以生成大量虚拟轨迹，从而提高数据效率。我们在DMControl基准测试中验证了我们方法的有效性，其中我们的方法取得了显著的性能改进。代码将在接受后开源。

更新时间: 2024-07-31 06:04:55

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.21359v1

Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs

Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge. However, KGs and LLMs are often developed separately and must be integrated after training. We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs. The algorithm equips a LLM with actions for interfacing a KG and enables the LLM to perform tree search over possible thoughts and actions to find high confidence reasoning paths. We evaluate on two popular benchmark datasets. Our results show that Tree-of-Traversals significantly improves performance on question answering and KG question answering tasks. Code is available at \url{https://github.com/amazon-science/tree-of-traversals}

Updated: 2024-07-31 06:01:24

标题: 遍历树：一种用于利用知识图谱增强黑匣子语言模型的零-shot推理算法

摘要: 知识图谱（KGs）通过提供可靠、结构化、领域特定且最新的外部知识，为大型语言模型（LLMs）提供补充。然而，KGs和LLMs通常是分别开发的，并且必须在训练后进行集成。我们引入了一种新颖的零-shot推理算法Tree-of-Traversals，它能够将黑盒LLMs与一个或多个KGs进行增强。该算法为LLM提供了与KG交互的操作，并使LLM能够在可能的思考和行动之间进行树搜索，以找到高置信度的推理路径。我们在两个流行的基准数据集上进行评估。我们的结果表明，Tree-of-Traversals显著提高了问答和KG问答任务的性能。代码可在\url{https://github.com/amazon-science/tree-of-traversals}上找到。

更新时间: 2024-07-31 06:01:24

领域: cs.AI

下载: http://arxiv.org/abs/2407.21358v1

Building AI Agents for Autonomous Clouds: Challenges and Design Principles

The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds through AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.

Updated: 2024-07-31 06:01:15

标题: 构建用于自主云的AI代理：挑战和设计原则

摘要: 大型语言模型（LLMs）和AI代理的快速增长作为软件开发和部署的一部分正在彻底改变信息技术领域。虽然代码生成受到重视，但更具影响力的应用在于利用AI代理来提高云服务的运营弹性，目前需要大量的人力和领域知识。越来越多的人对IT运维人工智能（AIOps）感兴趣，旨在自动化复杂的运营任务，如故障定位和根本原因分析，从而减少人为干预和客户影响。然而，通过AIOps实现自主和自愈云的愿景受到了缺乏标准化框架来构建、评估和改进AIOps代理的阻碍。本愿景论文首先确定需求，然后讨论满足这些需求的设计决策，为这样一个框架奠定基础。我们还提出AIOpsLab，一个利用代理-云接口的原型实现，协调一个应用程序，使用混沌工程实时注入故障，并与代理接口以定位和解决故障。我们报告了有希望的结果，并为构建、评估和改进用于自主云的代理的模块化和健壮框架奠定了基础。

更新时间: 2024-07-31 06:01:15

领域: cs.SE,cs.AI,cs.DC

下载: http://arxiv.org/abs/2407.12165v2

FrameQuant: Flexible Low-Bit Quantization for Transformers

Transformers are the backbone of powerful foundation models for many Vision and Natural Language Processing tasks. But their compute and memory/storage footprint is large, and so, serving such models is expensive often requiring high-end hardware. To mitigate this difficulty, Post-Training Quantization seeks to modify a pre-trained model and quantize it to eight bits or lower, significantly boosting compute/memory/latency efficiency. Such models have been successfully quantized to four bits with some performance loss. In this work, we outline a simple scheme to quantize Transformer-based models to just two bits (plus some overhead) with only a small drop in accuracy. Key to our formulation is a concept borrowed from Harmonic analysis called Fusion Frames. Our main finding is that the quantization must take place not in the original weight space, but instead in the Fusion Frame representations. If quantization is interpreted as the addition of noise, our casting of the problem allows invoking an extensive body of known consistent recovery and noise robustness guarantees. Further, if desired, de-noising filters are known in closed form. We show empirically, via a variety of experiments, that (almost) two-bit quantization for Transformer models promises sizable efficiency gains. The code is available at https://github.com/vsingh-group/FrameQuant

Updated: 2024-07-31 05:59:31

标题: FrameQuant：用于Transformer的灵活低比特量化

摘要: Transformer是许多视觉和自然语言处理任务强大基础模型的支柱。但是，它们的计算和内存/存储占用量很大，因此，为这样的模型提供服务通常需要高端硬件，成本昂贵。为了缓解这一困难，后训练量化试图修改预训练模型并将其量化为八位或更低，显著提高计算/内存/延迟效率。这样的模型已成功量化为四位，虽然性能有所损失。在这项工作中，我们概述了一种简单方案，将基于Transformer的模型量化为仅两位（加一些开销），只有轻微的精度下降。我们制定的关键是从谐波分析中借鉴的一个概念，称为融合框架。我们的主要发现是，量化必须发生在融合框架表示中，而不是原始权重空间中。如果将量化解释为噪声的添加，我们所提出的问题解决方式允许调用大量已知的一致恢复和噪声鲁棒性保证。此外，如果需要，去噪滤波器的闭式形式是已知的。通过各种实验证明，Transformer模型的（几乎）两位量化有望带来可观的效率提升。代码可在https://github.com/vsingh-group/FrameQuant上找到。

更新时间: 2024-07-31 05:59:31

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.06082v2

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processed the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using structural causal models (SCM). Specifically, given an SCM which explains the distribution of tokens observed during training, we frame the central hypothesis as whether the LM has learned to represent the latent variables of the SCM. Empirically, we extend a recent study of LMs in the context of a synthetic grid-world navigation task, where having an exact model of the underlying causal structure allows us to draw strong inferences from the result of probing experiments. Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.

Updated: 2024-07-31 05:57:07

标题: 潜在因果探究：用数据的因果模型进行探究的形式化视角

摘要: 随着语言模型（LMs）在一系列自然语言处理任务中表现出越来越好的性能，探测分类器已成为更好地理解其内部运作的不可或缺的技术。典型的设置包括（1）定义一个辅助任务，其中包含用标签注释的文本数据集，然后（2）监督小型分类器从预训练的LM的表示中预测在处理数据集时的标签。高探测准确率被解释为证据，LM已学会执行辅助任务作为其原始预训练目标的无监督副产品。然而，尽管探测器的使用广泛，但对探测实验的健壮设计和分析仍然是一个挑战。我们通过使用结构因果模型（SCM）对探测进行了形式化的视角。具体来说，鉴于一个解释了训练期间观察到的标记分布的SCM，我们将中心假设框定为LM是否已学会表示SCM的潜在变量。在经验上，我们在一个合成的网格世界导航任务的背景下扩展了对LM的最近研究，其中拥有底层因果结构的精确模型使我们能够从探测实验的结果中得出强有力的推论。我们的技术提供了关于LM能够诱导文本潜在概念的强有力经验证据。

更新时间: 2024-07-31 05:57:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.13765v2

Small Object Few-shot Segmentation for Vision-based Industrial Inspection

Vision-based industrial inspection (VII) aims to locate defects quickly and accurately. Supervised learning under a close-set setting and industrial anomaly detection, as two common paradigms in VII, face different problems in practical applications. The former is that various and sufficient defects are difficult to obtain, while the latter is that specific defects cannot be located. To solve these problems, in this paper, we focus on the few-shot semantic segmentation (FSS) method, which can locate unseen defects conditioned on a few annotations without retraining. Compared to common objects in natural images, the defects in VII are small. This brings two problems to current FSS methods: 1 distortion of target semantics and 2 many false positives for backgrounds. To alleviate these problems, we propose a small object few-shot segmentation (SOFS) model. The key idea for alleviating 1 is to avoid the resizing of the original image and correctly indicate the intensity of target semantics. SOFS achieves this idea via the non-resizing procedure and the prototype intensity downsampling of support annotations. To alleviate 2, we design an abnormal prior map in SOFS to guide the model to reduce false positives and propose a mixed normal Dice loss to preferentially prevent the model from predicting false positives. SOFS can achieve FSS and few-shot anomaly detection determined by support masks. Diverse experiments substantiate the superior performance of SOFS. Code is available at https://github.com/zhangzilongc/SOFS.

Updated: 2024-07-31 05:43:36

标题: 基于视觉的工业检测的小物体少样本分割

摘要: 基于视觉的工业检测（VII）旨在快速准确地定位缺陷。在VII中，监督学习和工业异常检测是两种常见范例，面临着不同的实际应用问题。前者是获取各种充足的缺陷很困难，而后者是无法定位特定的缺陷。为了解决这些问题，本文关注少样本语义分割（FSS）方法，该方法可以在少量注释的条件下定位看不见的缺陷而无需重新训练。与自然图像中的常见对象相比，VII中的缺陷较小。这给当前FSS方法带来了两个问题：1目标语义的扭曲和2背景的许多误报。为了缓解这些问题，我们提出了一个小物体少样本分割（SOFS）模型。缓解问题1的关键思想是避免对原始图像进行调整，正确指示目标语义的强度。SOFS通过非调整程序和支持注释的原型强度降采样实现了这一思想。为了缓解问题2，我们在SOFS中设计了异常先验图，以引导模型减少误报，并提出了混合正常Dice损失，优先防止模型预测误报。SOFS可以通过支持蒙版实现FSS和少样本异常检测。多样化实验证实了SOFS的卓越性能。代码可在https://github.com/zhangzilongc/SOFS上找到。

更新时间: 2024-07-31 05:43:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21351v1

Time Series Imputation with Multivariate Radial Basis Function Neural Network

Researchers have been persistently working to address the issue of missing values in time series data. Numerous models have been proposed, striving to estimate the distribution of the data. The Radial Basis Functions Neural Network (RBFNN) has recently exhibited exceptional performance in estimating data distribution. In this paper, we propose a time series imputation model based on RBFNN. Our imputation model learns local information from timestamps to create a continuous function. Additionally, we incorporate time gaps to facilitate learning information considering the missing terms of missing values. We name this model the Missing Imputation Multivariate RBFNN (MIM-RBFNN). However, MIM-RBFNN relies on a local information-based learning approach, which presents difficulties in utilizing temporal information. Therefore, we propose an extension called the Missing Value Imputation Recurrent Neural Network with Continuous Function (MIRNN-CF) using the continuous function generated by MIM-RBFNN. We evaluate the performance using two real-world datasets with non-random missing and random missing patterns, and conduct an ablation study comparing MIM-RBFNN and MIRNN-CF.

Updated: 2024-07-31 05:39:34

标题: 用多元径向基函数神经网络进行时间序列插补

摘要: 研究人员一直在努力解决时间序列数据中缺失值的问题。已经提出了许多模型，力求估计数据的分布。径向基函数神经网络（RBFNN）最近在估计数据分布方面表现出色。在本文中，我们提出了一种基于RBFNN的时间序列插补模型。我们的插补模型从时间戳中学习局部信息以创建连续函数。此外，我们还结合时间间隔以便于学习考虑缺失值的缺失项。我们将这个模型命名为缺失插补多变量RBFNN（MIM-RBFNN）。然而，MIM-RBFNN依赖于基于局部信息的学习方法，这在利用时间信息方面存在困难。因此，我们提出了一个扩展，名为使用MIM-RBFNN生成的连续函数的缺失值插补递归神经网络（MIRNN-CF）。我们使用两个具有非随机缺失和随机缺失模式的真实世界数据集评估性能，并进行了一个消融研究，比较了MIM-RBFNN和MIRNN-CF。

更新时间: 2024-07-31 05:39:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.17040v2

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation

Advanced Artificial Intelligence (AI) systems, specifically large language models (LLMs), have the capability to generate not just misinformation, but also deceptive explanations that can justify and propagate false information and erode trust in the truth. We examined the impact of deceptive AI generated explanations on individuals' beliefs in a pre-registered online experiment with 23,840 observations from 1,192 participants. We found that in addition to being more persuasive than accurate and honest explanations, AI-generated deceptive explanations can significantly amplify belief in false news headlines and undermine true ones as compared to AI systems that simply classify the headline incorrectly as being true/false. Moreover, our results show that personal factors such as cognitive reflection and trust in AI do not necessarily protect individuals from these effects caused by deceptive AI generated explanations. Instead, our results show that the logical validity of AI generated deceptive explanations, that is whether the explanation has a causal effect on the truthfulness of the AI's classification, plays a critical role in countering their persuasiveness - with logically invalid explanations being deemed less credible. This underscores the importance of teaching logical reasoning and critical thinking skills to identify logically invalid arguments, fostering greater resilience against advanced AI-driven misinformation.

Updated: 2024-07-31 05:39:07

标题: 欺骗性AI系统提供解释比诚实的AI系统更具说服力，并且可能加强对错误信息的信念

摘要: 先进的人工智能（AI）系统，特别是大型语言模型（LLMs），具有生成不仅是错误信息，还有欺骗性解释的能力，这些解释可以为虚假信息提供理由并传播，削弱对真相的信任。我们在一个预先注册的在线实验中检验了虚假AI生成解释对个体信念的影响，共有1,192名参与者提供了23,840个观察结果。我们发现，与准确和诚实的解释相比，AI生成的虚假解释不仅更具说服力，还可以显著增强对虚假新闻标题的信念，削弱对真实新闻标题的信念，与仅仅将标题错误地分类为真实/虚假的AI系统相比。此外，我们的结果显示，个人因素，如认知反射和对AI的信任，并不一定能保护个人免受虚假AI生成解释所引起的影响。相反，我们的结果表明，AI生成的虚假解释的逻辑有效性，即解释是否对AI的分类真实性产生因果影响，对抵抗其说服力起着至关重要的作用 - 逻辑上无效的解释被视为不太可信。这强调了教授逻辑推理和批判性思维技能的重要性，以识别逻辑上无效的论点，培养对先进AI驱动的错误信息的更大抗性。

更新时间: 2024-07-31 05:39:07

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2408.00024v1

Differentially Private Block-wise Gradient Shuffle for Deep Learning

Traditional Differentially Private Stochastic Gradient Descent (DP-SGD) introduces statistical noise on top of gradients drawn from a Gaussian distribution to ensure privacy. This paper introduces the novel Differentially Private Block-wise Gradient Shuffle (DP-BloGS) algorithm for deep learning. BloGS builds off of existing private deep learning literature, but makes a definitive shift by taking a probabilistic approach to gradient noise introduction through shuffling modeled after information theoretic privacy analyses. The theoretical results presented in this paper show that the combination of shuffling, parameter-specific block size selection, batch layer clipping, and gradient accumulation allows DP-BloGS to achieve training times close to that of non-private training while maintaining similar privacy and utility guarantees to DP-SGD. DP-BloGS is found to be significantly more resistant to data extraction attempts than DP-SGD. The theoretical results are validated by the experimental findings.

Updated: 2024-07-31 05:32:37

标题: 深度学习中的差分隐私块级梯度重排

摘要: 传统的差分隐私随机梯度下降（DP-SGD）在从高斯分布中抽取的梯度上引入统计噪音以确保隐私。本文介绍了用于深度学习的新型差分隐私块状梯度洗牌（DP-BloGS）算法。BloGS建立在现有的私有深度学习文献的基础上，但通过采用梯度噪音引入的概率方法进行了明确的转变，这种方法是在信息理论隐私分析之后进行的洗牌。本文提出的理论结果表明，通过洗牌、参数特定的块大小选择、批层剪裁和梯度累积的组合，DP-BloGS能够实现接近非私有训练的训练时间，同时保持类似于DP-SGD的隐私和效用保证。相比DP-SGD，DP-BloGS被发现对数据提取尝试更具有抵抗力。实验结果验证了理论结果。

更新时间: 2024-07-31 05:32:37

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.21347v1

On the Perturbed States for Transformed Input-robust Reinforcement Learning

Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as \textit{Transformed Input-robust RL (TIRL)}, which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: \textit{(1) autoencoder-styled denoising} to reconstruct the original state and \textit{(2) bounded transformations (bit-depth reduction and vector quantization (VQ))} to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple \mujoco environments demonstrate that input transformation-based defenses, \ie, VQ, defend against several adversaries in the state observations.

Updated: 2024-07-31 05:31:28

标题: 关于转换输入稳健强化学习中的扰动状态

摘要: 在训练环境中表现出熟练度的强化学习（RL）代理在部署期间对输入观测的敌对扰动表现出脆弱性。这强调了在真实世界部署之前构建强大代理的重要性。为了缓解这一挑战，先前的工作着重于开发基于训练的鲁棒程序，包括加强深度神经网络组件的鲁棒性或使代理受到对抗性训练以抵御强大攻击。在这项工作中，我们提出了一种新颖的方法，称为\textit{转换输入鲁棒RL（TIRL）}，通过采用基于输入转换的防御措施，探索另一种减轻敌对影响的途径。具体地，我们引入了两个原则来应用基于转换的防御措施，在学习鲁棒RL代理时：\textit{（1）自编码器风格的去噪}用于重构原始状态和\textit{（2）有界转换（位深度减少和矢量量化（VQ））}用于实现接近转换输入。在将状态输入策略网络之前，对状态进行转换。在多个\mujoco环境上进行的大量实验表明，基于输入转换的防御措施，即VQ，能够防御状态观测中的多种敌对行为。

更新时间: 2024-07-31 05:31:28

领域: cs.LG

下载: http://arxiv.org/abs/2408.00023v1

Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction

There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity. However, there has been comparatively less effort devoted to the consideration of temporal dependencies in emotion distributions which encodes ambiguity in perceived emotions that evolve smoothly over time. Recognizing the benefits of using constrained dynamical neural ordinary differential equations (CD-NODE) to model time series as dynamic processes, we propose an ambiguity-aware dual-constrained Neural ODE approach to model the dynamics of emotion distributions on arousal and valence. In our approach, we utilize ODEs parameterised by neural networks to estimate the distribution parameters, and we integrate additional constraints to restrict the range of the system outputs to ensure the validity of predicted distributions. We evaluated our proposed system on the publicly available RECOLA dataset and observed very promising performance across a range of evaluation metrics.

Updated: 2024-07-31 05:18:06

标题: 双重约束的动态神经ODEs用于考虑模糊性的连续情绪预测

摘要: 近年来，建模情绪歧义已成为一个重要的焦点，取得了在将情绪表示为分布以捕捉歧义方面的进展。然而，在情绪分布中考虑时间依赖性的努力相对较少，这种时间依赖性编码了随时间平稳演变的感知情绪的歧义。我们意识到使用受限动态神经常微分方程（CD-NODE）来模拟时间序列作为动态过程的好处，因此我们提出了一种考虑歧义的双约束神经ODE方法来模拟唤醒和价值情绪分布的动态。在我们的方法中，我们利用由神经网络参数化的ODE来估计分布参数，并集成额外的约束来限制系统输出的范围，以确保预测的分布的有效性。我们在公开可用的RECOLA数据集上评估了我们提出的系统，并观察到在一系列评估指标上非常有前途的表现。

更新时间: 2024-07-31 05:18:06

领域: cs.AI

下载: http://arxiv.org/abs/2407.21344v1

MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework

Medical imaging segmentation is a highly active area of research, with deep learning-based methods achieving state-of-the-art results in several benchmarks. However, the lack of standardized tools for training, testing, and evaluating new methods makes the comparison of methods difficult. To address this, we introduce the Medical Imaging Segmentation Toolkit (MIST), a simple, modular, and end-to-end medical imaging segmentation framework designed to facilitate consistent training, testing, and evaluation of deep learning-based medical imaging segmentation methods. MIST standardizes data analysis, preprocessing, and evaluation pipelines, accommodating multiple architectures and loss functions. This standardization ensures reproducible and fair comparisons across different methods. We detail MIST's data format requirements, pipelines, and auxiliary features and demonstrate its efficacy using the BraTS Adult Glioma Post-Treatment Challenge dataset. Our results highlight MIST's ability to produce accurate segmentation masks and its scalability across multiple GPUs, showcasing its potential as a powerful tool for future medical imaging research and development.

Updated: 2024-07-31 05:17:31

标题: MIST: 一个简单且可伸缩的端到端3D医学影像分割框架

摘要: 医学影像分割是一个高度活跃的研究领域，基于深度学习的方法在几个基准测试中取得了最先进的结果。然而，缺乏标准化的训练、测试和评估工具使得方法的比较变得困难。为了解决这个问题，我们介绍了医学影像分割工具包（Medical Imaging Segmentation Toolkit，MIST），这是一个简单、模块化和端到端的医学影像分割框架，旨在促进基于深度学习的医学影像分割方法的一致训练、测试和评估。MIST标准化了数据分析、预处理和评估流程，支持多种架构和损失函数。这种标准化确保了在不同方法之间的可重复和公平比较。我们详细介绍了MIST的数据格式要求、流程和辅助功能，并使用BraTS成年脑胶质瘤治疗后挑战数据集展示了其有效性。我们的结果突出了MIST生成准确分割掩模的能力以及其在多个GPU上的可扩展性，展示了其作为未来医学影像研究和发展的强大工具的潜力。

更新时间: 2024-07-31 05:17:31

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.21343v1

LAPIS: Language Model-Augmented Police Investigation System

Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebase specialized in crime investigation legal reasoning task. We extended the dataset's quality by incorporating manual curation efforts done by a group of domain experts. We then finetuned the pretrained weights of a smaller Korean language model to the newly constructed dataset and integrated it with the crime investigation knowledgebase retrieval approach. Experimental results show LAPIS' potential in providing reliable legal guidance for police officers, even better than the proprietary GPT-4 model. Qualitative analysis on the rationales generated by LAPIS demonstrate the model's reasoning ability to leverage the premises and derive legally correct conclusions.

Updated: 2024-07-31 05:16:30

标题: 蓝宝石：语言模型增强的警务调查系统

摘要: 犯罪案件是与时间赛跑。警察需要一种能够提供及时但精确法律建议的人工智能辅助犯罪调查系统。我们介绍了 LAPIS（语言模型增强警察调查系统），这是一个自动化系统，可以帮助警察进行理性和合法的调查行动。我们构建了一个专门用于犯罪调查法律推理任务的微调数据集和检索知识库。通过引入一组领域专家进行手动筛选工作，我们提高了数据集的质量。然后，我们将较小的韩语语言模型的预训练权重微调到新构建的数据集上，并将其与犯罪调查知识库检索方法集成。实验结果显示，LAPIS在为警察提供可靠的法律指导方面具有潜力，甚至优于专有的GPT-4模型。对由LAPIS生成的推理的定性分析表明，该模型具有利用前提并得出合法正确结论的推理能力。

更新时间: 2024-07-31 05:16:30

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.20248v2

Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks

Reinforcement Learning (RL) has been widely used to solve tasks where the environment consistently provides a dense reward value. However, in real-world scenarios, rewards can often be poorly defined or sparse. Auxiliary signals are indispensable for discovering efficient exploration strategies and aiding the learning process. In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called \textit{NaSA-TD3}. The experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings. NaSA-TD3 outperforms existing state-of-the-art RL image-based methods in terms of final performance without requiring pre-trained models or human demonstrations.

Updated: 2024-07-31 05:11:06

标题: 基于图像的深度强化学习与内在动机刺激：复杂机器人任务的执行

摘要: 强化学习（RL）被广泛应用于解决环境持续提供丰富奖励值的任务。然而，在现实世界情景中，奖励往往定义不清晰或稀疏。辅助信号对于发现有效的探索策略和辅助学习过程至关重要。在这项工作中，受到内在动机理论的启发，我们假设新奇性和惊喜的内在刺激可以帮助改善在复杂、奖励稀疏的环境中的探索。我们引入了一种新颖的样本高效方法，能够直接从像素中学习，这是一个基于图像的TD3的扩展，带有一个自编码器，名为NaSA-TD3。实验证明，NaSA-TD3易于训练，是解决复杂连续控制机器人任务的有效方法，在模拟环境和现实世界环境中都表现出色。NaSA-TD3在最终性能方面优于现有的基于RL图像的方法，而无需预训练模型或人类演示。

更新时间: 2024-07-31 05:11:06

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21338v1

One-time Pad Encryption Model for Non-local Correlations

We present a cryptographic-inspired framework for modeling Bell nonlocal correlations. Drawing inspiration from the renowned De Broglie-Bohm theory, we conceptualize nonlocal boxes as realistic systems featuring instantaneous signaling at the hidden variable level. By introducing randomness into the distribution of the hidden variable the superluminal signaling model is made compatible with the operational no-signalling condition. As our design mimics the famous symmetric key encryption system called {\it One-time Pad} (OTP), we call this the OTP model for nonlocal boxes. We illustrate the efficacy of this model through various esoteric examples related to the non-classical nature of nonlocal boxes. In particular, the breakdown of communication complexity using nonlocal boxes can be better understood in this framework. Additionally, we delve into the Van Dam protocol, revealing its connection to homomorphic encryption studied in cryptography. Exploring potential avenues for encapsulating quantum-realizable nonlocal correlations within our framework, we highlight that the Information Causality principle imposes additional constraints at the hidden variable level. Present work thus orchestrates the results in classical cryptography to improve our understanding of nonlocal correlations and welcomes further research to this connection.

Updated: 2024-07-31 05:00:59

标题: 非局部相关性的一次性密码加密模型

摘要: 我们提出了一个受密码学启发的框架，用于建模贝尔非局域相关性。从著名的德布罗意-波姆理论中汲取灵感，我们将非局域盒子概念化为在隐藏变量层面具有即时信号传递的现实系统。通过在隐藏变量的分布中引入随机性，超光速信号传递模型与操作性无信号条件相容。由于我们的设计模仿了著名的对称密钥加密系统称为一次性密码（OTP），我们将其称为非局域盒子的OTP模型。我们通过与与非经典性质相关的各种奇特例子展示了这一模型的有效性。特别是，使用非局域盒子破坏通信复杂性在这一框架下可以更好地理解。此外，我们深入研究了范达姆协议，揭示了它与密码学中研究的同态加密的联系。在我们的框架内探索封装量子可实现的非局域相关性的潜在途径，我们强调信息因果原则在隐藏变量层面施加额外约束。因此，目前的工作将古典密码学的结果编排起来，以改进我们对非局域相关性的理解，并欢迎进一步研究这一联系。

更新时间: 2024-07-31 05:00:59

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2307.03395v2

Multi-Tower Multi-Interest Recommendation with User Representation Repel

In the era of information overload, the value of recommender systems has been profoundly recognized in academia and industry alike. Multi-interest sequential recommendation, in particular, is a subfield that has been receiving increasing attention in recent years. By generating multiple-user representations, multi-interest learning models demonstrate superior expressiveness than single-user representation models, both theoretically and empirically. Despite major advancements in the field, three major issues continue to plague the performance and adoptability of multi-interest learning methods, the difference between training and deployment objectives, the inability to access item information, and the difficulty of industrial adoption due to its single-tower architecture. We address these challenges by proposing a novel multi-tower multi-interest framework with user representation repel. Experimental results across multiple large-scale industrial datasets proved the effectiveness and generalizability of our proposed framework.

Updated: 2024-07-31 04:58:56

标题: 多塔多兴趣推荐与用户表征抗衡

摘要: 在信息过载的时代，推荐系统的价值在学术界和工业界都得到了深刻的认可。特别是多兴趣序列推荐是一个近年来受到日益关注的子领域。通过生成多用户表示，多兴趣学习模型在理论和实证上都表现出比单用户表示模型更优越的表达能力。尽管该领域取得了重大进展，但仍然存在三个主要问题困扰着多兴趣学习方法的性能和可采用性，即训练和部署目标之间的差异，无法访问项目信息，以及由于其单塔架构而难以在工业中采用。我们通过提出一种新颖的多塔多兴趣框架，并引入用户表示排斥来解决这些挑战。跨多个大规模工业数据集的实验结果证明了我们提出的框架的有效性和普适性。

更新时间: 2024-07-31 04:58:56

领域: cs.IR,cs.LG,H.3.3

下载: http://arxiv.org/abs/2403.05122v2

The Hard-Constraint PINNs for Interface Optimal Control Problems

We show that the physics-informed neural networks (PINNs), in combination with some recently developed discontinuity capturing neural networks, can be applied to solve optimal control problems subject to partial differential equations (PDEs) with interfaces and some control constraints. The resulting algorithm is mesh-free and scalable to different PDEs, and it ensures the control constraints rigorously. Since the boundary and interface conditions, as well as the PDEs, are all treated as soft constraints by lumping them into a weighted loss function, it is necessary to learn them simultaneously and there is no guarantee that the boundary and interface conditions can be satisfied exactly. This immediately causes difficulties in tuning the weights in the corresponding loss function and training the neural networks. To tackle these difficulties and guarantee the numerical accuracy, we propose to impose the boundary and interface conditions as hard constraints in PINNs by developing a novel neural network architecture. The resulting hard-constraint PINNs approach guarantees that both the boundary and interface conditions can be satisfied exactly or with a high degree of accuracy, and they are decoupled from the learning of the PDEs. Its efficiency is promisingly validated by some elliptic and parabolic interface optimal control problems.

Updated: 2024-07-31 04:24:55

标题: 硬约束PINNs用于界面最优控制问题

摘要: 我们展示了物理信息神经网络（PINNs）与一些最近开发的捕捉不连续性的神经网络相结合，可以应用于求解带有界面和一些控制约束的偏微分方程（PDE）的最优控制问题。所得算法是无网格的，并且可扩展到不同的PDE，并且严格确保了控制约束。由于边界和界面条件以及PDE都被汇集到加权损失函数中作为软约束来处理，因此需要同时学习它们，不能保证边界和界面条件能够完全满足。这立即导致在相应的损失函数中调整权重和训练神经网络时的困难。为了解决这些困难并保证数值精度，我们提出了在PINNs中将边界和界面条件作为硬约束的方法，通过开发一种新颖的神经网络架构。所得硬约束PINNs方法保证了边界和界面条件可以完全或高度准确地满足，并且与学习PDE的过程是分离的。它的效率已经通过一些椭圆和抛物界面最优控制问题得到了有希望的验证。

更新时间: 2024-07-31 04:24:55

领域: math.OC,cs.LG,49M41, 68T07, 35Q90, 35Q93, 90C25

下载: http://arxiv.org/abs/2308.06709v2

BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

Real-world planning problems, including autonomous driving and sustainable energy applications like carbon storage and resource exploration, have recently been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods. To solve high-dimensional POMDPs in practice, state-of-the-art methods use online planning with problem-specific heuristics to reduce planning horizons and make the problems tractable. Algorithms that learn approximations to replace heuristics have recently found success in large-scale fully observable domains. The key insight is the combination of online Monte Carlo tree search with offline neural network approximations of the optimal policy and value function. In this work, we bring this insight to partially observable domains and propose BetaZero, a belief-state planning algorithm for high-dimensional POMDPs. BetaZero learns offline approximations that replace heuristics to enable online decision making in long-horizon problems. We address several challenges inherent in large-scale partially observable domains; namely challenges of transitioning in stochastic environments, prioritizing action branching with a limited search budget, and representing beliefs as input to the network. To formalize the use of all limited search information, we train against a novel $Q$-weighted visit counts policy. We test BetaZero on various well-established POMDP benchmarks found in the literature and a real-world problem of critical mineral exploration. Experiments show that BetaZero outperforms state-of-the-art POMDP solvers on a variety of tasks.

Updated: 2024-07-31 04:13:56

标题: BetaZero：使用学习近似值进行长时间跨度POMDP的信念状态规划

摘要: 真实世界中的规划问题，包括自主驾驶和可持续能源应用，如碳储存和资源勘探，最近被建模为部分可观察的马尔可夫决策过程（POMDPs），并使用近似方法解决。为了在实践中解决高维度的POMDPs，最先进的方法使用在线规划与问题特定的启发式方法来减少规划范围，使问题变得可解。最近，在大规模完全可观测的领域中，学习用来取代启发式方法的近似值已经取得成功。关键的洞察力在于将在线蒙特卡洛树搜索与离线神经网络逼近的最优策略和价值函数相结合。在这项工作中，我们将这一洞察力引入部分可观测的领域，并提出BetaZero，一种适用于高维度POMDPs的信念状态规划算法。BetaZero学习离线逼近值，以取代启发式方法，从而使在线决策在长期规划问题中成为可能。我们解决了大规模部分可观测领域中固有的几个挑战，即在随机环境中过渡的挑战，用有限的搜索预算优先考虑行动分支，以及将信念表示为网络的输入。为了形式化利用所有有限搜索信息，我们针对一种新颖的$Q$加权访问计数策略进行训练。我们在文献中找到的各种著名POMDP基准测试和一个关于关键矿产勘探的实际问题上测试了BetaZero。实验证明，BetaZero在各种任务上表现优于最先进的POMDP求解器。

更新时间: 2024-07-31 04:13:56

领域: cs.AI

下载: http://arxiv.org/abs/2306.00249v4

MetaOpenFOAM: an LLM-based multi-agent framework for CFD

Remarkable progress has been made in automated problem solving through societies of agents based on large language models (LLMs). Computational fluid dynamics (CFD), as a complex problem, presents unique challenges in automated simulations that require sophisticated solutions. MetaOpenFOAM, as a novel multi-agent collaborations framework, aims to complete CFD simulation tasks with only natural language as input. These simulation tasks include mesh pre-processing, simulation and post-processing, etc. MetaOpenFOAM harnesses the power of MetaGPT's assembly line paradigm, which assigns diverse roles to various agents, efficiently breaking down complex CFD tasks into manageable subtasks. Langchain further complements MetaOpenFOAM by integrating Retrieval-Augmented Generation (RAG) technology, which enhances the framework's ability by integrating a searchable database of OpenFOAM tutorials for LLMs. Tests on a benchmark for natural language-based CFD solver, consisting of 8 CFD simulation tasks, have shown that MetaOpenFOAM achieved a high pass rate per test (85%), with each test case costing only $0.22 on average. The 8 CFD simulation tasks include compressible and incompressible flows, 2D and 3D flows, heat transfer, and combustion, demonstrating the ability to automate CFD simulations using only natural language input and iteratively correct errors to achieve the desired simulation at a low cost. An ablation study was conducted to verify the necessity of each component in the multi-agent system and the RAG technology. A sensitivity study on the randomness of LLM showed that LLM with low randomness can obtain more stable and accurate results. Additionally, MetaOpenFOAM own the ability to identify and modify key parameters in user requirements and excels in correcting bugs when failures occur, with or without human participation, which demonstrates the generalization of MetaOpenFOAM.

Updated: 2024-07-31 04:01:08

标题: MetaOpenFOAM：一种基于LLM的CFD多代理框架

摘要: 通过基于大型语言模型（LLMs）的代理社团取得了自动问题解决方面的显著进展。作为一个复杂问题，计算流体动力学（CFD）在自动化仿真方面面临着独特的挑战，需要复杂的解决方案。MetaOpenFOAM作为一种新颖的多代理协作框架，旨在仅通过自然语言输入完成CFD仿真任务。这些仿真任务包括网格预处理、仿真和后处理等。MetaOpenFOAM利用MetaGPT的装配线范式，为各种代理分配不同的角色，有效地将复杂的CFD任务分解为可管理的子任务。Langchain通过整合检索增强生成（RAG）技术进一步补充了MetaOpenFOAM，该技术通过整合LLMs的OpenFOAM教程的可搜索数据库增强了框架的能力。在一个基于自然语言的CFD求解器基准测试中，包括8个CFD仿真任务，测试表明MetaOpenFOAM在每个测试中取得了较高的通过率（85%），每个测试案例的平均成本仅为0.22美元。这8个CFD仿真任务包括可压缩和不可压缩流动、2D和3D流动、传热和燃烧，展示了仅使用自然语言输入自动化CFD仿真的能力，并通过迭代纠正错误以低成本实现所需仿真。进行了消融研究以验证多代理系统和RAG技术的必要性。对LLMs随机性的敏感性研究表明，具有低随机性的LLMs可以获得更稳定和准确的结果。此外，MetaOpenFOAM具有识别和修改用户需求中的关键参数的能力，并在发生故障时优于纠正错误，无论是否有人参与，这展示了MetaOpenFOAM的泛化能力。

更新时间: 2024-07-31 04:01:08

领域: cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2407.21320v1

Big Cooperative Learning

Cooperation plays a pivotal role in the evolution of human intelligence; moreover, it also underlies the recent revolutionary advancement of artificial intelligence (AI) that is driven by foundation models. Specifically, we reveal that the training of foundation models can be interpreted as a form of big cooperative learning (\textit{abbr.} big learning), where massive learning individuals/tasks \emph{cooperate} to approach the unique essence of data from diverse perspectives of data prediction, leveraging a universal model. The presented big learning therefore unifies most training objectives of foundation models within a consistent framework, where their underlying assumptions are exposed simultaneously. We design tailored simulations to demonstrate the principle of big learning, based on which we provide learning-perspective justifications for the successes of foundation models, with interesting side-products. Furthermore, we reveal that big learning is a new dimension for upgrading conventional machine learning paradigms, valuable for endowing reinvigorations to associated applications; as an illustrative example, we propose the BigLearn-GAN, which is a novel adversarially-trained foundation model with versatile data sampling capabilities. Code is available at \texttt{https://github.com/YulaiCong/BigCooperativeLearning}.

Updated: 2024-07-31 03:59:14

标题: 大型合作学习

摘要: 合作在人类智能进化中起着至关重要的作用；此外，它也是由基础模型驱动的人工智能（AI）最近革命性进步的基础。具体来说，我们揭示了基础模型的训练可以被解释为一种大型合作学习（缩写为大学习），在这种学习中，大量的学习个体/任务合作以从不同数据预测角度接近数据的独特本质，利用一个通用模型。所提出的大学习因此统一了基础模型的大多数训练目标，将它们的基本假设同时暴露出来。我们设计了定制的模拟来展示大学习的原则，基于这个原则，我们提供了对基础模型成功的学习观点的证明，以及有趣的副产品。此外，我们揭示了大学习是升级传统机器学习范例的新维度，对于赋予相关应用复兴具有价值；作为一个说明性例子，我们提出了BigLearn-GAN，这是一个具有多功能数据采样能力的新型对抗训练的基础模型。代码可在https://github.com/YulaiCong/BigCooperativeLearning找到。

更新时间: 2024-07-31 03:59:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21319v1

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

Diffusion models (DM) represent one of the most advanced generative models today, yet recent studies suggest that DMs are vulnerable to backdoor attacks. Backdoor attacks establish hidden associations between particular input patterns and model behaviors, compromising model integrity by triggering undesirable actions with manipulated input data. This vulnerability poses substantial risks, including reputational damage to model owners and the dissemination of harmful content. To mitigate the threat of backdoor attacks, there have been some investigations on backdoor detection and model repair. However, previous work fails to purify the backdoored DMs created by state-of-the-art attacks, rendering the field much underexplored. To bridge this gap, we introduce \textbf{Diff-Cleanse}, a novel two-stage backdoor defense framework specifically designed for DMs. The first stage employs a innovative trigger inversion technique to detect the backdoor and reconstruct the trigger, and the second stage utilizes a structural pruning method to eliminate the backdoor. We evaluate our framework on hundreds of DMs attacked by 3 existing backdoor attack methods. Extensive experiments demonstrate that Diff-Cleanse achieves nearly 100\% detection accuracy and effectively mitigates backdoor impacts, preserving the model's benign performance with minimal compromise. Our code is avaliable at https://github.com/shymuel/diff-cleanse.

Updated: 2024-07-31 03:54:41

标题: Diff-Cleanse：在扩散模型中识别和减轻后门攻击

摘要: 扩散模型（DM）代表了当今最先进的生成模型之一，然而最近的研究表明DM对后门攻击是脆弱的。后门攻击建立了特定输入模式和模型行为之间的隐藏关联，通过操纵输入数据触发不良操作，从而破坏了模型的完整性。这种脆弱性带来了重大风险，包括对模型所有者声誉的损害和有害内容的传播。为了缓解后门攻击的威胁，已经进行了一些关于后门检测和模型修复的研究。然而，先前的工作未能清除最先进攻击创建的带有后门的DM，使得该领域未被充分探索。为了弥合这一差距，我们引入了Diff-Cleanse，这是一个专门为DM设计的新颖的两阶段后门防御框架。第一阶段采用创新的触发器反转技术来检测后门并重建触发器，第二阶段利用结构修剪方法消除后门。我们在数百个被3种现有后门攻击方法攻击的DM上评估了我们的框架。广泛的实验表明，Diff-Cleanse实现了近乎100\%的检测准确率，有效地减轻了后门的影响，保持了模型的良性性能，几乎没有妥协。我们的代码可在https://github.com/shymuel/diff-cleanse找到。

更新时间: 2024-07-31 03:54:41

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.21316v1

Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances

This paper introduces a novel approach to emotion detection in speech using Large Language Models (LLMs). We address the limitation of LLMs in processing audio inputs by translating speech characteristics into natural language descriptions. Our method integrates these descriptions into text prompts, enabling LLMs to perform multimodal emotion analysis without architectural modifications. We evaluate our approach on two datasets: IEMOCAP and MELD, demonstrating significant improvements in emotion recognition accuracy, particularly for high-quality audio data. Our experiments show that incorporating speech descriptions yields a 2 percentage point increase in weighted F1 score on IEMOCAP (from 70.111\% to 72.596\%). We also compare various LLM architectures and explore the effectiveness of different feature representations. Our findings highlight the potential of this approach in enhancing emotion detection capabilities of LLMs and underscore the importance of audio quality in speech-based emotion recognition tasks. We'll release the source code on Github.

Updated: 2024-07-31 03:53:14

标题: 超越沉默的信件：通过声音细微差别增强LLMs在情绪识别中的作用

摘要: 本文介绍了一种在语音情绪检测中应用大型语言模型（LLMs）的新方法。我们通过将语音特征转化为自然语言描述，解决了LLMs在处理音频输入中的局限性。我们的方法将这些描述集成到文本提示中，使LLMs能够执行多模态情绪分析而无需进行架构修改。我们在两个数据集上评估了我们的方法：IEMOCAP和MELD，展示了在情绪识别准确性方面的显著改善，特别是对于高质量的音频数据。我们的实验表明，整合语音描述可以使IEMOCAP的加权F1分数增加2个百分点（从70.111％增加到72.596％）。我们还比较了各种LLM架构，并探讨了不同特征表示的有效性。我们的研究结果突出了这种方法在增强LLMs情绪检测能力方面的潜力，并强调了音频质量在基于语音的情绪识别任务中的重要性。我们将在Github上发布源代码。

更新时间: 2024-07-31 03:53:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.21315v1

Intent-Based Access Control: Using LLMs to Intelligently Manage Access Control

In every enterprise database, administrators must define an access control policy that specifies which users have access to which assets. Access control straddles two worlds: policy (organization-level principles that define who should have access) and process (database-level primitives that actually implement the policy). Assessing and enforcing process compliance with a policy is a manual and ad-hoc task. This paper introduces a new paradigm for access control called Intent-Based Access Control for Databases (IBAC-DB). In IBAC-DB, access control policies are expressed more precisely using a novel format, the natural language access control matrix (NLACM). Database access control primitives are synthesized automatically from these NLACMs. These primitives can be used to generate new DB configurations and/or evaluate existing ones. This paper presents a reference architecture for an IBAC-DB interface, an initial implementation for PostgreSQL (which we call LLM4AC), and initial benchmarks that evaluate the accuracy and scope of such a system. We find that our chosen implementation, LLM4AC, vastly outperforms other baselines, achieving high accuracies and F1 scores on our initial benchmarks, which include state-of-the-art NL2SQL data requiring external knowledge, and real-world role hierarchies from the Amazon Access dataset.

Updated: 2024-07-31 03:53:04

标题: 基于意图的访问控制：使用LLMs智能管理访问控制

摘要: 在每个企业数据库中，管理员必须定义一个访问控制策略，指定哪些用户可以访问哪些资产。访问控制跨越两个领域：政策（定义谁应该具有访问权限的组织级原则）和流程（实际执行政策的数据库级基本原语）。评估和执行流程与政策的一致性是一项手动和临时的任务。本文介绍了一种名为基于意图的数据库访问控制（IBAC-DB）的访问控制新范式。在IBAC-DB中，访问控制策略使用一种新颖的格式更精确地表达，即自然语言访问控制矩阵（NLACM）。数据库访问控制基本原语会自动从这些NLACM中合成。这些基本原语可以用于生成新的数据库配置和/或评估现有配置。本文提出了一个IBAC-DB界面的参考架构，一个针对PostgreSQL的初始实现（我们称之为LLM4AC），以及评估这种系统准确性和范围的初始基准测试。我们发现我们选择的实现，LLM4AC，在初始基准测试中远远优于其他基线，实现了高准确性和F1分数，其中包括需要外部知识的最新NL2SQL数据和来自亚马逊访问数据集的真实角色层次结构。

更新时间: 2024-07-31 03:53:04

领域: cs.DB,cs.CR

下载: http://arxiv.org/abs/2402.07332v2

Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations

To some, the advent of AI promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs) that can be applied to many tasks, behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation and unnecessary conflicts. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 107 national security experts designed to examine crisis escalation in a fictional US-China scenario and compare the behavior of human player teams to LLM-simulated team responses in separate simulations. Here, we find that the LLM-simulated responses can be more aggressive and significantly affected by changes in the scenario. We show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between a team of players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as "pacifist" or "aggressive sociopath." When probing behavioral consistency across individual moves of the simulation, the tested LLMs deviated from each other but generally showed somewhat consistent behavior. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.

Updated: 2024-07-31 03:52:46

标题: 人类与机器：在战争游戏模拟中，专家人类与语言模型之间的行为差异

摘要: 对一些人来说，人工智能的出现承诺着更好的决策和增加军事效力，同时减少人为错误和情感的影响。然而，对于人工智能系统，特别是可以应用于许多任务的大型语言模型（LLMs），在高风险军事决策情景中与人类行为相比如何行事仍存在争议，存在潜在风险可能导致升级和不必要冲突。为了测试这种潜力并审查LLMs在这些目的上的使用，我们使用了一个新的战争游戏实验，参与者包括107位国家安全专家，旨在研究虚构的美中危机情景中的危机升级，并比较人类玩家团队与LLM模拟团队响应在独立仿真中的行为。在这里，我们发现LLM模拟响应可能更具侵略性，并且受到情景变化的显著影响。我们展示了LLM和人类响应之间的相当高水平一致性，以及在个体行动和战略倾向上的显著数量和质量差异。这些差异取决于LLMs在遵循战略指令时关于适当暴力水平的固有偏见，LLM的选择，以及LLMs是直接决定玩家团队还是首先模拟玩家团队之间的对话。在模拟对话时，讨论缺乏质量并保持荒谬的和谐。LLM模拟不能考虑到人类玩家的特征，即使对于极端特征如“和平主义者”或“侵略性反社会者”也没有显著差异。在测试LLM模拟中个体行动的行为一致性时，被测试的LLMs彼此偏离，但通常表现出一定程度的一致行为。我们的结果促使决策者在授予自主权或遵循基于人工智能的战略建议之前谨慎。

更新时间: 2024-07-31 03:52:46

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.03407v3

State-observation augmented diffusion model for nonlinear assimilation

Data assimilation has become a crucial technique aiming to combine physical models with observational data to estimate state variables. Traditional assimilation algorithms often face challenges of high nonlinearity brought by both the physical and observational models. In this work, we propose a novel data-driven assimilation algorithm based on generative models to address such concerns. Our State-Observation Augmented Diffusion (SOAD) model is designed to handle nonlinear physical and observational models more effectively. The marginal posterior associated with SOAD has been derived and then proved to match the real posterior under mild assumptions, which shows theoretical superiority over previous score-based assimilation works. Experimental results also indicate that our SOAD model may offer improved accuracy over existing data-driven methods.

Updated: 2024-07-31 03:47:20

标题: 非线性同化的状态观测增强扩散模型

摘要: 数据同化已经成为一种至关重要的技术，旨在将物理模型与观测数据结合起来估计状态变量。传统的同化算法通常面临物理模型和观测模型带来的高非线性挑战。在这项工作中，我们提出了一种基于生成模型的新型数据驱动同化算法来解决这些问题。我们设计的State-Observation Augmented Diffusion（SOAD）模型旨在更有效地处理非线性的物理模型和观测模型。与SOAD相关的边缘后验已经推导出来，并且在温和假设下被证明与真实后验匹配，这显示了其在理论上优于先前基于评分的同化工作。实验结果还表明，我们的SOAD模型可能比现有的数据驱动方法提供更高的准确性。

更新时间: 2024-07-31 03:47:20

领域: cs.LG,stat.ML,49N45, 60J60, 62F15, 68T20

下载: http://arxiv.org/abs/2407.21314v1

Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO

Updated: 2024-07-31 03:35:50

标题: 整数值训练和脉冲驱动推理的尖峰神经网络用于高性能和高能效目标检测

摘要: 脑启发的尖峰神经网络（SNNs）具有生物合理性和低功耗优势，相对于人工神经网络（ANNs）。目前，SNNs的应用仅限于简单的分类任务，因为它们性能较差。在这项工作中，我们专注于弥合ANNs和SNNs在目标检测上的性能差距。我们的设计围绕网络架构和尖峰神经元展开。首先，当YOLO系列转换为相应的尖峰版本时，过于复杂的模块设计会导致尖峰退化。我们通过设计一个SpikeYOLO架构来解决这个问题，简化了原始的YOLO并融入了元SNN块。其次，目标检测对于将膜电位转化为二进制尖峰时的量化误差更为敏感。为了解决这一挑战，我们设计了一个新的尖峰神经元，在训练时激活整数值，同时在推理过程中通过延长虚拟时间步长来维持尖峰驱动。该方法在静态和神经形态目标检测数据集上得到验证。在静态COCO数据集上，我们获得了66.2%的mAP@50和48.9%的mAP@50:95，分别比先前现有SNN的最新技术高出15.0%和18.7%。在神经形态Gen1数据集上，我们实现了67.2%的mAP@50，比具有相同架构的ANN提高了2.5%，能效提高了5.7倍。代码：https://github.com/BICLab/SpikeYOLO

更新时间: 2024-07-31 03:35:50

领域: cs.AI

下载: http://arxiv.org/abs/2407.20708v2

EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer

Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data. Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results. However, the complexity and large number of trainable parameters of ViTs restrict their deployment in practical applications. This underscores the need for an efficient model that not only reduces trainable parameters but also allows for adjustable complexity based on specific needs while delivering comparable performance. To achieve this, in this paper we introduce an Efficient Unsupervised Domain Adaptation (EUDA) framework. EUDA employs the DINOv2, which is a self-supervised ViT, as a feature extractor followed by a simplified bottleneck of fully connected layers to refine features for enhanced domain adaptation. Additionally, EUDA employs the synergistic domain alignment loss (SDAL), which integrates cross-entropy (CE) and maximum mean discrepancy (MMD) losses, to balance adaptation by minimizing classification errors in the source domain while aligning the source and target domain distributions. The experimental results indicate the effectiveness of EUDA in producing comparable results as compared with other state-of-the-art methods in domain adaptation with significantly fewer trainable parameters, between 42% to 99.7% fewer. This showcases the ability to train the model in a resource-limited environment. The code of the model is available at: https://github.com/A-Abedi/EUDA.

Updated: 2024-07-31 03:29:28

标题: EUDA：一种通过自监督视觉变换器实现高效无监督领域自适应的方法

摘要: 无监督域自适应（UDA）旨在缓解域偏移问题，即训练（源）数据的分布与测试（目标）数据不同。已经开发了许多模型来解决这个问题，最近视觉变换器（ViTs）显示出有希望的结果。然而，ViTs的复杂性和可训练参数的数量限制了它们在实际应用中的部署。这凸显了需要一个既减少可训练参数又允许根据特定需求调整复杂性的高效模型，同时提供可比较的性能。为了实现这一目标，本文介绍了一种高效的无监督域自适应（EUDA）框架。EUDA采用DINOv2作为特征提取器，DINOv2是一种自监督ViT，之后是一系列简化的全连接层瓶颈，用于改进特征以增强域自适应。此外，EUDA采用协同域对齐损失（SDAL），该损失集成了交叉熵（CE）和最大均值差异（MMD）损失，通过最小化源域中的分类错误来平衡适应性，同时对齐源域和目标域分布。实验结果表明，与其他领先方法相比，EUDA在域自适应中产生可比较的结果，可训练参数明显减少，减少了42%至99.7%。这展示了在资源有限的环境中训练模型的能力。该模型的代码可在以下网址找到：https://github.com/A-Abedi/EUDA。

更新时间: 2024-07-31 03:29:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21311v1

MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration

The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.

Updated: 2024-07-31 03:26:14

标题: MSMA：联网和自动驾驶车辆环境中的多智能体轨迹预测与多源数据集成

摘要: 周围车辆轨迹的预测对于无碰撞路径规划至关重要。在这项研究中，我们关注一个场景，即连接和自动驾驶车辆（CAV）作为中心代理，利用传感器和通信技术来感知由自动驾驶车辆（AVs）、连接车辆（CVs）和人驾驶车辆（HDVs）组成的周围交通。我们的轨迹预测任务针对所有检测到的周围车辆。为了有效整合来自传感器和通信技术的多源数据，我们提出了一个名为MSMA的深度学习框架，利用交叉注意模块进行多源数据融合。矢量地图数据用于提供上下文信息。轨迹数据集在CARLA模拟器中收集，并引入了合成数据错误。数值实验表明，在混合交通流场景中，来自不同来源的数据整合增强了我们对环境的理解。这显着提高了轨迹预测准确性，特别是在CV市场渗透率较高的情况下。代码可在以下网址获得：https://github.com/xichennn/MSMA。

更新时间: 2024-07-31 03:26:14

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.21310v1

Oblivious Monitoring for Discrete-Time STL via Fully Homomorphic Encryption

When monitoring a cyber-physical system (CPS) from a remote server, keeping the monitored data secret is crucial, particularly when they contain sensitive information, e.g., biological or location data. Recently, Banno et al. (CAV'22) proposed a protocol for online LTL monitoring that keeps data concealed from the server using Fully Homomorphic Encryption (FHE). We build on this protocol to allow arithmetic operations over encrypted values, e.g., to compute a safety measurement combining distance, velocity, and so forth. Overall, our protocol enables oblivious online monitoring of discrete-time real-valued signals against signal temporal logic (STL) formulas. Our protocol combines two FHE schemes, CKKS and TFHE, leveraging their respective strengths. We employ CKKS to evaluate arithmetic predicates in STL formulas while utilizing TFHE to process them using a DFA derived from the STL formula. We conducted case studies on monitoring blood glucose levels and vehicles' behavior against the Responsibility-Sensitive Safety (RSS) rules. Our results suggest the practical relevance of our protocol.

Updated: 2024-07-31 03:22:52

标题: 通过完全同态加密进行离散时间STL的无意识监测

摘要: 当从远程服务器监控一个网络物理系统（CPS）时，保持监控数据的机密性是至关重要的，特别是当它们包含敏感信息，例如生物或位置数据。最近，Banno等人（CAV'22）提出了一种在线LTL监控协议，该协议使用完全同态加密（FHE）从服务器中隐藏数据。我们在此协议的基础上进行了改进，以允许对加密数值进行算术运算，例如计算结合距离、速度等的安全度量。总体而言，我们的协议实现了对离散时间实值信号针对信号时间逻辑（STL）公式的遗忘式在线监测。我们的协议结合了两种FHE方案，CKKS和TFHE，利用它们各自的优势。我们利用CKKS来评估STL公式中的算术谓词，同时利用TFHE来使用从STL公式中导出的DFA来处理它们。我们对监测血糖水平和车辆行为遵守责任敏感安全（RSS）规则进行了案例研究。我们的结果表明我们的协议的实际相关性。

更新时间: 2024-07-31 03:22:52

领域: cs.CR,cs.FL

下载: http://arxiv.org/abs/2405.16767v2

Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation

Knowledge graphs (KGs) are essential in applications such as network alignment, question-answering, and recommender systems (RSs) since they offer structured relational data that facilitate the inference of indirect relationships. However, the development of KG-based RSs capable of processing user inputs in natural language faces significant challenges. Firstly, natural language processing units must effectively handle the ambiguity and variability in human language to interpret user intents accurately. Secondly, the system must precisely identify and link entities, like product names, to their corresponding nodes in KGs. To overcome these challenges, supported by Lenovo, we developed a novel chatbot called "Prometheus," which integrates a KG with a large language model (LLM), specifically designed for recommending computer components. This chatbot can accurately decode user requests and deliver personalized recommendations derived from KGs, ensuring precise comprehension and response to their computer setup needs.

Updated: 2024-07-31 03:20:35

标题: 普罗米修斯聊天机器人：知识图谱协作大型语言模型用于计算机组件推荐

摘要: 知识图谱（KGs）在网络对齐、问答和推荐系统（RSs）等应用中至关重要，因为它们提供结构化的关系数据，有助于推断间接关系。然而，基于KG的RSs在处理自然语言用户输入时面临重大挑战。首先，自然语言处理单元必须有效处理人类语言中的歧义和变化，以准确解释用户意图。其次，系统必须准确识别和链接实体，如产品名称，到KG中的相应节点。为了克服这些挑战，我们在联想的支持下开发了一款名为“Prometheus”的新型聊天机器人，它将KG与大型语言模型（LLM）整合在一起，专门设计用于推荐计算机组件。这款聊天机器人可以准确解码用户请求，并从KG中提供个性化推荐，确保精确理解并满足他们的计算机设置需求。

更新时间: 2024-07-31 03:20:35

领域: cs.AI

下载: http://arxiv.org/abs/2407.19643v2

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T.

Updated: 2024-07-31 03:06:40

标题: MINT-1T: 将开源多模态数据扩展至10倍：一个包含一万亿标记的多模态数据集

摘要: 多模态交错数据集是训练前沿大型多模态模型（LMMs）至关重要的，其中包含自由形式的图像和文本交错序列。尽管开源LMMs的发展迅速，但大规模、多样化的开源多模态交错数据集仍然非常稀缺。为此，我们推出了迄今为止规模最大、最多样化的开源多模态交错数据集MINT-1T。MINT-1T包含一万亿文本标记和34亿图像，比现有开源数据集扩大了10倍。此外，我们还包括以前未开发的来源，如PDF和ArXiv论文。由于扩展多模态交错数据集需要大量工程工作，因此共享数据整理过程并释放数据集将极大地使社区受益。我们的实验表明，在MINT-1T上训练的LMMs与先前领先数据集OBELICS上训练的模型性能相媲美。我们的数据和代码将发布在https://github.com/mlfoundations/MINT-1T。

更新时间: 2024-07-31 03:06:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.11271v3

Implementing Streaming algorithm and k-means clusters to RAG

Retrieval-augmented generation (RAG) has achieved great success in information retrieval to assist large models because it builds an external knowledge database. However, it also has many problems: it consumes a lot of memory because of the huge database. When faced with massive streaming data, it is unable to update the established index database in time. To save the memory of building the database and maintain accuracy simultaneously, we proposed a new approach combining a streaming algorithm and k-means cluster with RAG. Our approach applies a streaming algorithm to update the index and reduce memory consumption. Then use the k-means algorithm to cluster documents with high similarities together, the query time will be shortened by doing this. We conducted comparative experiments on four methods, and the results show that RAG with streaming algorithm and k-means cluster performs well in accuracy and memory. For massive streaming data, we find that our method behaves better than traditional RAG

Updated: 2024-07-31 03:00:59

标题: 实施流算法和k-means聚类到RAG

摘要: 检索增强生成（RAG）在信息检索中取得了巨大成功，因为它构建了一个外部知识数据库来辅助大型模型。然而，它也存在许多问题：由于庞大的数据库，它消耗大量内存。当面临大量流数据时，它无法及时更新已建立的索引数据库。为了节省构建数据库的内存并同时保持准确性，我们提出了一种新方法，将流算法和k均值聚类与RAG相结合。我们的方法应用流算法来更新索引并减少内存消耗。然后使用k均值算法将具有高相似性的文档聚类在一起，通过这样做可以缩短查询时间。我们对四种方法进行了比较实验，结果显示，具有流算法和k均值聚类的RAG在准确性和内存方面表现良好。对于大量流数据，我们发现我们的方法比传统的RAG表现更好。

更新时间: 2024-07-31 03:00:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.21300v1

Who should I trust? A Visual Analytics Approach for Comparing Net Load Forecasting Models

Net load forecasting is crucial for energy planning and facilitating informed decision-making regarding trade and load distributions. However, evaluating forecasting models' performance against benchmark models remains challenging, thereby impeding experts' trust in the model's performance. In this context, there is a demand for technological interventions that allow scientists to compare models across various timeframes and solar penetration levels. This paper introduces a visual analytics-based application designed to compare the performance of deep-learning-based net load forecasting models with other models for probabilistic net load forecasting. This application employs carefully selected visual analytic interventions, enabling users to discern differences in model performance across different solar penetration levels, dataset resolutions, and hours of the day over multiple months. We also present observations made using our application through a case study, demonstrating the effectiveness of visualizations in aiding scientists in making informed decisions and enhancing trust in net load forecasting models.

Updated: 2024-07-31 02:57:21

标题: 我应该信任谁？一种用于比较净负荷预测模型的视觉分析方法

摘要: 净负荷预测对于能源规划和促进有关贸易和负载分配的决策具有至关重要的作用。然而，评估预测模型与基准模型的性能仍然具有挑战性，从而阻碍了专家对模型性能的信任。在这种情况下，存在对技术干预的需求，使科学家能够在不同的时间范围和太阳能渗透水平下比较模型。本文介绍了一种基于视觉分析的应用程序，旨在比较基于深度学习的净负荷预测模型与其他模型在概率净负荷预测方面的性能。该应用程序采用精心选择的视觉分析干预措施，使用户能够辨别出不同太阳能渗透水平、数据集分辨率和多个月份内一天不同时间的模型性能差异。我们还通过案例研究介绍了使用我们的应用程序所得出的观察结果，展示了可视化在帮助科学家做出明智决策和增强对净负荷预测模型信任的有效性。

更新时间: 2024-07-31 02:57:21

领域: cs.HC,cs.AI,cs.LG,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2407.21299v1

A Vectorization Method Induced By Maximal Margin Classification For Persistent Diagrams

Persistent homology is an effective method for extracting topological information, represented as persistent diagrams, of spatial structure data. Hence it is well-suited for the study of protein structures. Attempts to incorporate Persistent homology in machine learning methods of protein function prediction have resulted in several techniques for vectorizing persistent diagrams. However, current vectorization methods are excessively artificial and cannot ensure the effective utilization of information or the rationality of the methods. To address this problem, we propose a more geometrical vectorization method of persistent diagrams based on maximal margin classification for Banach space, and additionaly propose a framework that utilizes topological data analysis to identify proteins with specific functions. We evaluated our vectorization method using a binary classification task on proteins and compared it with the statistical methods that exhibit the best performance among thirteen commonly used vectorization methods. The experimental results indicate that our approach surpasses the statistical methods in both robustness and precision.

Updated: 2024-07-31 02:55:01

标题: 一个由最大间隔分类引起的持续图的向量化方法

摘要: Persistent homology是一种提取空间结构数据拓扑信息的有效方法，表示为持久图。因此，它非常适合用于研究蛋白质结构。尝试将Persistent homology纳入蛋白功能预测的机器学习方法中已经导致了几种用于向量化持续图的技术。然而，当前的向量化方法过于人为，无法保证信息的有效利用或方法的合理性。为解决这一问题，我们提出了一种基于Banach空间最大边界分类的更加几何化的持久图向量化方法，并额外提出了一个利用拓扑数据分析来识别具有特定功能的蛋白质的框架。我们使用二元分类任务对蛋白质进行了我们的向量化方法评估，并将其与表现最佳的十三种常用向量化方法进行了比较。实验结果表明，我们的方法在稳健性和精度方面均优于统计方法。

更新时间: 2024-07-31 02:55:01

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2407.21298v1

Disentangled Condensation for Large-scale Graphs

Graph condensation has emerged as an intriguing technique to save the expensive training costs of Graph Neural Networks (GNNs) by substituting a condensed small graph with the original graph. Despite the promising results achieved, previous methods usually employ an entangled paradigm of redundant parameters (nodes, edges, GNNs), which incurs complex joint optimization during condensation. This paradigm has considerably impeded the scalability of graph condensation, making it challenging to condense extremely large-scale graphs and generate high-fidelity condensed graphs. Therefore, we propose to disentangle the condensation process into a two-stage GNN-free paradigm, independently condensing nodes and generating edges while eliminating the need to optimize GNNs at the same time. The node condensation module avoids the complexity of GNNs by focusing on node feature alignment with anchors of the original graph, while the edge translation module constructs the edges of the condensed nodes by transferring the original structure knowledge with neighborhood anchors. This simple yet effective approach achieves at least 10 times faster than state-of-the-art methods with comparable accuracy on medium-scale graphs. Moreover, the proposed DisCo can successfully scale up to the Ogbn-papers100M graph with flexible reduction rates. Extensive downstream tasks and ablation study on five common datasets further demonstrate the effectiveness of the proposed DisCo framework. The source code will be made publicly available.

Updated: 2024-07-31 02:52:20

标题: 大规模图的解缠结凝聚

摘要: 图结构压缩已经成为一种有趣的技术，通过用原始图替代一个压缩的小图来节省图神经网络（GNNs）的昂贵训练成本。尽管取得了令人期待的结果，先前的方法通常采用一个纠缠的范式，包括冗余参数（节点、边、GNNs），这在压缩过程中导致复杂的联合优化。这种范式极大地阻碍了图结构压缩的可扩展性，使得对极大规模图进行压缩和生成高保真度压缩图变得具有挑战性。因此，我们提出将压缩过程解耦为一个两阶段的无GNN范式，独立地压缩节点并生成边，同时消除了同时优化GNNs的需要。节点压缩模块通过将焦点放在节点特征与原始图的锚点的对齐上，避免了GNNs的复杂性，而边转换模块通过将原始结构知识与邻域锚点一起传递来构建压缩节点的边。这种简单而有效的方法在中等规模图上至少比最先进的方法快10倍，并具有可比较的准确性。此外，所提出的DisCo可以成功扩展到具有灵活减少率的Ogbn-papers100M图。对五个常见数据集的广泛下游任务和消融研究进一步证明了所提出的DisCo框架的有效性。源代码将公开提供。

更新时间: 2024-07-31 02:52:20

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2401.12231v2

Patient-centered data science: an integrative framework for evaluating and predicting clinical outcomes in the digital health era

This study proposes a novel, integrative framework for patient-centered data science in the digital health era. We developed a multidimensional model that combines traditional clinical data with patient-reported outcomes, social determinants of health, and multi-omic data to create comprehensive digital patient representations. Our framework employs a multi-agent artificial intelligence approach, utilizing various machine learning techniques including large language models, to analyze complex, longitudinal datasets. The model aims to optimize multiple patient outcomes simultaneously while addressing biases and ensuring generalizability. We demonstrate how this framework can be implemented to create a learning healthcare system that continuously refines strategies for optimal patient care. This approach has the potential to significantly improve the translation of digital health innovations into real-world clinical benefits, addressing current limitations in AI-driven healthcare models.

Updated: 2024-07-31 02:36:17

标题: 患者中心的数据科学：数字健康时代中评估和预测临床结果的综合框架

摘要: 这项研究提出了一个新颖的、集成的框架，用于数字健康时代的以患者为中心的数据科学。我们开发了一个多维模型，将传统临床数据与患者报告的结果、健康社会决定因素和多组学数据结合起来，创建全面的数字患者表征。我们的框架采用了多智能体人工智能方法，利用各种机器学习技术，包括大型语言模型，来分析复杂的、纵向的数据集。该模型旨在同时优化多个患者结果，同时解决偏见问题并确保泛化性。我们展示了如何实施这一框架，以创建一个持续完善策略以实现最佳患者护理的学习医疗系统。这种方法有可能显著改善数字健康创新转化为现实世界临床效益的能力，解决了AI驱动的医疗模型中目前存在的限制。

更新时间: 2024-07-31 02:36:17

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2408.02677v1

Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach

We consider the problem of learning stable matchings in a fully decentralized and uncoordinated manner. In this problem, there are $n$ men and $n$ women, each having preference over the other side. It is assumed that women know their preferences over men, but men are not aware of their preferences over women, and they only learn them if they propose and successfully get matched to women. A matching is called stable if no man and woman prefer each other over their current matches. When all the preferences are known a priori, the celebrated Deferred-Acceptance algorithm proposed by Gale and Shapley provides a decentralized and uncoordinated algorithm to obtain a stable matching. However, when the preferences are unknown, developing such an algorithm faces major challenges due to a lack of coordination. We achieve this goal by making a connection between stable matchings and learning Nash equilibria (NE) in noncooperative games. First, we provide a complete information game formulation for the stable matching problem with known preferences such that its set of pure NE coincides with the set of stable matchings, while its mixed NE can be rounded in a decentralized manner to a stable matching. Relying on such a game-theoretic formulation, we show that for hierarchical markets, adopting the exponential weight (EXP) learning algorithm for the stable matching game achieves logarithmic regret with polynomial dependence on the number of players, thus answering a question posed in previous literature. Moreover, we show that the same EXP learning algorithm converges locally and exponentially fast to a stable matching in general matching markets. We complement this result by introducing another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability, leveraging the weak acyclicity property of the stable matching game.

Updated: 2024-07-31 02:36:14

标题: 去中心化和不协调学习稳定匹配：一个博弈论方法

摘要: 我们考虑在完全分散和不协调的方式下学习稳定匹配的问题。在这个问题中，有$n$个男人和$n$个女人，每个人对另一方有偏好。假设女性知道他们对男性的偏好，但男性不知道他们对女性的偏好，只有当他们提出并成功匹配到女性时才会了解。如果没有男人和女人更喜欢对方而不是他们当前的匹配，那么匹配称为稳定的。当所有偏好事先已知时，盖尔和沙普利提出的延迟接受算法提供了一种去获得稳定匹配的分散和不协调算法。然而，当偏好未知时，开发这样的算法面临着协调不足的重大挑战。我们通过建立稳定匹配和学习纳什均衡（NE）在非合作博弈中的联系来实现这一目标。首先，我们提供了一个完全信息博弈的框架，用于已知偏好的稳定匹配问题，使得其纯NE集合与稳定匹配的集合相一致，而其混合NE可以以分散的方式舍入到一个稳定匹配。依赖这样一个博弈理论框架，我们展示了对于分层市场，采用指数权重（EXP）学习算法能够在稳定匹配游戏中实现对数遗憾，与玩家数量的多项式依赖有关，从而回答了之前文献中提出的一个问题。此外，我们展示了相同的EXP学习算法在一般匹配市场中局部和指数快速收敛到一个稳定匹配。我们通过引入另一个分散和不协调的学习算法来补充这一结果，该算法在全局上以任意高的概率收敛到一个稳定匹配，利用稳定匹配游戏的弱无环性质。

更新时间: 2024-07-31 02:36:14

领域: cs.GT,cs.LG,cs.MA,cs.SI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.21294v1

SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving

Many fields could benefit from the rapid development of the large language models (LLMs). The end-to-end autonomous driving (e2eAD) is one of the typically fields facing new opportunities as the LLMs have supported more and more modalities. Here, by utilizing vision-language model (VLM), we proposed an e2eAD method called SimpleLLM4AD. In our method, the e2eAD task are divided into four stages, which are perception, prediction, planning, and behavior. Each stage consists of several visual question answering (VQA) pairs and VQA pairs interconnect with each other constructing a graph called Graph VQA (GVQA). By reasoning each VQA pair in the GVQA through VLM stage by stage, our method could achieve e2e driving with language. In our method, vision transformers (ViT) models are employed to process nuScenes visual data, while VLM are utilized to interpret and reason about the information extracted from the visual inputs. In the perception stage, the system identifies and classifies objects from the driving environment. The prediction stage involves forecasting the potential movements of these objects. The planning stage utilizes the gathered information to develop a driving strategy, ensuring the safety and efficiency of the autonomous vehicle. Finally, the behavior stage translates the planned actions into executable commands for the vehicle. Our experiments demonstrate that SimpleLLM4AD achieves competitive performance in complex driving scenarios.

Updated: 2024-07-31 02:35:33

标题: SimpleLLM4AD: 一种端到端的视觉-语言模型，用于自动驾驶的图像视觉问答

摘要: 许多领域都可以从大型语言模型（LLMs）的快速发展中受益。端到端自动驾驶（e2eAD）是其中一个典型领域，因为LLMs已经支持了越来越多的模态。在这里，通过利用视觉语言模型（VLM），我们提出了一种名为SimpleLLM4AD的e2eAD方法。在我们的方法中，e2eAD任务被分为四个阶段，即感知、预测、规划和行为。每个阶段包含多个视觉问答（VQA）对，并且VQA对彼此相互连接构建了一个名为图像问答图（GVQA）的图形。通过逐步通过VLM推理GVQA中的每个VQA对，我们的方法可以实现具有语言的端到端驾驶。在我们的方法中，视觉变换器（ViT）模型被用于处理nuScenes视觉数据，而VLM被用来解释和推理从视觉输入中提取的信息。在感知阶段，系统识别和分类驾驶环境中的对象。预测阶段涉及预测这些对象的潜在移动。规划阶段利用收集的信息制定驾驶策略，确保自动驾驶车辆的安全和效率。最后，行为阶段将计划的动作转化为车辆可执行的命令。我们的实验表明，SimpleLLM4AD在复杂驾驶场景中取得了竞争性表现。

更新时间: 2024-07-31 02:35:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21293v1

Saliency Guided Image Warping for Unsupervised Domain Adaptation

Driving is challenging in conditions like night, rain, and snow. The lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised domain adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. Current UDA methods, however, treat all image pixels uniformly, leading to over-reliance on the dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear dramatically different across domains. As a result, they struggle to learn effective features of smaller and often sparse foreground objects (e.g., people, vehicles, signs). In this work, we improve UDA training by using in-place image warping to focus on salient object regions. Our insight is that while backgrounds vary significantly across domains (e.g., snowy night vs. clear day), object appearances vary to a lesser extent. Therefore, we design instance-level saliency guidance to adaptively oversample object regions, which reduces adverse effects from background context and enhances backbone feature learning. We then unwarp the better learned features while adapting from source to target. Our approach improves adaptation across geographies, lighting, and weather conditions, and is agnostic to the task (segmentation, detection), domain adaptation algorithm, saliency guidance, and underlying model architecture. Result highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Our method adds minimal training memory and incurs no additional inference latency. Please see Appendix for more results and analysis.

Updated: 2024-07-31 02:33:31

标题: 基于显著性引导的图像变形在无监督领域自适应中的应用

摘要: 在夜晚、雨天和雪天等条件下驾驶是具有挑战性的。缺乏良好标记的数据集阻碍了在这些条件下进行场景理解的进展。使用大型标记的晴天数据集进行无监督领域自适应(UDA)是在这种情况下一个有前途的研究方向。然而，当前的UDA方法通常会将所有图像像素均匀对待，导致对主导场景背景(例如道路、天空、人行道)过度依赖，这些背景在不同领域之间有明显差异。因此，它们很难学习较小且通常稀疏的前景对象(例如人、车辆、标志)的有效特征。在这项工作中，我们通过使用就地图像变形来改进UDA训练，以便专注于显著的对象区域。我们的见解是，虽然背景在不同领域之间变化显著(例如下雪的夜晚与晴天)，但对象的外观变化程度较小。因此，我们设计了实例级显著性指导，自适应地过采样对象区域，从而减少背景背景的不利影响，并增强主干特征学习。然后，在从源到目标的适应过程中解开更好地学习的特征。我们的方法改善了在地理位置、照明和天气条件下的适应性，并且不受任务(分割、检测)、领域自适应算法、显著性指导和基础模型架构的影响。结果亮点包括BDD100K Clear到DENSE Foggy的+6.1 mAP50、BDD100K Day到Night的+3.7 mAP50、BDD100K Clear到Rainy的+3.0 mAP50，以及Cityscapes到ACDC的+6.3 mIoU。我们的方法增加了最小的训练内存，并不会造成额外的推断延迟。请参阅附录获取更多结果和分析。

更新时间: 2024-07-31 02:33:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.12712v2

SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow

Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic relationship of another frame. To address this issue, we propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based on semantic contexts. This is then integrated into Global Fusion Flow Embedding (GF) to initialize flow embedding based on global correlations in both contextual and Euclidean spaces. Secondly, deformations exist in non-rigid objects after the warping layer, which distorts the spatiotemporal relation between the consecutive frames. For a more precise estimation of residual flow at next-level, the Spatial Temporal Re-embedding (STR) module is devised to update the point sequence features at current-level. Lastly, poor generalization is often observed due to the significant domain gap between synthetic and LiDAR-scanned datasets. We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world. Experiments demonstrate that our approach achieves state-of-the-art (SOTA) performance across various datasets, with particularly outstanding results in real-world LiDAR-scanned situations. Our code will be released upon publication.

Updated: 2024-07-31 02:28:40

标题: SSRFlow：面向实际场景流的语义感知融合与时空重新嵌入

摘要: 场景流是对连续两个点云的第一帧的3D运动场的关键，对于动态场景感知非常重要。然而，当代场景流方法面临三大挑战。首先，它们缺乏全局流嵌入，或者只考虑嵌入之前的单个点云的上下文，导致嵌入点难以感知另一帧的一致语义关系。为了解决这个问题，我们提出了一种称为双交叉注意力（DCA）的新方法，用于基于语义上下文在两个帧之间进行潜在融合和对齐。然后，将其集成到全局融合流嵌入（GF）中，以基于上下文和欧几里德空间中的全局相关性初始化流嵌入。其次，在变形对象中存在变形，在变形层之后，这扭曲了连续帧之间的时空关系。为了更精确地估计下一级的残差流，设计了空间时间重新嵌入（STR）模块，用于更新当前级别的点序列特征。最后，由于合成和激光雷达扫描数据集之间存在显著的领域差距，通常观察到泛化能力较差。我们利用新颖的领域自适应损失有效地弥合了从合成到真实世界的运动推断差距。实验证明，我们的方法在各种数据集上取得了最先进的性能，尤其在真实世界的激光雷达扫描情况下表现出色。我们的代码将在发表后发布。

更新时间: 2024-07-31 02:28:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.07825v1

TrackSorter: A Transformer-based sorting algorithm for track finding in High Energy Physics

Track finding in particle data is a challenging pattern recognition problem in High Energy Physics. It takes as inputs a point cloud of space points and labels them so that space points created by the same particle have the same label. The list of space points with the same label is a track candidate. We argue that this pattern recognition problem can be formulated as a sorting problem, of which the inputs are a list of space points sorted by their distances away from the collision points and the outputs are the space points sorted by their labels. In this paper, we propose the TrackSorter algorithm: a Transformer-based algorithm for pattern recognition in particle data. TrackSorter uses a simple tokenization scheme to convert space points into discrete tokens. It then uses the tokenized space points as inputs and sorts the input tokens into track candidates. TrackSorter is a novel end-to-end track finding algorithm that leverages Transformer-based models to solve pattern recognition problems. It is evaluated on the TrackML dataset and has good track finding performance.

Updated: 2024-07-31 02:27:57

标题: TrackSorter：基于Transformer的高能物理中轨道发现的排序算法

摘要: 粒子数据中的轨迹查找是高能物理中一项具有挑战性的模式识别问题。它以空间点云作为输入，并对它们进行标记，以便由同一粒子创建的空间点具有相同的标签。具有相同标签的空间点列表是一个轨迹候选。我们认为，这个模式识别问题可以被表述为一个排序问题，其输入是一个按照离碰撞点距离排序的空间点列表，输出是按照它们的标签排序的空间点。在本文中，我们提出了TrackSorter算法：一种基于Transformer的粒子数据模式识别算法。TrackSorter使用一种简单的标记方案将空间点转换为离散标记。然后，它将标记化的空间点作为输入，并将输入标记排序为轨迹候选。TrackSorter是一种新颖的端到端轨迹查找算法，利用Transformer-based模型来解决模式识别问题。它在TrackML数据集上进行了评估，并具有良好的轨迹查找性能。

更新时间: 2024-07-31 02:27:57

领域: cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2407.21290v1

Segment Anything for Videos: A Systematic Survey

The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond.

Updated: 2024-07-31 02:24:53

标题: 为视频分割任何部分：一个系统性调查

摘要: 最近一波基础模型的浪潮在计算机视觉（CV）及其他领域取得了巨大成功，其中分段任意模型（SAM）引发了探索任务无关的视觉基础模型的热情。由于其显著的零-shot泛化能力，SAM目前正在挑战CV中许多传统范式，在各种图像分割和多模态分割（如文本到掩码）任务中表现出非凡的性能，同时也在视频领域表现出色。此外，最新发布的SAM 2再次引发了对静态和视频可提示分割领域的研究热情。然而，现有的调查主要集中在SAM在各种图像处理任务中，视频领域的全面深入审查明显缺失。为了弥补这一空白，本文在基础模型时代对SAM进行了系统审查，主要关注其在视频中的应用，讨论了其最新进展以及在广泛应用中发展基础模型的创新机会。我们首先简要介绍SAM和与视频相关的研究领域的背景。随后，我们提出了一个系统的分类法，将现有方法分为三个关键领域：视频理解、视频生成和视频编辑，分析和总结它们的优势和局限性。此外，我们提供了基于SAM和当前最先进方法在代表性基准上的比较结果，以及深入分析。最后，我们讨论当前研究面临的挑战，并展望SAM在视频及其他领域的未来研究方向。

更新时间: 2024-07-31 02:24:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.08315v1

Robust Box Prompt based SAM for Medical Image Segmentation

The Segment Anything Model (SAM) can achieve satisfactory segmentation performance under high-quality box prompts. However, SAM's robustness is compromised by the decline in box quality, limiting its practicality in clinical reality. In this study, we propose a novel Robust Box prompt based SAM (\textbf{RoBox-SAM}) to ensure SAM's segmentation performance under prompts with different qualities. Our contribution is three-fold. First, we propose a prompt refinement module to implicitly perceive the potential targets, and output the offsets to directly transform the low-quality box prompt into a high-quality one. We then provide an online iterative strategy for further prompt refinement. Second, we introduce a prompt enhancement module to automatically generate point prompts to assist the box-promptable segmentation effectively. Last, we build a self-information extractor to encode the prior information from the input image. These features can optimize the image embeddings and attention calculation, thus, the robustness of SAM can be further enhanced. Extensive experiments on the large medical segmentation dataset including 99,299 images, 5 modalities, and 25 organs/targets validated the efficacy of our proposed RoBox-SAM.

Updated: 2024-07-31 02:16:28

标题: 基于强健的框提示的医学图像分割SAM

摘要: 分割任何模型（SAM）可以在高质量框提示下实现令人满意的分割性能。然而，SAM的鲁棒性受到框质量下降的影响，限制了其在临床现实中的实用性。在这项研究中，我们提出了一种基于鲁棒框提示的SAM模型（RoBox-SAM），以确保SAM在不同质量提示下的分割性能。我们的贡献有三个方面。首先，我们提出一个提示细化模块，隐式感知潜在目标，并输出偏移量直接将低质量框提示转换为高质量提示。然后我们提供一个在线迭代策略进一步完善提示。其次，我们引入一个提示增强模块，自动生成点提示以有效辅助框提示的分割效果。最后，我们构建一个自信息提取器，对输入图像中的先验信息进行编码。这些特性可以优化图像嵌入和注意力计算，从而进一步增强SAM的鲁棒性。对包括99,299张图像、5种模态和25个器官/目标的大型医学分割数据集进行了广泛实验，验证了我们提出的RoBox-SAM的有效性。

更新时间: 2024-07-31 02:16:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21284v1

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.

Updated: 2024-07-31 02:15:31

标题: SILO语言模型：在非参数数据存储中隔离法律风险

摘要: 对于在受版权或其他限制的数据上训练语言模型(LMs)的合法性存在激烈的辩论。然而，正如我们所展示的，如果仅在低风险文本（例如，无版权书籍或政府文件）上进行训练，模型性能会显着下降，这是由于其规模和领域覆盖范围有限。我们提出了SILO，这是一种新的语言模型，可以在推断过程中平衡风险和性能之间的权衡。SILO是通过以下方式构建的：(1)在我们策划的一个新语料库Open License Corpus (OLC)上训练参数化的LM，该语料库包含228B个公共领域和许可的文本；(2)利用一个更一般和易于修改的非参数化数据存储（例如，包含受版权书籍或新闻的数据），仅在推断过程中进行查询来增强LM。数据存储允许使用高风险数据而无需对其进行训练，支持句级数据归因，并使数据生产者可以通过从存储中移除内容来选择退出模型。这些功能可以促进遵守数据使用规定，例如美国的合理使用原则和欧洲联盟的GDPR。我们的实验表明，参数化LM在OLC未覆盖的领域表现出困难。然而，访问数据存储显着改善了跨领域性能，将其与在Pile上训练的LM（这是一个包含大多数高风险文本的更多样化语料库）的性能差距缩小了90％。我们还分析了哪种非参数化方法效果最好，剩下的错误在哪里，以及性能如何随着数据存储大小的增加而提高。我们的结果表明，可以在减轻法律风险的同时构建高质量的语言模型。

更新时间: 2024-07-31 02:15:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.04430v2

FedBChain: A Blockchain-enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights

Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unknown. In this paper, we introduce a novel framework: FedBChain, which integrates the federated learning paradigm based on a modified DeepConvLSTM architecture with a single LSTM layer. This framework performs comparative tests of prediction performance on three different real-world datasets based on three different hidden layer units (128, 256, and 512) combined with five different federated learning strategies, respectively. The results show that our architecture has significant improvements in Precision, Recall and F1-score compared to the centralized training approach on all datasets with all hidden layer units for all strategies: FedAvg strategy improves on average by 4.54%, FedProx improves on average by 4.57%, FedTrimmedAvg improves on average by 4.35%, Krum improves by 4.18% on average, and FedAvgM improves by 4.46% on average. Based on our results, it can be seen that FedBChain not only improves in performance, but also guarantees the security and privacy of user data compared to centralized training methods during the training process. The code for our experiments is publicly available (https://github.com/Glen909/FedBChain).

Updated: 2024-07-31 02:12:05

标题: FedBChain：一种基于区块链的联邦学习框架，用于改进具有比较策略见解的DeepConvLSTM

摘要: 最近在人体活动识别领域的研究表明，通过减少LSTM层数可以提高预测性能。然而，这种增强仅在单体架构上显著，在大规模分布式训练时，数据安全和隐私问题将被重新考虑，其预测性能未知。在本文中，我们引入了一个新颖的框架：FedBChain，它将基于修改的DeepConvLSTM架构与单个LSTM层的联邦学习范式相结合。该框架在三个不同的真实世界数据集上进行了基于三种不同隐藏层单元（128、256和512）结合五种不同联邦学习策略的预测性能比较测试。结果显示，与所有隐藏层单元的所有策略的中央训练方法相比，我们的架构在精度、召回率和F1分数上均有显著改进：FedAvg策略平均提高了4.54％，FedProx平均提高了4.57％，FedTrimmedAvg平均提高了4.35％，Krum平均提高了4.18％，FedAvgM平均提高了4.46％。根据我们的结果，可以看出FedBChain不仅在性能上有所提高，而且在训练过程中相比中央训练方法可以保证用户数据的安全和隐私。我们实验的代码是公开可用的（https://github.com/Glen909/FedBChain）。

更新时间: 2024-07-31 02:12:05

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2407.21282v1

Unlocking the Potential of Binding Corporate Rules (BCRs) in Health Data Transfers

This chapter explores the essential role of Binding Corporate Rules (BCRs) in managing and facilitating secure health data transfers within corporate groups under the EU General Data Protection Regulation (GDPR). BCRs are tailored to ensure compliance with the GDPR and similar international data protection laws, presenting a flexible mechanism for transferring sensitive health and genomic data. The chapter situates BCRs within the broader spectrum of the GDPR international data transfer mechanisms, addressing the unique challenges posed by the sensitive nature of health data and the increased adoption of AI technologies. The European Data Protection Board (EDPB) Recommendations 1/2022 on BCRs, issued following the Schrems II decision, are critically analyzed, highlighting their stringent requirements and the need for a balanced approach that prioritizes data protection and an AI governance framework. The chapter outlines the BCR approval process, stressing the importance of streamlining this process to encourage broader adoption. It underscores the necessity of a multidisciplinary approach in developing BCRs, incorporating recently adopted international standards and frameworks, which offer valuable guidance for organizations to build trustworthy AI management systems. They guarantee the ethical development, deployment, and operation of AI, which is essential for its successful integration and the broader digital transformation. In conclusion, BCRs are positioned as essential tools for secure health data management, fostering transparency, accountability, and collaboration across international borders. The chapter calls for proactive measures to incentivize BCR adoption, streamline approval processes, and promote more innovative approaches, ensuring BCRs remain a robust mechanism for global data protection and compliance.

Updated: 2024-07-31 02:09:52

标题: 解锁在健康数据传输中约束性企业规则（BCRs）的潜力

摘要: 本章探讨了在欧盟《通用数据保护条例》(GDPR)下管理和促进公司集团内安全健康数据传输的基本作用。BCR是定制的，旨在确保符合GDPR和类似的国际数据保护法律，为传输敏感健康和基因组数据提供了灵活的机制。本章将BCR置于GDPR国际数据传输机制的更广泛范围内，解决了健康数据的敏感性质和人工智能技术日益普及带来的独特挑战。根据Schrems II案件发布的欧洲数据保护委员会（EDPB）关于BCR的建议1/2022进行了批判性分析，强调了它们严格的要求以及需要优先考虑数据保护和人工智能治理框架的平衡方法。本章概述了BCR批准过程，强调了简化此过程的重要性，以鼓励更广泛的采用。它强调了在开发BCR时采用多学科方法的必要性，纳入了最近采用的国际标准和框架，这些标准和框架为组织构建可信的人工智能管理系统提供了有价值的指导。它们保证了人工智能的道德发展、部署和运作，这对于其成功整合和更广泛的数字转型至关重要。最后，BCR被定位为安全健康数据管理的重要工具，促进跨国界的透明度、问责制和合作。本章呼吁采取积极措施，以激励BCR的采用，简化批准过程，并促进更多创新方法，确保BCR仍然是全球数据保护和合规的强大机制。

更新时间: 2024-07-31 02:09:52

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.21281v1

A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder

Recently, large language models (LLM) based on transformers are facing memory bottleneck issues due to KV cache, especially in long sequence handling. Previous researches proposed KV cache compression techniques that identify insignificant tokens based on Accumulative Attention Scores and removes their items from KV cache, noting that only few tokens play an important role in attention operations. However, we have observed that the existing Accumulative Attention Score is not suitable for the transformer decoder structure. In the decoder model, the number of times the Attention Score accumulates varies depending on the order of token appearance due to the effect of masking, causing an uneven comparison between tokens. To solve this, we propose Accumulative Attention Score with Forgetting Factor (A2SF) technique, which introduces a Forgetting Factor in the Attention Score accumulation process. A2SF applies a penalty to the past Attention Score generated from old tokens by repeatedly multiplying the Forgetting Factor to the Attention Score over time. Therefore, older tokens receive a larger penalty, providing fairness among different ages of tokens. Through the fair comparison among tokens, we can more effectively select important tokens. We have verified the accuracy improvement through A2SF in the OPT and LLaMA models and A2SF improves the accuracy of LLaMA 2 by up to 7.8% and 5.1% on 1-shot and 0-shot.

Updated: 2024-07-31 02:02:40

标题: A2SF：Transformer解码器中带有遗忘因子的累积注意力评分，用于标记修剪

摘要: 最近，基于transformers的大型语言模型(LLM)在处理长序列时面临着由于KV缓存导致的内存瓶颈问题。先前的研究提出了基于累积注意力分数的KV缓存压缩技术，该技术根据累积注意力分数确定了无关紧要的标记，并将它们从KV缓存中移除，指出只有少数标记在注意力操作中扮演着重要角色。然而，我们观察到现有的累积注意力分数不适用于transformer解码器结构。在解码器模型中，由于掩码的影响，注意力分数累积的次数取决于标记出现的顺序，导致标记之间的比较不均衡。为了解决这个问题，我们提出了一种带有遗忘因子的累积注意力分数(A2SF)技术，该技术在注意力分数累积过程中引入了遗忘因子。A2SF通过反复将遗忘因子乘以注意力分数来对来自旧标记的过去注意力分数进行惩罚。因此，旧标记会受到更大的惩罚，从而在不同年龄的标记之间提供公平性。通过对标记之间的公平比较，我们可以更有效地选择重要标记。我们已经通过OPT和LLaMA模型验证了A2SF在准确性上的改进，A2SF将LLaMA 2的准确性提高了高达7.8%和5.1%的1-shot和0-shot。

更新时间: 2024-07-31 02:02:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.20485v2

Non-Overlapping Placement of Macro Cells based on Reinforcement Learning in Chip Design

Due to the increasing complexity of chip design, existing placement methods still have many shortcomings in dealing with macro cells coverage and optimization efficiency. Aiming at the problems of layout overlap, inferior performance, and low optimization efficiency in existing chip design methods, this paper proposes an end-to-end placement method, SRLPlacer, based on reinforcement learning. First, the placement problem is transformed into a Markov decision process by establishing the coupling relationship graph model between macro cells to learn the strategy for optimizing layouts. Secondly, the whole placement process is optimized after integrating the standard cell layout. By assessing on the public benchmark ISPD2005, the proposed SRLPlacer can effectively solve the overlap problem between macro cells while considering routing congestion and shortening the total wire length to ensure routability.

Updated: 2024-07-31 02:01:10

标题: 基于强化学习的芯片设计中宏单元的非重叠放置

摘要: 由于芯片设计的复杂性不断增加，现有的放置方法在处理宏单元覆盖和优化效率方面仍然存在许多缺点。针对现有芯片设计方法中存在的布局重叠、性能不佳和低优化效率等问题，本文提出了一种基于强化学习的端到端放置方法SRLPlacer。首先，通过建立宏单元之间的耦合关系图模型，将放置问题转化为马尔可夫决策过程，以学习优化布局的策略。其次，在整合标准单元布局后，对整个放置过程进行优化。通过在公共基准ISPD2005上的评估，提出的SRLPlacer能够有效解决宏单元之间的重叠问题，并考虑路由拥塞，并缩短总线长度以确保可路由性。

更新时间: 2024-07-31 02:01:10

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2407.18499v2

Multi-Level Querying using A Knowledge Pyramid

This paper addresses the need for improved precision in existing Retrieval-Augmented Generation (RAG) methods that primarily focus on enhancing recall. We propose a multi-layer knowledge pyramid approach within the RAG framework to achieve a better balance between precision and recall. The knowledge pyramid consists of three layers: Ontologies, Knowledge Graphs (KGs), and chunk-based raw text. We employ cross-layer augmentation techniques for comprehensive knowledge coverage and dynamic updates of the Ontology schema and instances. To ensure compactness, we utilize cross-layer filtering methods for knowledge condensation in KGs. Our approach, named PolyRAG, follows a waterfall model for retrieval, starting from the top of the pyramid and progressing down until a confident answer is obtained. We introduce two benchmarks for domain-specific knowledge retrieval, one in the academic domain and the other in the financial domain. The effectiveness of the methods has been validated through comprehensive experiments by outperforming 19 SOTA methods. An encouraging observation is that the proposed method has augmented the GPT-4, providing 395\% F1 gain by improving its performance from 0.1636 to 0.8109.

Updated: 2024-07-31 01:51:24

标题: 使用知识金字塔进行多层次查询

摘要: 本文讨论了现有检索增强生成（RAG）方法中对提高精度的需求，这些方法主要侧重于增强召回率。我们在RAG框架中提出了一种多层知识金字塔方法，以实现更好地平衡精度和召回率。知识金字塔包括三个层次：本体、知识图和基于块的原始文本。我们采用跨层增强技术实现全面的知识覆盖和本体架构和实例的动态更新。为了确保紧凑性，我们利用跨层过滤方法在知识图中进行知识凝结。我们的方法名为PolyRAG，遵循瀑布模型进行检索，从金字塔顶部开始，并向下进行，直到获得可信答案。我们引入了两个领域特定知识检索的基准，一个在学术领域，另一个在金融领域。通过超越19种SOTA方法进行了全面实验验证方法的有效性。一个令人鼓舞的观察是，所提出的方法已增强了GPT-4，通过将其性能从0.1636提高到0.8109，提供了395％的F1增益。

更新时间: 2024-07-31 01:51:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.21276v1

FreqTSF: Time Series Forecasting Via Simulating Frequency Kramer-Kronig Relations

Time series forecasting (TSF) is immensely important in extensive applications, such as electricity transformation, financial trade, medical monitoring, and smart agriculture. Although Transformer-based methods can handle time series data, their ability to predict long-term time series is limited due to the ``anti-order" nature of the self-attention mechanism. To address this problem, we focus on frequency domain to weaken the impact of order in TSF and propose the FreqBlock, where we first obtain frequency representations through the Frequency Transform Module. Subsequently, a newly designed Frequency Cross Attention is used to obtian enhanced frequency representations between the real and imaginary parts, thus establishing a link between the attention mechanism and the inherent Kramer-Kronig relations (KKRs). Our backbone network, FreqTSF, adopts a residual structure by concatenating multiple FreqBlocks to simulate KKRs in the frequency domain and avoid degradation problems. On a theoretical level, we demonstrate that the proposed two modules can significantly reduce the time and memory complexity from $\mathcal{O}(L^2)$ to $\mathcal{O}(L)$ for each FreqBlock computation. Empirical studies on four benchmark datasets show that FreqTSF achieves an overall relative MSE reduction of 15\% and an overall relative MAE reduction of 11\% compared to the state-of-the-art methods. The code will be available soon.

Updated: 2024-07-31 01:50:39

标题: FreqTSF: 通过模拟频率Kramer-Kronig关系进行时间序列预测

摘要: 时间序列预测(TSF)在广泛的应用中非常重要，如电力转换、金融交易、医疗监测和智能农业。虽然基于Transformer的方法可以处理时间序列数据，但由于自注意机制的“反序”特性，它们对长期时间序列的预测能力有限。为了解决这个问题，我们关注频域以减弱TSF中顺序的影响，并提出了FreqBlock，其中我们首先通过频率变换模块获得频率表示。随后，我们使用新设计的频率交叉注意力来获得实部和虚部之间的增强频率表示，从而建立注意机制与固有的克莱默-克朗尼关系(KKRs)之间的联系。我们的主干网络FreqTSF采用残差结构，通过连接多个FreqBlock来模拟频域中的KKRs，避免退化问题。在理论层面上，我们证明了所提出的两个模块可以将每个FreqBlock计算的时间和内存复杂度从O(L^2)降低到O(L)。对四个基准数据集的实证研究表明，与最先进的方法相比，FreqTSF实现了总体相对均方误差降低15%和总体相对平均绝对误差降低11%。代码将很快提供。

更新时间: 2024-07-31 01:50:39

领域: cs.AI

下载: http://arxiv.org/abs/2407.21275v1

Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net

Efficient intravascular access in trauma and critical care significantly impacts patient outcomes. However, the availability of skilled medical personnel in austere environments is often limited. Autonomous robotic ultrasound systems can aid in needle insertion for medication delivery and support non-experts in such tasks. Despite advances in autonomous needle insertion, inaccuracies in vessel segmentation predictions pose risks. Understanding the uncertainty of predictive models in ultrasound imaging is crucial for assessing their reliability. We introduce MSU-Net, a novel multistage approach for training an ensemble of U-Nets to yield accurate ultrasound image segmentation maps. We demonstrate substantial improvements, 18.1% over a single Monte Carlo U-Net, enhancing uncertainty evaluations, model transparency, and trustworthiness. By highlighting areas of model certainty, MSU-Net can guide safe needle insertions, empowering non-experts to accomplish such tasks.

Updated: 2024-07-31 01:36:47

标题: 利用MSU-Net增强超声图像分割中的不确定性估计

摘要: 在创伤和危重病患者中，有效的血管内通路明显影响患者预后。然而，在恶劣环境中，熟练医护人员的可用性通常有限。自主机器人超声系统可以帮助实现药物输送的针头插入，并支持非专家完成这些任务。尽管自主针头插入技术取得了进展，但超声图像中血管分割预测的不准确性带来风险。了解超声成像预测模型的不确定性对于评估其可靠性至关重要。我们引入了MSU-Net，一种新颖的多阶段方法，用于训练一组U-Net模型以生成准确的超声图像分割地图。我们展示了显著的改进，比单个蒙特卡洛U-Net提高了18.1%，增强了不确定性评估、模型透明性和可信度。通过突出模型确定性区域，MSU-Net可以指导安全的针头插入，使非专家能够完成这些任务。

更新时间: 2024-07-31 01:36:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21273v1

Automated Quantification of Hyperreflective Foci in SD-OCT With Diabetic Retinopathy

The presence of hyperreflective foci (HFs) is related to retinal disease progression, and the quantity has proven to be a prognostic factor of visual and anatomical outcome in various retinal diseases. However, lack of efficient quantitative tools for evaluating the HFs has deprived ophthalmologist of assessing the volume of HFs. For this reason, we propose an automated quantification algorithm to segment and quantify HFs in spectral domain optical coherence tomography (SD-OCT). The proposed algorithm consists of two parallel processes namely: region of interest (ROI) generation and HFs estimation. To generate the ROI, we use morphological reconstruction to obtain the reconstructed image and histogram constructed for data distributions and clustering. In parallel, we estimate the HFs by extracting the extremal regions from the connected regions obtained from a component tree. Finally, both the ROI and the HFs estimation process are merged to obtain the segmented HFs. The proposed algorithm was tested on 40 3D SD-OCT volumes from 40 patients diagnosed with non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), and diabetic macular edema (DME). The average dice similarity coefficient (DSC) and correlation coefficient (r) are 69.70%, 0.99 for NPDR, 70.31%, 0.99 for PDR, and 71.30%, 0.99 for DME, respectively. The proposed algorithm can provide ophthalmologist with good HFs quantitative information, such as volume, size, and location of the HFs.

Updated: 2024-07-31 01:33:47

标题: 自动化定量测量糖尿病视网膜病变SD-OCT中的高反射斑点

摘要: 超反射焦点（HFs）的存在与视网膜疾病的进展有关，并且其数量已被证明是各种视网膜疾病的视觉和解剖结局的预后因素。然而，缺乏有效的定量工具来评估HFs使眼科医生无法评估HFs的数量。因此，我们提出了一种自动化量化算法，用于在光学相干断层扫描（SD-OCT）中分割和量化HFs。所提出的算法包括两个并行进程，即感兴趣区域（ROI）生成和HFs估计。为了生成ROI，我们使用形态重建来获取重建图像和为数据分布和聚类构建的直方图。同时，我们通过从组件树获得的连接区域中提取极值区域来估计HFs。最后，将ROI和HFs估计过程合并以获得分割的HFs。所提出的算法在40名诊断为非增殖性糖尿病视网膜病变（NPDR）、增殖性糖尿病视网膜病变（PDR）和糖尿病黄斑水肿（DME）的患者的40个3D SD-OCT体积上进行了测试。非增殖性糖尿病视网膜病变（NPDR）的平均Dice相似度系数（DSC）和相关系数（r）分别为69.70％、0.99；增殖性糖尿病视网膜病变（PDR）为70.31％、0.99；糖尿病黄斑水肿（DME）为71.30％、0.99。所提出的算法可以为眼科医生提供良好的HFs定量信息，如HFs的体积、大小和位置。

更新时间: 2024-07-31 01:33:47

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.21272v1

DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations

We present DEF-oriCORN, a framework for language-directed manipulation tasks. By leveraging a novel object-based scene representation and diffusion-model-based state estimation algorithm, our framework enables efficient and robust manipulation planning in response to verbal commands, even in tightly packed environments with sparse camera views without any demonstrations. Unlike traditional representations, our representation affords efficient collision checking and language grounding. Compared to state-of-the-art baselines, our framework achieves superior estimation and motion planning performance from sparse RGB images and zero-shot generalizes to real-world scenarios with diverse materials, including transparent and reflective objects, despite being trained exclusively in simulation. Our code for data generation, training, inference, and pre-trained weights are publicly available at: https://sites.google.com/view/def-oricorn/home.

Updated: 2024-07-31 01:13:25

标题: DEF-oriCORN：无需演示的稳健语言引导操作的高效3D场景理解

摘要: 我们提出了DEF-oriCORN，这是一个用于语言指导操作任务的框架。通过利用一种新颖的基于对象的场景表示和基于扩散模型的状态估计算法，我们的框架能够在紧凑环境中以稀疏的摄像机视图对语言命令做出高效和稳健的操作规划，而无需任何演示。与传统表示不同，我们的表示提供了高效的碰撞检测和语言基础。与最先进的基线相比，我们的框架能够从稀疏的RGB图像中实现更优越的估计和运动规划性能，并且零示范地泛化到包括透明和反射物体在内的具有多种材料的真实场景，尽管它是专门在模拟中训练的。我们的数据生成、训练、推断和预训练权重的代码公开可在以下网址获取：https://sites.google.com/view/def-oricorn/home.

更新时间: 2024-07-31 01:13:25

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.21267v1

DDU-Net: A Domain Decomposition-based CNN on Multiple GPUs

The segmentation of ultra-high resolution images poses challenges such as loss of spatial information or computational inefficiency. In this work, a novel approach that combines encoder-decoder architectures with domain decomposition strategies to address these challenges is proposed. Specifically, a domain decomposition-based U-Net (DDU-Net) architecture is introduced, which partitions input images into non-overlapping patches that can be processed independently on separate devices. A communication network is added to facilitate inter-patch information exchange to enhance the understanding of spatial context. Experimental validation is performed on a synthetic dataset that is designed to measure the effectiveness of the communication network. Then, the performance is tested on the DeepGlobe land cover classification dataset as a real-world benchmark data set. The results demonstrate that the approach, which includes inter-patch communication for images divided into $16\times16$ non-overlapping subimages, achieves a $2-3\,\%$ higher intersection over union (IoU) score compared to the same network without inter-patch communication. The performance of the network which includes communication is equivalent to that of a baseline U-Net trained on the full image, showing that our model provides an effective solution for segmenting ultra-high-resolution images while preserving spatial context. The code is available at https://github.com/corne00/HiRes-Seg-CNN.

Updated: 2024-07-31 01:07:21

标题: DDU-Net：基于多个GPU的领域分解CNN

摘要: 超高分辨率图像的分割面临着空间信息丢失或计算效率低下等挑战。本文提出了一种新颖的方法，结合编码器-解码器架构和领域分解策略来解决这些挑战。具体来说，引入了基于领域分解的U-Net（DDU-Net）架构，将输入图像分割成非重叠的补丁，可以在独立设备上独立处理。添加了通信网络以促进补丁间的信息交换，以增强对空间上下文的理解。在一个旨在衡量通信网络有效性的合成数据集上进行了实验验证。然后，在DeepGlobe地表覆盖分类数据集上作为实际基准数据集进行了性能测试。结果表明，将图像分割为$16\times16$个非重叠子图像，并包含补丁间通信的方法，比不包含补丁间通信的相同网络实现了2-3%更高的交并比（IoU）分数。包含通信的网络的性能与在整个图像上训练的基准U-Net相当，表明我们的模型提供了一种有效的解决方案，可以在保留空间上下文的同时对超高分辨率图像进行分割。代码可在https://github.com/corne00/HiRes-Seg-CNN上找到。

更新时间: 2024-07-31 01:07:21

领域: cs.CV,cs.DC,cs.LG,68T07, 68W10, 68W15, 65N55, 68U10,I.2.6; I.4.6

下载: http://arxiv.org/abs/2407.21266v1

Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

Distributional reinforcement learning improves performance by effectively capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In this paper, we present a regret analysis for distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of Bellman unbiasedness for a tractable and exactly learnable update via statistical functional dynamic programming. Our theoretical results show that approximating the infinite-dimensional return distribution with a finite number of moment functionals is the only method to learn the statistical information unbiasedly, including nonlinear statistical functionals. Second, we propose a provably efficient algorithm, $\texttt{SF-LSVI}$, achieving a regret bound of $\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number of episodes, and $d_E$ is the eluder dimension of a function class.

Updated: 2024-07-31 00:43:51

标题: 可处理并可证明高效的具有一般价值函数逼近的分布式强化学习

摘要: 分布式强化学习通过有效地捕捉环境的随机性来提高性能，但对其有效性的全面理论理解仍然难以捉摸。在本文中，我们在有限的带有一般价值函数逼近的周期性马尔可夫决策过程设置中提出了一种分布式强化学习的后悔分析。我们首先介绍了贝尔曼无偏性的关键概念，通过统计功能动态规划实现可行且可学习的更新。我们的理论结果表明，用有限数量的矩函数逼近无限维度的回报分布是唯一无偏学习统计信息的方法，包括非线性统计功能。其次，我们提出了一种经过证明有效的算法$\texttt{SF-LSVI}$，实现了一个后悔上限为$\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$，其中$H$是时间跨度，$K$是周期数，$d_E$是函数类的偷窥者维度。

更新时间: 2024-07-31 00:43:51

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.21260v1

An Adaptive Gradient Regularization Method

Optimizer plays an important role in neural network training with high efficiency and performance. Weight update based on its gradient is the central part of the optimizer. It has been shown that normalization and standardization operation on weight and gradient can accelerate the training process and improve performance such as Weight Standardization (WS), weight normalization (WN) and gradient normalization (GN); there is also gradient centralization (GC). In this work, we introduce a new optimization technique based on the gradient magnitude in a gradient vector named adaptive gradient regularization (AGR), which normalizes the gradient vector in all dimensions as a coefficient vector and subtracts the product of the gradient and its coefficient vector by the vanilla gradient. It can be viewed as an adaptive gradient clipping method. We show that the AGR can improve the loss function Lipschitzness with a more stable training process and better generalization performance. AGR is very simple to be embedded into vanilla optimizers such as Adan and AdamW with only three lines of code. Our experiments are conducted in image generation, image classification and language representation, which shows that our AGR improves the training result.

Updated: 2024-07-31 00:37:20

标题: 一种自适应梯度正则化方法

摘要: 优化器在神经网络训练中起着重要的作用，具有高效性和性能。基于梯度的权重更新是优化器的核心部分。已经证明，在权重和梯度上进行归一化和标准化操作可以加速训练过程并提高性能，例如权重标准化（WS）、权重归一化（WN）和梯度归一化（GN）；还有梯度中心化（GC）。在这项工作中，我们介绍了一种基于梯度向量中的梯度幅度的新优化技术，名为自适应梯度正则化（AGR），它将梯度向量在所有维度上归一化为一个系数向量，并通过减去梯度和其系数向量的乘积来得到原始梯度。它可以被视为一种自适应梯度截断方法。我们展示了AGR可以提高损失函数的利普希茨性，具有更稳定的训练过程和更好的泛化性能。AGR非常简单，可以嵌入到Adan和AdamW等基本优化器中，只需三行代码。我们的实验在图像生成、图像分类和语言表示中进行，结果显示我们的AGR改善了训练结果。

更新时间: 2024-07-31 00:37:20

领域: cs.LG

下载: http://arxiv.org/abs/2407.16944v2

UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation

We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to developing unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding and computationally expensive, we warm-start the transformer from pretrained LLMs and perform explicit alignment to reduce the modality gap while improving data and compute efficiency. The cross-modal UPS achieves state-of-the-art results on a wide range of 1D and 2D PDE families from PDEBench, outperforming existing unified models using 4 times less data and 26 times less compute. Meanwhile, it is capable of few-shot transfer to unseen PDE families and coefficients.

Updated: 2024-07-31 00:37:11

标题: UPS：通过跨模态适应高效构建PDE求解的基础模型

摘要: 我们提出了统一PDE求解器（UPS），这是一种数据和计算高效的方法，用于开发适用于各种领域、维度和分辨率的时空PDE家族的统一神经操作符。UPS将不同的PDE嵌入到共享表示空间中，并使用FNO-transformer架构进行处理。与从头开始训练网络不同，这种方法要求大量数据和计算资源，我们从预训练的LLMs热启动变压器，并进行显式对齐，以减少模态差距同时提高数据和计算效率。跨模态的UPS在来自PDEBench的广泛1D和2D PDE家族上取得了最先进的结果，优于现有的统一模型，使用的数据量少4倍，计算量少26倍。同时，它能够进行少样本迁移，适用于未见过的PDE家族和系数。

更新时间: 2024-07-31 00:37:11

领域: cs.LG

下载: http://arxiv.org/abs/2403.07187v3

CAT: Interpretable Concept-based Taylor Additive Models

As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with relevant names and their ground-truth values. In response, we propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process. CAT does not have to require domain experts to annotate concepts and their ground-truth values. Instead, it only requires users to simply categorize input features into broad groups, which can be easily accomplished through a quick metadata review. Specifically, CAT first embeds each group of input features into one-dimensional high-level concept representation, and then feeds the concept representations into a new white-box Taylor Neural Network (TaylorNet). The TaylorNet aims to learn the non-linear relationship between the inputs and outputs using polynomials. Evaluation results across multiple benchmarks demonstrate that CAT can outperform or compete with the baselines while reducing the need of extensive model parameters. Importantly, it can explain model predictions through high-level concepts that human can understand.

Updated: 2024-07-31 00:31:45

标题: CAT：可解释的基于概念的泰勒可加模型

摘要: 作为一种新兴的可解释技术，广义加性模型（GAMs）采用神经网络分别学习每个特征的非线性函数，然后通过线性模型将它们组合起来进行最终预测。尽管GAMs能够解释特征级别的深度神经网络（DNNs），但它们需要大量的模型参数，容易过拟合，使其难以训练和扩展。此外，在具有许多特征的真实数据集中，基于特征的解释性对人类而言逐渐减弱。为了解决这些问题，最近的研究已经转向基于概念的可解释方法。这些方法试图在进行预测之前将概念学习作为一个中间步骤，并以人类可以理解的概念来解释预测结果。然而，这些方法需要领域专家对概念进行广泛标记和确定其真实值。作为回应，我们提出了CAT，一种新颖的可解释的基于概念的泰勒加性模型，以简化这一过程。CAT不需要领域专家注释概念及其真实值。相反，它只需要用户简单地将输入特征分类到广泛的组别中，这可以通过快速元数据审查轻松完成。具体来说，CAT首先将每组输入特征嵌入到一维高级概念表示中，然后将这些概念表示馈送到一个新的白盒泰勒神经网络（TaylorNet）中。TaylorNet旨在使用多项式学习输入和输出之间的非线性关系。跨多个基准的评估结果表明，CAT能够在减少大量模型参数的同时优于或与基线竞争。重要的是，它能够通过人类可以理解的高级概念解释模型预测结果。

更新时间: 2024-07-31 00:31:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.17931v3

Lifelong Person Search

Person search is the task to localize a query person in gallery datasets of scene images. Existing methods have been mainly developed to handle a single target dataset only, however diverse datasets are continuously given in practical applications of person search. In such cases, they suffer from the catastrophic knowledge forgetting in the old datasets when trained on new datasets. In this paper, we first introduce a novel problem of lifelong person search (LPS) where the model is incrementally trained on the new datasets while preserving the knowledge learned in the old datasets. We propose an end-to-end LPS framework that facilitates the knowledge distillation to enforce the consistency learning between the old and new models by utilizing the prototype features of the foreground persons as well as the hard background proposals in the old domains. Moreover, we also devise the rehearsal-based instance matching to further improve the discrimination ability in the old domains by using the unlabeled person instances additionally. Experimental results demonstrate that the proposed method achieves significantly superior performance of both the detection and re-identification to preserve the knowledge learned in the old domains compared with the existing methods.

Updated: 2024-07-31 00:19:22

标题: 终身人物搜索

摘要: 人物搜索是在场景图像库中定位查询人物的任务。现有方法主要是为处理单个目标数据集而开发的，然而在人物搜索的实际应用中，不断出现多样化的数据集。在这种情况下，当在新数据集上训练时，它们会在旧数据集中遭受灾难性的知识遗忘。本文首先介绍了一个新颖的终身人物搜索（LPS）问题，其中模型在新数据集上逐步训练，同时保留在旧数据集中学到的知识。我们提出了一个端到端的LPS框架，以促进知识蒸馏，通过利用前景人物的原型特征以及旧领域中的难以背景提案，在旧模型和新模型之间强化一致性学习。此外，我们还设计了基于练习的实例匹配，通过额外使用未标记的人物实例，在旧领域进一步提高区分能力。实验结果表明，与现有方法相比，所提出的方法在检测和重新识别方面取得了显着优越的性能，以保留在旧领域学到的知识。

更新时间: 2024-07-31 00:19:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.21252v1