Latent Guard、Tokenization in LLM、​3D Human Scan、FusionPortableV2

本文首发于公众号:机器感知

图片

https://mp.weixin.qq.com/s/HlVV3VnqocBI4XBOT6RFHg

A Multi-Level Framework for Accelerating Training Transformer Models

图片

The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the training costs. Motivated by a set of key observations of inter- and intra-layer similarities among feature maps and attentions that can be identified from typical training processes, we propose a multi-level framework for training acceleration. Specifically, the framework is based on three basic operators, Coalescing, De-coalescing and Interpolation, which can be orchestrated to build a multi-level training framework. The framework consists of a V-cycle training process, which progressively down- and up-scales the model size and projects the parameters between adjacent levels of mode......

Latent Guard: a Safety Framework for Text-to-image Generation

图片

With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts in the input text embeddings. Our proposed framework is composed of a data generation pipeline specific to the task using large language models, ad-hoc architectural components, and a contrastive learning strategy to benefit from the generated data. The effectiveness of our method is verified on three datasets and against four baselines. Code and data w......

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic  Segmentation

图片

Despite the significant progress in deep learning for dense visual recognition problems, such as semantic segmentation, traditional methods are constrained by fixed class sets. Meanwhile, vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks, owing to their robust generalizability. Recently, a body of work has investigated utilizing these models in open-vocabulary semantic segmentation (OVSS). However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks. In this work, we propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP), representing a straightforward adaptation of CLIP tailored for this scenario. Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature. By incorporati......

Toward a Theory of Tokenization in LLMs

图片

While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant language models. In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes. When trained on data drawn from certain simple $k^{\text{th}}$-order Markov processes for $k > 1$, transformers exhibit a surprising phenomenon - in the absence of tokenization, they empirically fail to learn the right distribution and predict characters according to a unigram model (Makkuva et al., 2024). With the addition of tokenization, however, we empirically observe that transformers break through this barrier and are able to model the probabilities of sequences drawn from the source near-optimally, achieving small cross-entropy loss. With this observation a......

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

图片

Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the fe......

3D Human Scan With A Moving Event Camera

图片

Capturing the 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed meth......

FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM  Across Diverse Platforms and Scalable Environments

图片

Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM dataset featuring notable sensor diversity, varied motion patterns, and a wide range of environmental scenarios. Our dataset comprises $27$ sequences, spanning over $2.5$ hours and collected from four distinct platforms: a handheld suite, wheeled and legged robots, and vehicles. These sequences cover diverse settings, including buildings, campuses, and urban areas, with a total length of $38.7km$. Additionally, the dataset includes ground-truth (GT) trajectories and RGB point cloud maps covering approximately $0.3km^2$. To validate the utility of our dataset in advancing SLAM research, w......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/552724.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

微软开源 WizardLM-2,70B优于GPT4-0613,7B持平阿里最新的Qwen1.5-32B

当地时间4月15号,微软发布了新一代大语言模型 WizardLM-2,新家族包括三个尖端型号:WizardLM-2 8x22B, WizardLM-2 70B,和WizardLM-2 7B,作为下一代最先进的大型语言模型,它在复杂聊天、多语言、推理和代理方面的性能有…

算法打卡day37

今日任务: 1)1049. 最后一块石头的重量 II 2)494. 目标和 3)474.一和零 4)复习day12 1049. 最后一块石头的重量 II 题目链接:1049. 最后一块石头的重量 II - 力扣(LeetCode) 题目难…

B1100 校庆

输入样例: 5 372928196906118710 610481197806202213 440684198612150417 13072819571002001X 150702193604190912 6 530125197901260019 150702193604190912 220221196701020034 610481197806202213 440684198612150417 370205198709275042 输出样例:…

LINUX中使用cron定时任务被隐藏,咋回事?

一、问题现象 线上服务器运行过程中,进程有莫名进程被启动,怀疑是有定时任务自动启动,当你用常规方法去查看,比如使用crontab去查看定时器任务,提示no crontab for root 或者使用cat到/var/spool/cron目录下去查看定时…

python使用uiautomator2操作真机(华为Honor 10)

环境: python3.8.10,华为手机Honor 10(6G,64g),版本android 9。 之前写过一篇文章: python使用uiautomator2操作真机_python uiautomator2 控制真机-CSDN博客 今天再拿另外一部手机测试。 一、将手机设置为开发者模式 1、设…

基于ssm冀中工程技师校园网站设计与实现论文

摘 要 使用旧方法对冀中工程技师学院网站的信息进行系统化管理已经不再让人们信赖了,把现在的网络信息技术运用在冀中工程技师学院网站的管理上面可以解决许多信息管理上面的难题,比如处理数据时间很长,数据存在错误不能及时纠正等问题。这次…

【数据恢复软件】:Magnet AXIOM V8.0

Magnet AXIOM V8.0重大更新 1、全新的UI设计 2、更快的相应速度 3、补全工件分析 4、支持亚马逊AWS云数据( 获取同一帐户或安全帐户上下文中的快照。 支持Windows实例、加密卷和超过1 TB的卷、具有多个卷的实例等等! ) 5、Bug修复 6、AI支持…

Promise模块化编程ES6新特性

文章目录 Promise&模块化编程1.Promise基本介绍2.快速入门1.需求分析2.原生ajax jQuery3.Promise使用模板 3.课后练习1.原生ajax jQuery2.promise 4.模块化编程基本介绍5.CommonJS基本介绍6.ES5模块化编程1.题目2.示意图3.代码实例—普通导入导出function.jsuse.js 4.代码…

JVM垃圾回收与算法

1. 如何确定垃圾 1.1 引用计数法 在 Java 中,引用和对象是有关联的。如果要操作对象则必须用引用进行。因此,很显然一个简单 的办法是通过引用计数来判断一个对象是否可以回收。简单说,即一个对象如果没有任何与之关 联的引用,即…

推荐系统综述

推荐系统研究综述 - 中国知网 传统推荐方法主要分类: 1)基于内容推荐方法 主要依据用户与项目之间的特征信息,用户之间的联系不会影响推荐结果,所以不存在冷启动和稀疏问题,但是基于内容推荐的结果新颖程度低并且面临特征提取的问题。 基于内容的推荐方法的思想非…

能源成果3D网络三维展厅越发主流化

在这个数字化飞速发展的时代,我们为您带来了全新的展览形式——线上3D虚拟展厅。借助VR虚拟现实制作和web3d开发技术,我们能够将物品、图片、视频和图文信息等完美融合,通过计算机技术和3D建模,为您呈现一个逼真、生动的数字化展览…

动态规划|1049.最后一块石头的重量II

力扣题目链接 class Solution { public:int lastStoneWeightII(vector<int>& stones) {vector<int> dp(15001, 0);int sum 0;for (int i 0; i < stones.size(); i) sum stones[i];int target sum / 2;for (int i 0; i < stones.size(); i) { // 遍…

开源项目one-api的k8s容器化部署(下)-- 部署至k8s

一、接着上文 本文讲述如何把上文制作好的docker镜像部署到K8S&#xff0c;会涉及以下部分&#xff1a; 健康检测应用程序的配置应用程序的端口日志路径 二、健康检测 1、健康状态 从官方的docker-compose.yml可以得知其健康检测方法 curl http://localhost:5175/api/statu…

03-JAVA设计模式-迭代器模式

迭代器模式 什么是迭代器模式 迭代器模式&#xff08;demo1.Iterator Pattern&#xff09;是Java中一种常用的设计模式&#xff0c;它提供了一种顺序访问一个聚合对象中各个元素&#xff0c;而又不需要暴露该对象的内部表示的方法。迭代器模式将遍历逻辑从聚合对象中分离出来…

Latex学习(从入门到入土)2

第一章 &#xff1a;插图 在LaTeX中插入插图可以通过graphicx宏包来实现&#xff0c;这个宏包提供了强大的图像处理功能。以下是如何使用graphicx宏包插入图像的基本步骤&#xff1a; ### 1. 加载宏包 在文档的序言部分&#xff08;\begin{document}之前&#xff09;&#x…

《C语言深度解剖》:(5)C语言操作符一网打尽

&#x1f921;博客主页&#xff1a;醉竺 &#x1f970;本文专栏&#xff1a;《C语言深度解剖》 &#x1f63b;欢迎关注&#xff1a;感谢大家的点赞评论关注&#xff0c;祝您学有所成&#xff01; ✨✨&#x1f49c;&#x1f49b;想要学习更多数据结构与算法点击专栏链接查看&am…

一些docker安装配置以及常见命令

​常用命令 docker 命令 //进去容器内部&#xff0c;找到需要拷贝的文件及目录 docker exec -it 2c2600fb60f8 /bin/bash ​ //将container id为4db8edd86202的容器内elasticsearch.yml文件拷贝到宿主机指定目录下&#xff1a; docker cp 4db8edd86202:/usr/share/elasticsea…

pytest系列——allure之在测试用例添加标题(@allure.title())

前言 通过使用装饰器allure.title可以为测试用例自定义一个更具有阅读性的易读的标题。 allure.title的三种使用方式&#xff1a; 直接使用allure.title为测试用例自定义标题&#xff1b;allure.title支持通过占位符的方式传递参数&#xff0c;可以实现测试用例标题参数化&a…

温度对射频电路性能的影响

对于射频电路,通常会有使用温度范围的要求,即在特定的温度范围内其性能变化不超出指标要求的值。对于工业级产品,一般要求使用温度范围为-40℃~+70℃,而军品要求使用温度范围为-55℃~+85℃。有一些其他特殊使用场景的产品会有不同的要求。 不同的温度对电路性能的影响,…

nginx安装在linux上

nginx主要用于反向代理和负载均衡&#xff0c;现在简单的说说如何在linux操作系统上安装nginx 第一步&#xff1a;安装依赖 yum install -y gcc-c pcre pcre-devel zlib zlib-devel openssl openssl-devel 第二步&#xff1a; 下载nginx&#xff0c;访问官网&#xff0c;ngin…
最新文章