JAVA Deeplearning4j library로 딥러닝 해보기
[분석AI서비스팀 전소희] AI 기술이 나날이 진화함에 따라 엔터테인먼트, 미디어, 전자상거래, 의료, 교육, 제조 등 다양한 산업군에 계속해서 AI 활용도가 증가하고 있습니다. 웹서비스 개발자로써 그동안 java를 백엔드 언어로 사용해왔는데, 현재 가장…
[분석AI서비스팀 전소희] AI 기술이 나날이 진화함에 따라 엔터테인먼트, 미디어, 전자상거래, 의료, 교육, 제조 등 다양한 산업군에 계속해서 AI 활용도가 증가하고 있습니다. 웹서비스 개발자로써 그동안 java를 백엔드 언어로 사용해왔는데, 현재 가장…
[분석AI서비스팀 이현정] 관리해야 할 AI모델과 데이터가 많아지고 그 용량이 커지면서 필자가 관심을 가지게 된 Ceph에 대해 간략히 소개해 보고자 합니다. Ceph 이란?Ceph은 단일 분산 컴퓨터 클러스터에 오브젝트 스토리지를 구현하는 오픈…
[선행AI기술팀 김윤혜] 2023년 IT 분야를 휩쓸었던 가장 핫한 이슈는 단연 ChatGPT입니다. ChatGPT는 모두가 쉽게 사용할 수 있는 대화형 거대 언어 인공지능 챗봇으로, 글로벌 사회에 생성형 AI에 대한 큰 임팩트와 유행을…
[분석AI서비스팀 전소희] 이번 글에서는 노코드로 알람 메시지 내용을 구성한 방법에 대해 이야기 해보도록 하겠습니다. 실제 AI 분석 포탈에서 알람 메시지 전송 비즈니스를 구현하기 위해 사용한 방법입니다. 요구사항 보내야 할 알람 메시지가…
[가상생명연구팀 김석겸] 이 글에서 소개 드릴 프로젝트의 주제는 “파일 번역” 입니다. 번역 모델을 개발하기 앞서 기존에 서비스 중인 번역 서비스들을 살펴 보았습니다. 그 중에 눈에 띈 것이 “파일 번역” 입니다.…
[분석지능개발팀 박효주] 딥러닝 기술의 발전으로 AI 모델의 성능은 점점 향상되고 있고 있습니다. 하지만 그만큼 모델의 크기는 점점 거대해지고 추론 속도는 느려지고 있습니다. 더 좋은 GPU를 사용하면 효과를 볼 수 있지만…
[가상생명연구팀 양승무 주임] ChatGPT의 시대가 도래하고 있습니다. AI 업계를 비롯한 다양한 산업과 분야에서도 ChatGPT의 우수성과 실용성이 인정되어, 많은 기업들이 ChatGPT의 적용을 추진하고 있습니다. 이러한 추세는 OpenAI와 같은 주요 기업들 뿐만…
[가상생명연구팀 황준선] ChatGPT와 Bard 등, 요즘 대화형태의 대형 언어 모델(LLM)이 우후죽순 발표되고 있다. 하지만, LLM만 있다면 학습한 데이터 안에서만 적절한 문장을 생성해낼 것이다. 그래서 Bard는 구글 검색 엔진을 추가하여 최근…
[뉴미디어 서비스팀 윤형진 책임] 본 포스팅에서는 ChatGPT와 DreamTexture를 활용하여 3D 모델링과 텍스쳐 생성을 어떻게 할 수 있는지에 대해 설명하고, 이 기술의 가능성과 한계점에 대해 이야기해보겠습니다. 1. ChatGPT로 생성한 파이썬 코드로…
[분석지능개발팀 박효주] ML 모델의 Lifecycle은 연구 및 테스트를 진행하는 Research 단계와 실제 서비스화하는 Production 단계로 나눌 수 있습니다. Research 단계에서는 문제 정의부터 모델 선정, 성능을 높이기 위한 다양한 실험 등을…
[분석지능개발팀 임창대] What is Feature?ML(Machine Learning) 은 과거의 예시 데이터를 학습한 모델을 기반으로 새로운 데이터 예측을 수행합니다.ML 모델 학습에서 표 형태의 2차원 데이터를 사용하였을 때 행이 예시이고 열이 해당 예시를…
[생성지능개발팀 정택현] ㅤ MobileFaceSwap은 AAAI 2022에서 공개된 오픈소스 Face swap 모델로, 기존 Simswap, FaceShifter 모델을 Distillation 기반의 경량화를 적용하여 연산 속도의 측면에서 SOTA(State-of-the-art)를 달성한 것으로 알려져 있습니다. 실제로 Original Simswap이 107M Parameter와…
[생성지능개발팀 정택현] ㅤ 최근 YOLOv7 알고리즘이 공개되며 컴퓨터 비전 및 각종 관련 커뮤니티에서 큰 주목을 받고 있습니다. 논문에 따르면, YOLOv7은 현재까지의 모든 Object detection 기술들보다 속도와 정확성 모두에서 더욱 뛰어난…
[분석지능개발팀 박효주] 데이터 과학자들은 다양한 실험을 통해 학습된 모델의 성능을 검증하고 배포합니다. 이 검증 과정은 Accuracy, Precision, Recall, IOU, PSNR 등 적절한 지표를 사용해서 수치로 검증하는데, 이 수치만 가지고는 실측…
[AI Lab 김무성] 스탠포드의 CS25 : Transformers United 강좌 동영상이 최근 공개 되었습니다. [1] 강좌[2] 자체는 작년 하반기입니다만, 그간 슬라이드만 공개하고 있었습니다. 그런데 이번에 동영상도 유튜브를 통해 공유했습니다. 슬라이드 자료도…
[생성지능개발팀 김성현] 저희 센터의 인공지능 연구 모토는 ‘Human-like AI’ & ‘Fun AI’ 입니다. 그렇다면, 단순히 날씨나 뉴스를 알려주는 챗봇을 넘어, 친근하고, ‘사람 같은’ 인공지능은 어떻게 만들 수 있을까요?저희는 그러한 요소를…
[분석지능개발팀 박효주] DeepMind가 경쟁 프로그래밍 대회에서 사용 가능한 수준의 프로그래밍이 가능한 AlphaCode를 공개했습니다. AlphaCode는 Transformer 기반 언어 모델을 사용하여 대량의 코드들을 생성한 뒤 가장 적합한 코드를 필터링해서 사용하는 방식으로 프로그래밍합니다.…
[분석지능개발팀 이현정] 마이크로소프트(Microsoft)가 노코드(no-code) 기반의 AI가 작성한 프로그램 코드를 검사하는 도구 ‘직소(Jigsaw)’를 발표했습니다. 노코드란 간단한 사용자 인터페이스 방식의 틀을 이용해서 복잡하고 어려운 코딩 과정 없이 어플리케이션과 응용 프로그램을 개발 및…
[가상인간연구팀 황준선] NVIDIA NeMo는 간단한 Python 인터페이스를 사용하여 GPU 가속 음성 및 자연어 이해 모델을 구축, training 및 fine-tuning하기 위한 오픈소스 프레임워크입니다. NeMo를 활용하면 실시간 자동 음성 인식, 자연어 처리,…
[서비스개발팀 임용택] 2015년 6월, 미국 브루클린의 한 흑인 프로그래머는 여자친구와 찍은 사진을 보려던 중 깜짝 놀랄 일을 경험합니다. 구글 포토에 본인들의 사진이 “고릴라” 로 오토 태깅된 것을 보았기 때문입니다. 구글은…
[Service Development Team Jeon Jeon-jun] Facebook AI unveiled the Droidlet platform for robot development that can be used in real and virtual environments on the 28th of last month.
[Prior Research Team Jihyun Song] It has been over 2 years since I was interested in Open Domain chatbot and came across papers on Blender 1.0 and Meena. At that time, they had a consistent long-ton conversation that they claimed they would overcome in the future, and about knowledge…
[Prior Research Team, Jeongwoo Lee] We have been using games (Go, chess, Atari games, etc.) to verify the performance of reinforcement learning algorithms for a long time. With the development of algorithms, in the field of reinforcement learning, like other images and natural language fields, there are...
[Service Development Team Hwang Jun-sun] When supervising machine learning models, when a dataset with an unbalanced number of data between labels is used as training data, the phenomenon in which learning of samples belonging to a label with a small ratio is not performed well you will experience simply…
[Service Development Team, Kyunghwan Lee] We usually encounter unlabeled data bundles in the process of learning a model, and often run into data annotation problems. Labeling all unlabeled data is too time-consuming and expensive...
[Service Development Team Jeon Jeon-Jun] ML-Agents, unveiled by Unity, is an open source tool that creates virtual characters in the game environment. You can create a game environment and learn NPC characters (Agents) that can operate in the environment through algorithms such as reinforcement learning.…
[Prior Research Team, Ji-Hyun Song] The TadGAN algorithm developed by the MIT research team is known to have better performance than previously known models in detecting anomalies by analyzing time series data. Currently, many companies researching anomaly detection are working in various fields (financial…
[Service Development Team Jeon Jeon Jeon] The digital transformation of companies accelerated by Corona 19 continues to increase the value of data. The need for change in various industries as well as specialized IT companies is raising ransom money for data specialists.
OpenAI's GPT-3 is a large language model with a parameter count of up to 175B. Despite the surprising results of GPT-3, it is not open source, so if you want to try it, try AI Dungeon (https://play.aidungeon.io/main/landing) or Philosopher AI (https://philosopherai.com/). ) Through a site such as...
As a deep learning-based image generation method, GAN produces a lot of amazing results. In particular, the latent space is simply random because it is possible to make changes that have a number of meaningful meanings by changing the latent vector after learning.
Tensorflow Lite is a software package that contains tools that allow AI models trained with Tensorflow to run on mobile devices. It is said to be running on over 4 billion devices now. Basically, the trained model is converted to Tensorflow Lite…
Jina, which is open sourced by Jina.AI, is a multimodal data search engine using deep learning technology. Not only does it implement some functions for search, it includes the entire system that can be easily applied to the service, and not only text…
Avatarify is a program that adds real-time avatar animation functions to various video communication programs such as Zoom, Teams, Hangout, and Skype, and is open source. It is developed in the form of replacing the camera input of a video communication program, and the algorithm uses a first-order motion model.
DensePhrases is an open domain Q&A technology created by Jinhyuk Lee at Korea University, and was published as a paper titled “Learning Dense Representations of Phrases at Scale”. Here is a link to the paper: Given a question, the best paragraph from Wikipedia's nearly 60 billion paragraphs…
HuggingFace, famous for its integrated natural language processing package, adds speech recognition. The following is the related link: Specifically, Wav2Vec 2.0 developed by Facebook was added, which Wav2Vec 2.0 does unsupervised learning first with a large amount of unlabeled data, and very…
The technology to create 3D models from a single photo has been unveiled under the name MeInGame. Looking at the results uploaded to the public repository, it is not yet enough to be applied to the service without the designer's work, but it will significantly reduce the initial modeling effort…
Paperswithcode, which provides information on various papers in the field of AI, linked open source, and SOTA, provides links to over 3,000 useful datasets. Of these, there are 851 data sets for text, limited to Korean…
In Kakao Brain, Pororo, an integrated natural language framework capable of responding to various natural language tasks, has been released as open source. Pororo stands for Platform Of neuRal mOdels for natuRal language prOcessing and you can think of it as a similar purpose to HuggingFace. Pororo...
Building a conversation system that allows humans to have natural-looking conversations with virtual agents is a difficult task in natural language processing and is the basis for much ongoing research. Ubuntu Dialogue Corpus addresses various Ubuntu-related issues…
Since the advent of AlexNet consisting of multiple convolution layers, there have been many studies on the structure of deep learning models. For example, Google Inception uses convolution layers with different kernel sizes, such as 3×3, 5×5, and 7×7…
As the number of parameters of a deep learning model increases significantly, the memory required for training is also increasing. OpenAI's GPT-2 consists of 1.5B parameters, and Google's mT5 also has a number of parameters up to 13B. Also, the number of parameters of OpenAI's GPT-3...
As deep learning models grow exponentially in size, it is no longer difficult to achieve usable learning times with a single machine. GPT-2, a well-known conversational model, has about 1.5B parameters and 8 million...
SuperGLUE is a challenge that evaluates the performance of AI technologies for a variety of natural language understanding tasks. It is characterized by consisting of tasks with relatively high difficulty compared to the existing GLUE, and the DeBERTa model recently announced by Microsoft achieved SOTA (state-of-the-arts) and evaluated...
KoChat is a Korean open source chatbot framework released by Hyunwoong Ko. Here's the KoChat github repository: When talking about chatbots, people often only think of a conversation model, but in fact, from a product point of view as a chatbot, machine learning algorithms occupy only a fraction of the…
FrankMocap, a technology released by Facebook AI Research (FAIR), is responsible for extracting a pose for a 3D model from a single image or video. In particular, it is characterized by being able to estimate not only the body but also the shape of the hand...
We share a set of Korean profanity data collected and labeled by Joonhee Jo. It is gathered from multiple communities, and seems to be suitable for evaluation of real-world data. Below is a description of the data set: The Data Description statement is classified as swearword...
QA Tasks that generate appropriate answers to a given question have seen a lot of performance gains due to recent deep learning technologies. The well-known SQuAD is one such task. By the way, the model is trained for each task...
Problems, commonly referred to as Q&A tasks, aim to learn from a data set that records questions and answers in pairs, so that when a question is asked, an appropriate answer is produced. You can think of a chatbot, but question generation is different from the paragraph...
StudioGAN is a pytorch-based open source library released by Pohang University CVLab Kang Min-guk, and various GAN algorithms are implemented. The included GAN algorithm includes a number of major algorithms such as DCGAN, LSGAN, WGAN, and so on.
It is not an exaggeration to say that poker is half a psychological game, so it is a different game from Go or chess. I think the ReBeL released by Facebook this time is remarkable in this regard. In particular, it is characterized by using reinforcement learning and search together, like RAG...
Multimodal EmotionLines Dataset (MELD) is a multimodal extension of EmotionLines, an emotionally labeled dialogue data set. MELD contains the same dialog instances available in EmotionLines, but audio and visual forms along with text…
MindMeld is an open source interactive AI platform designed to ensure serviceable quality. It is written in Python and includes the latest NLP technology and knowledge-based Q&A engine. Here is the approximate architecture of the MindMeld platform:…
We share a link to denoiser's github, Facebook's real-time noise reduction technology that was announced at Interspeech 2020. It is implemented in Pytorch and the title of the original paper is “Real Time Speech Enhancement in the Waveform Domain”. As the title suggests…
Typically, Q&A systems use text to answer questions. A task in this way is the Squad task, which gives you a paragraph explaining a fact and then asks a question and generates an appropriate answer. In contrast, Visual QA instead of text…
In order to implement the visualization part of Human-Like AI, it is necessary to think about how to create and move a 3D human model. There are various existing approaches, but one of them is from Max Planck ETH Center to CVPR 2020...
Many attempts are being made to expand the language model and translation model, which were previously studied mainly in English, into multiple languages. Google's mT5 is a study that extends the existing T5 (text-to-text transfer transformer) into a multilingual corpus, including a total of 101 languages...
Combining multiple network models into an ensemble increases performance, but it is a reality that there are many difficulties when applied in practice because the total network size and inference time also increase. Multi-model Ensemble via Adversarial Learning (MEAL) solves this...
The first stable version 1.0 of the Huggingface Datasets library has been released, making it easy to use NLP datasets and evaluation metrics. Currently, we support about 100 datasets and evaluation metrics (about 10) for each dataset.
It's natural to see virtual characters and moves reasonably in terms of the laws of physics, that is, human-like, which has been a subject of long-standing research in the field of gaming as well as computer graphics. Facebook Jungdam Won's project as the first author, “A Scalable…
LipGAN is a technology that generates the shape of the lips of a face image using a voice signal, and when it is actually applied to a video, it was somewhat disappointing in terms of visual artifacts and the naturalness of movement. To improve this, the discriminator is not a single frame, but a plurality of consecutive…
The performance improvement shown by Transformer-based language models is surprising, but as the model size increases exponentially, concerns about service costs are also becoming important. Bert-base or GPT-2 has about 100 million parameters, so the model size, memory bandwidth,...
The National Institute of the Korean Language has released Korean language materials for artificial intelligence learning on a large scale (13 kinds of 1.8 billion words). It was built by solving the copyright problem, and it is said that anyone can download and use the file once an online agreement is written and approved on the'Everyone's Corpus' site. this time…
Scatterlab (https://scatterlab.co.kr/), which stands out in everyday conversational research, is an article on the Ping-Pong team blog. I still see GPT-3 as a'eye of doubt', but it's curious when I see it again...
bryandlee's github has the results of image translation application using deep generative model and related research made into a webcomic in the late years of calm man. The title of the study is also “Chilled Generative Model Learner”. I like this wit! Looking at the process, webtoon…
There have been many attempts to convert code written in one programming language into another, and there are many types of commercial tools. The main purpose of use is to ensure compatibility, for example FORTRAN or BASIC, or...
I've heard stories that you can know what you're talking about with just the movements of your lips if you get special training, but the research in the link was realized with AI.
In the case of large-scale language models, there was always a difficulty because there was no Korean model. Following SKT's KoBERT, Naver released KcBERT, which was learned from the ground up with data reflecting Naver comment data and new words. Not only the trained model…
Deep learning-based super resolution technology was adopted by NVidia's latest GPU under the name DLSS (deep learning super sampling) and became a real service technology for consumers. Mainly in the 4K gaming market, 2K…
LipGan is the study of creating mouth shapes from speech signals. It is a technique that can be useful for creating an animation of a virtual character's mouth, but when applied in practice, the limitation is clear because only the lips of a character standing still move. In fact, humans...
The Visual Dialog task is a multimodal task that adds an image to a Q&A task that consists of a question and answer. For example, if you give a picture of a white cat and a black dog together and ask, "What color is the animal next to the cat?", you answer "black"...
In the game production side, we share a link to the Adobe Mixamo site that is already used a lot. When you enter, 121 3D characters and 2484 character motions are uploaded, and you can download them in a 3D format called (Autodesk) FBX. This format...
TensorflowTTS, an open source based on Tensorflow 2 that supports several latest TTS models such as Tacotron2, MelGan, FastSpeech, etc., has finally begun supporting Microsoft FastSpeech2. FastSpeech2 shows similar performance to Transformer series TTS, but takes more than twice the time to learn…
Text-to-SQL is a task that automatically converts natural language into SQL. The post I shared at the bottom was written by Aerin Kim of Microsoft, and it is well organized about Text-to-SQL. In the world, a lot of data is built as a relational database, and in this database...
MIT's Speech2Face is a study that generates a speaker's face from a speech signal. However, it does not perform speech to face transform with one model, but it combines the results of existing studies for different purposes to create impressive results. (The first author is now...
After performing representation training with 53,000 hours of label-free data, a pre-trained model for Facebook's wav2vec 2.0, which became a hot topic because it created a speech recognizer with only 10 minutes of labeled data, was released. No fine-tuning in the representation model,...
It is a dataset DriveSeg created for research on road situation awareness (used for self-driving cars, etc.). For each frame of the video, the entire image is pixel-by-pixel semantic labeling. Label is “vehicle, pedestrian, road, sidewalk, bicycle, motorcycle, building,...
Many MRC models proposed so far show evaluation values beyond human capabilities in various tasks and datasets, but I think it is difficult to easily say YES when asked if you understand a given context better than humans? priority,…
Los Angeles Noir, a 2011 film made by Rockstar, surprised many with facial animations that were far superior to other games. The technology used at this time is called MotionScan, and basically, the actor is in a room where several cameras are elaborately placed...
It is a code repository of GANimation, a technology that creates animations that change facial expressions by inputting a single image. Basically, it is a conditional GAN, and it uses FACS (facial action coding system), a methodology to describe the anatomical movements of the face. According to FACS we…
Introducing the Danbooru 2019 version link, an animated character image database. There are about 3.7 million images and about 29 tags are attached per image. Examples of tags include “1girl”, “solo”, “long_hair”, “highres”, “smile”, and “open_mouth”…
There are many complex human emotion perceptions and expressions (e.g. angry emotions affect facial expressions, voices, and language). Here's an open dataset with audio-videos tied together and emotionally labeled. The Ryerson…
The main task of AI chatbots is to answer questions such as explaining product information, telling schedules, and checking the weather. Perhaps, if these technologies are maximized, some areas of humans called'knowledge' are called'Super Human…'