{"id":123799,"date":"2024-04-23T11:20:52","date_gmt":"2024-04-23T15:20:52","guid":{"rendered":"https:\/\/massive.io\/?p=123799"},"modified":"2026-02-20T11:30:07","modified_gmt":"2026-02-20T16:30:07","slug":"video-understanding","status":"publish","type":"post","link":"https:\/\/massive.io\/fr\/realisation-de-films\/comprehension-de-la-video\/","title":{"rendered":"L'IA multimodale et la fa\u00e7on dont la compr\u00e9hension de la vid\u00e9o va r\u00e9volutionner les m\u00e9dias"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;on|desktop&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; background_color=&#8221;#FFFFFF&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;2%|20%|2%|20%|false|true&#8221; custom_padding_tablet=&#8221;4%|0%|4%|0%|true|true&#8221; custom_padding_phone=&#8221;6%|0%|6%|0%|true|true&#8221; border_color_top=&#8221;#e1e1e1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||0px||false|true&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>It\u2019s not hyperbole to say that artificial intelligence (AI) is now everywhere. You can\u2019t turn around these days without seeing a new incarnation of or application for AI. Nowhere is that more true than in the world of video and film production.<\/p>\n<p>From script writing and location scouting in pre-production to object removal and scene stabilization in post, the AI and machine learning (ML) takeover is real. And that\u2019s a good thing: Less time spent on tedious tasks means M&amp;E pros can save money and spend their valuable time on more valuable tasks.<\/p>\n<p>But perhaps one of the most innovative and powerful ways in which the power of<a href=\"https:\/\/massive.io\/filmmaking\/6-ai-tools-for-filmmaking\/\"> AI is revolutionizing the video world<\/a> is through <strong>video understanding<\/strong>.<\/p>\n<p>[\/et_pb_text][et_pb_text module_id=&#8221;physical&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_4_text_color=&#8221;#444444&#8243; background_color=&#8221;rgba(158,213,247,0.19)&#8221; custom_padding=&#8221;2%|3%|2%|3%|false|true&#8221; border_width_left=&#8221;5px&#8221; border_color_left=&#8221;#3d72e7&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>This post was written in partnership with <a href=\"https:\/\/www.twelvelabs.io\/\" target=\"_blank\" rel=\"noopener\">Twelve Labs<\/a>, a pioneer in multimodal AI for video understanding, and is also <a href=\"https:\/\/www.twelvelabs.io\/blog\/twelve-labs-and-masv\" target=\"_blank\" rel=\"noopener\">featured on their blog<\/a>.<\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_font_size=&#8221;26px&#8221; width=&#8221;100%&#8221; width_tablet=&#8221;100%&#8221; width_phone=&#8221;100%&#8221; width_last_edited=&#8221;on|tablet&#8221; max_width=&#8221;100%&#8221; custom_margin=&#8221;|-54px|0px||false|false&#8221; custom_padding=&#8221;0px|||0px|false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<div class=\"\" data-block=\"true\" data-editor=\"520fd\" data-offset-key=\"ekesf-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ekesf-0-0\">\n<p><strong>Table of Contents<\/strong><\/p>\n<\/div>\n<\/div>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<ul>\n<li><a href=\"#video-understanding\">What is Video Understanding?<\/a><\/li>\n<li><a href=\"#video-understanding-applications\">Applications of Video Understanding<\/a>\n<ul>\n<li><a href=\"#search\">Video search<\/a><\/li>\n<li><a href=\"#classification\">Video classification<\/a><\/li>\n<li><a href=\"#description\">Video description<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#video-understanding-technology\">The Technology Behind Video Understanding<\/a><\/li>\n<li><a href=\"#video-ai-workflows\">Enabling AI Video Workflows<\/a><\/li>\n<\/ul>\n<p>[\/et_pb_text][et_pb_cta title=&#8221;File Transfer for Massive Datasets&#8221; button_url=&#8221;https:\/\/app.massive.io\/en\/signup&#8221; button_text=&#8221;Try MASV Free&#8221; module_class=&#8221;starttrial&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; header_level=&#8221;h5&#8243; header_font=&#8221;||||||||&#8221; header_font_size=&#8221;26px&#8221; header_line_height=&#8221;1.3em&#8221; body_line_height=&#8221;1.8em&#8221; background_color=&#8221;#202332&#8243; use_background_color_gradient=&#8221;on&#8221; background_color_gradient_start=&#8221;#072231&#8243; background_color_gradient_end=&#8221;#031119&#8243; custom_button=&#8221;on&#8221; button_text_size=&#8221;18px&#8221; button_text_color=&#8221;#FFFFFF&#8221; button_bg_color=&#8221;#0472ef&#8221; button_bg_color_gradient_start=&#8221;#0472ef&#8221; button_bg_color_gradient_end=&#8221;#005dc6&#8243; button_bg_color_gradient_direction=&#8221;90deg&#8221; button_border_width=&#8221;0px&#8221; button_font=&#8221;Roboto|700|||||||&#8221; button_custom_padding=&#8221;10px|42px|10px|42px|true|true&#8221; custom_margin=&#8221;||20px||false|false&#8221; link_option_url=&#8221;https:\/\/app.massive.io\/en\/signup&#8221; border_radii=&#8221;on|10px|10px|10px|10px&#8221; border_color_top=&#8221;#3d72e7&#8243; border_color_left=&#8221;#3d72e7&#8243; box_shadow_style=&#8221;preset2&#8243; box_shadow_horizontal=&#8221;-13px&#8221; box_shadow_style_button=&#8221;preset1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221; button_border_width__hover_enabled=&#8221;on|hover&#8221; button_custom_padding__hover_enabled=&#8221;on|hover&#8221; button_custom_padding__hover=&#8221;|2em|||false|false&#8221; button_border_width__hover=&#8221;0px&#8221; button_bg_color__hover=&#8221;#005dc6&#8243; button_bg_color__hover_enabled=&#8221;on|desktop&#8221;]<\/p>\n<p>The fastest and easiest way to offload large datasets and large video files into the cloud for AI processing.<\/p>\n<p>[\/et_pb_cta][et_pb_code _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_code][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;on|desktop&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; background_color=&#8221;#FFFFFF&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;2%|20%|2%|20%|false|true&#8221; custom_padding_tablet=&#8221;4%|0%|4%|0%|true|true&#8221; custom_padding_phone=&#8221;6%|0%|6%|0%|true|true&#8221; border_width_top=&#8221;1px&#8221; border_color_top=&#8221;#e1e1e1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||||false|true&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text module_id=&#8221;video-understanding&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h2>What is Video Understanding?<\/h2>\n<p><a href=\"https:\/\/www.twelvelabs.io\/blog\/the-past-present-and-future-of-video-understanding-applications\" target=\"_blank\" rel=\"noopener\"><strong>Video understanding<\/strong><\/a> models analyze, interpret, and comprehend video content, extracting information in such a way that the entire context of the video is understood.<\/p>\n<p>[\/et_pb_text][et_pb_image src=&#8221;https:\/\/massive.io\/wp-content\/uploads\/2024\/04\/camera-lens-closeup.jpg&#8221; alt=&#8221;Placeholder image&#8221; title_text=&#8221;camera-lens-closeup&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; max_width=&#8221;75%&#8221; max_width_tablet=&#8221;100%&#8221; max_width_phone=&#8221;100%&#8221; max_width_last_edited=&#8221;on|tablet&#8221; module_alignment=&#8221;center&#8221; custom_padding=&#8221;||0px||false|false&#8221; border_radii=&#8221;on|10px|10px|10px|10px&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.8em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p id=\"advantages-of-nas\" style=\"text-align: center;\">Photo by <a href=\"https:\/\/unsplash.com\/@olloweb?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\" target=\"_blank\" rel=\"noopener\">Agence Olloweb<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/a-close-up-of-a-lens-with-a-blurry-background-9wYdW55NbnY?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\" target=\"_blank\" rel=\"noopener\">Unsplash<\/a><\/p>\n<p>[\/et_pb_text][et_pb_text module_id=&#8221;section-two&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>It\u2019s not just about identifying objects frame by frame or parsing the audio components. <span style=\"background-color: #ffffa3;\">AI-powered video understanding maps natural language to the actions within a video<\/span>. To do that it has to perform various video understanding tasks, such as activity recognition and object detection, to grasp the nuance of what&#8217;s being communicated through this most fluid of media by processing and understanding the visual, audio, and speech elements of a video.<\/p>\n<p>It&#8217;s also different from large language models (LLMs) such as ChatGPT, which aren&#8217;t trained to specifically understand video data.<\/p>\n<p>Put simply, AI video understanding models comprehend video just as we do.<\/p>\n<p>It\u2019s a huge challenge, but it\u2019s one that video understanding infrastructure company Twelve Labs is eagerly tackling.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_3,2_3&#8243; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; background_color=&#8221;#f5f5f5&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;3%|3%|3%|3%|true|true&#8221; border_radii=&#8221;on|8px|8px|8px|8px&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;1_3&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_image src=&#8221;https:\/\/massive.io\/wp-content\/uploads\/2023\/01\/ai-filmmaking-thumbnail.jpg&#8221; alt=&#8221;learn how artificial intelligence will impact filmmaking in this article&#8221; title_text=&#8221;ai-filmmaking-thumbnail&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; border_radii=&#8221;on|5px|5px|5px|5px&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;2_3&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h4>Artificial Intelligence &amp; Filmmaking<\/h4>\n<p>An exploration of artificial intelligence&#8217;s impact on filmmaking along with an analysis of the of the technology<\/p>\n<p><a href=\"https:\/\/massive.io\/filmmaking\/artificial-intelligence-filmmaking\/\">AI in filmmaking &gt;<\/a><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;on|desktop&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; background_color=&#8221;#FFFFFF&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;2%|20%|2%|20%|false|true&#8221; custom_padding_tablet=&#8221;4%|0%|4%|0%|true|true&#8221; custom_padding_phone=&#8221;6%|0%|6%|0%|true|true&#8221; border_width_top=&#8221;1px&#8221; border_color_top=&#8221;#e1e1e1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||0px||true|true&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text module_id=&#8221;video-understanding-applications&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h2>Applications of Video Understanding in Media &amp; Entertainment<\/h2>\n<p>Before diving into the technology behind deep video understanding, let\u2019s explore exactly how video understanding can streamline the work of M&amp;E pros and video content creators.<\/p>\n<p>[\/et_pb_text][et_pb_text module_id=&#8221;search&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h3>Video search<\/h3>\n<p>Imagine being able to find a particular video in a vast collection of petabytes of data, simply by describing its visual elements in natural language. Or as a <a href=\"https:\/\/massive.io\/industries\/broadcast-industry\/\">sports league or club<\/a>, asking your AI video understanding model to <strong>assemble a highlight reel of all a player\u2019s goals in just a few seconds<\/strong>.<\/p>\n<p>All these things are possible using AI video understanding.<\/p>\n<p>Traditional video search, on the other hand, has severe limitations in its approach and execution. By relying primarily on keyword matching to index and retrieve videos, they don\u2019t take advantage of multimodal AI techniques that provide a deeper understanding of video through visual and auditory cues.<\/p>\n<p>By simultaneously integrating all available data types\u2014including images, sounds, speech, and on-screen text\u2014modern video understanding models <span style=\"background-color: #ffffa3;\">capture the complex relationships among these elements to deliver a more nuanced, humanlike interpretation<\/span>.<\/p>\n<p>That results in much faster and far more accurate video search and retrieval from cloud object storage. Instead of time-consuming and ineffective manual tagging, video editors can use natural language to quickly and accurately search vast media archives to unearth video moments and hidden gems that otherwise might go unnoticed.<\/p>\n<p>Twelve Labs\u2019 <a href=\"https:\/\/www.twelvelabs.io\/product\/video-search\" target=\"_blank\" rel=\"noopener\">Search API<\/a> takes around 15 minutes to index an hour of video, making indexed video semantically searchable in more than 100 languages.<\/p>\n<p>[\/et_pb_text][et_pb_image src=&#8221;https:\/\/massive.io\/wp-content\/uploads\/2024\/04\/messi-passes-ball.jpg&#8221; alt=&#8221;a video understanding model recognizes Messi passing a ball to a select player&#8221; title_text=&#8221;messi-passes-ball&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; border_radii=&#8221;on|10px|10px|10px|10px&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][et_pb_text module_id=&#8221;classification&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<h3>Video classification<\/h3>\n<p>AI-powered video understanding allows for automatic categorization of video into predefined classes or topics. Using Twelve Labs\u2019 <a href=\"https:\/\/www.twelvelabs.io\/blog\/effortless-video-classifiers-with-twelve-labs-api-no-ml-training-required\" target=\"_blank\" rel=\"noopener\">Classify API<\/a>, you can organize videos into sports, news, entertainment, or documentaries by analyzing semantic features, objects, actions, and other elements of the content.<\/p>\n<p>The model can also<a href=\"https:\/\/techcrunch.com\/2023\/10\/24\/twelve-labs-is-building-models-that-can-understand-videos-at-a-deep-level\/\" target=\"_blank\" rel=\"noopener\"> classify specific scenes<\/a>, which can power practical applications around advertising or content moderation. The technology can identify a scene containing a weapon as educational, dramatic, or violent, for example, based on the context.<\/p>\n<p>This benefits creators and video platforms, along with enhancing user experiences by providing more accurate recommendations based on a user\u2019s interests and preferences. It also benefits post-production pros who need to quickly find and log items for editing, archiving or other purposes.<\/p>\n<p>While every video used within Twelve Labs\u2019 technology contains <a href=\"https:\/\/massive.io\/file-transfer\/best-practices-for-metadata-management\/\">standard metadata<\/a>, users also have the option of adding <a href=\"https:\/\/massive.io\/product\/custom-metadata-fields\/\">custom metadata<\/a> to their videos to provide more detailed or context-specific information.<\/p>\n<p>From surveillance and security to sports analysis, from content moderation to contextualized advertising, video understanding has the ability to completely upend video classification.<\/p>\n<p>[\/et_pb_text][et_pb_text module_id=&#8221;description&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h3>Video description<\/h3>\n<p>Video understanding can automatically summarize a video dataset through detailed descriptions generated in seconds. The technology improves comprehension and engagement by condensing long videos into concise representations that capture the most important content.<\/p>\n<p>[\/et_pb_text][et_pb_video src=&#8221;https:\/\/www.youtube.com\/watch?v=ACmA1DLgY10&#038;t=228s&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; box_shadow_style=&#8221;preset2&#8243; global_colors_info=&#8221;{}&#8221;][\/et_pb_video][et_pb_text module_id=&#8221;description&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>Such fast, detailed summaries can be a big help when enriching media with descriptive metadata and summaries. Notably among those with physical disabilities or cognitive impairments that make video a less than ideal medium.<\/p>\n<p>In the media and entertainment industry, video description and summarization can be used to create previews or trailers for movies, TV shows, and other video content. These previews provide a concise overview of the content and help viewers decide whether to watch the full video. And anything that improves the user experience is a good thing.<\/p>\n<p>Twelve Labs\u2019 <a href=\"https:\/\/docs.twelvelabs.io\/docs\/generate-text-from-video\" target=\"_blank\" rel=\"noopener\">Generate API suite<\/a> generates texts based on your videos. It offers three distinct endpoints tailored to meet various requirements. Each endpoint has been designed with specific levels of flexibility and customization to accommodate different needs.<\/p>\n<ul>\n<li>The <a href=\"https:\/\/docs.twelvelabs.io\/api-reference\/analyze-videos\/gist\" target=\"_blank\" rel=\"noopener\">Gist API<\/a> can produce concise text outputs like titles, topics, and lists of relevant hashtags.<\/li>\n<li>The <a href=\"https:\/\/docs.twelvelabs.io\/docs\/generate-summaries-chapters-highlights\" target=\"_blank\" rel=\"noopener\">Summary API<\/a> is designed to generate video summaries, chapters, and highlights.<\/li>\n<li>For customized outputs, the <a href=\"https:\/\/docs.twelvelabs.io\/docs\/generate-open-ended-texts\" target=\"_blank\" rel=\"noopener\">Generate API<\/a> allows users to prompt specific formats and styles, from bullet points to reports and even creative lyrics based on the content of the video.<\/li>\n<\/ul>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;on|desktop&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; background_color=&#8221;#FFFFFF&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;2%|20%|2%|20%|false|true&#8221; custom_padding_tablet=&#8221;4%|0%|4%|0%|true|true&#8221; custom_padding_phone=&#8221;6%|0%|6%|0%|true|true&#8221; border_width_top=&#8221;1px&#8221; border_color_top=&#8221;#e1e1e1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||0px||true|true&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text module_id=&#8221;video-understanding-technology&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h2>The Technology Behind Video Understanding<\/h2>\n<blockquote>\n<p>\u201cAI can\u2019t understand 80% of the world\u2019s data because it\u2019s locked in video content,\u201d explained Twelve Labs CEO <a href=\"https:\/\/www.linkedin.com\/in\/leejae94\" target=\"_blank\" rel=\"noopener\">Jae Lee<\/a> in an interview with MASV. \u201cWe build the keys to unlock it.\u201d<\/p>\n<\/blockquote>\n<p>Indeed, legacy computer vision (CV) models, which use neural networks and ML to understand digital images, have always had trouble comprehending context within video. CV models are great at identifying objects and behaviors but not the relationship between them. It\u2019s a gap that, until recently, limited our ability to accurately analyze video content using AI.<\/p>\n<p><a href=\"https:\/\/www.linkedin.com\/in\/traviscouture\" target=\"_blank\" rel=\"noopener\">Travis Couture<\/a>, founding solutions architect at Twelve Labs, framed the issue as content versus context.<\/p>\n<p>\u201cThe traditional approach has been to break video content down into problems that are easier to solve, which would typically mean analyzing frame-by-frame as individual images, and breaking the audio channels out separately and doing transcription on those. Once those two processes are finished, you would bring it all back together and combine your findings.<\/p>\n<blockquote>\n<p>\u201cWhen you do that\u2014break down and rebuild\u2014<span style=\"background-color: #ffffa3;\">you might have the content but you don\u2019t have the context<\/span>. And with video, context is king.<\/p>\n<\/blockquote>\n<p>\u201cThe goal at Twelve Labs is to move away from that traditional computer vision approach and into the video understanding arena, and that means processing video like humans do, which is all together, all at once.\u201d<\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h3>Multimodal video understanding<\/h3>\n<p>[\/et_pb_text][et_pb_image src=&#8221;https:\/\/massive.io\/wp-content\/uploads\/2024\/04\/multimodal-video-ai.jpg&#8221; alt=&#8221;what is multimodal ai&#8221; title_text=&#8221;multimodal-video-ai&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; border_radii=&#8221;on|10px|10px|10px|10px&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.8em&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p id=\"advantages-of-nas\" style=\"text-align: center;\">Source: <a href=\"https:\/\/www.twelvelabs.io\/blog\/what-is-multimodal-ai\" target=\"_blank\" rel=\"noopener\">What is Multimodal AI?<\/a><\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_3_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>Video is dynamic, layered, fluid\u2014a combination of elements that, when broken down and separately analyzed, don\u2019t add up to the whole. This is the problem that Twelve Labs has solved. But how did they do it?<\/p>\n<p>By employing <strong>multimodal AI<\/strong>.<\/p>\n<p>The term \u201cmodality\u201d in this context refers to the way in which an event is experienced. With video, as in the real world, there are multiple modalities: Aural, visual, temporal, language, and others.<\/p>\n<p>\u201cWhen you analyze those modalities separately and attempt to piece them back together,\u201d explained Twelve Labs co-founder and head of business development <a href=\"https:\/\/www.linkedin.com\/in\/soyoung-lee-1801a0120\" target=\"_blank\" rel=\"noopener\">Soyoung Lee<\/a>, \u201cyou\u2019ll never achieve holistic understanding and context.\u201d<\/p>\n<p>Twelve Labs\u2019 multimodal approach has allowed it to build a model that replicates the way humans interpret video. \u201cOur <a href=\"https:\/\/www.twelvelabs.io\/blog\/introducing-marengo-2-6\" target=\"_blank\" rel=\"noopener\">Marengo<\/a> video foundation model feeds perceptual, semantic, and contextual information to <a href=\"https:\/\/www.twelvelabs.io\/blog\/upgrading-pegasus-1\" target=\"_blank\" rel=\"noopener\">Pegasus<\/a>, our generative model, mimicking the way humans go from perception to processing and logic,\u201d explained Couture.<\/p>\n<p>Just as the human brain constantly receives, interprets, and arranges colossal amounts of information, Twelve Labs\u2019 multimodal AI is all about synthesizing multiple stimuli into coherent understanding. It extracts data from video around variables such as time, objects, speech, text, people, and actions to synthesize the data into vectors, or mathematical representations.<\/p>\n<p>It employs tasks such as action recognition or action detection, pattern recognition, object detection, and scene understanding to make this happen.<\/p>\n<p>Because applications for holistic video understanding are so far reaching in M&amp;E and beyond, Twelve Labs provides a sandbox environment\u2014called the <a href=\"https:\/\/docs.twelvelabs.io\/docs\/playground\" target=\"_blank\" rel=\"noopener\">Playground<\/a>\u2014that allows users to explore and test video understanding technology. The company also provides <a href=\"https:\/\/docs.twelvelabs.io\/docs\/introduction\" target=\"_blank\" rel=\"noopener\">documentation<\/a> and offers a robust API that allows users to embed video understanding capabilities into their platform in just a few API calls.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;on|phone&#8221; _builder_version=&#8221;4.14.7&#8243; background_color=&#8221;#f5f5f5&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;2%|20%|2%|20%|true|true&#8221; custom_padding_tablet=&#8221;4%|0%|4%|0%|true|true&#8221; custom_padding_phone=&#8221;6%|0%|6%|0%|true|true&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;7b1bf5ad-cc2a-4448-981c-4963d88bd6e8&#8243; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||0px||true|true&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text module_id=&#8221;video-ai-workflows&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h2>Enable AI Video Workflows in the Cloud With MASV and Twelve Labs<\/h2>\n<p>As of December 2023 approximately <a href=\"https:\/\/explodingtopics.com\/blog\/data-generated-per-day\" target=\"_blank\" rel=\"noopener\">328.77 million terabytes<\/a> of global data were created every single day, with <span style=\"background-color: #ffffa3;\">video responsible for 53.27 percent of that<\/span>\u2014and rising. It\u2019s this dramatic and ongoing shift towards video that makes Twelve Labs\u2019s video understanding technology so important.<\/p>\n<p>MASV also understands the already-immense and growing potential of video. Our frictionless, fast large file transfer service can ingest huge datasets into popular cloud environments for AI processing, including <a href=\"https:\/\/massive.io\/integrations\/amazon-s3\/\"><span>Amazon S3<\/span><\/a>, using an automated and secure file uploader. This helps simplify content ingest to support <a href=\"https:\/\/massive.io\/industries\/artificial-intelligence\/\">AI workflows involving video<\/a> and other large datasets.<\/p>\n<p>Users can configure MASV to automatically upload any transferred files into the user\u2019s S3 instance, and then use Twelve Labs to expedite AI video understanding tasks such as archive\/content search or video summarization.<\/p>\n<p>[\/et_pb_text][et_pb_image src=&#8221;https:\/\/massive.io\/wp-content\/uploads\/2024\/04\/ai-workflows-hero-3.png&#8221; alt=&#8221;MASV for AI workflows&#8221; title_text=&#8221;ai-workflows-hero-3&#8243; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; max_width=&#8221;75%&#8221; max_width_tablet=&#8221;100%&#8221; max_width_phone=&#8221;100%&#8221; max_width_last_edited=&#8221;on|tablet&#8221; module_alignment=&#8221;center&#8221; custom_padding=&#8221;||0px||false|false&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][et_pb_text module_id=&#8221;secure-file-sharing&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;2514b1ee-af07-4bc3-a96b-c9aaa32f4a18&#8243; text_text_color=&#8221;#000000&#8243; text_line_height=&#8221;1.8em&#8221; header_2_text_color=&#8221;#000000&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>Want to give MASV and Twelve Labs a test drive? <a href=\"https:\/\/app.massive.io\/en\/signup\">Sign-up for MASV<\/a> for free today to try things out\u2014and while you\u2019re at it, sign up for <a href=\"https:\/\/auth.twelvelabs.io\/login?state=hKFo2SA5VE9tMzQya0s5X19hVlFZWklPaGRFbWtZMzY2U1BoZKFupWxvZ2luo3RpZNkgV3g5V2ZydlJJUl9iaTc1SVZUVFFyNEktWmUxcVhMeHijY2lk2SBlQ3AwcURYbWtXNnIwd1NmQmZTbFFWWGlpRkR0WE5RUw&amp;client=eCp0qDXmkW6r0wSfBfSlQVXiiFDtXNQS&amp;protocol=oauth2&amp;scope=openid%20profile%20email%20read%3Acurrent_user%20https%3A%2F%2Ftwelvelabs.io%2Fmetadata&amp;audience=https%3A%2F%2Ftwelvelabs.us.auth0.com%2Fapi%2Fv2%2F&amp;redirect_uri=https%3A%2F%2Fplayground.twelvelabs.io&amp;response_type=code&amp;response_mode=query&amp;nonce=OWN0a0pia3A1U28waENsNVNjbi14aEs0YS51dGc0M3RQQ0RjMlhldk5sQw%3D%3D&amp;code_challenge=uRRU6CMrTM5rjULQpt-EC3QYi5AtIfK_jDgRwsHNT6o&amp;code_challenge_method=S256&amp;auth0Client=eyJuYW1lIjoiYXV0aDAtcmVhY3QiLCJ2ZXJzaW9uIjoiMi4yLjMifQ%3D%3D\" target=\"_blank\" rel=\"noopener\">Twelve Labs\u2019 Playground<\/a> environment to explore what the power of video understanding can do for you.<\/p>\n<p>[\/et_pb_text][et_pb_cta title=&#8221;Ingest Big Datasets to the Cloud&#8221; button_url=&#8221;https:\/\/app.massive.io\/en\/signup&#8221; button_text=&#8221;Start for Free&#8221; module_class=&#8221;starttrial&#8221; _builder_version=&#8221;4.14.7&#8243; _module_preset=&#8221;default&#8221; header_level=&#8221;h5&#8243; header_font=&#8221;||||||||&#8221; header_font_size=&#8221;26px&#8221; header_line_height=&#8221;1.3em&#8221; body_line_height=&#8221;1.8em&#8221; background_color=&#8221;#202332&#8243; use_background_color_gradient=&#8221;on&#8221; background_color_gradient_start=&#8221;#072231&#8243; background_color_gradient_end=&#8221;#031119&#8243; custom_button=&#8221;on&#8221; button_text_size=&#8221;18px&#8221; button_text_color=&#8221;#FFFFFF&#8221; button_bg_color=&#8221;#0472ef&#8221; button_bg_color_gradient_start=&#8221;#0472ef&#8221; button_bg_color_gradient_end=&#8221;#005dc6&#8243; button_bg_color_gradient_direction=&#8221;90deg&#8221; button_border_width=&#8221;0px&#8221; button_font=&#8221;Roboto|700|||||||&#8221; button_custom_padding=&#8221;10px|42px|10px|42px|true|true&#8221; custom_margin=&#8221;||20px||false|false&#8221; link_option_url=&#8221;https:\/\/app.massive.io\/en\/signup&#8221; border_radii=&#8221;on|10px|10px|10px|10px&#8221; border_color_top=&#8221;#3d72e7&#8243; border_color_left=&#8221;#3d72e7&#8243; box_shadow_style=&#8221;preset2&#8243; box_shadow_horizontal=&#8221;-13px&#8221; box_shadow_style_button=&#8221;preset1&#8243; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221; button_border_width__hover_enabled=&#8221;on|hover&#8221; button_custom_padding__hover_enabled=&#8221;on|hover&#8221; button_custom_padding__hover=&#8221;|2em|||false|false&#8221; button_border_width__hover=&#8221;0px&#8221; button_bg_color__hover=&#8221;#005dc6&#8243; button_bg_color__hover_enabled=&#8221;on|desktop&#8221;]<\/p>\n<p>MASV migrates video training datasets into the cloud faster than ever before to kickstart AI workflows.<!-- notionvc: e1efacb6-d516-4d39-b64e-9ea7089973f6 --><\/p>\n<p>[\/et_pb_cta][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>De l'\u00e9criture du sc\u00e9nario et du rep\u00e9rage des lieux en pr\u00e9production \u00e0 la suppression d'objets et \u00e0 la stabilisation de la sc\u00e8ne en postproduction, la prise de contr\u00f4le de l'IA et de l'apprentissage automatique est r\u00e9elle. Et c'est une bonne chose : moins de temps pass\u00e9 sur des t\u00e2ches fastidieuses signifie que les professionnels du M&amp;E peuvent \u00e9conomiser de l'argent et consacrer leur temps pr\u00e9cieux \u00e0 des t\u00e2ches plus utiles. Mais l'une des fa\u00e7ons les plus innovantes et les plus puissantes dont l'IA r\u00e9volutionne le monde de la vid\u00e9o est sans doute la compr\u00e9hension de la vid\u00e9o.<\/p>","protected":false},"author":11,"featured_media":123853,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[165],"tags":[],"class_list":["post-123799","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-filmmaking"],"acf":[],"_links":{"self":[{"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/posts\/123799","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/comments?post=123799"}],"version-history":[{"count":0,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/posts\/123799\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/media\/123853"}],"wp:attachment":[{"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/media?parent=123799"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/categories?post=123799"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/massive.io\/fr\/wp-json\/wp\/v2\/tags?post=123799"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}