A man demonstrates the capabilities of the AI-based video editing app Captions in a brief video. The app automatically generates bold subtitles for a video showing the preparation of fajitas in an airfryer. Gaurav Misra, the CEO and co-founder of Captions, showcases the app’s translation tool, which can dub the entire video into Hindi. He further demonstrates the app’s features, such as adjusting audio volume, background color, word removal, and adding transitions, all done through simple taps and toggles.
According to Misra, this demo highlights how Captions facilitates reaching a broader audience for video creators. Additionally, the video editing startup has recently announced a successful Series B funding round, raising $25 million. The funding was led by Kleiner Perkins, a Silicon Valley VC firm, with participation from Sequoia Capital, Andreessen Horowitz, and SV Angel. This injection of funds brings the startup’s valuation to $250 million, with a total of $40 million raised so far. Kleiner Perkins has shown interest in the video communication space, previously investing in AI video startup Synthesia and video recording platform Loom.
Captions originated from Misra’s tenure as the leader of the design engineering team at Snap Inc. between 2016 and 2021. During this period, he observed the evolution of social media videos, from TikTok-style dance videos to Instagram Reels and YouTube Shorts. He also noticed the rise of a new category called “talking videos,” where creators directly address the camera. In 2020, Misra left Snap and co-founded Captions with his former colleague Dwight Churchill, who had also departed from Goldman Sachs.
Since its inception, Captions has been utilized by approximately 3 million creators for automatically captioning and editing videos across various categories, including golf, real estate, and aviation. The app boasts around 100,000 daily active users and facilitates the creation of approximately one million videos per month.
However, Captions faces competition from established companies like CapCut, an editing app owned by Bytedance, which reportedly boasts 200 million active users. Adobe has also introduced its own generative AI features through Firefly. In recent years, other AI-based video and audio editing startups like Descript have emerged, securing substantial funding from venture capitalists.
Misra asserts that Captions’ approach to video editing software is distinct due to its focus on editing talking videos specifically. While most video production editing tools prioritize aesthetics like filters and colors, Captions prioritizes conveying ideas and experiences, according to Misra.
The app offers a range of generative AI-based features encompassing recording, editing, and distribution, available for a monthly subscription of $10. While most features are built on open-source models, some are developed by Captions’ team of 16 members. The AI script writer feature enables creators to use ChatGPT for scriptwriting and OpenAI’s speech-to-text tool Whisper for audio captioning. Captions also includes an in-house voice cloning tool trained on licensed audio recordings, allowing users to translate their audio into 28 other languages or use AI-generated voiceovers. Misra acknowledges the potential misuse of the software for creating deepfakes and emphasizes that users can only change the language of the audio, not insert or create new audio recordings for imported videos.
Other features of Captions include automatic zooming, filler word and offensive word detection and removal, and adjusting the sound level of background audio. The app also employs an AI eye correction tool, originally developed by Nvidia for potential application in Zoom, to make it appear as if users are looking directly at the camera.
With the new funding, Captions plans to expand its team and enhance existing features, such as its AI music feature, which generates instrumental background music by rearranging pre-recorded musical instruments. Misra believes that adding more features will level the playing field for content creators against better-resourced competitors, making it easier for everyday people to leverage these technologies.
Misra sums up the company’s goal by stating, “Our goal is to bring these technologies to everyday people. Half the battle is the technology.”