Skip to content

Dec 12 - AI, ML and Computer Vision Meetup

Network event
55 attendees from 14 groups hosting
Photo of Jimmy Guerrero - Voxel51
Hosted By
Jimmy Guerrero - V.
Dec 12 - AI, ML and Computer Vision Meetup

Details

Register for the Zoom:

https://voxel51.com/computer-vision-events/ai-machine-learning-computer-vision-meetup-dec-12-2024/

How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos

CoTracker3 is a state-of-the-art point tracking model that introduces significant improvements in tracking objects through video sequences. Its key innovations include:

  • Uses semi-supervised training with real videos, reducing reliance on synthetic data1
  • Generates pseudo-labels using existing tracking models as teachers1
  • Features a simplified architecture compared to previous trackers

About the Speaker

Nikita Karaev is currently doing a PhD at Meta AI and Oxford, where he’s working on dynamic reconstruction and motion estimation (CoTracker) with Andrea Vedaldi and Christian Rupprecht. Before that, he did his master’s at École Polytechnique (Paris), and undergrad in cold Siberia (Novosibirsk). He was also an early employee at two startups that got acquired by Snapchat and Farfetch.

Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output

In this presentation, Harpreet Sahota explores CoTracker3, a state-of-the-art point tracking model that effectively leverages real-world videos during training. He dives into the practical aspects of running inference with CoTracker3 and parsing its output into FiftyOne, a powerful open-source tool for dataset curation, analysis, and visualization. Through a hands-on demonstration, Harpreet shows how to prepare a video for inference, run the model, examine its output, and parse the model’s output into FiftyOne’s keypoint format for seamless integration and visualization within the FiftyOne app.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Streamlined Retail Product Detection with YOLOv8 and FiftyOne

In the fast-paced retail environment, automation at checkout is increasingly essential to enhance operational efficiency and improve the customer experience.

This talk will demonstrate a streamlined approach to retail product detection using the Retail Product Checkout (RPC) dataset, which includes 200 SKUs across 17 meta-categories such as puffed food, dried food, and drinks.

By leveraging YOLOv8, renowned for its speed and accuracy in real-time object detection, and FiftyOne, an open-source toolset for computer vision, we can simplify data loading, training, evaluation, and visualization for effective product detection and classification. Attendees will gain insights into how these tools can be applied to optimize checkout automation.

About the Speaker

Vanshika Jain is a Data Engineer Intern at UNAR Labs, a startup focused on making information accessible for the blind. She holds a Master’s degree in Machine Learning and Computer Vision from Northeastern University and is passionate about applying AI and computer vision to real-world problems, with a focus on automation and accessibility.

Photo of Chicago Computer Vision Meetup group
Chicago Computer Vision Meetup
See more events
FREE