This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub.

Upcoming events (4+)

See all

Tue, Nov 19, 2024, 5:00 PM UTCECCV Redux: Day 1 - Nov 19
Link visible for attendees
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events.

Register for the Zoom

Fast and Photo-realistic Novel View Synthesis from Sparse Images

Novel view synthesis generates new perspectives of a scene from a set of 2D images, enabling 3D applications like VR/AR, robotics, and autonomous driving. Current state-of-the-art methods produce high-fidelity results but require a lot of images, while sparse-view approaches often suffer from artifacts or slow inference. In this talk, I will present my research work focused on developing fast and photorealistic novel view synthesis techniques capable of handling extremely sparse input views.

ECCV 2024 Paper: CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

About the Speaker

Avinash Paliwal is a PhD Candidate in the Aggie Graphics Group at Texas A&M University. His research is focused on 3D Computer Vision and Computational Photography.

Robust Calibration of Large Vision-Language Adapters

We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline in the presence of distributional drift. We identify the increase in logit ranges as the underlying cause of miscalibration of CLIP adaptation methods, contrasting with previous work on calibrating fully-supervised models. Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits

ECCV 2024 Paper: Robust Calibration of Large Vision-Language Adapters

About the Speaker

Balamurali Murugesan is currently pursuing his Ph.D. in developing reliable deep learning models. Earlier, he completed his master’s thesis on accelerating MRI reconstruction. He has published 25+ research articles in renowned venues.

Tree-of-Life Meets AI: Knowledge-guided Generative Models for Understanding Species Evolution

A central challenge in biology is understanding how organisms evolve and adapt to their environment, acquiring variations in observable traits across the tree of life. However, measuring these traits is often subjective and labor-intensive, making trait discovery a highly label-scarce problem. With the advent of large-scale biological image repositories and advances in generative modeling, there is now an opportunity to accelerate the discovery of evolutionary traits. This talk focuses on using generative models to visualize evolutionary changes directly from images without relying on trait labels.

ECCV 2024 Paper: Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution

About the Speaker

Mridul Khurana is a PhD student at Virginia Tech and a researcher with the NSF Imageomics Institute. His research focuses on AI4Science, leveraging multimodal generative modeling to drive discoveries across scientific domains.
13 attendees+8
Wed, Nov 20, 2024, 5:30 PM CETNov 20 - Munich AI, ML and Computer Vision Meetup
Macromedia University, Munich
Join us for an evening of networking, talks, food and beverages at Macromedia University!

Date and Time

Nov 20, 2024 from 5:30 PM to 8:30 PM

Location

The Meetup will take place at Macromedia University, Sandstraße 9, Auditorium 608 in Munich

Robust and Efficient Coupling of Perception to Actuation with Metric and Non-Metric Scene Representations

We will present our work on robust and efficient ways, how camera information can be used to control an autonomous car. While many metric approaches require repeated intrinsic (sensor itself) and extrinsic (position of the sensor in the vehicle), our methods allow a direct control of the system solving the control problem directly in the retina (image plane) of the sensor. We will discuss the rich information encoded in the geometric projection of the scene that can be used for accurate navigation that includes uncertainty information necessary for fusion of diverse sensors and a novel way, how the dynamic information can be represented in the scene for planning in dynamic environments.

About the Speaker

Prof. Burschka of the Technical University of Munich conducts research into sensor systems in robotics and human-machine interfaces. Video-based navigation is one of his particular interests. This involves simulation of complex sensor systems through the analysis and fusion of sensor properties of physical sensor units and 3D reconstruction from the fusion of multimodal sensor data.

Understanding Context in the Wild - AI Testing Automated Driving Systems

Detailed contextual understanding is crucial for the testing of Automated Driving Systems (ADS). But, to provide high-quality and safe ADS, unseen events and potential weak points of the system need to be identified in the domain to be mitigated. This talk focuses on automated pipelines to identify the context, make driving data searchable and uncover potential weak points of driving systems automatically in the field.

About the Speaker

Tin Stribor Sohn is a doctoral candidate and tech lead for vehicle data analytics at Porsche AG, dealing with scenario search and failure cause analysis for automated driving. Tin holds an MSc. in computer science and co-founder/CTO of an energy startup.

Strategies Towards Reliable Scene Understanding for Autonomous Driving and Beyond

Scene understanding is crucial for self-driving cars and autonomous agents to reliably perceive their surroundings in diverse, unpredictable conditions. This talk tackles these challenges by presenting a series of research papers that improve reliability through novel methods in 2D and 3D perception, focusing on robustness in extreme scenarios and generalization to unseen environments.

About the Speaker

Stefano Gasperini is the Co-Founder & CEO of Visualais, a Computer Vision startup enabling the creation of 3D renderings from smartphone images. Beyond numerous top-tier research papers and outstanding reviewer awards from his PhD at TUM, as PostDoc, Stefano keeps contributing to Computer Vision and AI by advising a team of PhD students at TUM.

How to Unlock More Value from Self-Driving Datasets

AV/ADAS is one of the most advanced fields in Visual AI. However, getting your hands on a high quality dataset can be tough, let alone working with them to get a model to production. In this talk, I will show you the leading methods and tools to help visualize as well take these datasets to the next level. I will demonstrate how to clean and curate AV datasets as well as perform state of the art augmentations using diffusion models to create synthetic data that can empower the self driving car models of the future,

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.
117 attendees+112
Thu, Nov 21, 2024, 4:00 PM CETNov 21 - Munich Visual AI Workshop
Macromedia University, München
Join us for an evening of FREE hands-on workshops that will explore the intersection of video, autonomous vehicles and computer vision. In the workshops you will also become familiar with the open source FiftyOne computer vision library and its data curation and model evaluation capabilities.

Bring your laptop. Food and beverages will be available.

Register to reserve your spot.

Date and Time

Nov 21, 2024 from 4:00 to 7:00 PM

Location

The workshop will take place at Macromedia University, Sandstraße 9, Room 310 in Munich.

Workshop: Mastering Video Analytics with Voxel51: A Hands-on Workshop with Real-World Customer Applications

In this hands-on workshop, participants will explore the full potential of Voxel51 for video analytics and computer vision applications. Starting with an introduction to Voxel51’s core capabilities, we’ll dive into real-time video understanding, data management, and annotation tools.

Through a series of guided exercises, you’ll learn how to integrate Voxel51 into your video processing pipelines, enhance AI models, and streamline video data workflows. This workshop goes beyond theory by showcasing real-world customer applications, giving participants practical insights into how Voxel51 is currently solving complex video analysis challenges in diverse industries.

Whether you’re a data scientist, AI engineer, or video technology professional, this workshop will equip you with the skills and knowledge to apply Voxel51 in both experimental and production settings.

About the Speaker

Dr. René Brunner consults companies with Python and has been active in programming and Data Science for 20 years. He has published numerous papers in journals and given talks at internationally renowned conferences in the field of Big Data Science. He is a professor at the Macromedia University of Applied Sciences in Munich.

Workshop: Building the Visual AI Datasets of Tomorrow: A Hands-on Workshop with Autonomous Vehicle Data

Visual AI Datasets can come in many shapes and sizes. In this workshop you will learn how to leverage FiftyOne, a leading Open Source Visual AI library to build your Visual AI dataset of tomorrow. We will cover how to load, visualize and manage your labels and annotations on your images, videos, and 3D samples. At the end, we will do a live walkthrough of working with a Autonomous Vehicle dataset and FiftyOne.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.
30 attendees+25
Thu, Nov 21, 2024, 5:00 PM UTCECCV Redux: Day 3 - Nov 21
Link visible for attendees
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events.

Register for the Zoom

Closing the Gap Between Satellite and Street-View Imagery Using Generative Models

With the growing availability of satellite imagery (e.g., Google Earth), nearly every part of the world can be mapped, though street-view images remain limited. Creating street views from satellite data is crucial for applications like virtual model generation, media content enhancement, 3D gaming, and simulations. This task, known as satellite-to-ground cross-view synthesis, is tackled by our geometry-aware framework, which maintains geometric precision and relative geographical positioning using satellite information.

ECCV 2024 Paper

Geospecific View Generation — Geometry-Context Aware High-resolution Ground View Inference from Satellite Views

About the Speaker

Ningli Xu is a Ph.D. student at The Ohio State University, specializing in generative AI and computer vision, with a focus on addressing image and video generation challenges in the geospatial domain.

High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians

In just over a year, 3D Gaussian Splatting (3DGS) has made waves in computer vision for its remarkable speed, simplicity, and visual quality. Yet, even scenes of a single room can exceed a gigabyte in size, making it difficult to scale up to larger environments, like city blocks. In this talk, we’ll explore compression techniques to reduce the 3DGS memory footprint. We’ll dive deeply into our novel approach, Self-Organizing Gaussians, which proposes to map splatting attributes into a 2D grid, using a high-performance parallel linear assignment sorting developed to reorganize the splats on the fly. This grid assignment allows us to leverage traditional 2D image compression techniques like JPEG to efficiently store 3D data. Our method is quick and easy to decompress and provides a surprisingly competitive compression ratio. The drastically reduced memory requirements make this method perfect for efficiently streaming 3D scenes at large scales, which is especially useful for AR, VR and gaming applications.

ECCV 2024 Paper

Compact 3D Scene Representation via Self-Organizing Gaussian Grids

About the Speaker

Wieland Morgenstern is a Research Associate at the Computer Vision & Graphics group at Fraunhofer HHI and is pursuing a PhD at Humboldt University Berlin. His research focuses on representing 3D scenes and virtual humans.

Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures

We present Skeleton Recall Loss, a novel loss function for topologically accurate and efficient segmentation of thin, tubular structures, such as roads, nerves, or vessels. By circumventing expensive GPU-based operations, we reduce computational overheads by up to 90% compared to the current state-of-the-art, while achieving overall superior performance in segmentation accuracy and connectivity preservation. Additionally, it is the first multi-class capable loss function for thin structure segmentation.

ECCV 2024 Paper

Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures

About the Speakers

Maximilian Rokuss holds a M.Sc. in Physics from Heidelberg University, now PhD Student in Medical Image Computing at German Cancer Research Center (DKFZ) and Heidelberg University

Yannick Kirchoff holds a M.Sc. in Physics from Heidelberg University, now PhD Student in Medical Image Computing at German Cancer Research Center (DKFZ) and Helmholtz Information and Data Science School for Health
6 attendees+1