Skip to content

Effective handling of large and diverse data sources

Photo of Amanda Bowald
Hosted By
Amanda B. and 2 others
Effective handling of large and diverse data sources

Details

This meetup will feature two talks from the data teams at Definitive Healthcare.

Modernizing our data stack: From home-brewed & closed-off to open-source & accessible

For the last year or so, Definitive Healthcare Sweden has been on a journey to migrate away from our legacy data storage layer, in favor of a completely new setup. The old implementation was proving difficult to maintain and was a re-occurring impediment for developers, and it was not enabling the analytics & future use cases we had in mind.

The new solution is leveraging the great technological advancements that have been made in the space of SQL-like big data on cloud object storage, utilizing Apache Spark and data formats such as Delta Lake & Apache Iceberg.

Please join us to hear about our challenges and wins along the way to get there.

Automated extraction of tabular data from diverse web sources

Definitive Healthcare relies on semi-manual collection of web resources as part of the general data collection approach.

A subset of these resources include board participation for i.e. clinics, medical society boards and journal editorial boards. This subset has either been collected in-house or by third parties due to lack of coverage, poor output data quality and anti scraping measurements by the websites in question. In order to boost the performance for the scraping, extraction and parsing of web content to structured data, we have compared different API providers for these tasks alongside leveraging our in-house organization data.

Listen in to learn about this project's development and how we scrape, parse and enrich data to enable mapping of board participants to medical experts!

Agenda for the evening

17:00 Welcome mingle, food, and drinks
18:00 Presentations
19:00 Mingle
20:00 End of the event

Photo of Machine Learning and Data Science (GBG) group
Machine Learning and Data Science (GBG)
See more events
Definitive Healthcare
Hvitfeldtsplatsen 7 · Göteborg
Google map of the user's next upcoming event's location
FREE
40 spots left