What we’re about
The Cleveland Big Data is a group for those interested in the distributed management, processing, of analytics of data and other non-trivial problems in any platform (local & cloud).
Topics range from infrastructure to storage frameworks to processing frameworks in a variety of domains. The meetup has a mix of classic user-group presentations (e.g., "I tried this out and here is what happened") as well as informative technically-focused vendor sessions.
For a list of some of the recent meetup recordings, see:
https://www.linkedin.com/pulse/cleveland-big-data-meetup-recordings-doug-meil
If you have an interest in presenting, or have a topic you would like to see presented, contact Doug.
For the history of the meetup read this:
http://themeildeal.blogspot.com/2016/08/how-to-create-successful-technical.html
The group was initially formed as the Cleveland Hadoop User Group in 2010, but the name changed roughly in 2013-2014.
Upcoming events (2)
See all- Cleveland Big Data Meetup4910 Tiedeman Rd, Brooklyn, OH
Hybrid info: https://keybank.zoom.us/webinar/register/WN_AAyZCVrCQ0mM0PqpCR9Pfw
(however, no pizza is available for the hybrid modality)** For those attending in person please try to get to Key by 4:45pm because there is a security check-in **
5:00pm - pizza and networking
5:30pm - Presentation
Paco Nathan! Paco is an O'Reilly author on AI and Machine Learning.
"Catching Bad Guys using open data, open models in AI: a tour through anti-fraud use cases with graphs and entity resolution"
GraphRAG is a popular way to use knowledge graphs to ground AI apps in facts. Most GraphRAG tutorials use LLMs to build graph automatically from unstructured data. However, what if you're working on use cases such as investigative journalism and sanctions compliance -- "catching bad guys" -- where transparency for decisions and evidence are required?
This talk explores how to leverage open data and open models for AI apps -- using entity resolution to build investigative graphs which are accountable, exploring otherwise hidden relations in the data that indicate fraud or corruption. Professionals who work in sanctions compliance, tax fraud, counter-terrorism, etc., -- which our team helps support -- generally don't present a lot in public. However, we can use open data and open source to illustrate where machine learning assists in these kinds of use cases.
For this talk we'll construct an investigative graph about potential money laundering, using ER to merge open data from ICIJ Offshore Leaks, Open Ownership, and OpenSanctions. We'll explore techniques used in production use cases for anti-money laundering (AML), ultimate beneficial owner (UBO), rapid movement of funds (RMF), and other areas of sanctions compliance.
First we'll build a "backbone" for the graph in ways which preserve evidence and allow for audits. Next we'll use spaCy pipelines to parse related news articles, using `GLiNER` to extract entities, then the new `spacy-lancedb-linker` to link them into the graph. Finally, we'll show graph analytics that make use of the results -- tying into what's needed for use cases such as GraphRAG.
This approach uses Python open source libraries, and all of the code is provided on GitHub organized in Jupyter notebooks. For each NLP task we use state-of-the-art open models (mostly not LLMs) emphasizing how to tune for a domain context: named entity recognition, relation extraction, textgraph, entity linking, as well as entity resolution to merge structured data and produce a semantic overlay that organizes the graph.