14- Data Mining / DBSCAN and Spectral Clustering
Institution: Pontifical Catholic University of São Paulo (PUC-SP)
School: Faculty of Interdisciplinary Studies
Program: Humanistic AI and Data Science
Semester: 2nd Semester 2025
Professor: Professor Doctor in Mathematics Daniel Rodrigues da Silva
🎶 Prelude Suite no.1 (J. S. Bach) - Sound Design Remix
Statistical.Measures.and.Banking.Sector.Analysis.at.Bovespa.mp4
📺 For better resolution, watch the video on YouTube.
Tip
This repository is a review of the Statistics course from the undergraduate program Humanities, AI and Data Science at PUC-SP.
☞ Access Data Mining Main Repository
Important
- Projects and deliverables may be made publicly available whenever possible.
- The course emphasizes practical, hands-on experience with real datasets to simulate professional consulting scenarios in the fields of Data Analysis and Data Mining for partner organizations and institutions affiliated with the university.
- All activities comply with the academic and ethical guidelines of PUC-SP.
- Any content not authorized for public disclosure will remain confidential and securely stored in private repositories.
Welcome to your repository guide for DataMining DBSCAN_and_Spectral Clustering. This Repo is written so anyone even kids, can understand the two powerful clustering algorithms: DBSCAN and Spectral Clustering.
- What is Clustering?
- DBSCAN Algorithm
- Spectral Clustering
- Applications and When to Use Each
- References
Clustering is a way for computers to group things that are similar—like organizing marbles by color, or animals by species. The computer looks for natural groups in the data, so points in the same group are more like each other than points in other groups. Some points might not fit anywhere; finding them is important too!
DBSCAN stands for "Density-Based Spatial Clustering of Applications with Noise." It helps find groups in data where points are close together, based on how many neighbors each point has.
How DBSCAN Works (Step-by-Step)
-
Draw a circle around it: The size the circle (called epsilon, $ \varepsilon $) says what counts as "close."
-
Count all the neighbors inside the circle.
- If enough neighbors (at least MinPts), this is a core point—start a new group!
- If not enough: Maybe a border point or "noise."
-
Grow the group: For each direct neighbor that is a core point, include their neighbors too—so the group grows!
-
Repeat: Until every point is grouped or marked as noise.
-
Core point: Has lots of friends (enough neighbors within $ \varepsilon $).
-
Border point: Doesn't have enough direct neighbors, but is close to a core point.
-
Noise: Too far from any busy area. Not in a group at all!
1 Abdi, H. & WilliamsC, L.J. Principal Component Analysis. Wiley Interdisciplinary Reviews, 2010.
2. Castro, L. N. & Ferrari, D. G. (2016). Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações. Saraiva.
3. Dunteman, J. Principal Component Analysis. SAGE Publications, 1989.
4. Ferreira, A. C. P. L. et al. (2024). Inteligência Artificial - Uma Abordagem de Aprendizado de Máquina. 2nd Ed. LTC.
5. Larson & Farber (2015). Estatística Aplicada. Pearson.
6. Liu, F.T. et al. Isolation Forest. IEEE ICDM, 2008.
🛸๋ My Contacts Hub
────────────── 🔭⋆ ──────────────
➣➢➤ Back to Top
Copyright 2025 Quantum Software Development. Code released under the MIT License license.