ACADEMIC CASE STUDY // 02

Decoding
Academic Pathways.

Framework

Django ORM & REST

NLP Engine

Scikit-learn (NLTK)

Algorithm

TF-IDF + Cosine Similarity

The Data Harvest

METU Portal

UNSTRUCTURED WEB DATA

Parsing Engine

REGEX & BS4 PIPELINE

Django DB

STRUCTURED COURSE ENTITIES

Automated parsing of 5000+ course descriptions, credits, and prerequisites from METU's official curriculum portals.

The Semantic Core

STEP 01

Vector Space Transformation

Academic interests are rarely limited to exact keywords. To handle this, we transform course descriptions into a high-dimensional vector space. Every word becomes a coordinate, and every course becomes a unique vector.

STEP 02

Cosine Similarity Matching

When a student enters a query, we calculate the angle between the query vector and every course vector in our database. A smaller angle indicates higher semantic relevance.

Similarity: 0.942

Result Accuracy

98%

CENG 443

Introduction to HCI

Core principles of human-computer interaction, focused on user-centered design and iterative prototyping.

91%

CP 332

Urban Planning Studio

Designing smart city environments using participatory mapping and spatial data analysis tools.

85%

ID 401

Digital Design Research

Exploring the intersection of human behavior and digital environments through qualitative research.

From Keyword to Meaning

Traditional search engines rely on exact matches—if you search for "Human Interaction," you might miss a course titled "User Experience."

By implementing TF-IDF (Term Frequency-Inverse Document Frequency), we prioritize unique academic terms while dampening common words. This ensures that a student's vague interest is transformed into precise, data-driven recommendations.

                        # Calculating Relevance

                        relevance = dot_product(v_query, v_course) / (norm(v_query) * norm(v_course))

                        results = sorted(courses, key=lambda x: cosine_similarity(query, x), reverse=True)

Decoding Academic Pathways.