YouTube Video Link: https://youtu.be/SwE4mxQxhEI
PPT Presentation: click here
This project is designed for admission counseling departments to analyze and generate customized lists of colleges and branches based on candidate/students' percentage, location, and branch preferences.
We aim to reduce counselor's workload and enhance student satisfaction and decision-making during the admission process, by eliminating the need to go through multiple lengthy PDF files containing college cut-off details.
This project is developed exclusively for admission counseling departments in Maharashtra dealing with student queries related to Admission Cut-off for Engineering Colleges in Maharashtra.
About Technologies we used in this project,
We have utilized the pdfplumber and openpyxl python libraries, combined with regular expressions, to extract cut-off information from the PDF files.
To prepare the data, we employed PowerBI Power Query Editor, which involved tasks such as data cleaning, transformation, and merging. Furthermore, we leveraged PowerBI Reports for visualizing the data.
Data extraction from PDF involves the process of extracting relevant information and data from PDF documents. PDF (Portable Document Format) is a widely used file format for storing and sharing documents. However, extracting data from PDF files can be challenging due to the format's inherent complexity and lack of structured data. Data extraction techniques are employed to automatically identify and extract specific data elements, such as text, tables, or images, from PDF documents. This extraction process often involves using specialized software tools (RPA Tools like UiPath) or programming scripts that can analyze the PDF content, locate the desired data, and convert it into a structured format, such as a spreadsheet or a database. Data extraction from PDFs is particularly useful in scenarios where large amounts of data need to be processed and analyzed. We have utilized the pdfplumber and openpyxl python libraries, combined with regular expressions, to extract information from the PDF files.
Data preparation encompasses the process of extract, transform, and load (ETL). Prior to loading the data for visualization, we performed transformations to ensure it is well-organized, user-friendly, properly formatted, and validated. This approach enhances data quality and safeguards against potential issues like unexpected duplicates, null values, incompatible formats, and incorrect indexing.
Data visualization is the process of visually representing information and facts. It plays a vital role in data analysis by facilitating clear and concise communication of complex data. Visualizing data makes it easier to grasp intricate information. Graphs and charts provide a comprehensible depiction of data, enabling people to better understand and interpret its significance. By utilizing data visualization techniques, we can enhance decision-making by leveraging the insights derived from the data. In our case, we have employed various visualizations to construct a PowerBI Report. To prepare the data, we employed PowerBI Power Query Editor, which involved tasks such as data cleaning, transformation, and merging. Furthermore, we leveraged PowerBI Reports for visualizing the data.
- Power BI Desktop
- Power Query Editor
- Python Libraries,
- pdfplumber
- openpyxl