CSCI316 (SIM) 202 2 Session 1 – Individual Assignment 2 CSCI316 – Big Data Mining Techniques and Implementation Individual Assignment 2 202 2 Session 1 (SIM) 15 Marks Deadline : Refer to the submission link on Moodle Two (2) tasks are included in this assignment. The specification of each task starts in a separate page. You must implement and run all your Python code in Jupyter Notebook. The deliverables include one Jupyter Notebook source file (with .ipybn extension) and one PDF document for each task. Note: To generate a PDF file for a notebook source file, you can either (i) use the Web browser’s PDF printing function, or (ii) click “File” on top of the notebook, choose “Download as” and then “PDF via LaTex”. All results of your implementation must be reproducible from your submitted Jupyter notebook source files. In addition, the submission must include all execution outputs as well as clear explanation of your implementation algorithms (e.g., in the Markdown format or as co mments in your Python codes). Submission must be done online by using the submission link associated with assignment 1 for this subject on MOODLE. The size limit for all submitted materials is 20MB. DO NOT submit a zip file. Submissions made after the du e time will be assessed as late submissions. Late submissions are counted in full day increments (i.e. 1 minute late counts as a 1 day late submission). There is a 25% penalty for each day after the due date including weekends. The submission site closes four days after the due date. No submission will be accepted after the submission site has closed. This is an individual assignment . Plagiarism of any part of the assignment will result in having 0 mark for the assignment and for all students involved. Marking guidelines Code: Your Python code will be assessed. The computers in the lab define the standard environment for code development and code execution. Note that the correctness, completeness, efficiency, and results of your executed code will be assessed. Thus, code t hat produces no useful outputs will receive zero marks. This also means that code that does not run on a computer in the lab would be awarded zero marks or code where none of the core functions produce correct results would be awarded zero marks. Present ation and explanation: The correctness, completeness and clearness of your answers will be assessed. CSCI316 (SIM) 202 2 Session 1 – Individual Assignment 2 Task 1 (7.5 marks) Data set : The Abalone Data Set (Source: https://archive.ics.uci.edu/ml/datasets/abalone ) Data set information These data consisted of 4 ,177 observations of 9 attributes , detailed as follows. Name / Data Type / Measurement Unit / Description —————————– Sex / nominal / — / M, F, and I (infant) Length / continuous / mm / Longest shell measurement Diameter / continuous / mm / perpendicular to length Height / continuous / mm / with meat in shell Whole weight / continuous / grams / whole abalone Shucked weight / continuous / grams / weight of meat Viscera weight / continuous / grams / gut weight (after bleeding) Shell weight / continuous / grams / after being dried Rings / integer / — / +1.5 gives the age in years Objective Implement a Naïve Bayesian classifier to predict the age of abalone in Python from scratch. Task requirements (1) Randomly separate the data into two subsets: ~70% for training and ~30% for test . (2) The Naïve Bayesian classifier must implements techniques to overcome the numerical underflows and zero counts . (3) No ML library can be used in this task. The implementation must be developed from scratch.
However, scientific computing libraries such as NumPy and SciP y are allowed. Deliverables • A Jupiter Notebook source file named _task 1.ipybn which contains your implementation source code in Python • A PDF document named _task 1.pdf which is generated from your Jupiter Notebook source file . CSCI316 (SIM) 202 2 Session 1 – Individual Assignment 2 Task 2 (7.5 marks) Data set : MAGIC Gamma Telescope Dataset (Source: https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope) Data set information The data are Monte -Carlo generated to simulate registration of high energy gamma particles in a ground – based atmospheric Cherenkov gamma telescope using the imaging technique. The dataset contains 19,020 records. Attribute information: 1. fLength: continuous # major axis of ellipse [mm] 2. fWidth: cont inuous # minor axis of ellipse [mm] 3. fSize: continuous # 10 -log of sum of content of all pixels [in #phot] 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio] 5. fConc1: continuous # ratio of highest pixel over fSize [ratio] 6. f Asym: continuous # distance from highest pixel to center, projected onto major axis [mm] 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm] 9. fAlpha: continuous # angle of major axis with vector to origin [deg] 10. fDist: continuous # distance from origin to center of ellipse [mm] 11. class: g,h # gamma (signal), hadron (background) g = gamma (signal): 12332 h = hadron (background): 6688 Objective Develop an Artificial Neural Network (ANN) in TensorFlow/ Keras to predict the signal class . Requirements (1) Randomly separate the data into two subsets: ~70% for training and ~30% for test. (2) The training process includes a hyperparameter fine -tunning step. Define a grid including at least three hyperparameters: (a) the number of hidden layers, (b) the number of neurons in each layer, and (c) the regularization parameter s for L1 and L2. Each hyperparameter has at least two candidate values. All other parameters (e.g., activation functions and learning rates) are up to you. (Note. You can use Scikit -Learn for hyperparameter tuning , i.e., by using a Keras wrapper .) (3) Report the learning curve and test accuracy. Deliverables • A Jupiter Notebook source file named _task2.ipybn which contains your implementation source code in Python • A PDF document named _task2.pdf which is generated from your Jupiter Notebook source file
WE HAVE DONE THIS QUESTION BEFORE, WE CAN ALSO DO IT FOR YOU
GET SOLUTION FOR THIS ASSIGNMENT, Get Impressive Scores in Your Class
CLICK HERE TO MAKE YOUR ORDER on Big Data Mining Techniques and Implementation