Thursday, June 6, 2019

C950 Data Structures and Algorithms II


I did it! I passed! And I'm not officially qualified as a candidate to Georgia Tech! I have 12 weeks left in my term so the following is just my reddit post. I'll come back to add other thoughts and details when I have more time to myself.  

WARNING: This is long. It was mostly written for those who learned how to program by completing Software I/II at WGU and need a little perspective by the time they reach this point. Remember, I just stared programming a few months ago so if you have a lot of experience, don't expect to be enlightened by my post.

This project tests your ability to take a very large set of constraints and requirements (some of which make very little practical sense), implement them into a design, and code that design to work within the scope of the project. I think I'm finally feeling confident enough to start a portfolio and would if my end of term wasn't approaching so fast. If you are a beginner, I do recommend taking Software II first to get more experience with OOP and putting a multi-component program together. I took this first because it was the last one I needed to qualify for Georgia Tech and I wanted to make sure I got it in before the end of my term but plan could have easily not worked out.

There are some students who built the project way beyond what I considered the scope. There's a lot of interpretation involved, but keep in mind the business aspect and remember they are a small company that delivers 40 packages per day and only have plans to expand in the future. There are things they need now and things they will need for expansion. Are they really going to have the money to invest in extra bells and whistles to accommodate expanding additional cities and whatnot? For a lot of things, it only needs to be designed to allow for future updates (especially for planned expansion), and these are points you'll argue in your paper.

This is the heart of BSCompSci for all CompSci programs. It's a matter of setting priorities and being organized: both in completing the project in general and in executing your solution of the problem and constraints. Break down the problem into some general segments, then break that down further into sub-segments and various tasks (similar to the check list in Software I).

If you are just starting to work with Python, you're going to want to read

Below, I flesh out how I broke down my project to demonstrate how it can be done. Some of my pieces might not make sense depending on how you perceive the problem and how you decide to design your program (the design itself is pretty open-ended which is part of the challenge). I read the scenario as a data problem and reduced it to data that can be stored, retrieved, sorted, calculated, and printed.

That said, if you follow what I did too closely you're going to have a bad time because some of my decisions came about because of how I interpreted the problem, how I coded my project, and the odd twists that lead me to the next part. Remember, I'm still a beginner programmer so there is likely a better way to get it done. I just wanted to write this out for those of you who can't figure out where to start ... break it down and simplify. Remember, your project is going to be built differently so if you use too much of this as a guide, you're going to get stuck as it may not align with your own thought process, flow, or coding techniques. That being said .... here is my breakdown:

Data Structure Stuff:
  1. Build hash table. Zybooks was really good for this part. Pick the one you like, keep in mind you'll need to justify your choice in the paper.
  2. Get csv data into the hash table
  3. Format distance data to suit the data structure you want. You have a few options. A lot of students build some kind of graph. https://www.youtube.com/watch?v=th00zpkPM9Q&list=PLC617CBC8385356FF
  4. Build data structure for packages and distances (the rubric tends to refer to both as package data so it's probably safe to write about both when asked though it probably wouldn't hurt if you picked one and talked about that)
Note: while Python implements dictionaries as hash tables, our assignment is to build a hash table from scratch for the package stuff. I used DictReader for the route stuff.

UI parts:
  1. Look up package data
  2. Insert new package/data
  3. Print all package data at a given time
  4. Print route with mileage data

Algorithm:
There was no breaking anything down for this one except picking one and implementing it. This was one of the times I just had to pull samples apart and put them back together until I could get one running. I found nearest neighbor easiest to understand and model. Dijkstra is really popular because there is an example in the book that makes it easier to implement and I know one person who implemented brute force.

I manually separated the packages into 3 sets (with the help of excel because I'm an excel junkie ... I switched back and forth between sorting by address, delivery constraints, and truck sort to decide what package was going on what route). After I had 3 lists of packages, I ran it through nearest neighbor. Most students opt to program this part into the core algorithm.

... I'm not sure how common it is, but the path determined by my presort and algorithm ended up meeting all requirements (about one of the only things that worked out unexpectedly well!). I saved the things I originally planned to do in terms of optimization for my 'what I'd do if I did it over again' part of the paper. I reasoned that since the company was using csv files, management (or whomever they use to add the data into Excel) can do the initial sort in Excel. I plan to go back to this project after graduation to create a sorting program to replicate what I did in Excel.

Because I worked out the performance of each route did in Excel (delivered time, etc.), I was able to take that experience and translate it into python. The final bits were creating truck objects, having them load the packages and run the algorithm for deliveries. I used those actions to determine the status of packages, start, and end times. This was my outline for figuring out those parts:

Other algorithm tips (not from me):
This info came from a student in chatter who ended up with 76.1 miles. If your presort doesn't match muster or if you think it's too much of a hassle to sort by hand/excel, this was HIS general method: "First part removes all non-deliverable packages. Second part loads all urgent packages. Third part randomly loads packages in several dozen sets. Fourth part recursively checks each set from the third part for lowest mileage, route path, and route weights. Fifth part takes the best data from the fourth part and uses that to cook the route/weights into the truck class. Finally, truck leaves the hub and begins deliveries. Optimization for each part includes loading addresses that share the same address as packages loaded onto truck, unloading non-urgent packages in truck if it exceeds capacity, and throwing out routes if any packages do not get delivered on time."

States, Actions, Limitations:
  1. Truck Limit - 16 packages per load
  2. Package status is either AT_HUB, IN_TRANSIT, or DELIVERED
  3. Truck
  4. Time:
  • Note: I'm pretty sure if truck 2 has 16 packages and leaves after 9: 05 AM, it will not return until after 10:20
  • Every package with a deadline needs to be delivered on time! It helps to have a list for the paper.

For the time component, I kept mapping out what the program was supposed to do and what I needed to get the things that would indicate those things. I started with travel distance and converted that to time. In Excel, I found the difference between the start and end time of each route. From there I was able to figure out how to iterate the time and mileage values for each individual package and during the route (need a cumulative sum to find a package's place in the route). This seems to be a simpler implementation than others who have passed the class (the program acted more like an observer in which events 'happen' linearly as in the 'other algorithm tips' above). I'll chalk off my interpretation to my beginner status. The jury is still out on whether my method is scalable for calculating delivery times and setting the status.

Again, try not to tackle more than one issue or component at a time. If you're getting overwhelmed, you're probably trying to do too much at one time. Take a step back and figure out what you need first, then limit yourself to working with that. Break it down until the only parts left are simple.

The Writing Part:
The writing component is incredibly ambiguous; you need to combine the instructions with the rubric guidelines to make sense of it. I pasted the instructions into a document and used the various sections as topic headers. Then under each header, I pasted the rubric stuff (in italics). I highlighted them a really ugly color so when I was working on it, I could easily tell if I were done with the section or not (removed highlighting when done).

Some of the sections are basically asking for the same thing; it's ok to copy something from one section and paste it into another. For example, there are two places that ask you to verify that each package was delivered on time. Yes, they want it again; no, you don't have to reword all the iterations of the same question. Some sections ask for things that aren't included in the scope of the project; for those I said: the current software requirements don't include ___. For example, there was one section that mentioned hosting of the environment. For brownies, I followed up with: implementation of that feature would require ___. It was helpful asking about sections on Slack and discussing how everyone else interpreted the question and how they answered it. There were variations and all were accepted.

Anyway, I was nervous about the complexity analysis because I wasn't sure how well I understood it (I would not be surprised if I got some of it wrong). I took my best guess for each function in each script (in a note), then created tables for each script in my paper where I listed the line #, space complexity, and time complexity. At the bottom of the table I added each line, then put it in Big O notation. Here's a short example:

buildhash.py
Line #
Space Complexity
Time Complexity
11
O(N)
O(N)
20
O(1)
O(1)
38
O(N)
O(N)
52
O(1)
O(1)
Total:
2N + 2 = O(N)
2N + 2 = O(N)

Once I did that for each script, I replaced line # with Script Name and listed the results. Then I added to get the total N's and converted it back to Big O notation ... I didn't go into that much detail for the core algorithm overview. For that one I just put the overall Time and Space complexity in a section of my overview which is in the sample the mentors give us. I think they would notice if you had a sort function and the complexity didn't include a log function and things of that sort so I wouldn't attempt to put in random values. Do your best.

Anyway, I'm down to 3 courses: Software II, Introduction to Artificial Intelligence, and Capstone. Woot!

No comments:

Post a Comment