I did it! I passed! And I'm not officially qualified as a candidate to Georgia Tech! I have 12 weeks left in my term so the following is just my reddit post. I'll come back to add other thoughts and details when I have more time to myself.
WARNING: This is long. It was
mostly written for those who learned how to program by completing Software I/II
at WGU and need a little perspective by the time they reach this point.
Remember, I just stared programming a few months ago so if you have a lot of
experience, don't expect to be enlightened by my post.
This project tests your ability
to take a very large set of constraints and requirements (some of which make
very little practical sense), implement them into a design, and code that
design to work within the scope of the project. I think I'm finally feeling
confident enough to start a portfolio and would if my end of term wasn't
approaching so fast. If you are a beginner, I do recommend taking Software II
first to get more experience with OOP and putting a multi-component program
together. I took this first because it was the last one I needed to qualify for
Georgia Tech and I wanted to make sure I got it in before the end of my term
but plan could have easily not worked out.
There are some students who built
the project way beyond what I considered the scope. There's a lot of
interpretation involved, but keep in mind the business aspect and remember they
are a small company that delivers 40 packages per day and only have plans to
expand in the future. There are things they need now and things they will need
for expansion. Are they really going to have the money to invest in extra bells
and whistles to accommodate expanding additional cities and whatnot? For a lot
of things, it only needs to be designed to allow for future updates (especially
for planned expansion), and these are points you'll argue in your paper.
This is the heart of BSCompSci
for all CompSci programs. It's a matter of setting priorities and being
organized: both in completing the project in general and in executing your
solution of the problem and constraints. Break down the problem into some general
segments, then break that down further into sub-segments and various tasks
(similar to the check list in Software I).
If you are just starting to work
with Python, you're going to want to read
- Think Python by Allen B. Downey - http://greenteapress.com/thinkpython/thinkpython.pdf
- Think Complexity by Allen B. Downey is where Think Python leaves off - http://greenteapress.com/complexity2/thinkcomplexity2.pdf (I only read the chapter on graphs, WGU doesn't really touch on the rest)
- A great list and summary of algorithm options: http://160592857366.free.fr/joe/ebooks/ShareData/Heuristics%20for%20the%20Traveling%20Salesman%20Problem%20By%20Christian%20Nillson.pdf
- I also recommend Barbara Hecker's Data Structures lectures. I watched these when I was putting the paper together and it helped articulate some of the ideas I had trouble putting into words after spending so many hours writing in code, lol. The code samples are in Java which you should know by now, but you can skip them if it's throwing you off. If I had to do it over again I would have watched this while Java was still fresh and then transitioned into Python. But it's all good, she does a great job at conceptualizing complex concepts. -https://www.youtube.com/watch?v=th00zpkPM9Q&list=PLC617CBC8385356FF
Below, I flesh out how I broke
down my project to demonstrate how it can be done. Some of my pieces might not
make sense depending on how you perceive the problem and how you decide to
design your program (the design itself is pretty open-ended which is part of
the challenge). I read the scenario as a data problem and reduced it to data
that can be stored, retrieved, sorted, calculated, and printed.
That said, if you follow what I
did too closely you're going to have a bad time because some of my decisions
came about because of how I interpreted the problem, how I coded my project,
and the odd twists that lead me to the next part. Remember, I'm still a
beginner programmer so there is likely a better way to get it done. I just
wanted to write this out for those of you who can't figure out where to start
... break it down and simplify. Remember, your project is going to be built
differently so if you use too much of this as a guide, you're going to get
stuck as it may not align with your own thought process, flow, or coding
techniques. That being said .... here is my breakdown:
Data Structure
Stuff:
- Build hash table. Zybooks was really good for this part. Pick the one you like, keep in mind you'll need to justify your choice in the paper.
- Get csv data into the hash table
- Format distance data to suit the data structure you want. You have a few options. A lot of students build some kind of graph. https://www.youtube.com/watch?v=th00zpkPM9Q&list=PLC617CBC8385356FF
- Build data structure for packages and distances (the rubric tends to refer to both as package data so it's probably safe to write about both when asked though it probably wouldn't hurt if you picked one and talked about that)
Note: while Python implements
dictionaries as hash tables, our assignment is to build a hash table from
scratch for the package stuff. I used DictReader for the route stuff.
UI parts:
- Look up package data
- Insert new package/data
- Print all package data at a given time
- Print route with mileage data
Algorithm:
There was no breaking anything
down for this one except picking one and implementing it. This was one of the
times I just had to pull samples apart and put them back together until I could
get one running. I found nearest neighbor easiest to understand and model.
Dijkstra is really popular because there is an example in the book that makes
it easier to implement and I know one person who implemented brute force.
I manually separated the packages
into 3 sets (with the help of excel because I'm an excel junkie ... I switched
back and forth between sorting by address, delivery constraints, and truck sort
to decide what package was going on what route). After I had 3 lists of
packages, I ran it through nearest neighbor. Most students opt to program this
part into the core algorithm.
... I'm not sure how common it
is, but the path determined by my presort and algorithm ended up meeting all
requirements (about one of the only things that worked out unexpectedly well!).
I saved the things I originally planned to do in terms of optimization for my
'what I'd do if I did it over again' part of the paper. I reasoned that since
the company was using csv files, management (or whomever they use to add the
data into Excel) can do the initial sort in Excel. I plan to go back to this
project after graduation to create a sorting program to replicate what I did in
Excel.
Because I worked out the
performance of each route did in Excel (delivered time, etc.), I was able to
take that experience and translate it into python. The final bits were creating
truck objects, having them load the packages and run the algorithm for deliveries.
I used those actions to determine the status of packages, start, and end times.
This was my outline for figuring out those parts:
Other algorithm
tips (not from me):
This info came from a student
in chatter who ended up with 76.1 miles. If your presort doesn't match muster
or if you think it's too much of a hassle to sort by hand/excel, this was HIS
general method: "First part removes all non-deliverable packages. Second
part loads all urgent packages. Third part randomly loads packages in several
dozen sets. Fourth part recursively checks each set from the third part for
lowest mileage, route path, and route weights. Fifth part takes the best data
from the fourth part and uses that to cook the route/weights into the truck
class. Finally, truck leaves the hub and begins deliveries. Optimization for
each part includes loading addresses that share the same address as packages
loaded onto truck, unloading non-urgent packages in truck if it exceeds
capacity, and throwing out routes if any packages do not get delivered on
time."
States, Actions,
Limitations:
- Truck Limit - 16 packages per load
- Package status is either AT_HUB, IN_TRANSIT, or DELIVERED
- Truck
- Time:
- Note: I'm pretty sure if truck 2 has 16 packages and leaves after 9: 05 AM, it will not return until after 10:20
- Every package with a deadline needs to be delivered on time! It helps to have a list for the paper.
For the time component, I kept
mapping out what the program was supposed to do and what I needed to get the
things that would indicate those things. I started with travel distance and
converted that to time. In Excel, I found the difference between the start and
end time of each route. From there I was able to figure out how to iterate the
time and mileage values for each individual package and during the route (need
a cumulative sum to find a package's place in the route). This seems to be a
simpler implementation than others who have passed the class (the program acted
more like an observer in which events 'happen' linearly as in the 'other
algorithm tips' above). I'll chalk off my interpretation to my beginner status.
The jury is still out on whether my method is scalable for calculating delivery
times and setting the status.
Again, try not to tackle more
than one issue or component at a time. If you're getting overwhelmed, you're
probably trying to do too much at one time. Take a step back and figure out
what you need first, then limit yourself to working with that. Break it down
until the only parts left are simple.
The Writing
Part:
The writing component is
incredibly ambiguous; you need to combine the instructions with the rubric
guidelines to make sense of it. I pasted the instructions into a document and
used the various sections as topic headers. Then under each header, I pasted
the rubric stuff (in italics). I highlighted them a really ugly color so when I
was working on it, I could easily tell if I were done with the section or not
(removed highlighting when done).
Some of the sections are
basically asking for the same thing; it's ok to copy something from one section
and paste it into another. For example, there are two places that ask you to
verify that each package was delivered on time. Yes, they want it again; no,
you don't have to reword all the iterations of the same question. Some sections
ask for things that aren't included in the scope of the project; for those I
said: the current software requirements don't include ___. For example, there
was one section that mentioned hosting of the environment. For brownies, I
followed up with: implementation of that feature would require ___. It was
helpful asking about sections on Slack and discussing how everyone else
interpreted the question and how they answered it. There were variations and
all were accepted.
Anyway, I was nervous about the
complexity analysis because I wasn't sure how well I understood it (I would not
be surprised if I got some of it wrong). I took my best guess for each function
in each script (in a note), then created tables for each script in my paper
where I listed the line #, space complexity, and time complexity. At the bottom
of the table I added each line, then put it in Big O notation. Here's a short
example:
buildhash.py
Line #
|
Space Complexity
|
Time Complexity
|
11
|
O(N)
|
O(N)
|
20
|
O(1)
|
O(1)
|
38
|
O(N)
|
O(N)
|
52
|
O(1)
|
O(1)
|
Total:
|
2N
+ 2 = O(N)
|
2N
+ 2 = O(N)
|
Once I did that for each script,
I replaced line # with Script Name and listed the results. Then I added to get
the total N's and converted it back to Big O notation ... I didn't go into that
much detail for the core algorithm overview. For that one I just put the
overall Time and Space complexity in a section of my overview which is in the
sample the mentors give us. I think they would notice if you had a sort
function and the complexity didn't include a log function and things of that
sort so I wouldn't attempt to put in random values. Do your best.
Anyway, I'm down to 3 courses:
Software II, Introduction to Artificial Intelligence, and Capstone. Woot!
No comments:
Post a Comment