Adventures in B.S. Computer Science: July 2019

Friday, July 26, 2019

Capstone Adventures: Part 8

No. Just no image projects for me. I don't have the time or computer power to figure this out.

Time to attempt a more classic problem that only uses a csv.

Monday, July 22, 2019

I have tensorflow! After wrangling with environment builds and variations of tensorflow and cuda, I finally find a stackoverflow post that says I can just install keras-gpu on anaconda.

Next batch of things to learn:

Use Tensorflow to train my data
Dump the model into a pickle file
Design an application that fits all capstone requirements
Integrate the pickle in the application

I am thankful for Keras-GPU and the guy who mentioned in on StackOverflow. You're my hero!

Sunday, July 21, 2019

Capstone Adventures: Part 6

I think I've decided that I'm going for the Diabetic Retinopathy (leading cause of blindness) project. Getting this to work will involve either pushing my current system to run tensorflow-gpu or getting a new computer. I've been attempting to install it all day, going back and forth between CUDA9, CUDA10, and several variations of other parts of the installation.

After the first few failed attempts I figured that a new computer will likely not help and just realized at 11pm, that Tensorflow absolutely cannot work with Python 3.7. As you can clearly see ... I'm really not that smart ... This experience has been great for my self-esteem. To Python 3.6 and the glimmer of hope that I'm a little closer to figuring this out than I was 12 hours ago.

Friday, July 19, 2019

Capstone Adventures: Part 5

Well, the data reappeared but then I deleted everything the next day after my bill spiked up to over $500. So there goes that idea.

Right now I'm bouncing between another image-based dataset with under 5k images and an e-commerce dataset for customer segmentation or a market basket analysis. I really want to do an image one because it's not something I could have done before BSCS and the other two I could have done after MSDA.

I'm exploring a few tools right now, trying to decide which one is more feasible. If I used a service like Clarifai, it would cut down on the number of things I need to build from scratch. I have another week before I really need to buckle down and get things moving.

Tuesday, July 16, 2019

Capstone Adventures: Part 4

I got through my dataset, had everything uploaded, then started training. My mistake was not knowing how much credit that was using though I supposed that's Microsoft Azure's mistake because it didn't tell me how much it would cost. So I pick the 24 hour option because why not? And I run out of credit which I thought might be a thing but it was the last part I needed before I could start programming it.

I upgrade my account and that is when this project died. It is no longer there. All my data and training is just gone.

Time for a new project.

Sunday, July 14, 2019

Capstone Adventures: Part 3

Well, it looks like this dataset might be too big for Computer Vision. I just realized that it has a 100k upload limit which is less than half my training dataset.

EDIT:

... and my dataset is gone. Apparently I built the last project on my "student" account which just expired. I'm going to continue classifying my files since the remaining data will fill up the 100k limit easily. But I'll wait to hear from my advisor before recreating the project on the "free" account.

I also have another project lined up just in case I end up nixing this idea all together: Facial Expression Recognition. I would need to converting a .bib file into jpeg files to get it to work. But it's under 30k images so even my computer can handle this one.

RAWR!!!

On the upside, I am acquiring useful data munging and automation skills.

Saturday, July 13, 2019

Capstone Adventures: Part 2

It took a few tweaks to get the image search and sort scripts working. The process is finally streamlined and I was able to get through 3 of the 14 batches until I stopped to attempt to send some of those files to Azure.

Of course this didn't work, otherwise I wouldn't have had to add an extra post today. Apparently Azure doesn't like .tif images. After confirming that it prefers some other file format, I installed IrfanView to start batch converting them before upload.

The last hiccough of the day was having to stop that process because my computer was running out of space again. So I'm now uploading the files into Azure so I can make room to convert more files. I think I'll finish uploading the files I processed so far today, then tomorrow I can sort, convert, and upload at the same time (in theory) ... get some sort of assembly line going, lol.

Adventure!

Capstone Adventures: Part 1

This is more like Part 3 but I'll catch you up, lol.

My first choice for capstone project was Recursion Cellular Image Classification to disentangle experimental 'noise' from biological signals. This would help researchers understand how drugs interact with human cells which translates to decreasing the cost of treatments and the time it takes to bring new treatments to market. My computer couldn't handle the dataset. There were over 85GB worth of images and I shouldn't have attempted to download it, lol. It took a day to get the zipped file into my computer and partially unzipped before I ran out of room. Then it took another day to get the files off my computer.

My second choice is currently an image recognition application to identify metastatic tissue in histopathologic scans of lymph node sections. At 6 GB, my computer is currently struggling to do things with the dataset (220,006 image files). Right now it needs to parse a list and separate the 'cancer' images from the 'not cancer' images which were not-so-conveniently itemized in a csv file.

I spent the morning separating those files into 14 different folders in the hopes that my computer finds searching through under 20k files each more reasonable than the full dataset. Of course I'm using automation scripts but my computer is a dinosaur so it's still struggling to get by. But I just need to get the data separated and into Azure. After that I can get these images off my computer and handle everything else from the cloud.

The first file is still running by the time I posted this. Remember I have 14 to get though and once I do all that, I need to upload them into Azure.

I am thankful the cloud exists so I can work with datasets like this! I know I need to update my computer. It's on my list, especially because I REALLY want to attempt the Recursion Cellular Image problem once I have enough time to do it properly.

Monday, July 8, 2019

C951 Introduction to Artificial Intelligence

Both of these tasks ended up being largely an exercise in using buzz words for the paper. It's frustrating because it can easily be turned into something more to include machine learning which is required for capstone. I attempted to do that with Task 1 but it was taking too long to put a proper dataset and base case together. I'll have to attempt it again after graduation.

TASK 1: Start with the chapter in Ucertify called Focus: Chatbots between chapters 11 and 12. Find the exercises and hit the submit button to see examples on how to answer various parts of the rubric. For relevant data, I used O*NET which is what WGU uses for our career assessments. The Bureau of Labor Statistics can help find computer careers if you're not finding 5 distinct ones ... If you follow the rubric, this is a really primitive version of the assessments we get in the career center. My best advice is to play with different chat options. Decide your criteria that can be assessed with questions and answers, then build your conversation paths from there.

TASK 2: Start with the chapter in Ucertify called Focus: Robotics and Feature Engineering between chapters 25 and 26. Again, find the exercises and hit the submit button to see examples on how to answer various parts of the rubric. I started by building the environment to get used to the interface and object manipulation. The first thing I did was create an enclosed space to prevent Rob from falling off the testing area. Then I built a few obstacles and ran the simulation to see what Rob can do and how to improve it. I stuck with adding proximity sensors which involved copying the nose sensor and modifying the sensor name and attributes to suit my needs. Then I modified the main rob script (you'll see the nose sensor stuff in there and should be able to trace the code and use it to add the logic needed to make other sensors work). I ended up with three proximity/collision avoidance sensors that allowed rob to make right and left turns (the default only turns left). I also tweaked the speed and other attributes to force it to cover wider areas instead of getting caught in circular paths or not being able to back out of certain corners. This ended up more interesting than task 1 and I plan to make it more interesting in the future before showcasing it on GitHub.

My Tenure at WGU

I did have my eye on two other degrees at WGU (B.S. Math Ed and M.S. Special Ed) but all things considering, I think this will be my last at WGU. Right now I plan to spend some time taking select advanced math courses at Brandman (through Westcott). From there I will either go for the B.S. Mathematics program at Louisiana State University or the M.S. Computer Science program at Georgia Tech ... I no longer have any interest jumping through WGU hoops to get into the math ed program. Their loss.

Adventures in B.S. Computer Science