Over the last few days, I spent a bit of time programming. My initial goal was to see if I could write a program which could capture images from my webcam. I run Ubuntu, so the install process of all the packages I needed was exciting!
I initially got the configuration for one of the packages wrong, and due to many things, when my test program wouldn’t work, I went and asked a question of a friend, in the stupidest way possible. In the end, I just reinstalled pretty much everything and it worked because I didn’t get the configuration wrong a second time, so I guess there wasn’t too much harm done, just a slightly frustrated friend.
So my weapon of choice for tackling this project was OpenCV, along with the pile of dependencies it has. There are some nice tutorials on using OpenCV, which can be found here. It gave me a way to quickly start and see where I was going, it gave a bit of background maths, and it even linked to a paper which I’ve enjoyed reading.
The first program I wrote was to simply display a stream of the webcam. It allowed me to find the error in the configuration, and verify that it was fixed after I reinstalled everything. There wasn’t much to it, the program is essentially supplied in the documentation of OpenCV.
Next I figured out how to save a frame of the stream to an image file. This went fairly smoothly with only a few lines of code to achieve. So far everything was going well but I didn’t really have any idea what I was going to do with my new found knowledge of OpenCV.
I decided to see if I could detect a person in the frame. To do this, I took a frame of the stream without a person in it. I saved that and called it my “base” picture. I was doing comparisons of the base and the frames in the stream, and then if they differed significantly I would declare a person is in the frame. The comparison I was using is what is known as a SSIM index, which is a method of testing the structural similarity of an image after compression. Obviously what I’m doing is completely different to this, but so far I’ve had promising results using this method. I will look at different methods at a later date if I continue working on this project.
I was able to get a true positive on about 95% of all the frames that had a person in them, but I was getting a lot of false positives as well, somewhere in the order of about 30%, which is unacceptably high for me. The SSIM index, is a number between 0 and 1, with 1 being a perfect match. I decided that 0.73 was a good SSIM index to be the boundary, after calculating the SSIM index of a few images against my base, some with the light in the room off, some with the light on, some with a person, some without a person, and combining these. 0.73 seemed to be the appropriate mark to set. Unfortunately my results say otherwise.
To rectify this, I wrote another program. The purpose of it was to capture an frame of the stream every 20 seconds for an undetermined amount of time. I let it capture 4320 frames, or exactly 24hrs worth of images. These were all written to disk and are now able to be analysed to find a more accurate SSIM index. I have written a program to do a comparison with my base and these images (it’s the same one I used earlier, I just have a larger sample size now), and hopefully it will give me a much better SSIM index, which will give me the same amount of true positives, but far less false positives. I am yet to analyse these images though as I’ve only just finished the capturing process.
I also wrote another program, while the capturing was taking place. It takes all the images that have been captured, and creates a video out of it. It gives every image three consecutive frames in the video, and the video runs at 24fps. It compiled the video much quicker than I anticipated it would, taking only about a minute and sixteen seconds to compose the 12960 frames together. The video has a resolution of , and I’m tempted to make another video with a higher resolution. I’m reluctant to show anyone the video though because it does show me, and I’m not entirely sure I want people to see the expressions that come across my face over the course of 24hrs. There is also a small bit of the video which is entirely black where I was asleep (the webcam being attached to my computer, it is in my room with me and hence there is not enough light to capture an image while I sleep).
It is a really odd sensation watching how you look in certain situations mere hours after you were in that situation and it is still fresh in your mind. For example, I was working on some maths last night while the capturing was taking place, there are some very odd looks that come over my face, my head is often in my hands while I think and over all, I think I look pretty silly. Then I’m texting in other frames, and I’m laughing at my phone, or thinking about what to respond with and have an odd expression, again I look quite silly. From the whole video, I have learnt that I pull a lot of strange faces throughout the day. It also proves that I have done some work through my break from uni, albeit, it makes it look like doing the work was painful.
A few ideas I’ve had that relate to this include, putting the capturing program onto a Raspberry Pi and sitting it outside to capture the sun at the same time each day over the course several months, doing more 24hr, 12960 frame videos in different locations, again, it’d be possible to use a Raspberry Pi hopefully. I’d also like to do more on detecting a person, I skimmed a blog post on using OpenCV to detect faces, and eyes, so that may be a possibility, I may also be able to capture every frame for about 300 seconds and then calculate the SSIM index over sequential frames to see if I can detect motion, instead of just people, a person being a large object, it changes the SSIM index fairly significantly, but if a tennis ball was introduced, a much smaller change in the SSIM index will be seen. I will continue to fiddle, and if I get any exciting results, I’ll probably write a blog post about them!