Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate other architectures #4

Open
knkski opened this issue Oct 15, 2017 · 3 comments
Open

Investigate other architectures #4

knkski opened this issue Oct 15, 2017 · 3 comments

Comments

@knkski
Copy link
Owner

knkski commented Oct 15, 2017

Right now we're using a variant of VGGNet, which is giving decent results. However, we should investigate alternatives such as AlexNet. We should also investigate how well an actual version of VGGNet works, although this is blocked by #1, due to GPU memory usage.

@ykq1004
Copy link
Collaborator

ykq1004 commented Oct 21, 2017

https://cambridgespark.com/content/tutorials/neural-networks-tuning-techniques/index.html

This post mentioned some about what were talking about last time, using He_nomral kernel initializer with relu activation, data augmentation, .
Their model training on MNIST (at end of the post) achieved 99.47% accuracy on the testing data.
Maybe something we could try?

@ykq1004
Copy link
Collaborator

ykq1004 commented Nov 4, 2017

Found a architecture, SimpleNet:
https://github.com/Coderx7/SimpleNet

Their benchmarks show that it preform pretty well, even better than many complex architectures across different image recognition dataset (including MNIST), while it uses fewer parameters.

The corresponding paper, https://arxiv.org/pdf/1608.06037.pdf, introduces their design in detail, also including some tips for fine tuning CNN. Good to read if you guys are interested.

Some interesting things stand out to me:

  1. Comparing to what we have now, their CNN is still quite large, including 13 layers + classification layer. We may need some work around it to reduce it a bit.... or spend more time on training it....
  2. they do zeropading (1,1) to each convolutional layer, which I don't quite understand yet.
  3. they use a kernel of (1,1) instead of (3,3) in the 11th, 12th layers. While the (3,3) kernel helps preserve local correlation, the (1,1) kernel will be good for detecting the detail, so they implement near the end of the CNN.
  4. They do batch normalization before activation (relu in their), which I think we could corporate in the future even if we decide not to use their architecture.

Since they only offer a Caffe version, I "translate" it into Keras:
https://github.com/knkski/atai/blob/master/train_SimpleNet.py
However, I have not tested yet, if any of you guys are able to run it (also debugging...) would be great! Or we could just pick some pieces and transplant into our model.

Thank you,
Yekun

@knkski
Copy link
Owner Author

knkski commented Nov 6, 2017

I can answer the zero padding question. Basically, each layer downsamples the image (particularly maxpooling). Since we don't have very large input images, The images can quickly get downsampled to a 0x0 pixel image, which isn't useful. zero padding helps prevent that.

Unfortunately, it looks like a naive implementation of simplenet doesn't perform as well as vggnet:

image

image

It's not far off, though. I'll see if I can tweak the parameters and make it perform any better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants