Building a Streamlit app to identify Indian Food — Part III (Fine Tuning the EfficientNet-V2 network to detect Indian Food)

5 min readNov 12, 2022

In the previous post, we took a brief look at the EfficientNet-V2 architecture and presented a way to import it into our environment. Now, we will import the data and get on with training it with our network.

Function to Import Model Object

Before we go ahead with the training of the model, we will create a function that returns the model object.

The above code snippet is saved in a file called

model.py

In the 25th and 26th lines of code, we set all but the weights in the last three layers as trainable parameters(“unfreezing the last three layers”). This brings down the number of trainable parameters from 5,967,589 to 357,205 which is a ~94% drop in the number of parameters that we need to train! This reduces the amount of time taken to train the neural network thereby reducing the computational effort drastically.

Code used for training

train.py

The above code snippet is used for training the network. The code explanation is as follows

In line 12, the EfficientNetV2 model with the last three layers unfrozen is called through the model_def() function in the model.py file.
From lines 19 to 22, we set the preprocessing step that the images have to go through before being fed into the neural network. The pixel values are re-scaled to the range [0,1] and re-sized to have a height and width of 224 pixels. We will be using 10% of the images for validation.
From lines 25 to 27, we create the data set, keeping in mind that 90% of it would be used for training, while the rest would be used for validation.
Given that the neural network has the ability to work with numerical values, the 85 classes are encoded as numbers from 0 to 84(label encoding). Each number corresponds to one class. This class-to-number mapping is stored in a JSON file called class_mappings.json.(From lines 30 to 34)
To train the model on our data, we use the Adam Optimizer which uses an adaptive estimation of the first-order and second-order moments of gradients. This helps in achieving an optimized solution quickly. After training the neural network, we would want to save our newly acquired model weights to be re-used in the future. We use a call-back function that is provided by Keras to do this. This call-back function is then invoked in the model.train() function. The model weights are stored in training_1/cp.ckpt. The above-mentioned steps are carried out by the code from lines 49 to 60.
Finally, we train the model for 25 epochs. An epoch happens when the neural network passes through the entire training data once. (Line 62)

Training Process

Before we go on with the training process, I want to take some time to revisit the model.py file we have created to initialize our EfficientNet-V2 model.

model.py

In the code above, we have “unfrozen” the last three layers. This can be changed by changing line 25.

Training the neural network by unfreezing layers iteratively

Step 1: To train the network, we first unfreeze the last layer and run the training process for 25 epochs.

Step 2: We then unfreeze the second last layer and the last layer and we load the weights we have achieved in Step 1.

Step 3: With the new weights of the network and the last two layers unfrozen, train the network again for 25 epochs.

Step 4: Load the weights achieved in Step 3 to the network and unfreeze the last three layers.

Step 5: Train the network again for 25 epochs and save the weights.

The training accuracy is at ~91% and the validation accuracy is at ~87.5%. We can increase this by unfreezing more layers.

How does the classifier perform across the 85 classes?

To do this we take a batch of 25000 images belonging to 85 different classes. The top ten image classes by classification accuracy are as follows.

Top 10 food image classes by classification results

While the bottom ten image classes by classification accuracy are as follows.

Lowest 10 food items by classification results

It looks like food items that are similar in color tend to be classified wrong more often.

For example, the food items double ka meetha, idli, pitha are similar in color, although they are completely different food items

Next Steps

In the next article in this series, we will use the streamlit library to build out a simple application that will allow users to classify their pictures into one of 85 categories.

The next article can be found here.