In this article, we will see how the epidemic evolved over the months and answer the following questions:
- State of the epidemic in Morocco
- How clusters affected each of the thirteen regions in Morocco
- Evaluate the measures and efforts taken by the Moroccan government from the beginning of the confinement to date
Two datasets are used:
- covidma.csv: contains the confirmed, deaths, recovered and exluded cases over time for each region
- covid_coordinates.csv: contains the coordinates (magnitude and longitude) for each region
Next, we do some exploratory data analysis on the first dataset:
- We get a look at the shape of the dataframe (56 rows and 17 columns):
- We run the info() function to get detailed information about each column:
- Then, we check how many NaN values they are in the dataframe:
- Since we have NaN values and lack other cases status that will be useful to our analysis (like the number of tested and active cases), we also create them. Finally, we convert the Date column into datetime format to better manipulate it:
Since the dataframe’s structure above is not appropriate for plotting (with Plotly) because the date should be the index and the regions should be under the same column “Region”, the following steps were followed:
- I I melted (the process of unpivoting a DataFrame from wide to long format) the dataframe
- I added a “Date of Reference” column with the correct datetime data type as a copy of the Date column’s first value
- I created a new “Difference Date” which is the difference between the “Date of Reference” and the “Date ”column, and therefore keeps track of the number of days passed
Using the library Plotly, I plotted the Confirmed, Recovered and Deaths status and added an animation option to the plot:
The number of confirmed cases started very slowly and quickly rose up with the increase of deaths over the past 54 days.
The following plot shows a more detailed view of the three states:
Following the same principle and steps, this time I melted it by taking the Tested, Confirmed and Deaths columns as value variables, and the Date as my identifier variable, and I plotted results with a line plot for better analysis:
Which gives the result:
The Moroccan government made great efforts in increasing the number of tests over time and try to contain the virus as quickly and as much as possible. While the number of tests increases almost exponentially, the number of confirmed cases followed a slow increase.
To give a better and meaningful visualization of the magnitude of the number of tests conducted in Morocco over time, and the number of confirmed and death cases, I melted the dataframe by assigning the list of regions as my value variables, and the different status as identifier variables
Naturally, as the number of tests increases, the number of confirmed cases increases in a proportial fashion.
Now, we will dig deeper into the specific regions of Morocco.
To better understand and figure out the percentage and the total number of cases in each region, I followed these steps:
- Selected the columns of the regions that I wanted from the dataframe by index reference
- Created a new dataframe by grouping the columns, summing up the total cases and taking the maximum value of each regions
- Visualized it using a pie chart
It is also possible to view the exact number of confirmed cases of each region upon hovering over its pie region
To get an idea of how the number of confirmed cases developed in each region, I melted the dataframe by regions as value variables and date as identifier:
I plotted the results using line charts for each region:
The development of the total cumulative cases in each region and the sudden spike of some region like Tanger and Marrakech for example in early April is due to COVID19 clusters found in factories which skyrocketed the number of cases in that region.
To compute and order the total cases in each region, given the dataframe structure that I have, many modifications needed to be performed on it:
- I selected the last Date value of the dataframe and converted it to a date data type
- I created a dictionary that holds all the dataframe data as a key-value pairs of the late date
- I created a list of the columns to remove
- I popped the entries (in the list created) that I wanted to remove from the dictionary
- I sorted the dictionary in an ascending way
- I created a temporary dataframe that will holds the result with the Region column as the keys of the dictionary, and the Total Cases column as the values of the dictionary
The output of the the temporary dataframe:
Finally, I plotted the temporary dataframe using a bar chart:
The last step is making a heat map that shows the severity of the COVID19 in each region
The dataset used in this step is the covidma_coordinates.csv which holds the magnitude and longitude of each region:
In order to showcase and get a better idea of how each region’s COVID19 state, I used Folium module.
The following steps were followed:
- I create a time range of the last 15 days
- I selected the corresponding date rows of the time range that I just created
- I created a new dataframe using the latitude, longitude from the dataset, and region and number of total cases that I just computed in the pre-set time range
- Next, I followed the folium documentation in order to build a map with the different parameters (radius, color, columns to take into consideration etc)
The result of the heatmap plot:
Throughout this post, the state of the COVID19 epidemic in Morocco was analyzed with an overall view and also with a region-based view to better understand how each Moroccan region was affected.
Also, the appearances of new clusters in the different regions had a massive impact on the total number of cases and deaths in each region of Morocco
Overall, the Moroccan government has taken drastic measures by multiplying the number of tests and and containing the vitus by closing the roads between the different cities and regions