We will use a webcam as the video input to our pose estimation model and show the output on our main page
We are using two libraries here:
ml5.jsfor creating and running our ML model.
p5.jsfor getting the webcam video feed and displaying output in our browser.
I’ve added extensive documentation inside the code, explaining every single line. Here, we will discuss the main crux which is the majority of the code anyway.
Our code consists of two files:
index.html, the main page to show output.
p5.js runs two functions:
function setup(). The first function that is executed and runs only once. We will do our initial setup in it.
function draw(). This function is called on repeat forever (unless you plan on closing the browser or pressing the power button).
createCanvas(width, height) is provided by p5 to create a box in the browser to show our output. Here, canvas has
width: 640px and
createCapture(VIDEO) is used to capture a webcam feed and return a p5 element object, which we will name
webcam_output. We set the webcam video to the same height and width of our canvas.
ml5.poseNet() creates a new PoseNet model, taking as input:
- Our present webcam output.
- A callback function, which is called when the model is successfully loaded. Inside our
index.htmlfile, we have created an HTML paragraph with an ID
statusshowing the current status text to the user. We change that text to Model Loaded for the user to know, as the model takes a bit to load.
poseNet.on() is a trigger or event listener. Whenever the webcam gives a new image, it is given to the PoseNet model.
The moment pose is detected and output is ready. It calls
results is the final output of
keypoints and scores given by the model.
We store this in our
poses array, which is globally defined and can be used anywhere in our code.
webcam_output.hide() hides the webcam output for now, as we will modify the images and show the image with detected points and lines later.
All we have left to do is to show the image with all the detection results stored in
poses in the browser.
As we know, the
draw() function runs in a loop forever. Inside this, we call the
image() function to display our image (as we have our video image-by-image) in the canvas.
It takes five arguments:
input image. The image we want to display.
x position. The x-coordinate of the top-left corner of the image in respect to the canvas.
y position. The y-coordinate of the top-left corner of the image in respect to the canvas.
width. The width to draw the image.
height. The height to draw the image.
We then call
drawSkeleton() to draw the dots and lines on the current image.
draw() does this in an infinite loop, hence showing a continuous output to the user, which makes it look like a video.
pose key-value out of the
skeleton values, provided for each person in an image.
We have a function to draw detected points on the image. Remember, we saved all the results from the PoseNet output in the
poses array. Here, we loop through every
pose or person in an image and get its
We loop through every
point that is a body part in the
keypoints array, which further has:
part. The name of the part detected.
position. x and y values of a point in the image.
score. Accuracy of detection.
We only draw a
point if the accuracy of detection is greater than
0.2. We call
fill(red, green, blue), taking RGB intensity value ranging from
to 255 to decide the color of a point, and
noStroke() to disable drawing the outline that p5 draws by default.
Then, we call
ellipse(x_value, y_value, width, height) to draw an ellipse at the desired position but we keep the width and height very small, which makes them look like a dot (exactly what we wanted).
Similarly, as our variable
poses has multiple
pose‘s in it, it also has multiple
skeleton values with their own type of key-value pairs, which is handled by
drawSkeleton() drawing lines instead of points.
This is the main page where we display our output. We add all our libraries using script tags.
We show a cute welcome intro to the user. As model loading takes time, we show the ‘Loading model…’ message. If you remember, we change it to ‘Model Loaded’ once our model is loaded using the reference on an ID, called
At last, we put our own JS code inside the body. Run the
index.html file to see the output. Make sure you allow webcam access when prompted.
That’s it! You can always go to the ml5.js reference page, which has many more ready-to-use mode and code snippets for various cool ML projects, dealing with a wide variety of things like text, images, and sound.
Kartik Nighania is a a machine learning enthusiast who loves computer vision more than NLP. He previously worked in the field of robotics especially drones which haunts me to this date. In love with Kaggle.