Updates: Tyler on PowerHacking CSS to cutestrap your app in this latest episode!

E2 - Video/Audio, your own streaming service


Building video and audio codecs in simple JSON.

In episode 2 of 8, we’ll build an audio and video JSON codec encoding pipeline in python, then we’ll decode it in javascript on the frontend in the chromium browser.

This was originally livecoded on Twitch.

Transcript:

all right what is up everybody I've
already got the glove on we're ready to
draw this is the second episode of CSG
flix which is really about building a
streaming service from scratch this is
sort of an insane project I got started
on when I was talking to a few people
and they were saying oh how does a
streaming work over the Internet and you
know we were hanging out in Altaf or
stream and and you know it was it's
there's a lot of confusion it's it was
one of those things that like I'd been
doing it for a long time when I was
working at Netflix and you don't think
about it you know when you're when
you're really close to it but when you
want to get like a little bit further
away you know trying to explain it to
someone you're like oh I've been really
deep in the ditches for a while and this
is more complicated than I thought
in this series I'm just trying to stick
to mostly the basics so we're not going
to know like super advanced topics this
this the goal here is to do something
that you can get a sense of how it works
any of the topics that we're gonna cover
you could go way deeper on I mean you
could do all sorts of work with them but
we might even for kind of a high level
it for those of you caught the intro
session last week it was you know we're
simplifying things we're not doing
something as complicated as what Netflix
does we're only supporting one device
we're trying to go for the browser we're
gonna have the browser so you have to
watch video on the browser and that's
that's already a little bit unrealistic
but and if you if you have one platform
it's a lot easier just what we're only
gonna support one type of browser but
suffice it to say that it pretty much
works that way for everybody so we're
gonna build a simple version of the
servers and yeah we're gonna kind of go
from there so those of you who checked
out the tools we were building yesterday
and the day before that was I am going
to be making that a regular feature so
you'll be able to code with me during
the session so
gather your points we're gonna make them
useful in a future stream so in the
meantime let's get started with CSG flix
so my goal here is we're gonna be doing
a bit of overview in sort of the first
so this is this is episode two we're
gonna be talking about video tonight so
this was the let me see here the this
was kind of where we left things at the
end of the first session and we did the
overview and and that was kind of cool
so this time we're gonna be focusing on
video depending on how the session goes
we might get - audio - it's kind of a
similar case to video a few few
differences but same general idea so
today's sessions really about how we're
gonna pack files up and we're going to
be sending them to the browser so we'll
get into these other topics in the in
the future I'm hoping to do maybe two of
these a week the UI was basically the
original Friday session I had scheduled
anyway so we'll we'll see we'll see
where we get with that but it'll be at
least every Wednesday until we get
through through this and probably a lot
of the fridays as well alright so
without further ado I whipped up a quick
little intro for all of you which is
based on yes the last session the
episode one session so let's take a look
at that
okay a little bit a little bit shorter
than the normal intro video but we can
we could do I I'm working for you that's
that's all you need to know okay so
we're in episode 2 and hopefully unlike
the other thing that's referred to as
episode 2 this will be terribly
interesting ok so well we're talking
about video today and what what is video
sort of like at a high level I mean
video hasn't really changed that much in
a really long time I mean so video is
basically just a let me make that a
little bit bigger it's it's it's
basically just a set of pictures and if
you play these pictures fast enough so
this is picture 1 2 3 and if you play
these pictures fast enough and just sort
of like keep swapping them on top of
each other
then and you do that really fast then it
sort of blurs into an image that we
believe is a motion video so if we start
off and we have our dude kind of
chilling out here and then you know he
sort of raises his hand a little bit and
then you know he goes and I I know I
really went to art school no no I didn't
go to art school little bit so if we if
we get this little like hand animation
going and we play these pictures fast
enough and we put them right on top of
each other then it sort of blurs at some
point and the human eye kind of gets
stuck with this persistence of vision
and and we we see sort of the previous
frame it's it's blurring faster than our
brain is sort of like capturing it and
that's that's kind of they call that
flicker fusion so that's sort of what
we're aiming to do when we're doing
video we want we want to make sure that
we're we're throwing these pictures
which we're gonna call frames and we
want to make sure that we're throwing
these frames at the user fast enough to
maintain this illusion of flicker fusion
and as long as that happens then we get
this nice seamless image that
that something is moving even though
it's really just a set of pictures so
our first view into the world of video
is to talk about groups of pictures okay
so that's all fine audio on the other
hand is a little bit different so it we
can't really see audio but you know we
can imagine it sort of as a frequency
graph and you know who knows what this
looks like but and what this is is this
is sort of like looking at a series of
pulses over time and if these pulses are
played even though there's like little
gaps in here if these pulses are played
fast enough then again it creates that
it's sort of like the equivalent of this
flicker fusion we get this sort of like
aural fusion I'm not actually sure but
that's the real term for it but you
heard it here first if not and as long
as we take all of these different pulses
so here's pulse one two three you know
we just kind of keep on going and we do
that fast enough then that'll blend into
a nice stream list you know seamless
waveform which will trick us into
thinking we're hearing somebody speaking
or some sound effects or a car crash or
whatever whatever happens to be playing
that this type of encoding so over here
on the pictures we were talking about
groups of pictures down here we're
talking about all of these pulses and
this is pulse coded modulation so as
long as we do these pulses really fast
at they're at a consistent speed that
the the breaks between these things have
to be really really regular otherwise
the illusion gets broken then it'll
blend into one nice like aural
experience and we'll end up getting the
impression that we're hearing this thing
live even though you know who knows
what's really live all right so that's
basically our target so when we want to
work with a video that we're gonna play
over the Internet it's really packaged
into these two parts you know we're
gonna have to make some sort of like box
for all this stuff and we're gonna have
to get
turn into a whole pile of bits and we
got to get those bits you know which are
ones and zeros and we got to get those
things from the server or whatever
they're hosted this might be a CDN or
you know this I can spell - this might
be a CDN or you know this could be like
you know just a single web server which
is what we're gonna be doing today or
you know and then we need to get those
shipped off to the browser and I'm
running out of space there but we'll if
we work that right then we get that
stuff over to a browser as soon as the
user requests it so they say I want
movie 1 and movie ones bits get
delivered down if I want movie 2 then I
have to go to a different group of all
this stuff and I have to send that back
and here's movie 2 and so this person
could be watching movie 3 and this is
your this is pretty much the experience
that we're gonna want to create so today
we're focused on how do we put these
things together and I'm gonna try to
keep the packaging format of this box in
a familiar packaging format for all of
you and this is a little bit insane and
in a real production system you would
never actually pack video and audio in a
JSON format like we're going to do today
but the nice thing about the JSON is I
think a lot of people that are gonna be
watching this stream are probably
familiar with it and it makes it very
you know it's text readable so it should
be very easy to sort of see what this
stuff looks like so ok well that's
really neat what do we do sort of as we
move along in this sort of motion
caption world so we have like a frame
and I really like that color I think I
think we're gonna go more on this color
chat let me know if this is too dark to
see so we're going to take these frames
so here's frame one here's frame two and
so on and we're gonna notice as we're
putting this together in our simple
animation that I can actual
draw that the majority of the picture
doesn't change like basically very
little of this entire image like this is
really just a single image and very
little of this entire image is changing
from frame to frame the only actual part
that changed was just this little
actually let me yeah let me highlight
that a little bit better this this
little part here is the only thing that
actually changed between you know frame
one and two and that if I extracted just
that would literally just be like okay
delete actually here that's again
probably a simpler thing with Co delete
the arm being here and instead go with
the arm being here and if I send this
thing and and this is sort of like this
image one - image - and so I can just
say like okay just just give me this
Delta part this this we're gonna call it
Delta so this would be a delta frame in
this case I'm only sending the part that
changed so this by itself is totally
meaningless I need to have frame one
here in order to be able to build frame
two but if I have frame one and I sort
of add frame to Delta back onto it then
I end up with you know well this this is
actually the Delta here and I put this
together combine it with that I end up
with this again and that is wonderful
because very little data needs to be
transferred I mean almost all of this
stuff is just going to be zero so we're
we really if we want to really not send
a lot of bits it's very nice to be able
to just calculate like how much of this
image is actually different I could
explode the entire thing out and just
you know send an image every single
frame this gets very expensive because
I'm literally sending all of the bits
that are in the image every single time
regardless of if they change or not
I have to send all of this data which
you know on a typical image this this
could be like quite a lot of stuff but
in this case in this Delta case I don't
have to send that much so this to sort
of borrow some terms that we're going to
use this this is basically an image this
is an image type frame and this is going
to be a delta type frame which they're
they're gonna call them the standard
term for this is a P frame but the this
I'm gonna call it a delta frame because
I think it's a little bit easier P
refers to prediction so it's just saying
like okay if you if you reassemble this
whole thing you're predicting what the
next frame is going to be based on this
difference you can use different ways of
calculating that which is sort of like
that's that's the wealth of video codecs
out there but for our JSON example which
you know we're not really trying to
build a production video format using
this we're gonna just refer to these as
sort of the delta frames but if you hear
me say p frame that's what I'm free to
there's there's a more advanced version
of this which we're going to get into in
the next class when we start talking
about compression and compression is
really about like yeah how much do we
have to send down to the browser in
order to be able to keep this visual
experience like nice fluid and quality
there's only one other attribute that
we're gonna have to worry about as we're
doing all this and that's we need to
have a sense of how much how many bits
are there that we're going to be sending
so if this is like kind of like a small
picture and this is that at a much
larger scale there's a lot more data or
you know as you as you might know from
cameras and there's a lot more
megapixels and something like this which
which refers to like you know how many
dots there are in there in the image
then there isn't this this is gonna
transfer really quickly but but this
thing is like you know it's it's gonna
be a little bit slower it's gonna take a
lot more memory so we are it's very
important that we know like how many
bits are going this way how you know
what
how wide is this image how high is this
image how this is I'm gonna call this
one a a big age and this one a big w but
and there's sort of some standard sizes
most of them use the H dimension that
the height to refer to the resolution
and they sort of ignore the width
because that the ratio between these is
kind of constant so you've probably
heard this as like you know I mean the
computer people here are definitely
gonna know all of this stuff but we
might have something like 480 if we're
talking about television type formats
720 240 and this P is referring to
progressive skins this is just number of
pixels kind of going top-to-bottom 1080
and you know then something kind of
strange happens that you know when it
gets to 4k which is they switch over to
the the horizontal dimension and they
drop using the the vertical dimension
because this number is a lot bigger than
this number but it sounds like it's four
times better but it it isn't okay
so we're gonna be using these kind of
sizes in this series I'm gonna focus
today mostly on this because we're not
going to be doing much compression so
there's gonna be a lot of data if you
don't compress an image it's actually
quite large now in these images we are
going to need to spell out every single
dot from the top left all the way down
to the bottom right in this image and
how are we actually going to spell these
things out I mean so I'm not talking
about a delta frame here I really
shouldn't be drawing on top of this one
we should be drawing on top of this guy
but how are we going to format this
stuff well this very quickly gets into a
complicated discussion that isn't going
to be that important for today but what
we're gonna look at is basically all of
the bits that are in this image are
going to be packed into groups of three
and these groups of three are your
standard RGB so we're going to be using
RGB color space and this refers to red
green
and blue and that's sort of on the
assumption that you can you can put
these things together and if let's see
let's pick a nice red here and let's
pick a nice blue and if I sort of mix
all of these colors together then I'll
end up with the intersection of this
stuff now obviously it's not actually a
Venn diagram like this but you know if I
mix together some of these colors I'm
gonna get you know different
combinations and between all of them
I'll get a pretty decent color space
that I can use so we need to be careful
about how many bits we specify for each
one of these and that is sort of our bit
format so if we we're just going to take
like full-color images which are made up
of equal parts of red green and blue and
each of these is going to be 8 bits wide
so that's also just a byte in modern
times
so we have eight bits of red we have
eight bits of green and we have eight
bits of blue and this is referred to in
video as RGB 8 8 8 it's also called in
regular kind of computers RGB 8 8 which
is in computer slang we can just add
together all these eights and you know
on we usually refer to this as 24-bit
color now this is kind of an awkward
size for a computer because computers
like to deal in things like 32 bits
because they're typically you know
working with like you know based on
their history processors you know we're
8 bits wide and then they were 16 bits
wide and then you put two 16 bit you
know words together and then you end up
getting a nice 32 bit number which more
like that versus the eights where we
would need for these aids so what do you
do with these four eights well it turns
out the browser very much likes this
format so it treats this as RGB and a
and this a thing is
kind of a fake dimension unlike the red
green and blue we don't we don't just
mix a together 8a refers to alpha and
alpha is really about like how light or
dark all of these things are so I can
take the alpha and I can just mix it
into the RGB values and then I don't
need it so that that's really more for
like you know if you're programming you
know or if you're building like in it if
you're building like an image
manipulation program then it's nice to
be able to just quickly set alpha and
just ignore our G and B but for our
purposes tonight we we don't care about
we're not building an image editing
program we're just trying to display
stuff on the screen as fast as possible
and alpha communicates no additional
information so we're not gonna add the
alpha channel in at all and anything
that worked in it so that means that
when we're working on the server we have
to be really careful about this because
on the server we're gonna work on this
kind of 24-bit you know 888 color space
but when we're on the client we're gonna
need to expand that to this RGB a space
and that's going to cause some
JavaScript gymnastics because if we
don't if we leave this alone it'll start
off as zero which means this is totally
dark this is not seen so this will just
show up as black and that'll that showed
up a bit when I was trying to do this
when I was prepping this last week so
we're gonna try to be careful and we're
gonna put the maximum value in here for
this when we're expanding it which for
eight bits would be 0 to 250 0 to 255
and that's cool so as long as we just
remember to put that in there we
shouldn't see just black so if we see
just black after we're expanding this
thing hey remind me in check oh hey guy
you know you probably forgot that alpha
value so we'll see how far we get with
that all right so let me know if there's
any questions on that I'm gonna keep
going but I'll try to answer them as we
as we move along that's really the
important stuff there there's a lot more
that's going to go on in the future
when we start talking about compression
but that's really another topic so I'm
gonna try to stay light on compression
for today but we I mean if we're talking
about JSON is our image format then
we're not too worried about compression
to be so great well that's all fine
that's our that's our theory so we get
the pictures to show up on the browser
fast enough we're good to go here's one
picture we throw a bunch of these that
visual perception threshold that I told
you about that happens at around like 8
8 to 15 frames a second that's that
flicker fusion it starts to blend
depending on the person so for video
formats we usually want to be safely
above that we want to be doing like 30
frames a second you know old film was 24
frames a second modern games use 60
frames a second it doesn't cause any eye
strain but we're gonna we're just gonna
stay with 30 frames a second because I
think it's a reasonable amount of data
for us to be JSON and packing and
unpacking all right so what do we got we
have this we have this directory that we
created last time and in there we had
episode 1 we do our overview and we
started building a player possibly using
in scripted I'm now of the opinion that
we can do this with just JavaScript so
we're gonna try to do it with just
JavaScript so let's let's let's start
our episode 2 directory and in here
we're going to make we're gonna make our
our flask app so this there's one
there's one thing you know what I
totally forgot to mention and I should
probably say that before we cut back
over to this code which is we do have a
plan so let me let me just cut right
back to that for split second and I'm
gonna just tell all of you that we don't
want that see so yeah cool all right so
let's say that tonight the first thing
we're going to want to do and actually
that
to dark so let's do something a little a
little more gray all right so this is
going to be our plan and it has a line
under it so that's that's a serious plan
okay we're gonna deal with our video as
images so our first goal is going to be
to get a set of images that we can work
with that we know that when we put them
together right we end up with a video
our next step is we're gonna try to get
one of those images to load on the
browser we're gonna start very simply
now assuming that all goes fine we're
gonna take a look at Delta we might have
to come back to this depending on how
much time we've got a little worried
about the amount of content you gotta
get through so this is like the Delta
calculation and then we're going to take
the Delta and we're gonna put it back
together so this is like the apply and
that's where we're gonna actually merge
the Delta image with the with the image
that it refers to so that those two are
kind of a little bit more stretch and
then we're gonna play this on a timer
and if we hit that at 30 frames a second
then we're gonna be doing pretty well
and reach goal if we get there is a
little bit of audio so in this case
we'll be trying to do the same thing but
we'll be stitching it together for PC so
that's basically our plan so I'll leave
that I'll leave that up to remember oh
and I totally forgot to give you that
while I was drawing it so anyway sorry
the plan is over here on the top left so
I've got this yeah video image maybe
some Delta stuff play timer cool we have
effectively today we're not going to be
very bandwidth constrained because we
basically have infinite bandwidth from
the computer to the computer but as we
get to the real internet which i think
is episode 5 then we'll start to worry
about you know we have to begin worrying
about how much data we're transferring
but today it'll pretty much work no
matter how many bits and bytes we we
cram around as long as the CPU can keep
up
okay so we're gonna try our pure
JavaScript player and our pure
JavaScript player is going to need
there's there's a little bug that I've
just got to fix in my flask app creator
tool which is that this thing is the
wrong path I need to get this oh you
know what totally forgot got the red
screen here so let's let's go into
screen and let me just fix pip real fast
and that mean this easier of in Python
so this is just to make our virtual
environment and to make sure that's all
good and we've got that right I should
be able to install things and it's gonna
yeah you gotta explain how get good so
we don't get in there about I mean we're
getting an error that's saying we didn't
specify anything to install tonight
we're gonna need two additional tools
which is we're gonna need to be able to
work with images in their RAW format so
we're gonna use pillow and place of pill
and we're going to need numpy numpy the
python numeric library so we're gonna
treat that image as basically a bunch of
bits that are together in an array okay
so those get installed we're good to go
if we wanted to just check that this is
all good bug aside yeah we can go do
that and if we go log into it over here
we have our basic templates waiting for
us to get to work wonderful well it's
work so before I cut right into the code
our first issues we're gonna need a
video to work with so what I did last
week for the overview I took I took a
video I recorded the video and I dropped
it in our assets directory so this video
I've got two videos here and it's
basically it's the same piece of video
but this one is 30 seconds long and this
other one is 10 minutes so if we
worried about if we if we run out of
video to work with we can always switch
up to the the bigger one so one key
thing that I mentioned in the overview
last week is that streaming is different
from downloading because in the case of
downloading you're sort of waiting for
the entire file to come down and then
you start playing it which in the
overview I explained like that could
take forever with a really high
resolution but in the case of streaming
we want to be sending the bits basically
kind of as fast as the internet
connection to the user can handle and
you know within reason and up to the
resolution but we want to be able to
start playing as quickly as possible so
that it matters to us you know how many
bits we send so we're gonna start with a
low resolution version to begin this
this video itself is it's 720p so we're
gonna be working with smaller video
formats so that we don't have to send
too much JSON so let me just copy that
over so I'm gonna come here into the
static directory and I'm gonna copy
movie one we're gonna take the smaller
image here
now this mp4 it's packed in a normal
industry format and that's not how
that's kind of what we want to talk
today about today is how do we get these
things into compressed formats and and
before does work on similar principles
to what I was talking about with a
little bit more complexity but we're
gonna go back from that we're gonna take
this mp4 and we're gonna unroll it so
the way we're gonna unroll it we're
gonna use the Swiss Army knife of of
movie stuff ffmpeg and which you know I
didn't want to build ffmpeg to you know
to deal with mp4 for tonight so we're
gonna basically just take this movie
format that we have here and we're gonna
knock it down to two this 426 by 240 so
this is 240 P using the same width ratio
as would be in the original image dear
the original image if you
if you the original video format if we
look at it is so this is 30 frames a
second and it's 1280 by 720 so there's
our 720 and 1280 is the width so if we
look at that from a ratio perspective so
if I take 1280 and then divide it by 720
its 1.7 so if I if I do that with my 426
and I divide it by 240 it's close enough
okay so one of the constraints here with
ffmpeg is that it has to be a round
number so I had to slightly push it on
one side but it's pretty much the same
aspect ratio so the width to the height
is disabled okay
well let's um let's take that video and
let's let's knock it down to that size
that I just mentioned so we're gonna go
to 426 by 240 that should be a nice
small file size and we're gonna label it
as this is the 240p version
okay so ffmpeg does its thing and we
take a look at our file sizes and you
see that movie 1 is you know kind of big
at 9.1 Meg's but at 240p it's only 560 K
so you can see that resolution effects
file size drastically so our greatest
compression that we're gonna get is just
to take the image and scale it down and
not send as many bits but that's um we
can always in the browser scale it back
up but it'll start to look kind of
pixelated we're losing some information
but if we don't want to start off with
the highest fidelity but we want to
start off quickly that's a lot less
bytes to transfer to the user so based
on people's internet connections that'll
that'll give you the ability to start
playing video very fast we're gonna be
working with 30-second chunks of video
so if we if we need more chunks we'll
just grab them from the rest of the
video okay
well that's my source now I'm gonna take
this format in this mp4 because we're
not going to be delivering just mp4 and
we're gonna unroll it so the first thing
I want to do too is
there is one part that I didn't mention
which is that I also want to extract the
audio part so in this case I have my
movie we're gonna take the same source
movie but this time we're gonna set the
video codec to none so it's gonna drop
the video and we're just gonna take the
audio exactly as it is in this original
file we're just gonna say copy that as
is and extract that into another file
which we're gonna call movie 1.8 HC
turns out in that mp4 it's encoded as
AAC which is basically an mp4 applet
format okay so now we have our AAC file
which is just the audio and we have our
mp4 our smaller size and before which is
just the video well actually that one
has the video and the audio but you know
for now we don't have to worry too much
about that because we're gonna go making
up our own image format now I can't work
directly with this AAC so we're going to
turn that into a much more commonly used
and easy to use in Python format which
is going to be a normal wave file so
this is like a very old audio format oh
and hey what's up seven one five to nine
how are you doing it iss uni oh my god I
there's a clip for you that I need to
show you from yesterday's session where
I was like SS Union would be having a
heart attack if she saw this and we were
we were coding and somebody from the
stream jumped in on the code session and
they were like ss uni loves programming
in Python anyway
anyway we're getting far afield we're
gonna take this AAC file and we're gonna
roll it back out to a normal wave file
okay cool except we're gonna take this
wave file wasn't written in this better
watch the foggy yeah I'm probably gonna
try to get that up on YouTube later so
we'll see how we do so we're gonna take
this
movie that wave that we just dumped out
and we're gonna knock it down it's right
now it's in if we look at this thing
so this movie one wave it's in this is
riff is wave file standard audio
microsoft PCM that's our pulse coded
modulation it's in 16-bit which is two
bytes per sample and it's stereo so this
is left and right and it's at forty four
thousand one hundred pulses per second
so this is kind of standard CD format
we're gonna knock this down to a much
smaller size just like we knocked the
video down we're gonna knock the audio
down I'm gonna put it into single byte
format and I'm gonna throw away the
stereo into a mono stream and I'm gonna
readjust the samples per second down to
8,000 so this is going to be the old
this is gonna sound like a telephone
call back in the old days so right so to
do that we're gonna use socks which is
the Swiss Army knife of audio stuff and
something that ffmpeg uses and we're
going to take this wav file and we're
gonna specify all those things I just
said we're gonna say instead of 16 bits
8 bits per sample and instead of
left-right you know to channel audio
we're gonna go to one channel audio and
instead of 44,100 samples per second
we're gonna go to 8,000 samples per
second and when we do all that and we
make our audio sound nice and crappy
we're gonna save that as movie ones 8
kilohertz version as a WAV
now what's nice about that and you know
the sox does that very quickly what's
nice about that is you'll see that this
thing just got a lot smaller so here was
our full cd-quality audio sort of
uncompressed it was 5.1 megabytes and
when I just I didn't compress it all I
did was drop the quality on it it turns
into 235 K so that's that's already a
lot less business and it's gonna sound
like it by the way I mean if I if I play
this for you and you might want to
hold your ears mute the audio or
something is okay we're gonna be a
little more volume or anything and up on
the curb er I mean you hear that hiss
you hear that static how it sounds okay
I think I might even be able to boost
the game they're all running up on the
server like in a production system we're
not gonna want to deliver this but for
if we're packing the thing in JSON it's
a lot less bits okay so that's our
source material now I don't have right
now I'm beating if I use the mp4 format
for our video because that's exactly
what we're talking about today is the
format of packing video so I am going to
go one step further and I am going into
and this is why we're gonna get into the
magic I'm gonna create an image
directory and I'm gonna take this video
here this 240p video and I'm gonna
extract 30 frames a second into 30
pictures so each second will be 30
pictures and I'm gonna do that for the
entire 30 seconds so math channel 30
times 30 I should end up with about 900
pictures so let's let's take a look I
don't think it's exactly frame line to
that but you know we're gonna use ffmpeg
again and test new name new me say uni
uni nice nice name so we're gonna take
that thing which is we're gonna input
you know because ffmpeg
we're gonna input input file is gonna be
this 240 PE version and we're gonna say
synchronize your video frames to make
sure that you're getting a full frame
whenever you drop a picture and store
that in this image directory using movie
one
and image and use a three-digit number
followed by PNG okay so I do that and it
unrolls my movie and now I got a whole
bunch of images okay so if I go and I
look at these images and we are going to
look exactly at these images then we'll
see every single frame that we have in
here we're just gonna see like okay
here's frame 1
here's frame - here's frame 3 and you'll
see that very little is changing because
actually I'm pausing during that part of
the video I'm talking and so if I jump
up here to like like you know to the
70th image I see that this this just
filled in this little piece just sort of
filled in so if I look here I'm drawing
across at this point so there's image 71
and here's image 72 yep there we go no
more mispronunciation it's a nice unique
yeah what does the Korean say well I
guess we're gonna have to get an audio
sample in this format okay so now we
have unrolled in our directory all of
the images that we're going to need in
PNG format so this is something that we
can read in Python and we can quickly
look at any of these images and sort of
see what's going on yeah please do so
you need actually if you are in the
discord I could just MUX you on to the
stream and we could you could tell
everybody live
or you can send it recording it's up to
you it depends out live you want to be
okay
so we'll play a little bit as we go this
so we end up having you know hope we
have about 900 images exactly 30 seconds
of video and we want to be able to pack
those we're not going to use this PNG
format which is cheating that's another
form of compression we're gonna turn
those into just regular JSON so the way
we're gonna do that is we're gonna come
into this file and we're gonna build a
new route so this is gonna be the image
as an array route I'm just gonna call it
image a and this looks just return
nothing except with that you know I'll
do that because this is the end of a
function and that caused my server to
fail okay so now I've got my nice
useless image a function and I'm not
gonna be uploading tonight so so let's
start off by just saying let's let's
pick one of these images so we can just
take like image I 862 all right so we're
or you take 70 like we took before
because I know that there's motion
between 70 and 71 so let's let's let's
let's use that
so in this is gonna mi i/o 70 PMG and
that's gonna be our file that's in the
that's the rectory we don't need that
that's gonna be in static image okay so
static image this thing and this is
gonna be something that we want to load
using pills so we're gonna import pill
and I know see you need of when I
program in Python so actually most this
game's gonna be JavaScript I think but
or whiteboard which nightshade dude
doesn't like gonna call it a whiteboard
so we're gonna take in
pill and we're gonna load this image so
filled image open this PNG file and
that's gonna give us back image 1 all
right so in this case they were just 70
so why did you change it into PNG 7 1 5
2 & 9 I didn't really specifically care
that it's PNG I just want it to be in
images so that we have a single image
for every frame and I want it to be in a
format that Python can read we could
turn it into BMPs like raw bitmap if we
want to read the bytes but then we have
to deal with like the header and stuff
I'm not sure how good the pill support
is for BMP so I'm just using PNG as sort
of a convenience that I know that pill
can read it okay so that'll load up
image 70 if I print out in 270 I should
know I was doing some see before so if I
go and hit this thing on image it's not
image it's image array okay
so if I go and I hit this thing I've
loaded a PNG image file and it knows
what the right size is it knows that
this is for 426 by 240 so we're just
gonna work with single resolution
tonight we'll worry about up and down
scaling later so I'm gonna take i-70 and
using it I want to get this as a JSON
array right because I don't want to be
using PNG that'd be cheating for
building out run on a codec so I'm gonna
import numpy as NP and we're gonna take
i ii ii we're gonna do NP dot that
function we just we just have to say
array right
yeah we're gonna say like okay numpy
array take this image and turn it into
an array of stuff okay and return that
as image 70 as an array now I could just
return that to the browser I could JSON
that it would be a normal array I'm
gonna I'm gonna turn it into a I'm gonna
dump that image out this is i-70 a I
have to call because it's a numpy array
and not a normal list and in Python I
have to turn it back into a list and
then I dump that and we get a type error
dumps so I'll get trying to think about
the naming of the variable okay so there
we go
there's our image in sets of three so
that's our RG this is like seventeen our
G B and it's put that into sub arrays of
three so that correspond each one of
these corresponds to one pixel in the
image if I look at that the image here
like if I take like image 70 of course I
stopped that so I would have to go back
into here and I would have to simply
turn on a web server then I can see that
yeah there's not it's not that much up
here it's pretty dark so 17 17 17 seems
kind of reasonable for stuff up here
that's probably related to imprecision
and in in the original mp4 but we didn't
capture raw video unfortunately so well
I don't have the hard drive space for it
anyway all right so great job done we've
taken our image and we've turned it into
JSON format know I want to flatten this
so that we don't have it in sub arrays
we're just going to take the convention
that you need three did you need three
entries in a row too
one pixel so we're building a stupid
image format JSON so we're gonna take
this thing and we're gonna flatten it
and if I do that then I get here's my
three values followed by the next three
guys but they're not struck they're not
in sub arrays anymore so now by the way
image wise I could probably throw all
these away and just turn them all into
black and the user wouldn't really care
like this is flat black but I think you
know whatever there's there's some
issues with this encode that wood that
we're working with in this mp4 so here's
some like actual image data sort of
hiding down here alright so great we
have a server that now takes our image
and we can forget about the mp4 from now
on we're gonna build our own video
format we can now take a single frame
and we can send it to the client as JSON
okay a little bit crazy but bear with me
okay so now on the client what do we
want to do we want to actually take this
thing and turn it back into a picture
okay so what do we have on our web page
right now nothing so now we're going to
go into JavaScript plan and I'm going to
edit this home page okay so yes yes uni
who's SSE uni okay good so we're editing
the right file we are going to take this
JSON that I just showed you over here
and we're gonna try to turn it back into
this image if we get this right so we
want to get that to show up but we want
to manipulate it directly in JavaScript
so we're gonna need some JavaScript to
work with and I'm gonna give myself
because we need a place to draw we're
gonna need a canvas so I'm just gonna
give myself we're gonna be doing raw
bits so we want it no cheating no using
the image formats that already exist in
the browser we're going to make our own
we're making our own code so let's go
ahead and tag this canvas and
width and a height and let's stick with
the 426 and 240 now keep in mind in this
image format but I just sent down
there's no header that tells you what
the width and the height is you have to
know that you read 426 of these before
you're on the next line so Sunni okay so
it's Sunni no matter what I do you give
it 3s you I'm sorry Sunni I don't mean
it negatively like that
so okay this so we have a nice little
canvas tag here and we should be able to
load this if I remember some JavaScript
we're gonna use some newer JavaScript
but not to go watch vape juice Jordan if
you want to see some of the newest he's
I think he's on the Standards Committee
stuff he could he could show you a thing
or two about the latest spec all right
so let's um let's do a
document.getelementbyid some of you have
something to work with
all right now just so that I don't have
to keep typing out those numbers let's
just define our width is 426 and let's
define our height it's 40 because we're
going to stay constant for the rest of
this will be changing the size and later
episodes okay so we we have this and I'm
gonna need a place to draw that doesn't
show up on the screen so like right now
I could just take this and I can take
this canvas and I could go all right do
a clear rekt oh sorry I don't need to
canvass I need a context on my canvas I
need to say like I want this canvas as a
2d drawing surface that we can work with
so let's take canvas we don't want the
canvas we want to take the context that
we just created and we want to clear a
rectangle which goes from the top left
corner to the bottom right
all right so we clear our canvas and if
I go and I style this thing black then
we should be able to see our can just
the right spot okay
so great it's black we've cleared it we
know that there's no pixels on the
surface other than what we put there all
right everything is set to zero right
now so what do we have to do we have to
go get some json let's let's go fetch
some json fetch this image array and
when you get it then take it and turn it
in to do some JSON parsing and when you
have the json.parse because everything's
async then we're gonna just dump that on
concepts let's just make sure that we're
getting the right the right thing okay
so if I go and I look I don't want to
view the source we know we see the
source over here if I look on console I
just got an array back from the server
that has all of our image data so here's
our G&B
of the first pixel on the top left
here's our G&B of the second pixel and
it's gonna go left to right top to
bottom okay
so there's all of our raw image data now
all we have to do is take these pixels
and get them on this screen okay so how
do we do that well let's let's give
ourselves a function this is our kind of
getting started over here so I'm going
to move that up here that let's make a
show image array and this is just gonna
draw it on the context directly it's a
little bit naughty actually drawing it
off screen but so image array and that's
I've already parsed it it was JSON but
I've because it's JSON sees it a parson
to
arrey so looking at it I've got I just
got a normal flat rate okay well how do
I do that I need to take this context
which I could just reach into global
scope because we're not we're not doing
proper programming right now we're
hacking around okay and what I need to
do is basically get the data and like
manually put it back together again so
one way of doing this is I can create an
image there's a lot of ways to do this I
could directly hit the pixels themselves
using the context draw calls or I can
just say give me a new image data object
which is has it given width and a given
height and hopefully these all match I
would normally read this from an image
header and we're gonna call this our
image data so this will just be image
data and now it's gonna be kind of in
the format that's the format that I need
it to be in to display it on the screen
so if I go and I dump this thing out
you're gonna notice that this is a new
kind of object that oops
I can't have my W doing that what are
you talking about W reaching into my
global scope there oh you can't do that
let me just move this up here you know
what let me let me define that up
because those are those are sort of pure
so if I if I go and do that right cool
we're good this is the format of that
this is the array you know kind of what
we got back and you know here's our
sorry that's just the this this thing
over here so I'm gonna pass that in to
show image array which actually is my
own function so I will just call it
because it's going to call it and the
first argument is going to be that array
that we just saw so if I go and I need
to do that then I see that I now have an
image data object which is this is a
type two right this is a an array buffer
if I'm using the right
javascript language and it's 426 by 240
okay but it's pretty much empty right so
if I look at the data there's nothing in
there
now if I actually look at the size of
this I'd noticed that this size is a
little bit bigger than the size of the
JSON object that I got back which I
could show you here Firefox will do that
for me and this that the image data that
I got back was even though they're both
exactly the same dimension this is 306
720 but this thing is 408 960 the
difference is that alpha value so we
have to be careful about alpha when
we're packing this thing back in there's
an extra byte per pixel in JavaScript so
what we need to do is basically kind of
scan across the image that came in so
let's let's just just do it you know so
we're gonna let I be 0 and J which is
gonna be we're gonna use I for Rho and J
for column sorry I for column J for row
so all right so this is going to be 0 0
0 yes alpha refers to the basically we
have red green blue and alpha now alpha
is kind of fictitious because alpha is
really something that gets applied to
red green and blue alpha is not a color
itself it's useful in JavaScript because
I can dial it up and down to kind of
fade something but when we're talking
about direct like image stuff like it's
it doesn't matter okay so we don't want
to pack alpha in when we're sending it
but when we unpack it we're gonna want
to remember to put alpha there so that's
why the destination array is a little
bit bigger than the source array so
we're gonna start off with our camera
just basically gonna go through all the
pixels and we're just gonna write them
over so if they were exactly the same
format if it was RGB to RGB I could just
do
equals zero I is less than you know
however long image data is so this thing
length and just keeping fermenting I and
I could just copy every value from one
to the other I could say alright take
the image array sub I and write that
into the I data which is actually I data
dot data because this this has a data
element to get to this this thing sub I
and I would just assign one to the other
the problem is that this side is RGB a
and this side is RGB so I have to every
for every well every three it's a RGB
the NAP that one it depends if you start
from 1 or 0 I have to slap in a to 5/5
value so I can't do this the way I have
it packed right now so yeah let's let's
just keep track of that let's just say
like if so this is kind of like I is
just kind of scanning across at this
point yeah I'm sorry before I was saying
that I was gonna use row column that's
not what we'd do it all I'm gonna use J
as my counter in one array and I'm gonna
use eyes my counter in the other array
so I is just always gonna be going up so
that's my destination counter so I'm
gonna say every time the destination
counter mod for so divide it by 4 look
at the remainder is equal to 3 so when
I'm on the last one before it goes back
to 0 1 2 3 0 1 2 3 0 1 2 3 then that
time I'm gonna just I'm gonna ignore
some ice or Simmons and I'm just gonna
say you get 255 ok because that's the
alpha value otherwise just copy the red
green and blue so this is the red green
blue this is the alpha value okay so if
I do that I should move from one to the
other
we should be okay this these values are
all bite each one is a bytes at zero to
255 so I'm staying within the UN to
eight clamped array size and that's fine
so we do that and we look at the image
data again so I'm just gonna take this
thing it was all zeros before we drop it
out down here and now there should be a
whole bunch of stuff in it we have so in
my source where I had seventeen
seventeen seventeen that turned into
seventeen seventeen seventeen to
fifty-five there's our alpha value then
eight eight eight which turned into
eight eight oh so I didn't quite get my
array right here so this is because I'm
not incrementing J so this is okay so
really I want to copy the J value and
every time I copy a J value add one to
it
all right so J is basically my counter
in the source data which is image array
and I is my counter in the I data which
is my destination right I do that this
is why we debug and test ok J's not
defined because I deleted it before when
I was showing you we're gonna say J is
equal to zero but we're not gonna
increment J over here cool then that
means
17 17 17 255 eight eight eight two fifty
five eight eight eight nine nine nine
255 nine nine nine six six six 255 okay
so we did our copy except we threw in
this 255 for the Alpha every time okay
hopefully that made sense a little bit
confusing but it's just moving from RGB
to RGB a okay what would the image look
like if you did it like that if you
didn't like that if you just did RGB
then basically it would interpret every
fourth value as the Alpha even though
it's actually the next red value and
then it becomes you know be off by one
than to be off by two then it'll be off
by three so that's it would look wrong I
mean it'd be a different image from
Sen and alpha you know if it's not 255
it darkens that bit so no matter what
you put into these RG and B values it
would darken it I'll actually I'll show
you that you know let's look let's let's
do that so right now we're just dumping
it on console but this is not the proper
way to be looking at an image we don't
look at images on console we go and look
at them as images we're human beings
uni is a human being I believe so we are
going to we're not going to console.log
this thing anymore I'm going to take our
context that we had before and we're
gonna put image data in there we're just
gonna say take this image data from this
eye data that I created and write it in
in the top left corner look at that man
if you all saw the prep session it was
not like this not like this that's that
line in the matrix
okay so yes I'm blown - I'm blown away
here someone five two and nine so this
what would it look like if I didn't have
the Alpha value and this is actually
when I first built this I did not have
alpha correct so I was actually just
writing zero in for the Alpha value
everything's black it doesn't matter
what data comes back it all shows up as
black because the Alpha values zero so
it's saying like dark in this pixel
entirely no matter what its color is we
want it to be 255 if we go and we do
that then we get the image at full color
now if I take it at half way like 127
then you'll notice that it gets a little
bit darker it's dimming it that's what
happens if you play with the Alpha value
so if you write like red green and blue
values into the Alpha it's gonna look
like crap so this is a nice way to do
like it fade in fade out I mean it has
uses but it's not useful in the sense
that I could pre compute the Alpha in
the RG and B so I don't need to transfer
it I'm worried about how many bytes I'm
sending the browser I know this is JSON
but eventually we're gonna be worried
about how many bytes were sending to the
brows
I mean it matters because it's gonna
affect how fast the playtime starts
we're building a streaming service we
care how long it takes okay so let's and
you can watch the first episode if you
want to see about the Iron Triangle of
streaming so Anthony parks innovation
which I really like okay so we know how
to take an image from JSON which looked
like this and we know how to turn it
into this so that it looks exactly like
the image that said if I change the
image like if I make this like 75 so you
see that this line is mostly formed then
all we need to do is come down here and
pass back image 75 boom we get our line
completed okay so we know that this is
working cool let's make that a parameter
let's extract that thing and just say
pass along an image ID is it missing the
bottom right logo is it oh my god it is
alright so we actually oh that's yeah
we're gonna have to get back to that I'm
gonna need a better image which has more
data because I have a feeling I might be
cutting it but we'll worry about that a
little bit later that could have
happened in decompression and we didn't
we'd have to dig a little bit more to
figure that out but we'll we'll come
back to them so let's let's worry about
this line and by the way good eye seven
one five two oh nine I could award you
points I would we're gonna really need
to add that to the block so okay so
let's take in an image ID parameter so
that we're gonna specify like which
which image we want and instead of
taking always image 75 let's take this
thing we'll mix in the number here to
make sure that it's three digits even if
we pass it in less but that's also just
just for safety let's ensure that this
thing is
an integer so this will cause it to
throw in the event that it's not but
it'll also convert it to an integer I
know you should never write the we're
we're happy okay so we get a right it's
no longer called image a on the source
so this is now called image a and then
whatever frame we want so if we want
frame 7500 so I'm actually getting an
error here about image a seventy-five
500
yeah well it's cuz I need to Pat it so
it's not 0 3 it's 0 3 there you go ok so
now I get the right image and if I load
it here yep okay cool we get the right
thing so now on the client side we can
specify which frame we want which is
cool right I could say I want frame 278
ok 78 let's take frame 120
yeah there's starting to be this be over
yes let's let's go far in the future
let's go to frame 700 okay so this is
working and something going wrong down
here is seven point five two and nine
point it out okay cool so now we can
directly manipulate the image surface I
just I mean if you've worked with canvas
before but that's that's kind of the
nuts and bolts of how you can directly
write the image based on this JSON data
so we didn't talk about images we talked
about video and video is about lots of
images so we want these images to be
kind of cycling you know we want we want
to be rolling through and showing these
fast enough that our eyes are tricked
into thinking that this is a regular
video and you know it's not really just
a bunch of images that are being shown
so now this source that we're working
with we know that it's 30 frames a
second so basically what I need to do is
fetch this thing I'm gonna go like the
slow way which is we're just gonna fetch
this thing 30 frames a second okay so we
can get there by just settling and let's
build ourselves a little mini function
that that's this I'm gonna remember to
turn a variable so we're in a constant
thing this will be our video timer and
there's my one line that's gonna be in
my function and I know this code is a
bit sloppy and we're gonna say do this
I'm gonna slow it way down so let's do
one per second okay now right now it's
just gonna keep requesting the same
frame over and over which is not what I
want so let's count let's let's have a
variable called I sorry about these eyes
and stuff old habits and we're gonna use
new style syntax because I don't want to
mix this thing and we're gonna say put I
in right here so it's it should be
requesting one image a second forever
so that's eventually going to overflow
and run out of images but okay so this
is how it's starting from zero we don't
want to do all step one and we also want
to remember to think the Menai each time
we do this okay okay that looks like
it's working so we haven't seen the the
image change because I think in the
first part of this video we don't really
do much but we know that there's some
action happening by frame 70 so let's
let's jump over there yeah there we go
we have crappy video so and you oh
you're right 75 - and I see there's two
points two points dude okay so this is
now playing really crappy video I
wouldn't call it video it's really just
changing this slow enough that we don't
see it so let's pull that console.log
out and let's speed it up let's look
let's go every half a second
yeah it's starting to get a little
smoother right starting to look like a
video okay
now let's let's keep going let's let's
start this all the way at the beginning
and now who here knows what 30 frames a
second is in terms of milliseconds how
many milliseconds is each frame if it's
30 frames a second good guess
it's yes that's barely qualified that is
exactly correct so yeah we can actually
calculate so and that is the right thing
to do because we are this this parameter
here is in milliseconds for the set to
set interval so if we take a thousand
milliseconds which would be one second
we divide it by 30 frames that have to
happen in a second we would get how long
each frame is positive now I'm not doing
any pre loading here so we're allowing
Network time and we get we'll get
fancier later but let's let's start
there all right now when I do that it
suddenly turns into regular looking
video that's it everybody job done this
is amazing here we go we just rolled our
own video codec bad now you'll notice
we're getting a little bit of flicker I
don't know how much that's going to come
through on the stream and I'm seeing a
little bit of flicker and the reason for
that is we're drawing directly on this
canvas context so there we go we ran out
of frames and it starts it starts
erroring out so we're gonna need to cap
it at 900 so we're gonna have to say
something like if I is greater than or
equal to 900 then clear time or a clear
interval
I'm gonna take advantage of the closure
all right so we do that and it should be
fine so in 30 seconds we'll know next
thing I'm gonna want to do is I want to
smooth that I don't want to draw
directly on the surface because drawing
directly on the surface means that I'm
racing the beam so I'm drawing on it but
I'm also the computer scanning on it and
that's that's sort of bad so the typical
game pattern for this video uses it too
is you draw you don't draw directly
there you draw into another surface and
then you swap the surface okay so in the
case of this of the browser we don't
have to directly worry about swapping
the surface all we're gonna do is just
basically tell it to copy it which you
know will let the browser figure out how
to handle that under the hood I'm not
I'm not even sure if that's exposed in
normal 2d context I'd have to go do some
research on that but so by the way this
thousand divided by 30 we're just gonna
unroll it because it's actually going to
turn into a fraction and it turns out
that it's about 33 if I do that here
thousand divided by 30 then I get 33
point all this stuff so we're gonna call
this 33 milliseconds there's some
JavaScript time something not quite
exact no notice that time I didn't get
any errors so it's stopping cool so we
are clearing our interval in that state
let's let's just remove this that's
inside our function and cool well how do
I draw this thing somewhere else well
right now I'm drawing directly on the
canvas let's not do that so let's draw
we could draw in another canvas and move
it off the screen which we have in
JavaScript it's called an off-screen
canvas what do you know so we can
actually just initialize an off-screen
canvas directly this is a something I
think this is relatively new to the
standard I haven't no I'm not sure
exactly what it was added that'd be a
good question for Dave but so we can
basically just do one of these things up
we can say like and it'll it won't
actually bother to put the thing in the
because it's not it it doesn't make
sense to put the thing in the Dom
because it's never gonna be drawn on the
screen so it'll just give me the memory
and the data structure and I can go
manipulate the bits and then I can just
swap it on so that'll get rid of some of
that video tear so let's let's go and do
that let's we need it to be the same
size the width the height this we can
use later for scaling cuz we could
actually put this thing one size and
then when we copy it to the canvas we
can make the canvas a different size and
JavaScript will take care of scaling it
for us we don't have to do the bit
manipulation for that which would be
another class but anyway let's let's do
let's let's call this the off campus and
the off camera we're gonna got an off
context as well so let's let's take a
yeah I think yeah we're gonna need to
get using exactly the same trick that we
have at the off canvas with the regular
canvas we're gonna say get context 2d
and store that in our off context okay
so we still have a surface that we can
draw on and we can clear this and we can
clear the off context as well
all right so we run that should get no
errors yeah our videos playing whatever
so instead of showing the image array
directly on the context I'm now gonna
draw it onto the off context and off
context is going to have this create
image data and it's gonna put the image
data on the context at the end so I
think that'll handle the copy for me so
there you go all right
bugs aside I think I think that's the
right pattern yeah at the end I
basically just say put image data and I
take this image data which was actually
drawn on different context and it throws
it straight on on the context which is
attached to that canvas as this
technique been used in the past 75209
I'm not really sure that it's definitely
been used at lower levels than browser
and JavaScript I mean like there's this
this is basically how bits end up on the
screen I mean modern days this is
actually implemented in a chip so
there's a decoder chip typically in
hardware that will you could pass it the
the the movie format directly but we're
actually building a software decoder
sort of like in the old days and yes it
used to be just directly managing the
RGB like we are nobody would actually
pack it in JSON because that's a lot of
overhead for no reason I mean that's
that there's no reason to parse this but
okay so how does the off canvas make it
clear is it choosing only to re-render
pixels that change or sorry I didn't
fall good question
barely qualified so that's a great
nickname by the way so the way that
works is basically we're going to create
the image that we're going to be drawing
on the off context which is the
off-screen canvas so now we have an
off-screen canvas which is not in the
Dom so it doesn't participate in the dot
on that we're getting an off-screen
can't context that we can draw on the
canvas this is just sort of a web ism
so I have this off-screen context that I
can create an image on that's not
attached to the screen I then do my
image stuff which is based on this eye
data which is now sitting on the
off-screen contacts and I draw the whole
thing down and when I'm done with it
I basically just copy it all over so now
there's no drawing while it's drawing
you know so if the browser is refreshing
the screen I'm no longer racing the beam
you know so I don't get that visual
tearing this is this is called
double buffering it's it's a little bit
weird euphemistically in JavaScript
because there isn't actually a buffer
dot swap which would be how this is
implemented in C or C++ you like an STL
on an STL surface you would actually
just say like you know swap you could
say you're pointing to this memory
address for video for for the image
right now point to this memory address
which is the off-screen context
equivalent here and then all it does is
just update that and then the video
memory basically just starts drawing the
bytes directly from that location on so
that that's what removes the tearing
because you're no longer drawing while
the video hardware is drawing it on the
screen ok let me know if that made sense
it's a little bit complicated but
basically the idea is that you draw it
somewhere else you get your image right
and then you just quickly move it and
you kind of keep switching between the
two yeah that's exactly what it is this
is called double buffering actually 7 1
5 2 or not that's that's exactly what it
is that it is we're drawing into it as a
buffer and then we're saying ok display
this and then we're drawing into the
next one and we're saying ok display
that okay then we're drawing in the next
one we're saying ok display that now in
this case I'm always drawing into the
because it's JavaScript I'm always
drawing off the screen and copying it
over and I'm drawing off the screen and
copy that draw you know and and so we're
leaving that up to the browser but
basically it it it makes it so that the
video hardware isn't racing against the
image to code me padding out the rgba
each line cool
we have video that's great so we have
two things we can do here we can get
into Delta because Delta is basically
this is right now we're downloading an
entire image for every image but most of
those images are not that different from
each other they're actually just like
you know when as I'm drawing on this
board it's like just this line moving
along and you know this other stuff so I
can create the Delta and just apply the
Delta I can take the image and just keep
slapping the Delta on top of it until I
get
you know smooth video and then show that
and that's that's a little bit more CPU
intensive which is why I definitely want
to be doing it off the screen or we can
jump over to audio so I don't know which
which one do you want to see or hear
barely qualified because you can't tell
the browser when to re-render really so
you do the math first for the whole
frame and then tell it to render rather
than letting it possibly rerender
halfway that's exactly it barely barely
qualified that's a great nickname a lot
of talking I got a drink okay
just water tonight this is a little too
complicated to be having anything else
okay so we have video let's um let's
let's play with some Delta stuff yeah I
really want to see - both - as well
Brandon let's see how far we can get
okay I mean this is already it's not
it's not an hour session I mean none of
these sessions are really gonna be just
an hour but okay so let's start with
Delta it's really nice this is part of
why I'm picking Python which Unni loves
and that's mainly because I get to use
the the numpy library I also get to use
the pill image library so I get quite a
lot for me now again there's stuff in
NPM I could get to do equivalent so it's
not like this is a language one's better
than the other but it is really easy to
do this type of tamerica manipulation
and in python because and that's part of
the reason python is really common for
data science so it's this is where it's
gonna really be useful so by the way
where can I find your YouTube I think
the bot is online today let's let's uh
of course the box office
bah what are you doing I got to give it
the right command there you go that's
where you can find the youtubes cool not
used to having the botton stream that's
kinda neat oh and of course I left a bug
in there but okay so these are these are
all going to end up the CSG flix is
gonna end up on youtube so I think it's
a little tough to follow some of this
stuff on Twitch but live but you know
hey you know let's we'll keep rolling
we'll figure this out okay so we have
this image thing what we really want to
do is build another version of this that
does Delta so we want to do image array
Delta image rady mm dad i'ma get him get
image dad alright so we're gonna call
this in a em dad okay now it's funny
that this is still called i-70 because
that's totally wrong isn't it it's no
longer i-70 this is image one and just
the first image we're going to work with
this is image one right this is image
one I'm gonna fix this
real fast before because I'm about to
get burned as I copy pasta okay so that
is gonna cause my program to fail we're
gonna get an image ad and that's gonna
be image good okay that thing and that's
what the part let's restart this thing
yeah okay so if we don't look at this we
didn't break anything and if we look at
image get we're getting right now we're
getting exactly the same thing okay so
let's do a delta so basically we have
all of this stuff and we're gonna have
two images now we're gonna have like say
75 and so for simplicity I'm just gonna
assume that we're gonna do the Delta
from the first image for every one of
these so Matt no matter what you request
I'm just gonna say take image 1 take
whatever image is specified and
calculate the difference between these
two now that's not really needed for an
optimal video encoder we really
need the Delta from the previous frame
but for now just because I don't want to
deal with more parameter part and we're
just gonna assume that it's we're always
going to take image 1 as image 1 so in
this case I'm gonna take image 1 not as
this it's actually going to be really
image 1 now that'll show me the array
for image 1 yes it does ok cool we don't
know that because this part is the same
anyway we can see here that it's it's
loading the image all right cool
now we need image to and image 2 which
is really gonna be this is our
difference is going to be the difference
between these two we're gonna say image
2 is whatever was passed in so image 1
is always going to be the first image
from our video image 2 is gonna be
whatever this URL parameter is so if
it's 75 I'm gonna get the 75th image
okay I run that there shouldn't really
be any error there because I'm not doing
anything with image 2 yet now what I'd
like to do is say image 1 and I'd
subtract image 2 from actually in which
2's in the future so I could say well if
I want to recompose it I could say my my
Delta image is really like image 1 minus
image - except the problem is that
that's destructive so if I subtract all
of those array elements from each other
I might lose some information like if I
if I say like this thing - that thing
then you know it'll I don't actually
want a subtract because what I'm gonna
end up doing is I want something that
works whether it's subtracting adding or
whatever it turns out I have a logical
operation that does exactly that and
that is XOR and in that case I want to
basically say if the bid is in one or
the other
send it through but not both
because if it's in both I don't want to
reapply it on the other side like if
it's already there
so I'd be wasting bits that I'm sending
basically like so if
just super quick review for those you
haven't looked at logic in a little
while truth table so this is like image
one and this is image two and this is
the result here's my truth table this
thing is true true false
that's terrible F let's try it again
true true false false true false okay so
these are all my cases both are true
ones true the others false ones false
the other is true ones false the others
false and for an XOR which is really I
won there I go
slop it fine I 1 XOR I - that's gonna be
when they're both true it's false when
one or the other is true it's true and
when they're both false it's false okay
so this is neat because if image 1 has a
pixel like if this T is like a pixel is
on and image to the same pixels on I'm
not gonna send anything right I'm gonna
end up basically saying yeah just send a
0 for that and if one or the other
changes worry about it but otherwise
don't ok so I end up with my XOR
operation and let's go back here there
is an XOR image option in JavaScript I
had a lot of trouble working with it so
we're just gonna manually do it we've
already built our little pixel glitter
anyway so so we're gonna do XOR and
that's gonna create our delta image now
the delta image is except in order to be
able to do that i need to be working
with arrays so i'm gonna not flatten
this thing anymore i'm gonna move this
over here image 1 is going to get turned
into an array and image 2 is gonna get
turned into an array
okay so I've got the image image one I
open it up I turned it into a numpy
array image two I open it up I turn it
into a numpy array and now I'm going to
say take image one array numpy array XOR
it with image two numpy array that's
your Delta okay so if I do that that's
important because we're gonna have a lot
of bit values and these are jeebies then
send that thing back so this is really
Delta I turn it back into a list okay
most of the image doesn't change look at
that big surprise now this is cool
because we're only seeing the XOR the
two so we're only seeing the places
where the bits changed and that's cool I
think it's I love that how little code
it is to do because of Python thank you
so try to do this in JavaScript I always
like counting and stuff okay so a course
language alright so we're gonna take
this thing and we're gonna flatten it
because right now we're just to listing
it but let's let's flatten it first
because right now we still have these
RGB values as a sub array so I'm gonna
turn that back into a flattened array I
do this cool there we go now we get rid
of all those sub arrays and we're just
left with RGB values for the stuff
that's flipped alright so if I try to
just display that as a straight image
and to do so we're gonna go back over
here and we're gonna go back over here
I'm going to reuse that same context so
let's let's
let's so let's comment all this stuff
out that we just spent so much work
putting together and let's basically
take this line here but we're gonna
we're gonna work with it a little bit so
let's take this is image a right now
we're gonna turn it into image a a delta
and we want to take the Delta all the
way out let's let's take like 70 so this
would be the Delta from image 1 all the
way to image 70 okay now instead of
showing it I'm going to just dump it on
console to get started
except that's gonna end up find things
we're gonna say X is it's a function
that says console.log okay cool do that
and we should end up with our values so
if I jump down here somewhere
deeper in the array some of these values
should not be 0 but I could be hunting
all day this is actually really good for
compression because all of these zeros
we can run length encoding okay yeah so
here's some stuff that actually changed
down here in the 37-thousand so these
these values are just bits that are
gonna be different so if I take the
image if I take image 1 and I apply this
to it I should end up with image 70 but
the nice thing about this it's very
compressible because the majority of
what I'm sending in the Delta is zeros
so I can actually display the Delta
itself so why don't we just do that
first so let's let's go back into our I
hate to change it because it's already
working
all right let's let's just copy it we're
copypasta it so show image array naming
is hard so I'm just gonna be
unimaginable so we show image array
Delta and that's gonna take the image
array we're gonna use our off-screen
context we're gonna do our same RGB a to
RGB unwind
that's our blitt and we're gonna drop it
onto the context so that we can see it
okay so let's image add now this thing
does take an ID so it's gonna be this
one so this should be the Delta between
image 1 and image 70 first try that
calls for a drink I think that's a big
deal I'm very happy about that if I take
this thing and I go all the way to the
last image in the series like image 900
I've drawn the entire image so here's
the part now this this coloring stuff
this is really related to digital noise
that was that these are compression
artifacts so I can actually see exactly
where like the frame is changing and it
shouldn't be changing those should have
just stayed black but that was before
like when it noticed that some of this
black wasn't really black and there's
some video compression that's yeah threw
away some bits but that's what this
color stuff is but you can see the main
stuff is here so aside from a little bit
of digital noise this is actually the
Delta so if I this is the Delta all the
way to the end so if I apply this Delta
to the beginning I should get the last
frame of the image so this this lack
this frame really has just what's
changed if I start from what was already
there and I apply the XOR on top of it I
should end up with the last frame
exactly okay
so let's roll the dice I mean literally
oh 10 actually that's binary that's a
really good roll I like that time if you
if you want to see that I'm gonna show
you the hardware cam this is 10 100
really like that it's neat I rolled the
only thing that's binary on this dime ok
cool so great let's go do this well show
image Delta right now we're actually
just showing the Delta itself but we're
not applying it to the original image so
to do that we're gonna need
two pieces of information right now
we're just fetching the Delta but we're
gonna have had to have known the
original image to be able to apply the
Delta to it so let's let's just do that
which is pretty similar to what we were
already doing except we're not gonna do
the image Delta we're just gonna say
give me image 1 and if I I could just
reuse this routine because the Delta the
Delta part is actually happening on the
server at the moment it's not happening
in the client ok so this will this will
make sure that I have an image array
which corresponds to the first image so
now we're gonna expand this function for
what we're gonna do in this case this is
gonna be image 1 gonna be a little more
careful about naming at this point and
inside that we're gonna run another
fetch where we're gonna try and fetch
the last frame of this thing and we're
gonna say ok take the last frame which
yeah it's ok this X is only contained in
here and it's gonna call show image
Delta with the value that's come back
which is really X but I don't need to do
that I need to do it with both of them
so since I want to apply both of them
I'm actually gonna turn this this is
gonna be why I'm just gonna call it Y
now which is actually let's call it
image 2 so this is this is the Delta
part image 2 Delta and call show image
array Delta with image 1 an image to
Delta ok so what I want is I want this
thing to start with image 1 on the
canvas and then apply the Delta on top
of it oh hey what's up f2o how you doing
wouldn't 1 10 and 11 be binary options
on D 20 points to F 2 well you are
absolutely right I misspoke earlier but
I guess I was just so excited that I
rolled some binary on a d20 apologies
but that is absolutely correct okay so
now I've got the image array which is
really image 1 and I've got image array
of the Delta I'm gonna spell them out
here so this thing is really image ray
of image one and I data is a local so
that's all fine and that's gonna show up
so that'll actually work I'm just at the
moment I'm not using image array Delta
okay so we run that and we get an error
so image rays not defined because on
line 41 we're using image array which is
ji-won okay not okay so it's showing
image array one no this is the first
image image one now what I want to do is
I want to reapply that Delta to it so
instead of just showing the bite
directly I'm going to XOR it again and
I'm gonna say and again I don't have to
worry about the Alpha components we
don't care but we know that we're at
exactly the same offset here because I
haven't moved J forward yet because I'm
gonna move J forward on this okay so I
run that there it is the last frame of
the video now that I composited by
taking the first image in the entire
sequence and XOR in the last the Delta
just the Delta and it worked
sweet totally magic okay so just for
reference for those of you who are
curious I'm gonna put that image under
this context so right now I've got my my
canvas here so why don't under it we'll
just going to throw a few be ours is
sloppy web programming and let's let's
load the image for load it directly so
right now we're coming through the
Python processor to get this JSON
formatted thing but we could actually
just hit static movie one image
ing okay except I got a except that's in
the image subject that was pretty good
guess yeah okay so this is actually
image 900 but this is actually the
composition so let's let's take a look
at the Delta of that and which I don't
have an image format do I have them JSON
image format so I'd need another canvas
to do that I kind of want to see it so
let's just let's play a little bit that
that was working well so hey why not
we're gonna call this def of canvas and
we're gonna take Delta canvas which is
gonna be an onscreen canvas like this
one and I'm gonna show when it does this
show image Delta show image or a delta I
have the Delta handy yeah that's a
bummer isn't it so what I could do
instead is pass the use this content
okay so I do that and I say use this
context and down here where I just
commented out I fixed that and now I
would have to actually turn this into a
function which I don't need to use that
notation sorry I could just say like X
pass along show image array but now it's
going to be the normal context with X
okay so that should repair that but now
I can reuse it because in this dude over
here I can call it we have a different
context so I can say you
let's let's collect our cannabis grabber
again and let's put this here as our
other canvas what I call I call it the
Delphic and this is a lot cooler the
Delta canvas was referred to as Delta
canvas okay so it's called Delta canvas
so that sorry yes I want to grab the
element for the Delta canvas and from
the Delta canvas I want to get the Delta
context I'll spell it out a little bit
more and let's let's let's go and show
that so now I should be able to do show
image array of Delta context and alright
so we roll the die again so this is the
Delta that we're downloading and we are
getting the first image and we're
applying it and we end up with that cool
Brandon I'm with you I let's not jinx it
let's let's keep going let's let's just
keep going I feel like this is good now
I should quit while I'm ahead
maybe I shouldn't quit ROM Matt yeah
whatever you're all here you're hanging
out should we go a little bit more I
mean you know the real role here is we
try for audio to
I think in order to do that I'm going to
have to arm myself with the power glove
let's do it we have the power let's
let's go for the audio
let's we're done with video where this
videos working we got this json encoded
custom codec and it's nice and
compressible because in this delta
format we have large runs of zeros we
can just take all those zeros and run
length encoding which is totally
something we're gonna do in the next
class we could roll the d20 now I'm
feeling I put the power glove on we're
doing it I think what we're doing it
hell or high water as long as I can say
SUNY right we're gonna do it said I say
SUNY right I don't know if Cu needs to
but I did appreciate SS Uni spelling out
C uni phonetically thank you so great
we're gonna leave all that image stuff
alone we have Delta images we have
regular images we have video frames
we're missing audio so let's give
ourselves some audio how do we give
ourselves some audio well we start out
with a route okay so first step just
like we did with the video image is we
took our we're gonna we're gonna add an
audio route at this point and our audio
route audio array doesn't call it a a
that's gonna be a no that's it we're
just gonna pass the entire file back
we're not gonna worry about chunking or
individual frames or anything like that
so hey ya don't want to get pick on
those people so so this is just gonna
return
nothing at the moment and let's take a
look at that so that's our static let's
go to our audio data so right now that's
our webpage and our audio data is
currently Wow sorry go back there
browser okay our audio data is currently
empty alright I'm gonna close some of
this so we have our audio data actually
we have nothing we begin with nothing we
are thrown into the programming world
with nothing but we will make something
ok so in order to do this we're gonna
import the default wave library which is
built into Python Thank You Python and
sorry SUNY and we're gonna basically
wave follows standard Python semantics
for files so it's it's open just the
same as like normal file opens this is
I'm really impressed forgot this for all
right no I mean I meant I knew we were
gonna do that so so what are we gonna
open for our wave well we want to take
our nice crappy audio which as you may
recall from earlier sounded like and I'm
gonna boost the volume again this
they're all run committees like fragile
and hop on the servers just because I
want to try to keep them simple it'll
probably be doing a bit of Python I
overdid it I won't use 300 got a little
bit excited okay so alright so what do
we have to do we have to open up this
audio file which is in static and movie
1 8 kilohertz wave alright so static
movie 1 8 kilohertz so we open that
thing nothing should happen yeah because
I'm not actually doing anything with the
data yet so I'm gonna be nice and Python
I can say with this thing as f do
something which we're gonna end up
having to do and now I have a way to get
audio data now wave gives you a bunch of
nice features off the bat including read
frames which basically is gonna read
every single frame that's in this file
so keep in mind in audio pulse coded
modulation I'm looking at pulses and how
high that pulse is at each instant how
many instance per second sort of like we
had our frames per second we have our
samples per second here alright that's
our sample rate which this file is 8
kilohertz so we have 8,000 samples every
second so question for the audience chat
let me know how many milliseconds apart
is each sample and while you're working
on that I'll work on this and I'm gonna
read frames and this is basically
specify how many frames I need some math
come on people
give me some math and if you're not in
the mood for math don't worry I'll do
the math for you so this is just gonna
be we're gonna call this our audio data
we can we could actually call it audio
data there could be pythonic about it
too so cuz we're in Python so this will
read one frame of audio data and if I
print that frame out
gonna be very boring but I can come over
here reload this
and there's my binary data that's the
first frame now if I read five of these
frames ten of these frames actually 8000
of these frames that would be one second
of audio err math not found nice there
you go this is this is one second of
audio from that really loud sample sorry
about your ears but that would be the
equivalent of eight thousand bytes now
this is in binary hex representation and
Python so this is /x this that's not
really in the data the backslash X
that's just escaping this eighty is hex
that's in there so it's 80 80 80 7f 80
80 81 all this stuff now this just to
let you know is an unsigned format
because 8-bit audio
according to waves must always be in
unsigned format now you're not gonna see
you're gonna see a little bit of wave
perturbation around these things why is
most of it 80
that's because 80 corresponds to 128
which out of 256 is exactly in the
middle so they didn't have signed math
they didn't have a standard on sign math
in the wave format at the time so it was
added a little bit later so 128
corresponds to 0 255 corresponds to a 1
in terms of like like full pulse and 0
corresponds to negative 1 and JavaScript
of course is going to use that negative
1 to 1 range it's it's float32 array so
but this audio file that we have in
character it's in single byte format is
unsigned so I have to I'm gonna have to
remember to subtract 128 to get it to be
between negative 128 and 128 and then we
have to divide by 128 to get it on the
range from 1 to negative and I don't
know why I had to do that with my hands
when I have a perfectly good whiteboard
over here
point 1 to 5 I don't know let's check
your math I'm not really sure I was
asking the question but I didn't
they know the answer to it so the
question was 8,000 Hertz which is 8000
samples per second so if I want to do
that in its 1/8000 which is so in
milliseconds that would be which is
probably what Brandon is responding 0.12
all right so do you close but actually I
think you're probably right Brandon cuz
I think I only did 5 digits of precision
yeah I only did 5 digits of precision so
let's let's pump that up and see if
Brandon's totally right because I
haven't feeling Brandon is yeah it is 1
to 5 if I multiply that out to from
millet from seconds to milliseconds I
get point 1 to 5 you are absolutely
correct what are the till days that's
the till days that you saw there 7 1 5 2
& 9 they're not really till days it just
turns out that those hex digits are
printable so Python prints them they're
not really till days it's actually it's
whatever the ASCII value is but this
will actually make a little bit more
sense if I if we don't look at it with
have hex
escaping or if we look at it fully with
hex like like let's say we actually look
at the hex I don't know if I should be
showing this on a JavaScript stream yeah
what the hell you live once okay so this
is actually a hecta dump of the raw WAV
file that I just was playing around with
and there's a little bit of magic stuff
up here this is the header right and
that's RIF means that this is a WAV file
it's and WAV formats riff riff formatted
audio and there's a bunch of stuff
describing how many channels there are
how many samples per second all that
stuff that's our header and then down
here we have our actual audio data and
you'll notice that most of it's at 80
which is zero and there's no till days
you'll see here if you saw till days
they would be over here on the right so
that's that's ASCII representation of
stuff that's happening over here so in
this case this is this this is your
tilde and I think if we look at an ASCII
table
and we look for our tilde then we're
gonna see that it's down there yeah it's
okay so that that's it right there it's
70 so if I look for seventies yeah I
have a 70 on the 1 2 3 or 0 1 2 on the
3rd bite on this line and there's on 3rd
bite line there's a tilde so it's not
really a tilde that's just a ski
representation of some hex okay
I'm gonna have to convert that back but
that's basically this is the hex this is
the raw hex but because we're in Python
we're not gonna want to look at bra hex
because we can work with numbers because
it's Python hey all right how many
frames we want to read we don't want to
read just a second of audio data let's
read the entire file and it turns out
that you can actually query this thing
and say how many frames are in the file
it's gonna read that header that I just
showed you over here and I should point
so in this header is actually some
information about how many frames are in
this file and I can access that by
calling get the number of frames that
are in this file it'll read the header
and tell you so if I read in the number
of frames and I say read frames as many
frames as in the file the audio data
will now be filled with all of the audio
data
all right so I go and I run this program
and now I get all the audio data not
just a second of the data and I take
that thing and I can start getting
crafty crafty is fun so we can take that
and convert it into we don't have to
deal with it in hex we can actually just
treat it as numbers because Python so I
can just take this thing and say turn it
into a normal Python list and alright I
can't run this program I have to put it
on the webpage and it actually turns
those it takes the hex and it turns them
into normal numbers so I can look at
them as normal ordinary decimal numbers
that were used to seeing cool point to
the header on your screen
see uni for you I will do it that's me
pointing to the header that's that's it
over there
but if I was going to point to it on the
screen I could actually just select it
so this this part up to well actually
see I clipped off the top one so let me
let me take that from here this part up
to here is all header so I could
actually parse that header if we wanted
to write our own wave parser but Python
has a perfectly good wave file parser so
I'm just gonna use it because why right
you run for that myself oh sorry see I'm
losing points with CNE you should get
something to draw every way except one
five to nine all right you know what
just for you oh you know what that's
totally not gonna work is it ooh
that's not the thing that I wanted to
hit is it yeah you know what sorry I
broke my streaming tool a long time ago
so we can't have that we can't have nice
things
okay so let me just pull that file back
up and we're good to go okay so we have
our stuff and okay so this is JavaScript
we're down here in Python land and we've
pulled the thing in we've pulled all the
audio frames in we've turned them into
normal numbers we know how to work now
we could convert them in Python but
that's not really the point of this we
won't we sort of want to do the
unpacking of this file in JavaScript
sort of like I did before
so let's not print the list out here
let's actually send it all the way down
to the client how do we do that well
since I have a width I have to you know
use a kind of standard pattern this is
going to be JSON so I'm gonna do
actually you know what we're gonna we're
gonna do this so actually this is gonna
be an array and let's return JSON dumps
so so right now that's going to return
an empty array cool and if I go and not
print this out but set this to
result is now going to be this thing all
right I get my audio data in the browser
which is kind of where we want it now
it's in normal numeric format so I don't
have to worry about changing it from
hexadecimal or not but I do have to
worry about converting it to the right
audio format for the browser which I'm
gonna have to take this unsigned thing
and I'm gonna have to put it into a
float 32 which is how the browser wants
the audio data to be so that's web
standards talk to the vape juice Jordan
if you want to know why I don't know I
guess it's the most neutral thing they
could do was to use a float 32 they
already had a float 32 array so this is
this is the float 32 array that we're
gonna be using we could new one of these
up and basically construct the whole
thing from scratch say we're gonna need
this thing to be 8,000 long that would
be one second so and it'll give me lots
of zeros now this is gonna be where my
audio data is stored except these
numbers need to be from negative 1 to 1
not from negative 128 to positive 120 or
in the case that there's a 0 to 256 so
we're gonna transform them back great
how do we do that well we go back to
JavaScript and we stop playing around
with images right no more of this image
nonsense this image fudou we're done
with images so if we reload this thing
should be black okay you're black let's
let's deal with some audio stuff so how
do we begin working with audio stuff in
JavaScript well we could use the audio
tag you know but again that's
downloading that's not streaming I want
to be able to manage bytes directly I
want to be uncompressing on the fly and
I want to be playing and I want to make
sure that I'm synchronizing with the
video so let's do that I'm gonna whip up
a new function we're gonna call this do
the audio do the audio all right and
doing the audio is going to wrap my
audio stuff so we're just gonna call be
the audio down here so we hit we don't
put everything in one stop and let's
take this thing now in order to do this
the first thing we're gonna have to do
is we have to fetch this we're no longer
we're gonna have to fetch this audio
data that we just created okay now
normally we would like you know if this
was really like a streaming service
would be like fetch the audio from you
know movie one but for now we only have
a single movie so we'd have to worry
about them so and we're not we're just
pulling this chunk that corresponds to
the chunk of video that we're we're
getting so we can we can pump chunks as
we want to go so great let's fetch this
thing so we're gonna fetch a which is
fine because it's a Venable promise and
that's Venable so we can say this thing
is really JSON so parse it for me all
right
neat so then after we've parsed that
JSON we can just drop that thing on
console just to make sure that we got
the right data and that's a nice place
to start okay so let's run this and we
drop it on dad there's our 128's 127th
all that other stuff okay do you not
like to use a sink no we're gonna want
to so a sink is a whole other story I
don't want to focus on how you would do
background fetching here mainly because
I'm frame timing myself the bad way
because I'm trying to keep this simple
so I'm trying to keep it simple we can
as we get into more advanced versions
like I said earlier we can go into any
one of these things and we can build a
very sophisticated version of it but if
we're gonna get through the whole series
and explain how you build an entire
streaming service from scratch I'm gonna
have to try to keep it simple but you
know let's let's spend our complexity on
you know the meat that's probably
unfamiliar to everyone like I think
there's well-known patterns about how
you fetch stuff and whatever but there's
not as well known patterns about how you
deal with audio data or video data this
there are there's a large community I
shouldn't say large there's a community
of video audio compression people but
it's a very small community so let's
let's let's do this oh my god the alt f4
is here hello
glasses I can read the matrix I'm sorry
I'm sorry we're we're yeah I don't even
see the matrix anymore I just see I
don't see the code anymore is he blonde
brunette redhead man cipher okay so
let's let's let's roll let's how you
doing alt f4 add the rest of the stream
go I did promise you I'd be doing
JavaScript look at this this is split
50-50 this is all JavaScript it's
comment this was our image stuff but
which is actually quite cool I can show
it to you because you dropped by and yes
special dudes so this was we're building
a video codec from scratch actually
already built the video part of it and
we took the first frame of video and we
applied a delta to it so that we can use
Delta compression we only have to do the
part of the scene that changed and we're
applying that we're getting it to work
this is the correct image okay so this
switch was definitely the coolest name
in this area I'm gonna I'm gonna have to
go ahead and agree with you but you know
there's just so many good lines okay all
right let's not get lost let's not go
down that right all right we're doing
the audio because we're going for the
reach
I feel yes and the Nebuchadnezzar the
Nebuchadnezzar
come on all right so we have this thing
on console but we don't want it on
console because we're gonna be
JavaScript masters and what a JavaScript
masters do like alt f4 is totally a
master he does like socket IO like
nobody's business so we're gonna go and
we're gonna take this buffer we're gonna
start working with it so right now we
have a normal array because we read the
thing in as JSON we parsed it like we
just saw over here we let we let
JavaScript do the parsing work right
here and now we need to start turning it
into the right format to be able to play
the audio alt f4 definitely have it a
day sorry about that I really shouldn't
have brought apologies alt f4 last thing
I heard he forgot oh come on
all that for is awesome and I really
appreciate the shout out and by the way
for anybody here who didn't come from
alt f4 check out all that for he's
totally awesome I don't
if I have an Esso command I I should be
that cool alt for stream yeah look at
that okay if we do have adesso command I
didn't remember if we coded it that
night when I was having a wine okay so
but definitely check out his stream he's
he he's gonna do JavaScript right I'm
gonna do JavaScript wrong I'm gonna do
it all the wrong way okay so but anyway
thanks a lot for shouting out earlier so
we're processing audio we got audio it's
coming down and we need to turn it into
something that's set we don't look at
audio in an array we have to hear the
audio and that's why I'm wearing the
headphones tonight come on let's let's
hear the audio I want to hear it okay so
the way we're gonna do this similar to
what we were working with the off-screen
context because audio wouldn't really
participate in the Dom would it it's not
something that draws audio controls draw
but audio itself does not draw on screen
so it's not normally part of the Dom so
how do we get it well we start with a
audio context which is something we can
pick up from window I think it's hanging
off window okay so we need to get an
audio context which gives us a place
where we can write audio that's you know
gonna be played by our speakers and
we're gonna call that the audio context
because I was feeling lazy naming and I
was using CTX so now I'm using a CTX
okay and we're gonna need a buffer
because we're not gonna write directly
into the audio context that would
probably sound not good
so with that audio context I want to
create an audio buffer because it's an
audio context so it only does audio
buffer and how big is that audio buffer
gonna be well this X thing that just
came in which oh sorry the first I have
to specify how many channels so we're
doing single channel mono audio and I
it's going to be as long as the data
that was passed in right it's going to
be that many bytes and the sample rate I
have to
I have to specified sample rate by the
way I spent a lot of time looking at the
Mozilla Docs - Thank You mdn but the
Mozilla Docs are pretty solid for
looking up current standards and stuff
like that without having to read the
spec so that'll create a buffer an audio
buffer that we can write to now it's not
anything really magical like if I just
dump this thing out on console it's just
gonna be a bunch of zeros okay so it's
not it has a little bit just like that
image thing it has a little bit of extra
stuff so here I'm specifying the length
which is you know it's 30 seconds of
audio of corresponding to our 30 you
know seconds a video it's this is how
many samples that is at 8000 samples per
second and I have one channel great now
this you'll notice it doesn't have a
data member and that's because it
doesn't know which channel you want so I
can't just take this thing and go give
me the buffer itself I have to specify
in JavaScript
give me the channel we're gonna use left
I mean we could use right if we wanted
to but give me the left channel so if I
do that get chanson get you know what am
I talking about
let's get you you want the data okay
she's a data option and there it is
there's my audio array as a typed array
it's a float32 just like I showed you
before before alt f4 showed up okay 8000
8000 - the sample rate seven one five
two oh nine so that's 8,000 pulses pulse
coded modulation
pulses per second so that's actually
that's really low for audio data
well audio is normally like CD quality's
44 one you know if you get like a flack
file it's usually 48 thousand that's
typically what web RTC is gonna use so
8,000 is like crappy phone call but it's
as low as I could get it that we could
still kind of like hear it and it was I
don't want to be too painful in stream
okay so we have right now we have all of
these things this is eight thousand
times thirty eight thousand times thirty
24 240 thousand it's about that much
because it wasn't quite 30 seconds it's
29 points something seconds and that's
the size of this array okay so two
hundred forty thousand samples we got to
fill out we parse that in JSON because
we're doing JavaScript little commentary
on JavaScript there sorry you don't have
to there are better serialization
mechanisms you can use so this is gonna
be our I'm calling this the buff the
buffer of the channel so this is
basically gonna be the left channel we
can also do the right channel if we
wanted to but you know you change the 0
to a 1 and you get the right channel so
buff chin we could actually write into
both channels if we felt like it and
this is gonna be a really simple copy
except just like we did with the image
data we have to do a slight
transformation now before on the image
data we had to go from RGB a to RGB
space here we have to go from this 1
byte value space 0 to 255 to float32
space with negative 1 to positive 1
because that's what the standard says we
have to do so we're gonna say let I be 0
go from until I is the entire length of
this JSON data of audio that we got back
and Inc okay so we use a three part four
and we just say okay buff chin serve I
was gonna get what we'd want to do on a
straight copy is just copy those except
we can't just copy it over because
that's the wrong format right if I just
copy this thing over and then I write
out
consul and then I'm gonna end up with
this thing in the wrong format I need it
to be that 128 corresponds to zero so
let's move it to zero all right there
you go now it's that now it's around
zero but it's still the wrong size so
what we're gonna do is we're gonna take
it we're gonna subtract 128 so that's
gonna make it negative now we're gonna
make it signed we went from unsigned
zero to 255 to sign negative 128 to
positive 128 and we're gonna divide it
by 128 we're gonna force it to be
floating-point that's gonna put it on to
the right interval it wants to be some
decimal from zero to one or zero to
negative one okay so I do that I look at
my buffer and we're in the right range
okay so now for the moment of truth we
need to hook this thing up to our actual
audio machinery because audio is not
meant to be looked at in the console log
that's not where the audio goes the
audio goes right here put the audio in
the ears this won't be super loud don't
worry apologies before for boosting the
game so what we need to do is we're
gonna need to get another very but we're
gonna need from a context we need to
create a buffer source and that means
that we're gonna be filling out the
audio data as we're going so that
that'll allow us to manage the buffer
directly in JavaScript which allows us
to do the kind of thing that seven one
five 209 was mentioning that we can
handle async we can do whatever we want
in order to be able to manage our audio
data as we want we can watch the
framerate we can watch how many frames
are being displayed we can make sure the
audio stays in sync with the video
that's an advanced topic so we're not
gonna do that today today we're just
gonna pray they stay in sync so let's go
ahead and create this buffer source so
I'm gonna call this to the Buffs source
very creatively and you know that's just
yeah so buffer source the the buffer
source buffer this this this this thing
if I drop it out
this is probably a part of the standard
you haven't seen I'm just guessing maybe
some of you have seen it you're gonna
get this audio buffer source node which
has kind of most of the machinery that
we're gonna need in order to be able to
play a file but it has nothing in the
buffer so you'll notice there's no audio
data loaded to this thing
so let's attach that audio data buff
source dot buffer is gonna get the
buffer that we created earlier so that's
this buff that we were working with okay
so that's gonna be our audio buffer
doesn't seem right there yeah put that
in 32 huh
yeah because this buffer source I'm not
getting this from the right thing this
this buffer source so my notes are off
here because this this buffer sir okay
so basically I'm filling out buff Chan
Oh buff chant that's right buff Chan is
actually just one side that's one audio
channel in the buffer so when I it's
really like a view so when I'm writing
into buff Chan I'm really writing into
the concrete data that's attached to
buffer so as long as I take the concrete
buffer and I attach it to buff source
buffer now I have something that I can
work with okay so I can take buff source
right okay so this knows believe the
notes buff source I'm gonna connect this
up to our audio context which is already
connected to our audio speakers in the
right way it's gonna do resampling and
all the other stuff that it needs to do
to get it in the right format and we're
gonna play now there's a little bit of a
standard issue here because you're not
allowed to do this because you can't
just play audio on a web page it has to
come out of a gesture right you know
because that's that's like you know
click to unmute you know all different
websites use different tricks to get you
to play audio but you have to have a
that triggers this otherwise that you
know it could be just playing in the
background there were all sorts of
annoying webpages that did that oh I'm
sorry I I missed a little bit of chat
there that's his Netflix doing all of
this in JavaScript yes but I can tell
you that Netflix is doing something a
bit more advanced than this we don't we
don't write a full streaming service
from scratch every day I'll tell you
that act like this is like back in the
alright I'll answer that question in
more detail later if you're if you're
still interested but it is something
that we're gonna get to I can tell you
that like you would need to do this if
you want to be able to manage your own
audio buffering in your own video
buffering well clone well chrome block
it yes well it turns out chrome did
block it our negative decimals okay or
do you need to be absolute value
absolutely need the negatives so you
need to be on the range from negative
one to one for all of your audio samples
since I'm using one by 8-bit audio I'm
on the range of 0 to 255 which is what
the WAV format requires so I'm
normalizing it in this line here to be
in the right range for JavaScript for
the browser so all I have to do to fix
this problem is I need to make it a
click thing so we're gonna build a
little play button for us to use so
instead of calling this thing directly
let's just add a button here let's let's
call this the play button spell it out
I'm not gonna be fun and find the
Unicode for the play icon but you know
that would be the the right thing to do
okay there's a nice play button now let
me let me hide my like Delta here
okay so we're gonna we're gonna zoom
this out a little bit and that pausing
it and playing it worked it does work no
so seven one five two nine this is
really a permission thing you want the
user to opt in to playing audio so the
browser doesn't really care what button
they pressed but they had to have press
something on the page it's that that was
to try to deal with websites that were
annoying and they would leave they would
pop under and then the browser would
cover you know back in the day of crappy
web well they're still out there I mean
there's websites like this and they
would start playing audio in the
background they would wait a while and
then sneak up on you and play audio and
do some advertising it's really
obnoxious and you never knew when audio
was gonna happen so now it's like it's
required gesture from specification yes
I know but I thought there was a
loophole there could be a loophole I
don't know the specs that well that's
really a vape juice Jordan kind of
question or alt f4 not anymore was
patched yeah I'd believe alt f4 on that
he's more of an expert on this stuff
some sites still auto play video you're
gonna play video like I just did before
but not audio and that's the reason they
usually make you wait so you have to
like click play because it says you
could play you opt it into a gesture so
that's kind of a trick it's UI trick
yeah that's that that's what we had
before we were playing video I mean you
saw our own video okay so let's just do
an onclick handler for this and we'll
say do the audio all right so if we do
that
like do you hear it up on the server
just because I want to try to keep
things simple it'll probably be I
totally hear it but that I don't want
the main I totally hear there you go
audio and webpage video in a web page
video Delta compression mission
accomplished
it's so happy I need to go have a drink
okay oh yeah it should sound like I'm in
space all that for this is 8-bit audio
this is like crappy phone call stuff
it's like houston we have a problem
yeah totally seven one five two or nine
I'm digging this is this is cool stuff I
mean so to get full video I mean if
we're gonna go for the gold let's do it
all together are we ready for this I
don't even know if we're ready for this
I mean let's do it
let's be crazy let's let's play the
video let's just take this chunk here
this is our image player right this is
this is the non Delta version of the
image player so why don't we just take
this whole thing wrap it up in a
function and just say this this is a do
the video that's my bad and let's let's
do the video let's do the audio let's
let's let's get rid of this Delta player
for the moment I'm not talking about
compression
are we ready for this are we really
ready for I don't know if I'm ready for
this it's not gonna work it's nothing
there up on the server just because I
want to try to keep things simple it'll
probably be Python but that I don't want
the main focus to be about I mean I want
I want to build it for that we know we
got out of part this is really really
we're gonna be paying more on oh my god
custom audio video codec from scratch
you saw it here that's it we did the
audio we did the video we just built
their own codec in JSON JSON this is our
own video codec this is awesome that's
it I'm done that done you know you know
what's really funny about that
Brandon it's not so much that they're
powerful actually they've gone way back
to what it used to be
we are just directly programming where
we're managing the bits and bytes
ourselves
we're counting on nothing from the
browser basically we're like just give
me a video surface to draw bits on just
give me an audio like let me pulse the
thing I'll pulse the speaker directly
and we're doing all of that we're doing
our own compression we are doing it well
we're really into it the next class
you're doing all of it are we are
managing it entirely ourselves in the
JavaScript red this is amazing
yeah we're actually going back to the
basics here Delta Force Jim I hate on
JavaScript sometimes but then other
times I remember how bad asking if he
for making cool stuff totally black
classes i I I agree with you hundred
percent this is awesome we have our own
custom video codec we just built an mp4
I mean there's a bit more in the mp4 so
than what I'm doing there's not not a
lot more surprisingly I mean it's really
just a mix of image frames which are
called iframes Delta frames which are
referred to as P frames they allow for
macro blocks so you can actually just do
part of the image instead of having to
do the whole image like I just did and
they also allow more than just Delta
right now I'm just using Delta which is
like okay what what pixels changed but
they allow things like motion for
example if you're into like a moving
camera scene most of the frame is the
same it's just like a little bit moved
and then like maybe blurred so they
allow for that too you know that that's
that's allowed in iframes and they're
also allowed to use frames in either
direction that's called a B frame you
can use frames that are in the past you
can use frames it forward but again same
trick it's basically an iframe plus or
minus some stuff you can use other B
frames once it decoded them invisible so
but that's it that's video compression
in a nutshell and they're also applying
Lempel Ziv on top of it which we can do
that with the pak OU library but that's
the next that's the next session you
want to build a video streaming service
from scratch you start by being able to
stream video we can manage there buffers
directly we can put any streaming in we
can make our own streaming engine at
this point so this is this is pretty
sweet we got the basics right here we're
not doing anything around
synchronization I don't want to over
build this this could easily get out of
sync like I mean if the files that I'm
pulling down are coming down at the
wrong rate I mean I need to add some pre
caching I really need to change this
format to not just be a sync so just
like this audio is really a 30-second
chunk I need to do the same thing with
the video I need to take a 30-second
chunk of video with my iframes
P frames you know my delta frames and my
image frames in our case and I need to
pack them into a single file and I
needed to decode them at the same time
that I'm decoding the audio so that I
can keep sync it's very important to
keep them in sync nobody wants to watch
like the odd like person talking and
it's behind that's probably what's
happening in my stream right now I guess
we could just send this up there I
really love learning how stuff actually
works totally seven one five two and
that's how this started I was in the alt
f4 stream and people were just talking
about basically how Netflix
works and I was like listening to Emily
no that's no these are all open
standards now like back so this is a
little bit more onto alt F force
comment when Netflix built a lot of
stuff the web standards were not there
the browser standards were not there to
do this I mean vide Netflix didn't
invent like video of the internet there
was IPTV stuff for years but to do this
in a browser and to do DRM I'm skipping
over DRM because you know basically you
take this stuff and you add some
encryption on top of it you get DRM so
an encryption that's really just a
different topic it's higher anyway I I'm
putting my video for free on the
internet I don't really mind
I'm not trying to DRM this audio this
crappy 8-bit audio so you know you had
it like the browser didn't support like
just having a surface that you could
draw bits on directly it didn't support
having an audio context that you could
just write on directly that stuff was
added later in html5 like not I don't I
don't even think it was in the first
version I they they had to do a bit of
work to get these like buffers and
channel you know even though the funny
thing is the this is really just
exposing what the operating system does
this is actually they're really just
cleverly passing it through and checking
that you don't have security violations
but really they're not adding anything
like this is they're just giving you
access to the the hardware basically the
only thing is like it does do resampling
so I didn't have to implement the
resampling but just just like I'm doing
this there's a way you could do
resampling in software I mean it's
that's that's fine you know there's
enough CPU available to do it but yeah I
mean we had to use like Silverlight
you know it used to be like you had to
download silverlight we'd do it in flash
and you know we would do some of the DRM
there we were you know there was a lot
of restrictions from studios about
content you know it couldn't just be put
clear on the internet so you know yeah
it was difficult you know and getting
that stuff through standards body so is
available in html5 that was a long fight
let me tell ya
that took forever you know and I mean I
knew in fact there's a guy I'm gonna be
interviewing he was the head of the web
team for quite a while at Netflix he's
awesome dude and he's he's a hacker like
bad ass hacker and I'm gonna be bringing
him on the stream a little bit later not
not in this series we're gonna be doing
some hack stuff there's somebody
super-fun but he was so happy the day we
finally had a pure html5 browser no
Silverlight extension we could stop
installing that they I mean Microsoft
had long since end-of-life date they
were like keeping it on life support but
you know it's just those were long days
anyway dudes awesome I'll try to get him
in here anyway
I hope I mean we're coming up on two and
a half this is two and a half hours I I
hope you all enjoyed this I mean a lot
of the magic I mean this is how you do
the video image decoding and the audio
surface itself we covered all that in
this stream we even covered a bit about
Delta which is really starting to get
into compression but you know when you
start getting these runs of the same
number it's very easy to compress you
know you don't have to use very advanced
techniques to compress this you know I
don't want to get into that topic but
the fact is this is kind of what you
need and then you're gonna add that
stuff on top right and then you know
some of the synchronization stuff you're
gonna be able to manage yourself as well
but you can get more advanced like I
said on any one of these topics I mean I
could I could spend you could do an
entire PhD in video codecs it's a very
deep feel there's been a lot of research
but I'm trying to give you like this is
actually kind of the this is high-level
but it's it's pretty practical like this
is actually kind of how it does work a
lot of the magic is in how you're gonna
compress it how you're gonna deliver
this stuff I mean I'm not gonna get into
all that but
we're trying to build a basic CSG flick
service here but these are the tools you
need to do it you know this is we have
the ability to write audio we have the
ability to write video so next time
we're going to talk a bit about
compression you know we're gonna try to
cut the number of bits down I'm not
gonna get too detailed on that because
that's really a tough topic and but you
know enough to like you know we're gonna
we're gonna work on like run-length
encoding and like you know kind of
simple compression techniques and then
we'll we might do a little bit with code
books then we're gonna get into building
a UI because now we have the ability to
play a movie we got to select which
movie we play right I mean that's our
first food we have to add some more
movies and then we're gonna get to the
real internet which is actually how this
started I was talking about that in the
alt f4 stream and the real internet adds
challenges because I can't just count on
these things instantaneously downloading
I have to sort of buffer a little bit
how much do i buffer do i buffer how do
i watch how the connectivity is flowing
how do I collect statistics on how
different ISPs work you know how do I
work on the user do they use Wi-Fi you
know do they not use Wi-Fi who knows
personalization this is I'm still a
little bit iffy on some of these topics
I think that we're probably gonna
collapse these into just one of them
or maybe some a/b stuff and then we're
gonna close we're gonna get more
advanced on any of these topics that
you're all interested in
so what's x264 that's a that's an
encoder that's open source of the h.264
codec which is actually what this mp4 is
you know this this file here I'm being a
little bit nonspecific because mp4 is
really a container format it's not a
video codec because in this file in in
my movie file that I met my raw movie
file that I started with this was the
this thing I had a video stream and I
had an audio stream and they were both
packed together into this thing now I
could do that with my JSON I could just
say okay here's the video is array
sub-zero
audio is array sub one and that's
similar Dow this works they interleave
the two so that you can keep them like
but basically that's kind of how it
works in here but the video part of it
is h.264 in this particular container
that in this particular file so x264 is
a free encoder for that but the thing is
h.264 requires a patent so we're not
using h.264 patenting in h.264 is a
really long Wikipedia on article on it
patents usually get paid by the device
makers when they're putting it in
silicon software decoders there's
there's all sorts of like gray areas on
the legalese whether the patents are
enforceable or not Google Apple this is
like there's years of arguing about this
kind of thing so we're making our own
video format so we're not dealing with
any patented tech I can open-source this
later x264 is yeah so when we were doing
it before when I unrolled I took this
video earlier in the stream and I
unrolled it to a series of images PNG by
the way is patent free it's one of the
reasons I chose it but the when I did
that that was actually I used ffmpeg but
ffmpeg doesn't do that itself it
actually calls lib x264 to do the decode
and turn that into individual frames so
we actually did use x264 we just didn't
use it directly it's included via
library so that's you know I guess we
could have done that but yeah we used
ffmpeg because I didn't want to deal
with it alright well thank you everybody
for dropping by I think this was a very
productive stream I hope you got
something out of it and yeah we're gonna
move on from there just like I said
before so I guess next up is gonna be
compression um I might skip around a
little bit on or I might I might go to
UI next just because this was already a
dense CS topic so maybe a little have a
little more fun with UI and users stuff
a little more familiar with for some of
you compression is really about doing
this better so it's about doing it with
less bits
but yeah thank you very much can you
make it look a little harder next yeah
let me tell you during the prep session
a bunch of this stuff I had to figure
out I was like oh my god go through some
of these standards acts like trying to
get 8 kilohertz audio to play in a
browser I was fun it's trying to figure
out that format of float32 see you just
get all of it you know but I had to work
pretty hard to get this stream together
thanks to stream this is really cool I
appreciate that brand and I appreciate
all of you who caught me on various
things during the stream to keep me
honest you know it's fun and I hope to
see you all around spread the news tell
your friends they want to learn how to
build a streaming service come to the
rest of the show