Updates: Tyler on PowerHacking CSS to cutestrap your app in this latest episode!

Topic 8 - Allocators, dynamic structs, and the Heap

Outline:

  • binary I/O with fread/fwrite
    • padded vs. unpadded structs
    • gcc directive __attribute__((packed))
  • no semantics around void *, creating arbitrary binary formats
    • data description is not implicitly included
  • recap dynamic memory basics: malloc and free
    • addressing pointers to structs *(ptr). vs. ptr->
    • reference with &, dereference with *
    • stack / function memory model
    • local references (aside from return) are lost
  • introduction to malloc + free, allocator basics
    • handing out chunks of memory
    • making malloc from a byte array
    • tracing allocations and bookkeeping
    • calloc is easy, realloc is harder
    • lots of implementations available, C just defines the interface

Transcript:

hello
everybody how are all of you we are back
for
topic eight of c as an additional
language with the Enceladosaurus who's
on the line
you there hi everyone hello
maybe i got the accent in the right
place too so hello chat
uh we are just doing a little warm-up
here
uh we went over some we we missed last
week
although a bunch of you came by for
adding manual pages i don't have that
version of the code deployed tonight
because we didn't have time to look at
websockets and all sorts of other boring
details but
we're moving beyond that we are just
getting down and dirty with c we are
going to get back to dynamic memory
allocation
and one of the things we sort of briefly
went over
um previously was how to parse
and read text stuff in uh
ascii text nothing nothing too crazy but
um
from from the file system and today
we're gonna kind of go we're gonna use
that same data we're gonna go a little
deeper
um so for those of you who've seen
the web page i think you all know
what's there but i'll find
this thing and yay all right cool there
it is
uh so this has uh we're doing topic
eight here
and i'm going to not i'm going to just
ignore this data so i i
took some dry boring dull census data
which is very important for your country
fill out the census if it's been mailed
to you but
all that psa stuff but we're we're going
to just like
we're going to go back to this planetary
data and we're going to
start messing around with that in binary
light so
got the glove we're ready to draw things
and i think jesse is ready to learn
stuff so oh yeah i have our shared
editor up
and let me just pull up our whiteboard
sweet okay and by the way for those of
you who don't know
that was a real 20. i rolled that 20.
i jesse is my witness she was on while i
rolled it
the camera was live so you can go check
that's a real 20. it's legit
all right and because it's a real 20
that means we're going to do
hard topics today because the luck is
with us so
let's oh no get ourselves another color
and let's let's just leave off oh that's
fun
all right so yeah that stands out yeah i
should use that color more often so
binary io um so
that's uh and then over here let's
choose another fun color this is going
to be
our allocate oh that's too dark dark and
dismal
no fun allocators
so this is dynamic memory so
this is kind of like where c really like
the training wheels come off so
this is there there's nothing harder i
think to talk about and see than
dynamic memory and all of the fun
associated with it but
we're going to get to that a little bit
later so for right now
um we did a little bit of binary io
and how did we do you remember how we
opened files and
got references to them yes so
it's been that was not last week i think
that was the week
before yeah it's been a little while so
it's been a little bit
but um i'm trying to remember how we did
that
so this was with like f open
yes exactly um exactly that's
f get s and things like that s for
reading things
this would read a line at a time
assuming you had enough buffer space for
it and then we had fclose
to sort of wrap that file up and that
would
this actually inherently causes the file
to flush its stream so if you have some
data that hasn't been written
this kind of guarantees that it gets
written so
when we did f open we specified some
stuff
and the calling stuff that we did we
sort of said like okay f open
and then we set a file name you know so
there's some
file dot text or something like that
and um and we specified
a open specifier and this is the same as
it is in
python and most music systems so we just
specified we wanted it
open for reading and this returned for
us
something very unpythonic we got back a
file pointer and we can just call it
fp for the moment um and this is sort of
valid
so actually it's not sort of valid it
should be totally valid
now assuming your file system doesn't
have a problem with uppercase letters
but you know
c doesn't really get into that kind of
thing um
great well this is sort of the
the part we're going to mess around with
today so when we opened this up we
intentionally did not specify anything
else we just said we want this thing
open for reading
and that's cool because we can say this
is reading
we can also open something for writing
which we also did the other time
last week or the week before sorry the
week before the week before that
um and one that we didn't really touch
on but it's there anyway so for
completeness i'll just mention there's a
for appending and this is sort of the
same thing as writing but instead of
writing will if the file already exists
it'll just
delete it it's like delete and open
um and that'll be like you know so it's
like you're opening this thing for
writing like you
fully expect to start from scratch and
appending is really
if it's already if there's nothing there
then it's really the same as
opening it for writing but if there is
something already there
then this will just stick it at the end
so this will start at the end and
the only real difference between these
two is
the position in the file so in this case
like it's actually the position is zero
it's starting right at the beginning
and it's making sure that there's
nothing after it in this case it's
whatever the end of the file
is and it's going to say all right let's
just
start adding stuff on there um
so if you want to be careful generally
open things with the a permission if you
really want to make sure that there's
nothing there then open it with the w
permission
but that's all pretty straightforward
and i think it's pretty much the same in
python
yep pretty much okay now the other
specifier we can add which
also it's funny python borrowed it as
well is we can add some other stuff here
which we can say
we can tack on a b have you seen that in
python
okay cool so this this implies binary so
we do not want you to interpret
so this is basically do not assume
anything
about the data that's in there so just
assume that it is just
raw data and we're going to operate on
it as raw data
which means we are not going to do
things like f get s
we are not going to do things like f put
s these these imply that i'm working
with
strings and text stuff but
instead we're going to be using
binary operators and those are f read
and f right and they're basically
reciprocal
and again you'll get permission
violations if you try to do an f
right on something that's open for
reading and same thing if you do f read
on something that's open for writing
bad stuff usually happens so you know
try to open with one permission there is
a
further modifier you can add which is
you can tack a plus on the end and that
means
reading and writing yep
but we're not in python yeah we're not
well yeah python stole it
um which was the right thing to oh and
hey what's up loaf bone sorry i just saw
you
i guess nc already saw you um
okay so f read and f right well what do
they do
i mean in a nutshell they read it's sort
of like
f get s in that you need to say how much
but you also have to say where and keep
in mind that this is no longer
structured
right so this this is not a character
string like with with fgts we can kind
of always assume we're dealing with some
sort of care pointer
and with this we could make it a care
pointer
i think the actual specification is for
a void pointer
um and this just basically means i'm not
telling you what type it is it's
it's a type and it's definitely a
pointer but you know if you want to get
into that you're going to have to cast
it to something else
so you're going to have to say this is a
stream of ins or this is a bunch of
floats or whatever you want to cast it
to but um
the we better keep that pointer there as
well so quick question actually because
like
we we've used void before and my
understanding of it is just based kind
of situationally
on using it to define functions like the
return of a function being a void
meaning
kind of nothing yeah but in this case
having a void pointer that doesn't quite
track for me just because
yeah how do you point to nothing well so
it's not in this case it's it
we're overloading the use a little bit
but it doesn't mean
nothing it means no semantics
so if you're operating with
a character you know like for example in
here when you're dealing with like a
care
if you're doing like an unsigned care or
um pretty much or a bite on some
operating system well
let's see i was just working on
nintendo's stuff so
bites over there but um these all
have semantics kind of associated with
them like i i plan on dealing with this
as sort of
textual data but and if i if i throw an
int
specifier or a float specifier on it i'm
i'm sort of hinting to the compiler like
you know on most systems that we're
dealing with nowadays these are going to
be four byte chunks
and you know each one of them is going
to be you know
follow float rules so i'm going to have
like you know uh
you know the significant
and the uh mantissa
um and i'm going to follow float sorts
of semantics uh with an integer
it's just numeric
numeric round numbers and
in the case of a void pointer i'm
telling you
no semantics so if i try to do any sort
of operation that
implies anything about the type the
compiler's going to scream at you
so what i'm still kind of having trouble
conceptualizing that so if if you don't
have any semantics and if you can't make
any assumptions about data types yeah
what's left well you have to make the
assumptions you have to explicitly cast
it
so you can say like this is a void
pointer
but it's really
a care pointer so i'm going to cast it
out so
this is kind of like up to you as the
programmer to say like
i am now putting semantics on top of
that i want you to treat this as
character data
but why wouldn't you just because i
don't know what was written to that file
i see okay so in the case of like f read
or f right
i don't know what kind of data was
written there so i can't
give you any semantics until you
explicitly say this is really
integer data or whatever i
i follow you there but i guess you know
i'm i'm i'm getting really used to this
kind of c
aspect of needing to know some of that
like specifying how much memory and
whatnot how do you point
because you know pointer being to a
memory address right
yeah so how do you point to an
address in memory if you don't know how
much memory
you need because you don't know there
you go
how big is a pointer or how i mean in
this case like
for having avoid anything like how does
how does the computer know how much
memory to allocate for them
wait wait how big is a pointer well
pointers not
pointers as big as any address in memory
exactly
it's not related to the underlying type
so but but so the way that we've dealt
with it so far right is that
you're pointing to a memory address for
something that exists
and so right the thing that's kind of
tripping me up here is like
it's almost like a placeholder yeah is
what it feels like and that's fine but
then how do you put a plate like
the thing that i've learned in c right
you can't really have placeholders
because you need to know how much memory
you need
so how do we know how much memory we
need every
do exactly that they say how much do you
need you have to explicitly say this is
this size because there's no semantics
there on the void pointer the void
pointer itself has no idea what the size
it doesn't know how to
deal with any of the types in fact it's
intentionally untyped
the only thing avoid is not a standalone
type at all i can't declare something as
void
yeah that wouldn't make sense i can't
say like void x i mean it's like
how big is x like
what okay so you have specified how much
memory you need
and it's just that we're leaving it kind
of data type neutral or
exactly except all i can't define
a void of a thing but i can define
a void pointer of a thing that makes
yeah because
this is the same size no matter what the
type is
right so because it's just it depends on
your memory yeah it's really just saying
this is an address in memory it's not
saying anything about the semantics of
that memory
okay yes follow me that makes perfect
sense that's kind of important because
when we're dealing with a file
how did we store it so when we were
using you know
f get s and f put s
the semantics are clear we're talking
about text so
this is character data and in particular
it's going to be character arrays
so that's that's like very clear to the
compiler but
in the case of void in the case of
binary data
i'm the c standard doesn't know all the
different ways you're going to be
writing and reading binary data in fact
the most common case is you're going to
be
writing and reading structs and your
file format they don't have any
you know idea of what you're going to do
with that file format you might decide
to write like
a character that's you know a string
that's 10 bytes long and then you write
a struct that's like five bytes long and
then you write
that's up to you you know it's up to you
how you want to serialize your data or
deserialize it
so and yeah strophium's going void
pointer is like photo of a ufo i don't
know what it is but here's where it is
that's that's i mean that's a very
interesting metaphor i kind of like that
i like that yeah and
and that's and that really like focuses
on the fact that it's a pointer
it's not a thing so i i don't know
what is going to be there until i take
the time to cast it
so casting is your job as the coder like
if you want semantics and you want to
start dealing with this data
you should probably cast in fact when
you have a void pointer you should treat
i wouldn't treat it like a ufo
i would treat it like you're holding
plutonium like
don't do that you know like it's as
quick as possible
know what you're holding you know i mean
like this this could be radioactive you
know like so
so type that thing as quick as you can
in your code
now that's kind of guy editorial it's
not required by the c
standard but you know good practice is
you're probably going to want to type it
as quickly as possible so that the
compiler will
prevent you from doing bad things with
it
that makes sense okay so well what do we
do with this binary data we're going to
start the same way we did last week and
we're going to write
something out first which is going to be
the thing that we want to
you know operate on when we're going to
be reading it back in so
right i don't really want to write out
text data with this i mean i could use
f right on text right i could say i
could take my care pointer which is
pointing at something and i could
cast it to void pointer and say all
right you know write this thing out as
binary like
that's totally fine i could do that but
we already have a whole bunch of text
operations for it so we're not usually
going to use this we're typically going
to be using
things like structs so today we'll be
focusing mostly on
using it for a struct so in our case we
were doing struct
planet and we had i'm gonna keep the
same thing sort of as what we were doing
before so
in this case we were doing you know
characters and the name
was 10 big
and the period
the orbital period is just a normal
float
now this structure is something that i
can write out
as binary because this this has some
representation in memory
and i'm partially setting you up here
and
it's because of a fact we very lightly
touched on so i apologize in advance but
we're going to work through that in the
actual code but um the important thing
that we need to know
here is that this has a definite size
right this this particular struct planet
has a definite size and what operator do
we use to figure out what the compiler
assigned it
um was that malloc well matlab is when
we were doing it dynamically but i mean
how do we tell how big a type is oh um
just size of exactly so we can say size
of
and this thing right and i remember this
being a little confusing last time just
that
the memory that it allocates was not
what was intuitive
yeah that's exactly the setup
yeah so i remember kind of us looking at
and we're going to look at the
assumptions that we have about memory
yeah because it was you know assuming
that
if you have kind of like a certain type
a data type that has a certain size and
a different data type that has a
different size
like we would think maybe just add them
together or take the bigger size or
something and it was
i remember it not being quite as
intuitive as i had hoped yep
it's not and so we're you're right to
think about that and that's exactly what
we're going to look at before we go mess
it around with this thing because
i told you before it's going to matter
well this is where it really matters
so okay um in the in our case
uh excuse me so it's kind of whatever
today
so f right i didn't fully tell you yet
but f right and f read are basically the
same thing and they're going to work the
way you just sort of said
earlier so f write takes um
and we can look this up in man pages but
basically you got to give it some sort
of void pointer
which is this is going to be the thing
and
then you have to pass along how big is
it
well how big is each element so let's
let's call this
each element
size and then there's sort of a
how many
elements
and there's
oh and then obviously where you want to
write it to so there's a there's a file
pointer which is
you know you know where
okay so that's f right f read is
very similar and we'll we'll get to that
a little bit later after we get through
our f writing
but f read basically is going to take
another void pointer which is this is
where you're going to get your result
when you're done reading it
and it's going to take you know assa
each element size again
in this case they were very consistent
in the library so you don't have to
worry about like kind of screwy
parameter order
and again how many elements this is the
same and
again a file pointer which is where
so this is all basically identical
when you're reading and writing and
that's a little different from
some of the string operations but that's
okay
so does that kind of make sense this is
the high level yes i think it just
and the f right requirements make a lot
of sense to me but at least with f
read i think the difficulty there
is it it requires you to
know so much up front when
i can imagine there's no semantics yeah
and i can imagine there are
situations where you don't know a lot
about
the data file that you want to look at
and so i wonder how you would handle
situations
like that that's exactly the obfuscating
detail a lot of c programmers use
intentionally that's i mean like
games would hide data like this you know
like they they intentionally did not
want their data structures to be obvious
to read
you know because they were worried about
it so i mean really that's kind of the
wrong
approach because you really should be
encrypting things and you know that
but anyway like you know back in the c
era they were like ah if i write the
file format to be all screwy then no one
will figure it out and
you know i'll be hack proof and that was
ridiculous
what fake donate what i don't
i don't know but somebody is fake
donating you thank you dollars
uh for 10 grand a tour pepsi of fake
donate
i appreciate it we're rich
and yeah nathaniel's right this was sort
of that area that they refer to as kind
of
well actually i shouldn't say that era
it's still the case but they did believe
that this is
this is a case have you heard the term
security through obscurity
yep okay so that's basically that and i
don't like it so yes
you are right it does need more stuff
because the the myriad combinations you
can use these things
is vast i mean you can you can go f
writing
you know a struct planet one minute and
then you could f
write a care the next moment and then
you could
f write another struct and then you
could you know whatever you want to do
like it's it's all your choice with this
but you're gonna have to undo
whatever you did when you read it back
in i think it's interesting because
and and this is just maybe era
differences or whatnot is that there's
such
and i think rightly so it's such a such
a focus nowadays at least on you know
truly like you're you're an okay
programmer if you write code that you
can understand
but like you you kind of level up if you
will
if you write code that is understandable
by many people
and so it's interesting because
it's a different mentality and there's a
lot more focus especially
i think on a lot of development teams
that i've worked on where
you you're there to make your code as
readable and explainable and accessible
as possible
and so it's interesting that so many of
these features as we talk about them
are kind of designed for the opposite
designed to kind of like obscure things
or to hide things
and understandably so for like
proprietary stuff but it's just
interesting yeah i'm going to tell you a
trade secret you know back in the day
you they used to try to make your code
as unreadable as possible because that
meant they could never fire you
that was a prevalent mentality
like there were there were a lot of
companies where that like
exactly the opposite of what you're
talking about like the open source
movement hadn't started yet things were
very proprietary and a trade secret
and you know anyway different that's
really interesting though too because i
think
i just it wasn't the open source era i
can tell you that such a shift in
mentality is that
you know i the family makes it work
like thinking a lot about the next
person that's going to
improve this or need to read it or
understand it so trying to have
as as thorough and readable
documentation as possible
and you know have very intuitive
readable code and so it's
it's just it's interesting it's a very
different mindset this standard was
written
a long time before that so the standard
that you often talk about us using was
written before i was born yep so
yes yep computers were different digital
was a thing
you know that deck was a company so
uh rip deck long live deck
deck is dead okay so we still have octal
to deal with anyway we'll get to that
later
so um deck left us a few gifts which
we're gonna talk about in future
episodes
i'll let last miles get into that
actually because that's really his cup
of tea
so anyway um great let's set up
a structure let's let's let's assign uh
a couple planets
if you let's do you remember how to make
an array
i know we did arrays kind of quickly but
it's pretty much the same as python so
you know how to make an array
so in main give me an array of planets
i don't know call it p
okay an array of planets yeah
okay so oh and i'm sorry you said make
it a main so
an array of hmm hold on
yes okay fine so are these going to be i
should know are these going to be planet
names or like planet
oh no i want to use this struct planet
thing we got they're going to be
how do you make yeah how do you make a
okay well start with a single one how do
you make an array of structs okay start
with
start with a single one a single what
a single planet okay so
let's just say that we have we're gonna
call this planet bob bob
nice okay okay and array operator
oh yep uh well
it goes in the usual place okay
now we have the bobs they're bobs
okay so yeah nice
and we can pluralize it if we want okay
now c is going to require a size because
it doesn't know how to figure out from
this code
how many planets to allocate space for
so
okay let's give it three
and so the three there it's going to
know to take whatever the
like yes okay nope ignore me i'm good no
i'm not ignoring you because that's
exactly the next thing we're going to do
is how
yeah allocate what but i think so since
we have how much what defined
this struct planet right it's going to
work just like
care creating a string which is an array
of cares right that
that data type struct planet is already
defined
and so that size is then already also
already defined
so let's get that size size of let's get
it
um what would you like me to do with
print it out on the output
okay
very nice
oh and new lines we got to get something
to remember
yeah there's no python here um
and we just want planet so can i just
reference it as planet or do i need to
say struct
struct you always have to say struct
okay until i teach you how not to say
struct but
there's a there's a dirty trick we're
going to get to a little bit later but
i don't like dirty but that dirty tricks
is basically the next episode
that that's topical dirty tricks
all right so we're trying to do the pure
sea up up until like the last episode of
that
we're okay now the rest of it
you paid your dues yes you've paid your
dues you've written enough
good c so now it's time for the dark
side
okay um so we know that the size of this
is 16
and that's a little funny because if we
knew that a float was the same size as
an
int
doing some math in my head um
that seems awfully
because we can run that and it's still
the same size because the float is the
same size as an in
and it was how big on this system 32
you got it so so four
yeah so four sizeof gives us fights
bits bytes there we go whatever the
eight one is exactly
and so this is technically six times
eight bits right
six times eight no no
what am i thinking of wait wait we have
uh the in wait
take this apart so the o period here is
how big
32 right well use bytes
bytes yeah or four yeah it's four okay
so that's four bytes
and how big is a character
six no eight one yep yes
yes correct it's one bite so ten of them
is ten ten so
ten plus four is sixteen
because c rounding up in multiples of
eight
um it has to do with padding which we
touched very briefly
and because of how like the memory is
organized
yes that's correct yeah okay yep i
remember that
so that's important here because when we
write this thing out
our file file bytes are different from
like ram bytes and that like you know
everything we can do
sorry really quick am i remembering this
correctly that we can do like oh i'm
gonna butcher this it's like
i remember having underscores like
attribute and then we can say
like not stacked it's like yes
yeah and we can say that usually we put
a space here but that's perfect
by the way like and that's going to
prevent the memory padding run that
okay oh oh
nope maybe hold on huh i think there's
supposed to be two maybe yeah
there we go all right and so that's what
gets rid of the padding and that changes
yes that gets rid of the padding so now
ram is going to be matching kind of like
what we expect
now that's really important when we're
talking about disk because disk storage
we don't waste bytes on padding
on disk right that would be a crappy use
of memory that
you know back in the day was like punch
cards or something you know i mean so
like we we really don't want to like
throw away memory on persistent storage
we don't really want to throw it away in
ram either but there's sort of a
trade-off
padding gives you a trade-off between
speed of access and
you know which is why it's aligned for
the microprocessor architecture
versus like it takes a little bit just a
tiny bit more so why
why does padding allow for greater speed
is it just because
i mean so horrible metaphor incoming
but thinking of kind of pointers and
memory locations as
ad like physical addresses right like
each address points to a house
and that houses on a plot of land and
like
let's say we have oh god this is gonna
get bad um like
500 units of furniture or something and
maybe that's enough
to fit like a full house
and then like maybe a little bit more
but it would make sense
for like the memory to point to like two
separate houses to make it easier to
access
versus to like start a new house's
furniture in the second house
does that make any sense maybe
i'm not 100 on that one in my head that
kind of makes sense is like
you you made me think of one that might
be related to that which is like have
you ever played that game with like
like you're coming up on like an
escalator and you know you're like kind
of timing how long your stride has to be
so that you
hit the escalator when you're like you
know like you want to be like a flowing
smooth motion right you know you kind of
get on the escalator with like a natural
stride as opposed to being one of those
people that like
stops at the escalator and waits for the
stare and then like steps on it like
yeah like you don't want to be them
right well what is so so you
you kind of said that it has to do with
the the microprocessor
architecture process or something i
forget what you said but like something
about the architecture has to
means that if we keep things padded like
in my head almost like new lines
it's easier or it's faster to to read
but then or to access but why is it
why is it faster to kind of access
a new line or new like you you always
use this sort of picture of
of little horizontal rectangles stacked
so why is it so much faster
for a computer to start at the beginning
of the horizontal rectangle versus start
in the middle
it seems to me that it wouldn't make a
difference it was designed to
but like why that has to do with
reasons that we're going to have to just
get into hardware to to do no worries
but it literally it comes down to like
kind of like some
architecture choices that they made
specifically for
like how much data they were typically
accessing for speed
like so for example like as as the
standards started to sort of settle and
you know typically we were pulling
blocks of four for ins from memory that
was a very common operation
then it made sense to make processors
that were really fast at pulling a block
of four
so it was just like you know at the
start when building these processors
this one kind of unit of data and as
data types diversified that became less
efficient no there was no standard in
the beginning
no no i don't mean like like standard i
just mean that no no i'm saying there
was no
agreement on what a reasonable amount of
data was
is kind of the problem like so so you
know some companies are like units of
the data
yeah right so like if if the units of
the data
like its base base eight or base four
if you will right and so then it doesn't
really matter
how wide your your stack is per se as
long as it's in units of
of four or eight or whatever your data
yes but i feel like
we should really talk about this at the
talk session
okay because this is gonna this is gonna
be a long one i have a feeling
but um so i'll just i'll just call it
data access
and 31-bit architect okay hold on i
don't know
yeah there's 12-bit architectures
there's
a lot of the common ones were like you
know bits or 16
bits uh that makes sense sometimes there
were 24
bit you know people were like oh well
rgb each thing is eight bits so we're
going to line up
again it you have different trades and
it kind of comes down to cost at the end
of the day i mean you could make
something that could read everything
equally fast and then alignment isn't an
issue
so alignment strongly has to do with
your chip architecture because it has to
do with how the hardware designers
design it but we're going to go into
more detail on that later no worries i
i don't mean to derail i think like
sometimes knowing those details of why
something's done helps me understand how
to use it
but i will i'll put a pin in that for
later pinned
pin i'm not calling this talk topic so
i'm calling this pinned
topics okay so josh all right so we were
we to get back to this we were talking
about the attribute packed which helps
get this of the padding
and the reason why it's 16 not 14 is
because there was that padding to help
it
align better for speed in our stack it
does matter in architecture
this is kind of like one of those like
sort of leaky details of the compiler
because c
has no real opinion on this like c was
trying to be
something that was portable to all
architectures so they didn't want to get
into like what the chip designers were
going to do so you could make up your
own c compiler and align this however
you want
you could have a c compiler that never
pads if you want
okay you'll just you'll have to
implement it but
um all right so theory aside we now know
that the size of a planet is
14 when we're sort of dropping our
padding
uh and that's gcc that was our first
compiler directive so
how do we fill out a couple planets here
in this array
how do we get some bobs
so um in that case
we could so this is where
the idea of having an
no no i got this i think i got this hold
on so
so you could kind of like
start here right we can set
the first element of our array of
structs
um actually um hold hold hold your
hold your horses let's say that we've
got
a bob bob one a singular bob
and so we can say that this is bob one
right uh no
well there goes my understanding all
right i'm still kind of struggling
mentally with this idea of having
an array of structures like from a
python perspective perspective it makes
sense but from a c perspective
it doesn't so how to interact with it
and assign elements
of the array of structures is okay
so how do we this refers to the first
bob right
and when i'm in a particular like when
i've just got a normal
forget about the array for a moment can
we do it i'm sorry
we do like yes exactly
gotcha okay except it's another array
because it's a care type
so how do we deal with putting stuff in
arrays
into character arrays can i can i cheat
will you let me cheat what what's
cheating oh like leaving the brackets
because i remember when we first learned
that no no no it's already been declared
that it doesn't work
that only works at definition time like
like when you when you when you declare
the thing
you don't get to do that later on
you can try because name is already 10
right
so when you said that it's it's another
array
yep what were you getting at because
this would really be like
m exactly right so
do i can i just put a colon
what do you mean colon
cause that just that's that's absurd
which is why we have a string copy
routine right we don't we don't normally
do this with strings
right okay so thank you
i it's that
i've been coding in python all day and
so we gotta break
sometimes well i can't break that that's
my job
but the the nighttime switch to see
sometimes like my intuition is still
um in python land okay so string copy
that's what i want to copy it into and
then we'll call it
cool bob i'm gonna make a new planet
okay it's gonna be awesome
all right let's try it okay
well so this isn't going to unless it
throws an
error it's not going to change the
output because string copy doesn't have
an output
that's okay so do you want me to print
to make sure that we are finding it
appropriately okay
so in this case we want
the bob planet is
and then string
new line and then bob's
zero dot name
all right
nice hey look at that
okay let's go to real planets for a
moment
that that are in this solar system so
we'll just limit the domain a little bit
chat stop with the function pointers we
briefly touched on that last time but
that is still
like here between ignore what they're
saying because they're oh i'm ignoring
it and i'm telling them to stop
we're getting to what they're talking
about actually but we'll develop
um so very confusing but yes okay that's
okay
um we're not doing function pointers
today don't worry
all right just just the idea of a
pointer that points to a function
bothers me so
yeah that doesn't make sense that's a
separate thing okay so okay
so yes all right well how do we so you
haven't fully initialized
this first bob's because we forgot the o
period right
yes okay and so how do we assign o
period for it
because it's an integer i don't have to
worry about this ridiculous string copy
stuff
and so i think i can just do oh
sorry let me switch that back to uh what
it should be
this is going to be hmm you know what
bob bob zero should be a planet that is
very close to its star
okay so let's talk about i'm gonna go
i'm sorry
20 0.5 nice
okay and if we wanted to add that to our
output
float so all right chat no manually
implementing v tables
knock it off oh lord words i don't
understand
yeah it's c plus plus stuff
that's another language i don't know
okay um oh period
well you're getting to know it a lot of
c plus plus is c
clear this output and nice
look at that okay now
this um if we were going to start doing
the second planet
we would have to do it doing it the way
you're doing it
but with one instead we would do the
same thing but with one and then we do
the same thing again with two for the
last one
all right well nathaniel stole a little
bit from what i was going to tell you
but
there is a kind of nice way to
initialize
us an array of things and
we already know how to do it with
character strings right because we do
something like this we say like um
you know some label and then we can you
know
do one of these and say some label
and c kind of just does the right thing
with that right
yes and that is because that is a nice
uh feature that you told me was
implemented in more
recent c but not in the original c
because you made me do it the hard way
the first time yes
yes i remember that it does
i'm grateful for the empty brackets yes
yes it
it is nice it's a nice little compile a
little bit of syntactic sugar
not the salt that we were talking about
last time but um so
but using that same concept we're
basically assigning an array
in this one line right because we're
assigning s and then
o and then m and then e and then space
right we're we're going one by one and
assigning that to each of the elements
of some label right so
we can do kind of the same thing with a
struct so
i can do something like planet bob
and i can use this notation which
you might notice from python because
they
stole it but it has a slightly different
meaning in python but
originally it kind of came from this so
i can use this brace
and i can sort of say something like
you know not cool bob
and i can also give it an orbital period
that's a little bit longer
so if i run this i clear my output i run
it
we get a little bit of complaint except
all right not cool bob is a little bit
too long for this
for this uh for the ten characters that
we have so c's
this is a comp modern compilot bad bob
what bad bob bad bob would
probably fit oh bad bob
you're such a bad bob so this is a
way of doing a fast initialization with
a struct
okay so that implies though that
you know the
i'm sorry i don't know the right term
but like the
attributes if you will of the struct so
like
for instance if i were writing a code
and defining a struct myself that would
be fine but if i was using some
predefined structs i would need to know
exactly how or or will it accept
a partially initialized struct so if we
only initialized it with a name will it
throw an
error or does it need all of the
attributes filled out
one way to find out right okay never
mind
all right just just curious like i'm i'm
mapping this and i know this has
probably got a lot of like
it isn't a one-to-one comparison but i'm
always gonna fall back on sort of
references that i'm familiar with uh to
help me understand and so
it's very similar to an object in python
and so i keep kind of coming back to
that a little bit in my mind yep
it's not an object though that's c plus
plus but
look at that yeah it's just that's kind
of the the
understanding that i'm pulling from
so um and they're using language you
should not use by the way so don't don't
read about that t word in there
um i explicitly mentioned that as
something in the dark side that we're
going to get to in that last class but
okay so that's my reward yeah you guys
stop stop spoiling my rewards
okay she gets to write out struct every
time for now
okay so um gotta build the habit all
right great so this is how we initialize
one
well guess what this also works for
arrays as well
so i can do this and say this is going
to be an array of planet bobs
and we can start specifying this
one at a time because the c initializer
is going to be nice to us and it's
basically going to say all right
you mean that's going to go into the
first
it's a nested struct yes well because
it's an array
it's an array of structs
so this this is kind of a short notation
of
being able to do this
all at once um excuse me no okay yeah
you got it right
bad bob bad bob should be much much give
it like 127.
okay so the astronomer in me i'm sorry
i'm like oh no these are too close we're
talking like collisions okay we're good
you are totally right so if we clean up
that stuff then
we should have done the fast
initialization of a couple planets
and we execute and it works kind of the
way we think it's gonna
which is neat so there's your fast
syntax for it
okay now that's all beating around the
bush because
what we really wanted to get to was
serializing this out to a fine
file now this is binary data because
while the name
is text the float definitely is not
well the name is text the float yes okay
yeah the flow is stored in memory as
a float well ieee 754 but i mean like if
i was using an int or some other numeric
type
like if it's like a seven like like some
control character it's going to be
the equivalent of some control care it's
not in some representation of it so it's
not like
a string of the number it's actually the
number itself
i'm sorry you've lost me just a touch so
when we
let's let's think about this as inst for
a moment because we have a little more
intuition around in so
um when if we just change these types to
int real quick
when i store this thing in memory it
this doesn't store it as ascii right
this doesn't store it as the ascii
number
two and then the ascii number zero okay
right it stores it as the number 20.
right so if i look at the bit
representation of it it's going to be
you know
the the 16 20 and binary plus the four
bit
plus yeah yeah so it's going to be 20 in
binary so um
and that's different right because the
string is stored in ascii representation
where a
starts at the you know 65 and b
is 66 and so on and so forth so um
the numbers have ascii representations
too but
20s you know down in kind of the control
code area
okay so that's um
this this all makes sense right
yes now if i wanted to save this thing
and by the way we're going to go back to
the float version of it
you know those are both the same type
but if i wanted to write this out to a
file
sort of thinking in your python world
what would you have to do at this point
okay so i can't just write an object
right oh
yeah i feel like you're you're you're
setting me up for something but um
okay so i'm thinking about this if i
want to write
the the features or the attributes or
whatever the right word is for structs
if i want to write these parameters to a
file
i mean one i will say that so far when
we have been dealing with
reading and writing data all of the file
in the data has been for the most part
we've been able to treat it as the same
type
so we haven't really had to deal with
mixed type data
like even when there were numbers in the
first um
the data with like companies they were
textual numbers
yes everything every we read that in as
as characters yeah
we had to turn it into we used f or
you know f-scan and we had to actually
read it back into a numeric format
but let's say we want to skip that and
we want to just write the number out to
disk and then read the number back from
disk
well so if we're only storing the
numbers
i'm guessing that's just like f right
yeah
right and because we know that we know
the data type we know the size
so we know that parameter we know how
many of them will need
we have all of the inputs or the
arguments for f right that we need
correct
yes but that works exactly the same for
a struct as it does for the components
of a struct
i mean the struct is really just this 14
bytes of memory right
yes because we packed it so if we
write out those 14 bytes to disk and
then we read those 14 bytes back in
we should get back the same thing that
we wrote right
yes okay so what i was dealing with
right there i was kind of thinking
through this
makes perfect sense given that we've
set aside care name right we've said our
max here is going to be 10
characters so that makes perfect sense
yep
the if you're about to talk about
dynamic that's the second part of the
talk i'll i'll hold back because i think
that might be what i'm getting at is
that
yes because i i love one of the things i
love about c is that it's
it's largely from what you've explained
to me so much of this is about making
things more efficient which is why we
specify
how large something is where like
whether or not we want this memory
padded or whatever is that it's
it's gives a lot of control so i'm
wondering
if we want to say that you know okay
50 let's say we've got an array of 50
planets
and 40 of them have names that are three
characters or
smaller yeah and the and there's just
like a handful that have some longer
longer names how do we make sure that
our memory is being
allocated
like the you know small amounts of
memory for the smaller names larger
amounts of memory for the larger names
so that we're really trying to make use
of back in the day
as you've told me when we had no memory
for anything
you know i imagine like every little bit
of memory matters
and so the idea of just kind of saying
well
i've got 50 planets and 40 of them
have small names but oh crap because 10
of them have longer names i need to set
aside more memory than i actually need
i just that seems so counter c i i just
wrote that down because that's exactly a
kind of access pattern you're going to
have to deal with but that's
getting a little bit more advanced so
for right now we're starting with fixed
record sizes
so i pinned it so it's there for a topic
no worries
all right just making sure like that
there's not some deep fundamental
misunderstanding
no no no and you're thinking your
instincts are right but it's gonna be
let me let me let me cut to the chase
it's gonna be harder to deal with it the
way you just described
okay um the easier thing to do is just
know whatever the biggest one is and
just allocate every record to be that
way
okay and that's how that memory but what
if what what if all i have is
is is is 200 you get efficient later
right now you
inefficiently write this out
where i have no memory whatsoever so i'm
trying to be
trying to you know oh by the 80s we had
tons we had 640k
which was enough memory for anybody
supposed to be born in the 90s all right
that's how i got around you know floppy
disks for all right
all right okay so how do we write these
out let's let's let's let's get to our
f reading our f right so let's get this
thing out to a file
oh boy so we're gonna start with f right
and i'm just gonna
drop up here the part of the manual page
that we have in the newer version
that we don't have tonight but if you
could look it up you would see that this
is there as the description of how
fright works
and it follows basically what i wrote on
the whiteboard so it's got
that void pointer thing don't worry
about this const that's
that doesn't we're not worried about
that right now so
um so we have this p we have the ptr
which is going to be the pointer to the
data
we have a size we have number of
elements and we have
a stream so we have to start with the
the end of that which is the stream
so how do we open up our stream so
i'm actually going to
just briefly i know this is not the code
i'm just putting it there as a
placeholder for myself
um so we need f open and so for
that and i'm going to remember this give
me a second so for that let me let me
think back it's been a little bit so
this is where we have the like
hold on so file right because it's a
file type
and we need a pointer to
that file and so then i can define this
as this is the open and then
except it's not text anymore oh uh what
are we doing
we're doing dot bin we're going to make
up a new format
oh boy okay um and then
i am writing it with great
power comes great responsibility sort in
reference to strophium's
comment here but um yeah so just
spider just now this is
w is not a variable you declared right
oh oh yes i'm so sorry that needs to be
that's fine that's a character
okay yep so it's a string now string
because it can possibly be
because double quotes not single quotes
we're not in python they are different
here
um yeah because the reason because it
could have
in fact three technically it does have
another specifier if we're dealing with
binary
oh the b yeah so i need to actually put
yep this is not hypothetical
okay okay so now we have our good so far
yep
good so far okay let's not check our
return type let's just assume it always
worked
because nothing could go wrong opening a
file and
um so this is this is the fun version to
see
like early on and so let's see there's a
size that this outputs
so we're going to need
i'm going to get rid of your line
because you have it as a comment just
above
and i'm not going to get rid of it it's
right up there for you
on line nine but i wanted to go one by
one and replace but no that's fine
um so let's think here when we did
like f get and f put
we often had to specify like a buffer
and like write into the buffer and then
copy the buffer but
but we know exactly the series yeah so
okay
so we still need though we still need
because size underscore t i'm guessing
is an integer
it's a big integer but don't it's a size
t
it it's exactly the same type that's
returned by
if i want to know how big the struct was
size sizeof yeah it's exactly that same
type
okay in fact i think it was added to the
spec for that reason because they needed
to say
this is not necessarily an interest
okay we know what file is yep
number we have two two bobs size
can i cheat what's cheating yeah
it's not cheating that's being smart
okay there's times that you told me i
couldn't do that
oh no this this is the time okay all
right you should pretty much
always be writing that there if you can
if you can manage it
the void pointer yeah well what is it
i mean start with what is it well so
like what's missing from this line don't
worry about the return type by the way
so let's let's simplify a little bit oh
i don't have to yeah where did we we
checked return types last time you know
how to check return types
okay um so not like any good c
programmer you cannot check the return
time
i'm learning all my shortcuts i like
this okay so
my intuition here but void okay
it's a void because it doesn't but we
give it so
the library doesn't know what type
you're passing in
and it doesn't need to know because you
you specified the size
no no no no don't listen stop don't look
um
no but it's an array an array is
always inherently a pointer because
yep an array is just a pointer to the
first element in the set so i can
actually do that and if i want to be
really explicit i can do
uh-huh except you're supposed to cast it
because it's expecting what type
a pointer not just any pointer
avoid pointers exactly no no that was
right you can leave that you can leave
that
but you need to cast it so we haven't
done much casting
no no that's okay that no no no no no
hold on hold on so wait
wait wait wait it's expecting a pointer
if it's expected it's expecting a void
pointer what are you passing void
a real point a struct point pointer
planet so this is a pointer
to the first element of the array hey
hey where did all that
fun stuff you just did put that i'll
leave that leave that leave that that
parts that's good you did you did the
right thing
we haven't really done much casting i'm
not sure how to do that that's okay it
won't get better by removing that part
i'll tell you that okay all right so to
cast something we basically just put the
type we want to
have it become in front of it so we say
you should become a void pointer
that's it that's it that's what i'm
saying don't don't worry about this
stuff this this is okay it doesn't
matter if you wrote it that way or
this way those those both refer to the
same thing because the first element of
bob's is
also its pointer because of the name and
that that handy syntax we learned
but um this this part is the cast and
i'm just saying that bob's i want you to
treat that now as a void pointer
okay so listen c i don't want you to
know what type this is you don't care
okay you just care that this thing
points to the right chunk of memory
and each element is this big
and there's this many elements
okay and send it over here to this fp
thing
okay because and let me let me see if
this this
logic is correct but if we have a
pointer let's say to something much
easier like an
int it's kind of because we
into so clearly defined that pointer
we know how much memory that address
it's pointing to an address but we also
know how much memory because
it's defined it's a pointer to an end um
to like the memory address where that
int is stored
but because here it takes a void
because we're going to explicitly tell
it how much memory
because and we don't want to write a
version of f right for every type
right and so that gives it the
flexibility to apply to kind of
any amount that's why we're going to
treat this that they wrote this
intentionally as generic and to make it
generic they said this is a void pointer
it's actually it's not your first void
pointer that you've seen because it was
actually used in mem copy
and some of the other binary routines
that we did really quickly
they it's been a bit yeah that's okay
they they also take void pointers
because they don't
care like they're designed to operate
regardless what the underlying type is
they're just dealing with memory
in an unstructured no semantics kind of
way
so it's it's they're treating it sort of
as a bite stream they're saying this
byte then this byte and this
bay and you'd better specify how many
bytes are here because otherwise i'll
just read the wrong memory okay
hey what's up foffy how you doing
and nathaniel's right you're basically
telling the compiler that you're smarter
than x so go ahead and treat this as a
void point that's what casting is about
casting is saying i don't want you to
type check this i want you to just deal
with this as it is now
smarter compilers come along and they
they nowadays say like
you know what you're probably doing the
wrong thing here i'll let you do it
anyway because you explicitly cast it
but
this wasn't really a pointer to begin
with so are you sure that's what you
want to do
so okay um anyway um let's try this out
so what are we missing what do we do
whenever we open a file
we close it okay
oh actually it's been a while uh yes
chat's cracking me up
f bite barf
bark you can kind of put anything you
want in there you're just you're just
barfing it out as far as the compiler
see if i ever made a coding language
you better you you you bet that
it would be hilariously named functions
all right well you can f bytebarf this
out so
we close it we run the yep and things
happen right
okay now we see no output because this
doesn't show us the output right i mean
we ran this thing and it just
writes that file theoretically we don't
we don't really know what happened
because we didn't check the return codes
but
theoretically this worked and there was
plenty of disk space
and there were access there was no
access permission problem and all the
other things that could ever happen when
we f right none of them ever happen in
practice so we never check our return
types okay
great how do we put this back into
memory how do we resuscitate it
read it yeah okay
so now so let's comment actually i'm
gonna i'm gonna i'm gonna
copy this all right you can copy for my
notes so you can copy
and i'm going to comment
yeah you can you can keep the final
pointer
yeah that makes sense so now so
let's start down here and now we don't
have bob's
specified so let's
right now we're dealing with the data
there's already been data
and we don't know what kind of data it
is so actually i'm not going to comment
it because you've never seen it so
this is something that was just passed
to you on disk
alright so this is going to be we know
that it's a planet
and we know that there's two of them
because i'm keeping the example simple
for now
but that's all we know
aren't we kind of lucky that we know
that much well we're gonna get unlucky
in a minute that's the second half of
the lecture
oh no okay all right so
gracious all right oh my gosh my fingers
are getting tired from all the typing
today
so same thing here but we're going to
read it
instead uh-huh
and so we assumed that worked because it
could have
man page for f read or is it very
similar yeah it's actually exactly the
same but
um i'll i will just put that here so we
don't have to refer anywhere else
okay
well so we don't have struct nope hang
on
nope we're fine i'm good all right oh we
still have planet to find
yeah yeah there was just a rainfart
moment where i was like wait a minute
um yeah we'd better know what structure
this thing is because otherwise he's
gonna just write it however you want
so now so like previously when we've
done this we've been able to use like
f get s and things like that but
how do we
how do we read this you just read it
well
right no read
but so the output of that if i were to
print it for example
yeah no but it's just an integer
what's an integer f read returns
size under 14. yeah that's how many
bytes i want to access the data
that f read has just where did it right
where did it write it to
where did it read it to i guess i should
say
um
bob's exactly you wrote it to bob's you
read it to bob's
okay so let's print out the first bob's
that looks pretty good
yep floating point
magic the second bob
nice so i have a question uh-huh
a lot of the data that i use for example
for work
i'm often not always
sure until i i personally like to read
it in
right off the bat to kind of see it's
easier for me to open say
something easy like a csv or whatever in
python and look at it
in that format versus just opening up
say excel or something
what do you think excel's written in the
thing though here is that it assumes we
know the
attributes of the struct and it assumes
we know
like the data type of those attributes
yeah so there's like a lot of assumed
knowledge whereas like so much of my job
is dealing with data where i'm not even
sure kind of what format it's in until i
take a look at it
um like this seems like a very
a cool example to learn how to read and
write stuff but it also kind of seems
like
i'm really lucky that this is i'm
reading data from a file in which i've
defined the struct that was written to
it
if that makes sense yeah this is real
c i mean this what what you're kind of
grasping at is the notion of a data
description layer
and intuitive semantics because the
formats you're sort of talking about
like take like json for example it's
designed to be
human inspectable so you can just read
it in and it's designed that
parsing it is unrelated to the data type
that's in there
parsing is unrelated
that bothers me wait so json is sort of
a standard in the sense that
like numbers are wrote are written in
like this
you know unquoted string form and
strings are written in this quoted
string form and
you put commas between entries and an
array and you put
uh commas between braces all right sorry
between
object you know like for hashes so this
this notion
is data descriptive because all of that
like
quotes and commas and all that other
stuff that's sort of metadata that's not
directly data itself
so for example in in your example here
if i wrote this out as
json it would look something like
um yeah i've used json before
bad bob and you know this thing was what
27.5 and actually i guess these would be
entries like this in some sort of array
and this would be
good cool bob and this thing would be
something like 20.5
and what's neat about this is and
actually this would all be an array
sorry this is not a
not a hash so what's great about this
is that it sort of has all of the
information there for you to be able to
parse it
because you know where bad bob and it
starts and ends it ends in between these
quotes
and you know that i'm following the same
pattern for each of these entries
because i've got
a string name and then i've got some
sort of number now i don't know what the
number
is because it's not specified here but i
could change my format to do that right
i could say like you know
name is this thing and then i could also
say you know
op is this thing
and this is kind of nice because this is
data descriptive
you know this includes the description
of what the data is
sort of in the data
well c can do that too in fact most of
these parsers are written in c but
the fact is that this is designed for
structured data that's designed to be
human readable and i can make my format
it's sort of non-specific about what the
format is there's different types of
json depending on
what i want is a very loose open format
but that's not really that efficient
because when i want to read this thing
in you know there could be for example
any number of returns here
this is valid
this is also valid this is valid
this is valid oops that's not valid
this is valid this is okay i can put a
return here
this is all still exactly the same data
in some sense
okay because json specifies that this
white space that i'm adding right now
is meaningless so it doesn't really
matter if you put it there or not
sound good i think so tentatively yes
okay well this is also a format that's
designed to sort of carry along
all of the information you need to read
it in
but yeah what's the butt so
let's say that you write it in that
format
how do i is kind of like the ignorant
reader in of that data access
the descriptions to know
what format and attributes the data has
well that's up to the person who wrote
that
data format i mean they would have had
to give you the specifications for this
data
okay so it's literally it's it's all of
this is predicated on the fact that i
have this knowledge of what structure
this data is in
and that that is necessary knowledge
there's no way around that yep
okay yeah c deals both with the case
where you want people to be reading the
data when you sort of don't want people
to be reading the data so
you may or may not include a data
description
and there's different ways of specifying
data descriptions right you could
specify it in ebnf you could specify it
like an rfc
you know you could specify it in as a
protobuf you could specify it as
you know there's like a million
different options there's there's a long
history i
could serialize the data into a csv but
then you know i get into like what
happens if there's a quote in the csv
because the field has quotes and you
know various answers for how to escape
it because it's kind of a non-standard
and then json sort of a non-standard
xml very well specified standard very
messy
lots of different cases but again these
formats all have different uses
but one of the things that's important
is that
binary data does not inherently have
any description associated with it it's
just binary data
so it's up to you as the programmer to
either you know draft specifications or
put the specifications in the file you
know which is kind of
you know there's different ways of doing
that there's
and and that all gets into like
structuring data that's like a totally
different topic we're just
we're getting into how to read and write
binary data and that's
kind of this okay like the options
you're going to have
architecturally for how to store data
for to answer your
questions about like this is not the
data that i want to parse
you know like i i want to in fact the
more parsing you do generally the bad
but
you know what's nice about this it's
super fast you know
and by the way all of those other
formats can be built on top of this
okay it's just a question of how much
work you want to do parsing it and
specifying it and all that other stuff
yeah fair enough okay so we've done that
we've got our planet bobs
written out to a binary file we read it
back in and we saw we
managed to recover the same contents
that we wrote at
you good with this topic yes i think so
you inherently don't like it because
this data is lacking description
it doesn't have a lot of descriptive
power associated with it it's
it's just the raw data itself
okay um and you're right so like you
know you'll see in a lot of older
formats things like you know
version and you know you'll see like
okay so if you're parsing this you
should be using this version and not
that version and then
you'll see some magic maybe at the front
you know so you'll see something like
uh some sort of like marker to say that
this is this particular format
and you know there's sometimes you'll
have a variable header
you know so you might have this thing
which is really a struct of a header
for this data in here we're going to
include this thing
and this thing and you know all this
other magic wonder
that we can get to but um that
we can we can make our data format sort
of whatever we want for right now we're
just dealing with a uniform
single structure very simplistic format
packed and unpacked the right way so we
know exactly what it's going to do
on different architectures because we
packed it
so we're sort of agnostic to padding now
this does get into the format like
because there's multi-byte stuff in here
so it's actually not going to be super
portable but that's
that's an advanced topic i guess that
would go into the uh
more advanced mess but um
and yes actually that's uh nathaniel is
talking about different magic that
you'll find at the front
magic refers to like a little marker you
tend to put at the beginning of a file
which tells you what you're reading
okay all right
so you can put a little bit of magic
there to say like oh this is a png
or this is a jpeg or you know this is a
jpeg version three
you know or this is a whatever so you
can put that up front
which sort of gives you a hint about
what the specification is going to be
but you know you never really know if
there's bad data or not
and a lot of programs have broken and
been hacked because of that sort of
thing
okay that's probably
enough i'm not going to go into the file
command
but that's that's a program that like is
designed to read magic
from different formats but and it has a
magic database on it
on on your system so how much magic
okay that's um that's kind of the first
half
so how are we we're a little bit long on
time for that so
should we get into the next thing or
should we save it
um so the next thing you mentioned was
kind of the hard thing
so my only concern well binary data is
not easy i mean
um just because i have that like super
early shift tomorrow
i have to be more conservative with my
time but
um if you want i think you had mentioned
before if you want to take a couple
minutes just to do a preview
or like a next time on kind of thing
just to give a taste
we've already i doubt we'll have the
dive into it
okay well all right you know what let's
let's start with a simple version of it
so i was going to go into a more detail
but let's
let's save the the harder one for i
guess kind of next time
um do you already copy this should i
blast it away yes i'm good
okay um all right so
what we really wanted to get to on this
was allocators
so we already met our first allocator it
was called
malloc yeah the memory allocator all
right and
mallocs yeah no that's that's
a good guess you're right so um
malloc does some magical stuff
and malloc was defined in standard
length so we've got to pull in our
standard library
to start talking about mala okay okay
so how much roughly do you remember i
mean should i just walk you through a
quick version of malloc or should we
so i remember from last time that we
we had like a pointer we were defining a
pointer and we used
malloc to set aside a certain amount of
memory but that's kind of
all i remember
it wasn't we we didn't discuss it i
think in depth and
all i really remember is it
being able to kind of we had a pointer
and i think we used a data type and then
we were able to set aside with malloc
and then like the argument of malloc was
like a certain amount of memory
yep okay and that's that's all i
remember yeah that's good
so if we want to allocate something like
the space we're gonna need for bob's
then
well for let's start with just a planet
so how many bytes is planet
well we can find out because we can do
our little trick that we know
we can say this is a size of this thing
malloc is the marvel villain whose lair
is the heap
yes so
this is um we know now that this
particular data type that we defined is
14 bytes
now when we ask for the memory from the
heap which
is kind of part of the lecture so i was
going to kind of dig deep into that but
we'll
we'll do kind of the quicker version and
save a bit of that for next time but
um just to kind of recap we were
basically
allocating with matlock and we would ask
for
like you said a certain amount of memory
so how much money do we need for a
planet
14. yeah so we could directly ask for
14.
and we could return that as a void
pointer because you love those now
my new best friends yes and if we run
this it's fine we just leak some memory
right so we're gonna assume that that
allocation always works and we're gonna
remember to free it
and we're good to go we wrote our
program we have 14 in there
now we can start overlaying semantics on
top of this because we know how to cast
things
so we can say something like i want you
to treat this thing
as a planet pointer
and this is basically going to kind of
force it to be
that like nathaniel told you before this
is you telling the compiler i know
better than you
so i'm telling what happens do you have
a misalignment or some kind of error
if you try to cast something where the
data types don't matter
you get to keep both pieces you're lucky
if it crashes
that's that that's one of those things
about c but um dynamic memory does get
kind of
disastrous quickly but you know we can
come along and we can say
let's let's make a proper struct planet
pointer
and we're going to call this the proper
pointer
uh wait let's be more more like that
okay so i can do that and i can kind of
get it to be sort of the right
type okay
so if i execute this yeah we're good we
got a reference
proper pointer knows about the semantics
of planet
but pointer itself is really just a void
pointer so it knows nothing
so this is how we can start telling it
you know some semantic stuff so we can
say like okay i plan on treating this
like a proper pointer
and if i come here and i say proper
pointer
and we talk about how do we do you
remember the uh the fancy operator for
the pupil operator yes the pew pew
operator do you remember this
this is uh i don't actually remember
that's the proper name this
um so i can actually use pew pew here
and i could say
like all right let's let's copy in a
planet and
now we're going to use some regular
planets so let's let's just drop mercury
in there
and we can also copy in
we can pew pew in our oh period
we could say this thing is gonna be well
i don't know what i did with my data but
87.97
all right so i can do that and then i
free the pointer
and now we have something we start
having ownership issues but
why didn't need to free hold on well we
allocated
we have to give it back but
okay all right i know to a python
programmer this is totally foreign
you're like
i just take the memory no no no that's
not what was throwing me it was that
like
to me we should
so we have the pointer but really what
we want to do is we want to free up the
memory that the pointer points to but
then
it occurred to me that probably free
just takes a pointer
uh-huh so yeah i saw i answered my own
question yeah i'm good
yeah because we're going to use the
pointer to operate on the chunk
yes like so we have a chunk of memory
which is doled out but we don't directly
refer to the memory we use a pointer to
the memory to refer to
all right so i run this it works we're
good to go
now we don't know that it really worked
because we didn't actually try printing
it out
so we print out
all right we're still good up up to here
right
now using the proper pointer
i'm not really allocating more memory
underneath i'm still using the same
chunk of memory i'm just sort of this is
kind of like a memory view
um from python world but it's a little
bit uncommon to use in
python but yeah this is um
yeah i can run this thing and you know
we're good so
i have semantics now related to that
matlock
all right so there's a little quick
recap of what malloc and free is
now i can also screw myself up and
double free by
freeing proper pointer as well but how
many allocations were there
one yeah how many times the proper
pointer is just a pointer
that's like point
it's it's equivalent or equal to the
pointer to
for the the struct planet that you use
yeah and so
it it's like i already freed it
yeah so you don't it's referring to the
same thing so you don't need to free it
twice
i don't know what that would even do
you're lucky if it crashes
i mean what what would it do if you told
it to free something that you had
already freed
ah it's a double free so what does that
mean
it's kind of undefined okay
sometimes it could crash sometimes it
could work we don't really know
undefined is really bad um so
pointer like allocation ownership gets
very tricky kind of really quickly but
um that is actually one of it meaning
what
and what memory that you have allocated
yeah
who's not a permissions thing but like
what is writing to them no
not not just a permissions thing because
anybody can write to it that has a
reference to it but
who is responsible for deallocating it
um okay and that's where the ownership
really matters that's actually by the
way
a lot of why they made c plus because
this turns out to be
kind of nightmarish to deal with but
and that's because of the next part
which is really what we're going to get
to in the meet of the next topic but
how did we write malloc
right now what do you mean i mean we
have this magical function here that
just sort of
grants us memory when we ask it but that
had to be written in c
at some point so how are we i mean isn't
it just like malloc
what's malloc really doing
is it
that wouldn't make sense so
i have sort of a vague hunch
but is it kind of it's almost like a
void where it's first going to like
set a a memory address
and then it's casting
any memory addressing 14 to
that pointer no the 14 is
how big so in this case yeah so it's
like that's that's the casting like in
scare quotes is that
it's it's first create like pointing to
a memory address
and then it's like as before what we
could do is we could cast a specific
type but in this case i'm not sure if
type to me just is a placeholder
for size and some semantics
yes so could you not cast a size to
something
yeah or no so is that just what it's
doing that's one way to write malloc now
the c
standard is not very specific about this
intention so this is sort of up to the
implementation
um traditionally there were wildly
different implementations of malloc
in particular the bsd world and the
linux world both handled this very
differently
later on there's a lot of like kind of a
renaissance around allocators because
you can build better and worse
allocators you can build them domain
specific you can build allocators that
are really good at certain sizes and
really bad at others
and that that's there's kind of a world
of programming there you know facebook
made their own allocator
you know ea games made their own
allocator you know there's there's all
sorts of like kind of
famous allocators that have been written
um so
one of the things that we were gonna do
today which
is probably not a good time to start it
now is to do exactly what you just said
so that's kind of a
a very simple allocator and we can do it
by just basically giving ourselves
a nice big chunk of memory so we can
pre-allocate a nice big chunk of memory
and we can start divvying it up
as we start allocating
and that's sort of like what you're
saying is that the heap
no so the heap
is a slightly different chunk of memory
so
it's uh it it's when said c quickly
evolves into juggling loaded revolvers
what he really means is that c quickly
devolves
into juggling loaded chunks of memory
yep so that's what it feels like as
these chunks of memory are loaded
revolving
so we can write all different chunk
handling and loaded revolver handling
like we can we can for example a really
simple allocator to build if you're ever
under the gun is you can just build
a version of malloc that just keeps
handing back the next chunk of memory
and when the person frees it you just
throw it away
what do you mean by throw it away you
never reallocate it
that seems wasteful it's very wasteful
but it's very fast
why because it doesn't have to reclaim
anything so there's no
sizing there's no copying stuff around
there's no
it's one of the fastest allocators you
could write it's called the null
allocator
or you know it has different names it's
called the trivial allocator
it's very very simple to write it takes
very few well i shouldn't say simple i
mean
compared to other allocators it's very
simple
we can get fancy with it we can say okay
we have you know different
pages of memory and we can say this is
this is for chunks that are about this
size take it from this page
and this is for because this is one of
those like fragmentation problems if you
start
reusing the memory you you kind of want
to like give the user a chunk of memory
that's big enough but
you know where is there a big enough
chunk if you've been managing it badly
and you know like
this there's all sorts of like nasty
games you get into here
okay i mean i think i think that makes
sense was because you said that
when you asked me how you might
implement or write malloc
was the intuition that i had
you had said it's kind of one way to do
it but then yeah it almost sounded like
it's like
right okay it's it's wonderful i know
it's probably a little hand wavy because
i'm missing some some elements kind of
the details but
that's what we're gonna do for this part
of the class but
i think we're going to save that for
next time
so okay um because i know you're on a
schedule so
uh oh and thank you last month i
appreciate it
last miles memory memory that stuff that
all goes away eventually
and we all swap yes there's a lot of
different
so again one of those things it's like
it's kind of easy to write
a simple allocator it's hard to write a
good allocator
and it's not a one-size-fits-all kind of
problem you know different allocators
are
really aimed at different types of
spaces like you know you can get into
arena
allocation which is pretty popular with
the game developers that are
around here on twitch so um arena
allocation is
you know zone allocation there's uh we
can use ropes we can
there's there's look there's there's
like you could teach
graduate courses on like all the
different allocators that exist
but um there is not a one size that fits
all
i can tell you that um it's a strophe
i'm saying
so i historically have memory pools so
that they yes consistent size objects
and defragmentations
but this is this is true it's also true
because they would you know ea was
famous about this that they would charge
the allocation to a specific part of the
game
so when you were allocating me like you
had a fixed budget of memory
and that's all you got so like the team
that was working on pathfinding
this is your chunk of memory and that's
all you're gonna get and when you when
you use that up you're
you're in trouble and this is one way to
deal with the finite memory problem of
like you know how do we build a game
that's going to fit into like you know
256 megs of ram or whatever they were
targeting for
their low end platform and that yeah
those days i mean these days
it's a lot more like last miles kind of
tongue-in-cheek mentioned
which is that we just sort of like keep
on using memory when we run out of ram
we just
go right into the swap space and then
that's that's using the hard drive as
ram
and it's um
so and then line up that's you uh so
well we all just malloc from the heap
and be happy but
damn it i like my memory clean and shiny
so i always catalog usually yeah
i like cadlock too um but we're
we're just starting with um the heap
allocation so i did want to start with a
fixed size stack allocator but we're
gonna we're gonna save that i think for
the next
next go so um is that the hand wavy fast
enough version
i think so yeah and it's a good kind of
preview of what we'll be doing next time
okay so for now we're gonna do just
malloc
all day till we run out
um that'll be kind of the memory the
encephalothesaurus early c memory
allocator you know we'll just
go allocating and then have fun um
no no you're you're good last mile i'm a
fan of cadlock
um we didn't get it we just haven't
gotten that far yet it's
they're actually they're relatively
simple compared to malloc there's some
slightly different semantics that matter
um i'm gonna focus on malloc mainly
because um i want to work on
theory here you know like it's kind of
like general c stuff and then
the other ones are are really important
reallock in particular is very important
um os and performance wise but it this
is kind of getting into it
more complicated than probably you know
intro c stuff