Lynne Teaches Tech: How does HTTPS keep you safe online?
forgot to tag this! #LynneTeachesTech
Lynne Teaches Tech: What are protocols and formats? How do they come to be? (long, serious)
View this post with its original formatting and images here: https://bune.city/2019/05/lynne-teaches-tech-protocols-and-formats/
This question was originally submitted via the survey.
This question, like our last reader-submitted question, is in three parts:
What exactly is a protocol?
What exactly is a format?
Can you give a short rundown of how standardisation in IT works?
I’ll answer them from top to bottom.
=== Protocols ===
Protocols are used to facilitate communication between computers. You need rules for what things mean and how to behave, otherwise your computer won’t know what to do with all the data it’s receiving.
For the sake of the explanation, we’ll create our own fake protocol, ETP: Face Transfer Protocol. You can use this protocol to send an emoji from one networked computer to another.
A proper ETP communication will have two pieces of information: An emoji, and a date the emoji was sent. When you connect your ETP client to an ETP server, you might receive this:
Date: 2019-05-23 10:31:45
Your ETP client can then format this raw data into something more readable.
Received on May 23rd at 10:31 AM from etp://bune.city
This isn’t a very useful protocol, but it gets the point across: A protocol allows computers to communicate using a predefined set of rules.
The client can treat this ETP data however it wants. It can show just the emoji, or just the date, or maybe it tells your printer to make 100 copies of the emoji. The ETP server doesn’t tell the client what to do with the data, it just sends the data. Your client might even send a response back.
A real world example is HyperText Transfer Protocol, or HTTP. To view this blog post, your computer sent a HTTP request, specifying that you want to view this particular page. The server then sent the page back, including some additional info, such as the size of the response, the encoding (such as text/html), the date it was last modified, and so on. The only information you end up seeing is the content – your web browser handled the rest.
HTTPS works the same way as HTTP in this regard. It’s the same protocol with encryption added, and otherwise behaves the same.
=== Formats ===
Formats are ways to store types of data on a computer. There are formats for documents (.docx, .pdf, .odt…), audio (.mp3, .ogg, .flac…) and so on. Documents are sort of similar to protocols in that they describe data to be interpreted. The main difference is that protocols work over a network, and formats are files. You can transfer files between computers using protocols, but you can’t use a format to transfer a file. You also can’t store a protocol as a file, because protocols are conversational – your computer needs to be able to talk to a HTTP file. You can, however, run a server on your computer to make and respond to HTTP requests.
Let’s invent a simple format, a .face file. A .face file looks like this:
Face file version 1<<<
The first line lets the face file reader know that it’s definitely reading a face file. The second line is the actual face. The first line appears in every single face file, and the second line changes between files. Another example would be:
Face file version 1<<<
Most files are unreadable by us humans. (Okay, so a face file is far from unreadable, but try opening an MP3 in a text editor and you’ll see what I mean.) Because we can’t read this raw data on our own, we need a viewer to interpret them.
Let’s call our face file viewer FacesTime, because I’m uncreative. When you tell FacesTime to open a file, it first checks to make sure the first line is there. If it is, it assumes it’s looking at a genuine face file, and reads on.
It then gets to the second line, which is the data portion. FacesTime takes this previously unreadable :) and converts it into a human readable version, as demonstrated by the highly realistic mockup below.
[A very poorly drawn sketch of FacesTime.FacesTime displaying our .face file.]
Some formats are more flexible than others. Our .face format could be displayed as an emoji, or a drawing, or even a photo. When you have something like an audio file, however, there’s not much room for interpretation. You can change things like contrast and colour balance, but you obviously want the actual sound to remain unchanged. On the other hand, you could have a program that opens the file and displays a visualised version instead of playing the song, or a program that lets you edit the song title and album art without actually playing it. You can do what you want with formats, but most of the time, you just want to display the content of it in a human-readable (or listenable!) way.
A .txt file specifies nothing but the raw text inside it, meaning that you’re free to interpret it however you want. Your viewer can choose the font size, colour, and so on. However, a Word document is a different story. You could, of course, create a Word document viewer that always displayed text as Comic Sans, but then you wouldn’t have a very good document viewer. It’s usually best to just follow the standard, because everyone else is following it, too.
=== How are standards created? ===
Anyone can create a standard. I created two just then: The ETP protocol, and the .face format. I didn’t really define them very well, but I did define them to some extent. Someone could create an ETP server right now with the information provided. However, what actually happens with my standards is beyond my control. Someone could create an ETP server that allows for more than one emoji at a time. A text format could have a standard saying “all text must be displayed as hot pink on neon green”, but nobody actually has to follow that. It’s different if there’s a certification process, however.
To display a DVD Video label, a DVD player must comply with the DVD standard. If it does anything differently (such as playing all the video upside down), it doesn’t get approved, and can’t display the label. Again, there’s nothing stopping you from creating such a player, and if you don’t care about having the DVD Video label applied to your product (or being allowed to call your product a “DVD player”), then you’re good to go.
In the end, standards are just a guideline. Things will typically go badly if you don’t follow them, but you don’t have to, and sometimes you might want to violate them on purpose. An example is our .face format – you might want to violate the standard by including multiple faces. However, any standard .face file viewer wouldn’t be able to understand your modified version, so unless everyone agrees on your standard, then things probably won’t work out too well.
=== Summary ===
A protocol defines a method of communication for two computers to “talk” to each other. A format defines a method for one computer to read and display data. Standards can be created by anyone, and are occasionally enforced.
Thanks for reading!
Date: 2019-05-24 00:45:18 AEST
Lynne Teaches Tech – What is the Linux kernel? (long, serious)
You can view the original post with inline images and better formatting
You can submit a question to be answered on LTT here:
This post contains inline links.
Our first survey question! The submitted question asked:
What is the Linux kernel?
Why does apt make me install a new one every month or so?
Why do the old ones stick around and waste space?
We’ll be answering all three of these questions. Let’s start with the
=== What is the Linux kernel? ===
The kernel is the underlying part of a computer that makes all the low
level stuff work. Things like reading files, managing which programs
are running, and keeping track of memory usage are handled by the
Almost every operating system has a kernel – Windows uses the NT
kernel, macOS uses the Darwin kernel, and Linux distributions
(including Android!) use the Linux kernel.
The kernel handles communication between the software running on your
computer and the hardware your computer is made with, so you won’t get
far without it. This also means you can’t swap it out while your
computer’s running, so if you update the kernel using apt (or
otherwise), you’ll need to restart afterwards, even if the computer
doesn’t tell you to (or force you to).
Note: There are methods of replacing the Linux kernel while the system
is still running, but that is beyond the scope of this post.
When the kernel encounters a critical failure that it can’t recover
from, your computer will stop working. On Windows computers, this is
known as a stop error, and when a stop error occurs, Windows will
display… [Image 1: A Blue Screen of Death!]
This happens (much less frequently) on Linux systems too, with a much
less visually appealing (although more informative, if you’re able t
o understand what it’s talking about) error screen.
[Image 2: A kernel panic on a very old version of Ubuntu. They still
look like this, though! Jpangamarca [CC BY 3.0], via Wikimedia Commons]
So now we know what a kernel is and does – it’s a piece of software
that allows other software to communicate with your computer’s
hardware, and it’s critical to everything working.
=== Why do I need to update the kernel so often? ===
This is because the Linux kernel is updated very often. As of the time
of writing, the latest kernel version is 5.x. You can see a list of
update logs here –
notice how short the times between each changelog are.
Most Linux distributions (including Ubuntu) won’t receive every single
one of these versions. There’s so many of them, and many only introduce
minor changes. Additionally, the Ubuntu developers need to make sure
that everything works properly with the new kernel version, so it won’t
be available right away.
You can check the Linux kernel you’re running with the command uname
-r, and you can check the latest version at kernel.org.
This doesn’t answer why the kernel is updated so frequently, though.
This is because there are many, many changes made to the Linux kernel every single day, by a wide range of
contributors across the world. Not every single update gets its own
version number, though. With the high frequency of changes made, there
are understandably many updates released in a month. Every now and
then, the Ubuntu developers will pick one of these versions, work on
testing it to ensure it’s compatible with the rest of Ubuntu, and then
release it for you to download.
=== Why do the old versions stick around? ===
This requires a more in-depth explanation of how your system handles
kernel updates. I’ll be talking about Ubuntu specifically here, but
almost all of this applies to other Debian-based distributions too.
Even with the testing done by Ubuntu, it’s impossible to know that a
new kernel release will work with every single Ubuntu user on Earth. To
make sure you still have a working computer at the end of the day,
Ubuntu will keep the previous version of the kernel installed. If you
restart, and the new kernel doesn’t work properly, you can switch back
to the old kernel to have a working PC.
The Ubuntu developers tend to keep older versions of the kernel around
as separate packages. When your computer tries to install the Linux
kernel image (linux-image-generic), apt will tell it which particular
version it needs to install. [Image 3: Ubuntu Bionic’s linux-image-
generic package currently depends on the linux-image-4.15.0-50-generic
This means that you only ever have to install linux-image-generic and
the system will automatically install the correct kernel version for
You can see how many kernel versions Ubuntu Bionic currently has
available by checking this page.
apt will never remove a package without your permission. If a package
is no longer required, you need to remove it manually by running sudo
apt autoremove. This will clean up any packages that aren’t currently
in use, including older Linux kernels.
You’ll notice that uname -r will give you the same result before and
after updating the kernel. Even though the new kernel update was
installed, it’s not active yet. You need to restart your computer to
start using the new kernel version.
So, in short: to get rid of those old, unused kernel versions, try
rebooting (to make sure you’re running the latest kernel), then run
sudo apt autoremove.
=== Summary ===
The Linux kernel is a critical piece of your computer’s software.
Without it, nothing will work, and if it crashes, the whole system
comes down with it. It can’t be updated without a reboot. New versions
are frequently releases, and the Ubuntu team tests and releases a new
version once every month or so. Ubuntu keeps old versions around for
various reasons, but you can get rid of the ones you aren’t using
want to submit your own #LynneTeachesTech question(s)? there's now a simple little survey form for doing just that!
bear in mind that LTT questions should be more like "how does thing work" or "why does thing do X" and not "why does microsoft do this" or "how do i change the wallpaper on android".
here's the survey: https://l.bune.city/ltt-survey
feel free to submit multiple questions!
Lynne Teaches Tech: Does “dark mode” really save battery life? (long, serious)
View the post with its original formatting here: https://bune.city/2019/05/lynne-teaches-tech-dark-mode/
Right now, there are two common types of consumer-oriented display technologies: LCD, and OLED (including AMOLED). LCD screens won’t save any battery life by using dark themes, while OLED ones do.
=== LCD ===
Consumer electronics with LCD displays use LED backlights to give the image brightness. You may sometimes see LCD displays referred to as “LED-LCD”, or (mistakenly) “LED”. The LED provides the backlight, while the LCD displays the picture. It’s possible to have an LCD display without a backlight (the Gameboy and Gameboy Colour had one), but uncommon.
The LED backlight provides brightness to the whole LCD display at once. To light up even one pixel, the whole display needs to be lit up. There’s no way to have only some areas bright and others dark, it’s all or nothing. Issues like backlight bleeding are inherent to this approach, and “true black” can never be displayed, as the display is always emitting light. [Image 1: An example of backlight bleeding (light “leaking” from the edges of the screen). Image source in blog post.]
Colour is irrelevant to power saving on an LCD display. The only thing that helps is turning down the brightness. Displaying black at full brightness uses much more power than displaying white at low brightness, even though the white screen would provide much more light. The LCD filters out the blacklight to provide black, but the backlight’s still on.
However, LCD displays tend to be more power efficient when displaying bright images than OLED displays. As soon as you start turning down the brightness, though, OLED quickly pulls ahead.
=== OLED ===
OLED displays don’t have backlights. Every pixel is individually lit, so making half the screen black would use (about) half the power. Because of this, dark themes really do save power on OLED displays. Darker colours mean less power usage, and i a pixel is completely black (pure black, no light at all), it will turn off. A dark theme with a pure black background saves much, much more power than a dark grey background, because it allows the pixels to turn off entirely.
There’s a downside to this, however – the pixels can’t be turned on instantly. If you’re scrolling quickly down a page with a pure black background on an OLED device, you might notice subtle “smears” caused by the pixels taking some time to turn on. This effect is hard to notice in most cases, but is still something to consider.
The colour can also affect the battery usage. Google gave a real world example of battery usage with various colours on their 2016 Pixel phone: [Image 2:
Blue uses much more power than red or green, and white uses the most.]
=== Summary ===
Dark themes don’t make a difference to power consumption on an LCD display, although you may find them easier on the eyes. While LCD displays light up the entire display at once with a backlight, OLED displays can set the brightness individually for each pixel, meaning that mostly black images use less power.
Lynne Teaches Tech: Why does Windows need to restart to install updates so often? (longish, serious)
There are many different system files that Windows needs in order to function. These files are often updated using Windows Update to add new features, fix bugs, patch security issues, and more.
Windows is unable to replace these files while they’re in use. This is for numerous reasons, both technical and practical.
If an update specifies that file X needs to be replaced, and file X is being used by a critical system process, then we have an issue. Closing the critical process would cause Windows to stop working, which is obviously unacceptable behaviour. There’s no way to switch the old version of file X with the new one while the critical process is running, but Windows needs that process.
The only way to safely stop the critical process is to restart Windows. Updated can either be applied before or after you restart. Windows will ensure that the critical process isn’t running yet, replace file X, and then start up normally. Windows can’t start up properly until the new version of file X is in place, so you can’t use Windows while it’s updating.
Not all updates require a restart. Windows Vista introduced a feature that allows system files to be replaced while the system is running under certain conditions. If an update replaces a non-critical file, or a file your computer isn’t using, it’s not necessary to restart.
Unix-based systems like macOS and Linux don’t need to reboot as often, due to the way they handle loading files. In short, Unix systems load critical files in memory and don’t depend on the version on the hard drive, while Windows depends on the files on the hard drive staying the same. If a critical file on Linux needs to be updated, the system will simply swap the old version with the new one while running. Currently running processes will use the old version, but new processes will use the new file. Replacing certain files still requires rebooting. This does, however, pose a security risk, and rebooting to ensure the new files are loaded is still recommended, even if it’s not technically required.
Lynne Teaches Tech: What are all the different types and versions of USB about? (long, serious)
View the original post with better formatting and embedded images here: https://bune.city/2019/05/lynne-teaches-tech-usb/
Describing something like an Ethernet port is easy. You have one number to worry about: the speed rating. 100Mbps, 1Gbps, 10Gbps… It’s very simple.
USB, on the other hand, has two defining attributes: the type, and the version. Calling a USB port “USB 3.1” or “USB-C” doesn’t tell you the whole story.
Note: I’ll be using the term “port” to mean “bus” at times throughout this post for simplicity’s sake.
=== The letters ===
At first, there really was only one USB port type.
When people describe a port as a “USB port” without any additional info, they’re talking about this one. USB-A ports can be found everywhere, and are (as of writing) still more common than USB-C.
If this is USB-A, and we’re up to USB-C, then where’s USB-B? You might have used one without knowing.
USB-B ports are often referred to as “printer cables”, as they are most commonly used on printers.
A more commonly seen port is USB Micro-B, which was used on most Android smartphones, before manufacturers adopted USB-C.
There are many more varieties of USB, such as USB Mini-A, USB SuperSpeed B, and, of course, USB-C.
A USB-C port.
USB 1.0 offered transfer rates of up to 1.5 Mbits per second, and future versions improved upon that speed. A USB 3.0 port can (theoretically) reach speeds of up to 5 Gbits/s. A USB-C port can implement USB 3.0 or later, which is where the issue lies.
Describing a given port as USB-C doesn’t tell you the speed it operates at, and describing a port as USB 3.1 doesn’t tell you what type it is. The original USB port with the lowest version – the original port that debuted in ’96 – is not just a USB port, it’s a USB-A 1.0 port.
===To make matters worse===
Thunderbolt 3 is a standard that uses the USB-C port. It allows for (among other things) communicating with an external GPU and connecting external displays. Thus, when describing a USB-C port, it is also pertinent to note whether or not it has Thunderbolt support. This doesn’t apply to non-C ports (yet!).
The letter (USB-A, USB-C) refers to the type of connector. You cannot mix and match types (a USB-C device will not connect to a USB-B port).
The number (USB 3.1, USB 2.0) refers to the version number. You can usually mix and match version numbers, but you may encounter reduced performance or functionality.
Thunderbolt is an optional extension for USB-C ports. A Thunderbolt device cannot work with a non-Thunderbolt port, but a non-Thunderbolt device will work with a Thunderbolt port.
There’s more that I didn’t go into in this post, but if I wanted to describe the ins and outs of every USB specification, we’d be here forever!
Some of the images used in this article were created by Wikipedia user Fred the Oyster, under the CC-BY-SA 4.0 license.
Lynne Teaches Tech: Why did everyone’s Firefox add-ons get disabled around May 4th?
View the original post here: https://bune.city/2019/05/lynne-teaches-tech-firefox-addon-bug/
Mozilla, the company behind Firefox, have implemented a number of security checks in their browser related to extensions. One such check is a digital certificate that all add-ons must be signed with. This certificate is like a HTTPS certificate – the thing that gives you a green padlock in your browser’s URL bar.
You’ve probably seen a HTTPS error before. This happens when a site’s certificate is invalid for one reason or another. One such reason is that the certificate has expired.
HTTPS certificates are only valid for a certain amount of time. When that time runs out, they need to be renewed. This is done to ensure that the person with the certificate is still running the website, and is still interested in keeping the certificate.
When a certificate expires, your browser will refuse to connect to the website. A similar issue happened with Firefox – their own add-on signing certificate expired on the 4th of May, 00:09 UTC, causing everyone’s add-ons to be disabled after that timed passed.
One would think that Firefox wouldn’t disable an addon that had been signed with a certificate while it was still valid, but apparently they didn’t do that. Even so, this could have been avoided if anybody had remembered to renew the certificate, which nobody did. This is a particularly embarrassing issue for Firefox, especially considering both how easily it could have been avoided and the fact that it really shouldn’t have been possible for this to happen in the first place. It also raises the question: What happens if Mozilla disappears, and people keep using Firefox? Thankfully, there are ways to disable extension signing, which means that you can protect yourself from ever happening again, but note that doing this is a minor security risk.
One could argue that by remotely disabling some of the functionality of your browser, intentionally or not, Mozilla is violating the four essential freedoms, specifically, the right to unlimited use for any purpose.
why does windows install to C:\ by default? why not A:\? and why use letters at all? (medium-long, serious)
CP/M, a very old operating system for very old computers, used drive letters to distinguish between each drive on a computer. the first drive would be drive A, then drive B, and so on. CP/M computers typically had two floppy disk drives (drive A and B). when a CP/M machine was given a hard drive, it would assign it the letter C, so A and B could be claimed by the two floppy disk drives.
microsoft's MS-DOS operating system aimed for some backwards compatibility with CP/M, to make things more convenient to port and to provide users migrating from CP/M with some familiarity. they kept letters A and B reserved for floppy drives, and labelled the first hard drive as drive C.
this behaviour was never changed as MS-DOS became windows. windows still labels the first hard drive as drive C, holding letters A and B aside for when you connect two floppy drives. windows still treats drives A and B differently from the other letters, even if you manually assign a hard drive (or optical drive, or USB stick...) to use it - it's still running some compatibility code to make sure floppy drives work just right. even if what you connect isn't a floppy drive.
also if you're wondering why so many files use three letter extensions, like how JPEGs are often .JPG: it's because of MS-DOS. DOS allowed for filenames with a maximum length of 8.3 - that is, eight letters with a three letter extension. so FILENAME.JPG is okay, but FILE_NAME.JPEG is too long on both accounts. everyone just kinda agreed that 3 is a good number for file extensions and here we are still using it for pretty much everything even though it hasn't been required for decades
why does text on a webpage stay sharp when you zoom in, even though images get blurry? (long, serious)
images like PNG and JPEG files get blurry when zoomed in beyond 100% of their size. this is true of video files, too, and many other methods of representing graphics. this is because these files contain an exact description of what to show. they tell the computer what colour each point on the image is, but they only list a certain number of points (or picture elements - pixels!). if a photo is 800x600, it's 800 pixels wide, and 600 pixels tall. if you ask the computer to show it any bigger than that, it has to guess what's between those pixels. it doesn't know what the image contains - to a computer, a photo of the sky is just a bunch of blue pixels with some patches of white thrown in. there are many algorithms that a computer can use to fill in those blanks, but in the end, it's just an estimation. it won't be able to show you any more detail than the regular version could.
a font, however, is different. almost all fonts on a modern computer are described in a vector format. rather than saying "this pixel is black, this one is white", they say "draw a line from here to here". a list of instructions can be done at any size. if you ask a computer to show an image of a triangle, it'll get blurry when you zoom in. but if you teach it how to draw a triangle, and then tell it to make it bigger, it can "zoom in" forever without getting blurry. there's no pixels or resolution to worry about.
vectors can also be used for images, such as the SVG format. here's an example of one on wikipedia: https://upload.wikimedia.org/wikipedia/commons/0/02/SVG_logo.svg
even though it's an image, you can zoom in without it ever getting blurry!
even though you can resize an SVG to your heart's content, it'll never reveal more detail that what it was created with. so you can't "zoom and enhance" a vector image either.
there are always exceptions to the rule. not all fonts use vector graphics - some use bitmap graphics, and they get blurry like PNGs and JPEGs do too.
so if vector graphics don't get blurry, why don't we use them for photos? to put it simply, making a vector image is hard. you need to describe every stroke and shape and colour that goes in to replicating the drawing. this gets out of hand very quickly when you want to save images of complex scenery (or even just faces). cameras simply can't do this on the fly, and even if they could, the resultant file would be an enormous mess of assumptions and imperfections. the current method of doing things is out best option.
what's keybase? why are so many people talking about it right now? what's with all those "it is proven!" posts? (long, serious)
-----BEGIN PGP SIGNED MESSAGE-----
keybase is a website that allows you to prove that a given account or website is owned by you. to explain how this works, we'll need to briefly cover public key cryptography.
there are many ways to encrypt a file. one such way involves using a password to encrypt the file, which can then be decrypted using the same password. this is known as a symmetrical method, because the way it's encrypted is the same as the way it's decrypted - using a password. the underlying methods of encryption and decryption may be different, but the password remains the same. how these algorithms work is outside the scope of this post - i might make a future post about encryption.
public key encryption is asymmetrical. this means the way you encrypt it is different from the way you decrypt it. a password protected file can be opened by anyone who knows the password, but a file encrypted using this method can only be decrypted by the person you're sending it to (unless their private key has been stolen). if you encrypt a file using someone's public key, the only way to decrypt it is with their private key. since i'm the only one with access to my private key, i'm the only person who can decrypt any files that are encrypted using my public key.
my private key can also be used to "sign" a file or message to prove that i said it. anyone can verify that i was the one who signed it by using my public key. comparing the signature to any other public key won't return a match, and changing even one letter of the text will mean that the signature no longer works.
as the signing process can be used to guarantee that i said something, this means that i can use it to prove that i own, say, a particular facebook account. i could make a post saying "this is lynne" with my signature attached, and anyone could verify it using my public key. this is where keybase comes in.
the process of signing a post is rather technical, and everyone who wants to verify it will need to know where to get your public key. there are "keyservers" that contain people's public keys, but the average person won't know that, or what the long, jumbled mess of characters at the end of a message even means. keybase does this for you. after you create an account, it generates a public and private key for you to use. you don't even need to access these, it's all managed automatically. you can then verify that you own a given twitter, reddit, mastodon, etc. account by following the steps they provide to you. you just need to make a single post, which keybase will check for, compare against your public key, verify that it's you, and add to your profile. users can also download your public key and verify it themselves.
support for mastodon was only added recently and isn't quite complete yet, but it's ready to use and works well. this is why you might have noticed a lot of people talking about it recently. support for keybase is new in mastodon 2.8.
keybase can also be used to prove that you own a given website, again by making a public, signed statement. i've proven that i own lynnesbian.space with a statement here: https://lynnesbian.space/keybase.txt
it also provides a UI to more easily verify someone's signed message, without having to find and download their public key yourself.
keybase is built on existing and tested standards and technologies, and everything that it does can also be done yourself by hand. it just exists to make this kind of thing more accessible to the general public.
keybase also offers encrypted chat and file storage, but it's main feature is that you can easily verify and confirm that you are who you say you are. so if you see a website claiming to be owned by me, and you don't see it in my keybase profile, you should be suspicious!
finally, this post itself is digitally signed by me! you probably noticed that weird "begin signed message" thing at the top! you can verify that it's me simply by pasting the whole post, top to bottom, including the weird bits at the start and end, but *not* including the content warning, into this page here: https://keybase.io/verify
-----BEGIN PGP SIGNATURE-----
Version: Keybase OpenPGP v2.1.0
-----END PGP SIGNATURE-----
what's an integer overflow, or underflow? what happens when a computer runs out of room to count with? why do games like the original pac-man and dig dug break on level 256? (long, serious)
we count using a decimal system. the lowest digit is 0, followed by 1, 2, and so on, through to 9. when you're counting up and you reach nine, you need to add another digit. there's no way to express ten with only one digit, so you need two digits to write 10.
computers use a binary number system instead. the lowest digit is 0, followed by 1, and that's it! so when you count to one, you need to add another digit to get to two, which is written as 10 in binary.
in decimal, from left to right, the digits mean ones, tens, hundreds, thousands, ten thousands... this means that a 4 in the third position (followed by two zeroes) means four hundred. from left to right, binary digits mean ones, twos, fours, eights... a 1 in the fourth position means eight, which is written as 1000. decimal uses powers of 10, binary uses powers of 2.
let's say you can only write two digits on a piece of paper. you can easily write numbers like 12 and 8 and 74, but what about 100? there's nothing you can do. but let's assume you aren't aware of that, and you're a computer following a simple algorithm calculating 99+1. first, you increment the least significant digits, which is the leftmost one. this leaves you with 90, and you need to carry the one. so you increment the other nine, and carry the one, leaving you with 00. normally, you'd just write the one you've been carrying and end up with 100, which is the correct answer. however, you only have space for two digits, so you can't continue. thus, you end up saying that 99 plus 1 equals 0.
an eight bit number has room for eight binary digits. this means that if the computer is at 11111111 (255 in decimal) and tries to add 1 again, it ends up with zero. this is called an integer overflow error - the one that the computer has been carrying has "overflowed" and spilled out, and is now lost. the number has wrapped around from 255 to 0, as if the numbers were on a loop of paper. underflow is the opposite of this problem - zero minus one is 11111111.
so if adding to the highest number possible should create zero, why does it sometimes give a negative number instead? this is due to signed integers. an unsigned integer looks like this:
a signed integer looks like +628 or -216. a computer doesn't have anywhere special to put that negative (or positive) sign, so it has to use one of the bits in the number. 1111 might mean -111, for example.
(n.b. the method of signing integers described below is "offset binary". there are other methods of doing this as well, but we'll focus on this one because it's intuitive.)
if we want to represent negative numbers, we can't start at zero, because we need to be able to go lower than that. in binary, there are sixteen different possible combinations of four digits/bits, from 0000 to 1111 - zero to fifteen. instead of treating 0000 as zero, we can move zero to the halfway point between 0000 and 1111. since there are sixteen positions between these two numbers, there's no middle. (the middle between one and three is two, but there's no whole number middle between one and four.) we'll settle for choosing 1000 to be our zero, which means there are eight numbers below zero and seven numbers above it. if we treat zero as positive, we have eight negative and eight positive numbers to work with. our number range has now gone from 0 to 15, to -8 to 7. we can't count as high, but we can count lower.
in such a system, 1111 would be 7 instead of 15, just as 1000 is 0 instead of 8. when adding one to 1111, it overflows to 0000, which means -8 with our system. this is why adding to a high, positive number can produce a low, negative number. positive numbers that overflow to negative ones are signed integers.
overflow and underflow bugs are the root of many software issues, ranging from fascinating to dangerous. in the first game in sid meier's civilization series, ghandi had an aggressiveness score of 1, the lowest possible. certain political actions reduced that score by 2, which caused it to underflow and become 255 instead - far beyond the intended maximum - which gave him a very strong tendency to use nuclear weaponry. this bug was so well-known and accidentally hilarious that the company decided to intentionally make ghandi have a strong affinity for nukes in almost all the following games. some arcade games relied on the level number to generate the level, and broke when the number went above what it was expecting. (the reason behind the pac-man "kill screen" is particularly interesting!) for a more serious and worrying example of integer overflow, see this article: https://en.wikipedia.org/wiki/Year_2038_problem (unlike Y2K, this one is an actual issue, and has already caused numerous problems)
the first image is a chart explaining two methods of representing negative numbers with four bits (the one used in this post is on the left). the second is a real-world example of an "overflow".
thanks so much for reading! #LynneTeachesTech
what do all the parts of a URL or hyperlink mean? why do some sites start with www, but not others? why do some URLs end with a question mark and a bunch of weird stuff? (long, serious)
when you access a website in your browser, its URL (Universal Resource Location) will almost always start with either https:// or http://. this is known as the schema, and tells the browser what type of connection it's going to be using, and how it needs to talk to the server. these aren't the only schemas - another common one is ftp, which is natively supported by most browsers. try opening this link: ftp://ftp.iinet.net.au/pub/
a hyperlink (or just link) is a name for a clickable or otherwise interactive way to access a URL, typically represented by blue underlined text. clicking a link opens a URL and not the other way around.
the next part of a URL is the domain name, which may be preceded by one or more subdomains. for example:
google.com - no subdomain
docs.google.com - subdomain "docs"
a common subdomain is "www", for example, www.google.com. using www isn't necessary and contains no special meaning, but many websites use or used it to indicate that the subdomain was intended to be accessed by a web browser. for example, typical webpages could be at www.example.com, with FTP files stored at ftp.example.com.
the ".com" at the end of a URL is a TLD, or Top Level Domain. these carry no special meaning to the computer, but are used to indicate to the user what type of website they're accessing. government websites use .gov, while university sites use .edu. you can have multiple TLDs - for example, https://australia.gov.au
in "example.com/one/two", "/one/two" is the path. this is the path to the file or page you want to access, similarly to how the path to your user folder on your computer might be "C:\Users\Person".
some URLs end with a question mark and some other things, like "example.com?page=welcome". this is the query portion of the URL, which can be used to give the server some instructions. what this means varies from site to site.
finally, some URLs might end with a hash followed by a word, like example.com#content. this is called a fragment or anchor, and tells the browser to scroll to a certain part on the page.
technically, a URL is only a URL if it includes the schema. this means that https://example.com is a URL, but example.com isn't - it's a URI, the I standing for Identifier. however, people will understand what you mean if you use them interchangeably.
why are there so few web browsers? why do some browsers display websites differently to others? why are some browsers, like opera and vivaldi, just based on chrome instead of being their own thing? (long, serious)
a web browser is a program that displays a HTML document, in the same way that a text editor is a program that displays a txt file, or a video player displays MP4s, AVIs, etc. HTML has been around for a while, first appearing around 1990.
due to the difficulty of creating and maintaining a web browser, and keeping up with the ever evolving standards, bugs and oddities appear quite frequently. they often manifest in weird and obscure cases, and occasionally get reported on if they're major enough. an old version internet explorer rather famously had a bug that caused it to render certain elements of websites completely incorrectly. by the time it was fixed, some websites were already relying on it. microsoft decided to implement a "quirks mode" feature that, when enabled, would simulate the old, buggy behaviour in order to get the websites to work right. most modern browsers also implement a similar feature. this just adds yet another layer of difficulty to creating a browser.
the complexity of creating a web browser combined with the pre-existing market share domination of google chrome makes creating a new web browser difficult. this has had the effect of further consolidating chrome's market share. chrome is currently sitting about about 71.5% market share⁴, with opera (which is based on chrome) adding a further 2.4%. this means that google has a lot of say over the direction the web is headed in. google can create and implement new ways of doing things and force others to either adopt or disappear. monopolies are never a good thing, especially not over something as fundamental and universal as a web browser. at its peak, internet explorer had over 90% market share. some outdated websites still require internet explorer, which is one of the main reasons why windows 10 still includes it, despite also having edge.
1. before CSS, styling was done directly through HTML itself. while this way of doing things is considered outdated and deprecated, both firefox and chrome still support it for legacy compatibility. having to support deprecated standards is yet another hurdle in creating a browser.
2. chrome is the non-free (as in freedom) version of the open source browser "chromium", also created by google. compared to chromium, chrome adds some proprietary features like adobe flash and MP3 playback.
3. the engine safari uses is called webkit. chrome used to use this engine, but switched to a new engine based on webkit called blink. safari therefore shares at least some code with chrome.
4. based on http://gs.statcounter.com/browser-market-share/desktop/worldwide. note that it's impossible to perfectly measure browser market share, but this is good for getting an estimate.
what does "defrag" mean? why do you have to defrag old PCs? did it even do anything? (long, serious)
"defrag" is short for "defragment". when a file is saved to a hard drive, it is physically written to the device by a needle. when you need to read the file, the needle moves along the portion of the disk that contains the data to read it.
let's say you have three files on a hard drive, on a computer running windows. when you created them, windows automatically put them in order. as soon as the data for file A ends, the data for file B begins. but now you want to make file A bigger. file B is in the way, so you can't just add more to it. your computer can either
a) move file A somewhere else on the drive with more free space. this is a slow operation, and may be impossible if you don't have much free space remaining.
b) leave a note saying "the rest of the file is over here" and put the new data somewhere else.
option B is what windows will generally do. this means that over time, as you delete and create and change the size of files on your PC, they end up getting split into lots of small pieces. this is bad for performance, because the hard drive platter and needle need to do a lot more moving around to be able to read the data. if a file is in ten pieces, the needle has to move somewhere else ten times just to read that one file.
to fix this, you need to defragment your hard drive. when you do this, windows will take some time to scan the hard drive and look for the files that are split up into multiple pieces, and then it'll do its best to ensure that they end up split into as few pieces as possible. it also ensures (to some extent) that related files are next to each other - for example, it might put five files it needs to boot up right next to each other to ensure they can be found and read faster. by default, windows 10 defragments your computer weekly at 3am, but you can do it manually at any time. doing it weekly ensures that things never get too messy. leaving a hard drive's files to get fragmented for several months or even years can cause the defragmentation process to take hours.
many modern file systems use techniques to avoid fragmentation, but no hard drive based system is truly immune to this issue. this isn't a problem with SSDs, however, as they work completely differently to a hard drive.
a heavily fragmented file system will work, it'll just be slower, and might wear down the mechanisms of the hard drive a little faster. defragmentation is entirely optional, but a good idea. plus, it's always cool to watch the windows 98 defrag utility doing its thing!
how does compression work? why does a compressed JPEG look different, but a JPEG in a compressed zip file looks the same? (long, serious)
there are many different methods of file compression. one of the simplest methods is run length encoding (RLE). the idea is simple: say you have a file like this:
you could store it as:
to represent that there are 3 a's, 4 b's, etc. you would then simply need a program to reverse the process - decompression. this is seldom used, however, as it had a fairly major flaw:
which is twice as large! RLE compression is best used for data with lots of repetition, and will generally make more random files *larger* rather than smaller.
a more complex method is to make a "dictionary" of things contained in the file and use that to make things smaller. for example, you could replace all occurrences of "the united states of america" with "🇺🇲", and then state that "🇺🇲" refers to "the united states of america" in the dictionary. this would allow you to save a (relatively) huge amount of space if the full phrase appears dozens of times.
what i've been talking about so far are lossless compression methods. the file is exactly the same after being compressed and decompressed. lossy compression, on the other hand, allows for some data loss. this would be unacceptable in, for example, a computer program or a text file, because you can't just remove a chunk of text or code and approximate what used to be there. you can, however, do this with an image. this is how JPEGs work. the compression is lossy, which means that some data is removed. this is relatively imperceptible at higher quality settings, but becomes more obvious the more you sacrifice quality for size. PNG files are (almost always) lossless, however. your phone camera takes photos in JPEG instead of PNG, though, because even though some quality is lost, a photo stored as a PNG would be much, much larger.
some examples of file formats that typically use lossy compression are JPEG, MP4, MP3, OGG, and AVI. some examples of lossless compression formats are FLAC, PNG, ZIP, RAR, and ALAC. some examples of lossless, uncompressed files are WAV, TXT, JS, BMP, and TAR. in terms of file size, you'll always find that lossy files are smaller than the lossless files they were created from (unless it's an horrendously inefficient compression format), and that losslessly compressed files are smaller than uncompressed ones.
you'll find that putting a long text file in a zip makes it much smaller, but putting an MP3 in a zip has a much less major effect. this is because MP3 files are already compressed quite efficiently, and there's not really much that a lossless algorithm can do.
there are benefits to all three types of formats. lossily compressed files are much smaller, losslessly compressed files are perfectly true to the original sound/image/etc while being much smaller, and uncompressed data is very easy for computers to work with, as they don't have to apply any decompression or compression algorithms to it. this is (partly) why BMP and WAV files still exist, despite PNG and FLAC being much more efficient.
as an example of how dramatic these differences often are, i looked at the file sizes for the downloadable version of master boot record's album "internet protocol" in three formats: WAV, FLAC, and MP3. you can see that the file size (shown in megabytes) is nearly 90 megabytes smaller with the FLAC version, and the MP3 version is only ~13% of the size of the WAV version. note that these downloads are in ZIP format - the WAV files would be even larger than shown here. this is not representative of all compression algorithms, nor is it representative of all music - this is just an illustrative example. TV static in particular compresses very poorly, because it's so random, which makes it hard for algorithms to find patterns. watch a youtube video of TV static to see this in effect - you'll notice obvious "block" shapes and blurriness that shouldn't be there as the algorithm struggles to do its job. the compression on youtube is particularly strong to ensure that the servers can keep up with the enormous demand, but not so much so that videos become blurry, unwatchable messes.
why don't windows programs work on a mac, and vice versa? (long, serious)
an operating system (OS), such as windows or macOS, handles a lot of low-level stuff. this means that developers don't have to worry about details like "how to scroll a page" or "how to read text from a file", because the OS handles it for you. a computer can't do anything without an OS - it's needed for even the most basic tasks.
the OS provides you with a huge amount of functions you can call to get stuff done. rather than worrying about the fundamentals of reading a file from a hard drive, the OS will provide you with a function that does the job for you. however, every OS does this differently. this means that you can't just run windows program on macOS because the functions it needs aren't there.
wine is a program that translates windows functions into ones that work with macOS or linux. this allows you to actually run windows programs on macOS. when the program asks for a windows function, wine performs the macOS equivalent and pretends it's running on windows. the macOS version of the sims 3 actually runs in a modified version of wine - it's the exact same code!
it's possible to make programs that work on windows, macOS, and linux. for example, games made with the engine "unity" can run on all three of these. however, the actual game file is still different for all three - unity just translates the code into versions that work *individually* with windows, macOS, or linux. this means that you can't run a windows version of a unity game on macOS.
a java program can run fine on all three of the above operating systems, but you actually have to install java first - and the installer is specific to your OS. it's not possible to make a program that truly works with all three of these operating systems from a single file with no installers or engines or other behind the scenes work, as the differences are simply too great.
one more interesting note: reactOS is an operating system based on wine technology. its goal is to completely simulate windows without actually using any windows code (as this is illegal) by using the same methods wine does. it's still in alpha, but it's really cool!
(i know there are other operating systems, i just didn't mention them for brevity's sake. sorry *BSD people.)
leftist polyam transbian linux nerd. cute bune. vry gay
@email@example.com's anti-chud pro-skub instance for funtimes