Intelligence Quarterly: Final 2020 Election Update

Key Issues

·         Stimulus Passage/Delay Political Calculations – see above

·         Corn/RFS Waivers

·         Polling vs Reg Voter Statistics

·         Battleground States(Latino & Black Male vote collapse)

·         Obama MIA(PA late, now FL over Mich?)

·         Covid vs Economy

·         Electoral College prediction – 306 electoral votes for Trump, 232 for Biden.

Our updated electoral college prediction comes at the end of this piece, but we wanted to maintain a data audit trail in order to explain our prediction, which many consider an outlier. So, here goes:

Advantage Trump/McConnell – Stimulus Passage/Delay

Stimulus continues to be the focal point for both parties on the final stretch of the election. With early voting having already started a month ago, at this point, the political damage has been done. Pelosi’s strategy to hold stimulus talks hostage has seemed to fail as 43% to 40% blame her for the political games.

Pelosi now has a mutiny within her own party, as party members are facing tougher races and we forecast the Democrats to lose over 8 seats, mainly in Trump leaning 2016 districts. 

Had Pelosi played ball weeks ago, she could have shored up her members and helped Senate Dems move closer to majority. McConnell has pounced on her miscalculations and now is eyeing to negate Democrat victories in Arizona and Colorado with a win in Michigan and/or Minnesota. A deal is likely to materialize over the weekend, but this is far too late to show results at the polls.

Here is Pelosi’s discussion on the stimulus on Sunday’s CNN State of the Union with Jake Tapper.


Again, I am not sure her explanation for the delay works well optically – not good politics. Tapper appears not to fully accept her stance, and rightly so – it really is a milder version of the Wolf Blitzer train wreck interview.

Advantage: Trump/McConnell – Corn/RFS Waivers

Corn politics still seems to be the biggest under-the-radar topic in the media. The US agriculture lobby is the most powerful in the Senate and continues to see bipartisan support in funding and legislation. The RFS waiver issue was settled, as Albert had written in September, which boosted Joni Ernst (Senate Rep: Iowa) and solidified support from nearly 100,000 corn producing farms in the Minnesota, Wisconsin, Michigan and Pennsylvania region. The price was slightly below $300 when Albert sounded the alarm that this issue had to be done prior to early voting. It was and now those rural communities enjoy close to $400 price.

Furthermore, given Biden’s debate comments about the oil industry and fossil fuels, it is worth remembering that corn is absolutely correlated oil via the ethanol mandate. This was a bad electoral move for Biden and the Dems, over and above the negative impact on voters from the oil states. Just consider ethanol’s footprint in the US below:


Advantage: Trump – Battleground States

The big debate is whether the opinion polls have corrected the error of their 2016 ways and thus accurately reflect voter intentions and the likely outcome. Firstly, national polls offer little value, although most of the media seem to focus thereon in order to push the ‘Biden will win’ narrative. It is all about the top battleground states:

·         Florida

·         Pennsylvania

·         Michigan

·         Wisconsin

·         North Carolina

·         Arizona

Back at the last election, RealClearPolitics had the following average polls in the above states showing advantage Clinton (left column). Compare them with the Saturday October 24th update showing advantage Biden:

October 29, 2016                                                 October 24, 2020

·         Florida (Tied)                                                Florida +1.2

·         Pennsylvania +5.6                                      Pennsylvania +5.1

·         Michigan +7                                                  Michigan +7.8

·         Wisconsin +6.2                                            Wisconsin +4.6

·         North Carolina +3.2                                    North Carolina +1.5

·         Arizona +1.5                                                 Arizona +2.4

Trump went on to win all these states in 2016.

·         Florida +1.2

·         Pennsylvania +0.7

·         Michigan +0.3

·         Wisconsin +0.7

·         North Carolina +3.6

·         Arizona +3.5

We have been very vocal on ignoring polling coming out of Covid lockdowns, where only phone or online polling has been conducted. This is the most error prone method to determine voter intentions, with only 6% of the public willing to be polled. Turnout percentage with voter registration, top issues, and approval within the party of the candidate are superior indicators in our opinion for determining the most likely outcome of races. 

With regard to voter registration, the GOP has cut the Democrats voter advantage in Florida, Pennsylvania and North Carolina:


As JP Morgan’s famed quant Marko Kolanovic stated recently, adapting our data to a statistical analysis (see below table), thischange in voter registration data shown above would immediately invalidate all polls such as this one from Real Clear Politics showing Biden sweeping across the Battleground states. In fact, while he does not say it, the implication from the Kolanovic analysis is that Trump may well end up winning the critical trio of Pennsylvania (20 Electoral votes), Florida (29 votes) and North Carolina (15 votes).


As for top issues, it is the economy:


What do American voters’ election predictions look like?


President Trump’s approval ratings amongst Republicans is at its highest since he was elected:


And the level of Trump’s strong support remains higher:


And finally, according to this poll by Rasmussen, Trump is running at a marginally higher rate of total approval than did Obama at the equivalent date ahead of the 2012 election against Romney.


Advantage: TBD

Polls have been publishing wild results showing Texas, Georgia, Iowa and other states in play. We completely dismiss these unrealistic outcomes and focus solely on the battleground states of Arizona, North Carolina, Pennsylvania, Michigan, Wisconsin and Minnesota. The statistics from early voting have shown a drop in the African American turnout in key Democratic stronghold counties in Michigan and Pennsylvania with further evidence of Black males supporting Trump more than in 2016. I cannot stress how this, along with a collapse of non-Mexican Latino vote, makes Biden’s chances of winning this election minimal at best.


Then we have this poll from Rasmussen:


The strategy to only use Covid response against Trump’s economy was an absolute disaster and an obvious loser, as people vote with their paycheck and rarely ideology. The fear inducing lockdowns have run their course, as the public is desperate for normalcy. Let’s have a look at some of the latest state data for mail-in and in-person early voting, starting with Florida. This looks to us like game-over in Florida. Bear in mind that the big Democrat voting counties are Palm Beech, Broward and Miami-Dade. But look, early voting in Miami-Dade is leaning Republican, wherein GOP has 218,624 to Dem 125,302 early votes tally. Without running up the totals for Democrats in these three counties, there is no hope that Biden can win this state, despite the polls averaging a Biden lead of 1.5 points!!!


Then we have this from the New York Times/Sienna poll out today with regard to Philadelphia:

Biden’s underperforming Hillary Clinton by 24% in Philadelphia is seriously bad news for the Biden campaign in Pennsylvania, and this comes from the New York Times!

Furthermore, Trump continues to out-perform his 2016 polling in the battleground states and Targetsmart data models, to which we have previously referred, continue to show that the GOP are experiencing a surge that has surprised many within the Biden campaign and poll modeling outfits, who had Biden winning at a 90%+ rate. The Democrats have urged Barack Obama to finally show up on the campaign trail, however he decided to make one digital appearance in Philadelphia 2 days after registrations closed and instead of moving to Michigan where Biden is likely now losing, he decided to go to Florida where the race is all but over in the state. This only further solidifies Albert’s speculation that his selection of Biden and Harris was an intentional loss to set up Michelle Obama for 2024 and have 2020 as a massive fundraiser for a 2022 Senate majority strategy. Trump has been an absolute financial windfall for the Democrats as they have raised record amounts of money this cycle. 

However, just to emphasize the problems with some of the polling, over and above the claim that Biden is winning Florida, let’s take a look at Texas, the latest poll being the University of Texas at Tyler:


Firstly, if Texas were truly in play, Joe Biden would be there on the stump – he is not, having spent time on Saturday in Pennsylvania. Nor is Obama, who has been in Miami!!!

Targetsmart data for Texas suggest that the early voting for Republicans is up from 2016, whereas it is down for the Democrats!!!


Let’s take a look at North Carolina. Which has also benefited from a surge in GOP voter registrations relative to the Democrats (see above). Consider the following data:


Dem voter edge over GOP – 864,253

Obama won NC by 0.32%


Dem edge over GOP – 818,443

Romney won NC by 2.04%

Flip of +2.36% to GOP from 2008 result


Dem edge over GOP – 646,246

Trump won NC by 3.66%

Flip of +1.62% to GOP from 2012 result


Dem edge over GOP 398,953

GOP closes gap by further 234, 105 votes

Odds for a Trump win?!!!

We continue to believe that the preponderance of opinions are not showing much accuracy. Have a look at this interview on the RealClearPolitics website:


Advantage: Trump for 2020, Dems for 2022/2024


With many missteps and failure to capitalize on Trump’s public mistakes, the most logical outcome considering all the data is a Trump re-election with Senate control still in the GOP’s hands while losing 2 seats. The House will remain in the Democrats control, but Nancy Pelosi’s miscalculations will cost her party seats and her leadership position. This will be a welcome result to the centrist Democrats, who will use this loss as an excuse to purge some of the progressive elements in the party, that have challenged the establishment for control. The GOP will have to contend with how to amicably separate itself from the Trump brand for 2022 and 2024. However, with Trump’s penchant for chaos and ego, this will not be easy and likely cost the GOP control of the Senate in 2022.


What We Talk about When We Talk about Holes

A demonstration that a tube of toothpaste has a two-dimensional hole. Image: Slipp D. Thomson, via Flickr.

For Halloween, I wrote about a very scary topic: higher homotopy groups. Homotopy is an idea in topology, the field of math concerned with properties of shapes that stay the same no matter how you squish or stretch them, as long as you don’t tear them or glue things together. Both homotopy groups and the somewhat related homology groups are different ways to describe the topology of shapes using algebra. In my post, I said that homology detects “holes” of different dimensions. But, as one commenter asked, what do I mean by holes of different dimensions?

Good question! I deliberately used “hole” as a wiggle word because there isn’t a real mathematical definition of hole. But here’s my short answer that is also the reason I’m not an algebraic topologist. If you can put it on a necklace, it has a one-dimensional hole. If you can fill it with toothpaste, it has a two-dimensional hole. For holes of higher dimensions, you’re on your own.

That answer isn’t very satisfying. Is there a better way to describe holes? I talked with some of my topologist friends and discovered two things: topologists don’t all agree on what a hole is, and it’s fun and interesting to think about different interpretations of a word whose mathematical definition isn’t completely settled. I think my larger conclusion, in the spirit of the season, is that holes are like Santa Claus: the true meaning is in your heart. So let’s look into our hearts and think about what holes are.

The Stanford Encyclopedia of Philosophy has an amusing entry about holes by Robert Casati and Achille Varzi. It starts:

Holes are an interesting case study for ontologists and epistemologists. Naive, untutored descriptions of the world treat holes as objects of reference, on a par with ordinary material objects. (‘There are as many holes in the cheese as there are cookies in the tin.’) And we often appeal to holes to account for causal interactions, or to explain the occurrence of certain events. (‘the water ran out because of the hole in the bucket.’)Hence there is prima facie evidence for the existence of such entities. Yet it might be argued that reference to holes is just a façon de parler, that holes are mere entia representationis, as-if entities, fictions.

Luckily we are mathematicians, not philosophers, so we don’t need to concern ourselves too much with the trivial detail of whether or not holes exist. (Some also take this approach with Santa Claus.)

I have to warn you that this post will end up being a little circular. In some sense, the mathematical definition of an n-dimensional hole “should be” something that causes the n-dimensional homology or homotopy group to have something interesting in it, or to be nontrivial.

A basketball has a hole in it. Image: Public domain, via Wikimedia Commons.

The Mathworld entry on holes has a definition by Eric Weisstein that I like a lot: “A hole in a mathematical object is a topological structure which prevents the object from being continuously shrunk to a point.”

Let’s think about a basketball. Using Weisstein’s definition, it definitely has a hole in it because you can’t squish it all the way down to a point without changing its basketballiness.

I like this definition because it’s intuitive, but I think it’s a bit dangerous because there are a few different notions of being continuously shrunk to a point that are used in topology, and it’s easy to get them confused. (Trust me. I have lived it.) A circle in the plane can be continuously shrunk to a point,* but intuitively, and in the sense of homotopy and homology, a circle has a hole in it. That notion of shrinking, however, relies on the assumption that the circle is sitting in a 2-dimensional plane, so it’s really telling us something about the topology of the plane, not the topology of the circle. We need our definition not to rely on how something is sitting in space.

The notion of being shrunk down to a point that Weisstein’s definition uses requires us to retain topological equivalence the whole time. We can’t shrink a circle down to a point because we’d end up tearing or squishing something together at the end.

What about defining the dimension of a hole? That’s trickier. A tempting definition, and the definition that one of my topologist friends prefers, is that an n-dimensional hole in a manifold is a place where the manifold is “like” the n-sphere. (For our purposes, a one-dimensional sphere is a circle, a two-dimensional sphere is basketball-shaped, and so on. This is because up close, a circle looks like a line, and a sphere looks like a plane.) More rigorously, an n-dimensional hole in an object is something that prevents some map of the n-sphere into the object from being shrunk down into a point without leaving the object. This definition of a hole would mean that we were equating hole-ishness to homotopy. Let’s work out some examples.

First, a plane. You can’t put it on a necklace or fill it with toothpaste, so it probably doesn’t have a hole. Let’s check. There are lots of different ways to map a circle into a plane, but all of them can be shrunk down into points while staying on the plane. In other words, there’s no obstruction to scooting a rubber band around the plane and shrinking it down as much as we want. So by our working definition, a plane has no one-dimensional holes. That’s good because if the plane has a hole, we our definition of a hole is wrong.

We can tell the punctured plane has a hole because we can’t pull the orange loop past the missing point, outlined in blue. Image: Evelyn Lamb.

What about the plane with one point removed? We still can’t fill it with toothpaste, but given a really thin chain, we could put it on a necklace, so it should have a one-dimensional hole. How can we see that? If we map a circle into the plane, and the removed point is inside the circle, we have a problem. (Pedants might point out that I haven’t proved that there’s such a thing as an inside and and outside of a map of a circle into a plane, hole or no. You’re right, and you can go write your own blog post about it. The rest of us will just assume that we can find a circle map polite enough to have a clearly defined inside and outside.) We can’t pull or shrink the circle past that point, so we know that the plane minus a point has a one-dimensional hole.

Now back to the basketball we talked about earlier. We know it has a hole. What dimension is its hole? You can’t put it on a necklace, but you can fill it with toothpaste, so it’s probably two-dimensional. Now to check it. It has no one-dimensional holes because any way you put a rubber band (or circle) on the basketball, you can shrink it down until it’s a single point without leaving the surface of the basketball. But it does have a two-dimensional hole because you can’t continuously shrink every map of a two-sphere into the space down to one point without leaving the basketball. (To pick the low-hanging fruit, if your map from a basketball to a basketball is the identity map, where everything stays in the same place, you can’t shrink it down to one point.)

So far, the definition of hole we’re using seems promising. But in the end, I don’t think it’s the best one.

The two highlighted loops on the torus show us the two different one-dimensional holes. Image: YassineMrabet, via Wikimedia Commons.

Let’s look at the torus, one of the simplest topological spaces. The torus can be thought of as the glaze of a donut or the surface of an inner tube. We can put it on a necklace or fill it with toothpaste, so it should have one- and two-dimensional holes. Everything is fine for one-dimensional holes: there are basically two main ways a map of a circle can fail to shrink down to a point on a torus. Either it can go around the hole of the donut (the blue circle in the image to the left), or it can be like the circle your fingers would make if you stuck your thumb through the hole of the donut and grasped it with your first finger (the red circle in the image to the left). So the torus has two one-dimensional holes. (You don’t find them both with the necklace definition unless you stand inside the torus to wear one of the necklaces.)

Our working definition breaks down when we get to two-dimensional holes. A torus “should” have a two-dimensional hole, but we can’t find it using maps of two-spheres. (This isn’t obvious, at least to me. You can think about trying to wrap a balloon around an inner tube to get an idea of what’s going on.)

Our definition of hole in terms of maps of spheres doesn’t work for the two-dimensional hole in the torus, but I’d really like to say the hole is there. I think the right answer, though it doesn’t seem particularly insightful, is to define hole the same way but allow maps of any two-dimensional things instead of just spheres. There is a two-dimensional thing we can map into the torus that can’t be shrunk down to a point while staying on the torus, and it’s the torus itself. So if we know that the torus isn’t topologically equivalent to a point, we know that it has a two-dimensional hole. This kind of seems like an “I know it when I see it” definition, and it isn’t very helpful in practice. If we don’t know much about an object, how will we know which one of the infinitely many two-dimensional surfaces to map into it to test its holiness? But this a version of this notion, defined more precisely, is homology. (For the ambitious, you can read more about it in Allen Hatcher’s free Algebraic Topology textbook. It’s worth noting that Hatcher always uses scare quotes around the word hole because he never defines it.)

There are several ways to define homology, but to me the most intuitive is by taking some fundamental building blocks—vertices, edges, faces, and so on—and looking at how they get stuck together to make the surface. Although it’s more subtle than this, homology basically tells you which building blocks of a certain dimension don’t bound higher-dimensional building blocks in your space. This works with the ideas of holes we’ve already seen: the two distinct holes in the torus come from (one-dimensional) circles that don’t go around around a (two-dimensional) solid disk in the space. The two-dimensional hole comes from the fact that the torus is only made up of two-dimensional and smaller components, so its two-dimensional components don’t bound any three-dimensional parts of the surface. On the other hand, a solid torus (the whole donut) doesn’t have any two-dimensional parts that aren’t the boundary of three-dimensional parts, so it doesn’t have a two-dimensional hole. (The two-dimensional hole of the donut glaze is now filled with three-dimensional bread. Which is much better than toothpaste.)

A visualization of the Hopf fibration, which demonstrates the surprising fact that a basketball has a three-dimensional hole. Image: Niles Johnson, via Wikimedia Commons.

Mathematicians often refer to homology alone as detecting holes, leaving homotopy—and our earlier working definition of n-dimensional holes—high and dry. One advantage of this definition is that we’ll never have a higher-dimensional hole in a lower-dimensional space, a disturbing prospect that is the reason I find higher homotopy groups spooky. If we allow the homotopy-based definition of hole, a basketball has a three-dimensional hole. (So I guess it can be filled with whatever four-dimensional beings use to brush their teeth.) The Hopf fibration, which I also mentioned in my earlier post, is a map from the three-sphere to the two-sphere that can’t shrink down to a point.

So with holes, you get a choice of what definition you like the best. I think I prefer to use the homology definition, but there’s something beautiful about the idea that different hole-detectors can detect different holes, so I might try to open my heart—which for simplicity I’m assuming is topologically equivalent to a two-sphere—and let the three-dimensional hole in.

If you made it this far, you deserve a treat. How about a demonstration that a two-sphere filled with watermelon flesh doesn’t have a one-dimensional hole?

[embedded content]

Thanks to Arunima Ray and two Christopher Davises (Christophers Davis?) for their helpful comments about this post. Anything you didn’t like is my fault.

*Recipe for shrinking a circle to a point in the plane: start with a circle of radius 1, and for convenience, set it down at the point (0,0). We’ll define a two-variable shrinking map. The first variable will represent a point on the circle, which we’ll identify by angle (measured counterclockwise from the x-axis). The second variable represents time. I can shrink any circle down to a point over the time interval from 0 to 1 with the map F(a,t)=(1-t)a. At any time w strictly between 0 and 1, the image of this map is a circle of radius 1-w. At time 1, we have a “circle” of radius 0, which is also known as a point. Maps like this are used all the time as examples of explicit homotopies between paths.

Ask HN: View Source Like in the Matrix?

Ask HN: View Source Like in the Matrix?
2 points by coddle-hark 4 minutes ago | hide | past | favorite | 1 comment
I was rewatching The Matrix the other day and it got me thinking about information density in source code. In the movie, they view the “source code” of the matrix as vertical columns of scrolling characters. The characters vary in brightness and the scrolling varies in speed, and the whole screen is filled with these characters. Compared to how source code is usually presented, there’s a lot more information on the screen.

Has anybody looked into information dense representations of code like that? I’m thinking it might be useful for getting an oversight of a code base or something. Honestly I don’t even know the proper terminology to describe the effect, so if anyone can point me towards relevant literature I’d be grateful.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Converting Large Movies to Smallish MP4s with Ffmpeg

Converting Large Movies To Smallish Mp4s

Created: Mon Oct 26 2020 16:23:03 GMT-0400 (Eastern Daylight Time)

Converting Large Movies to Smallish MP4s using FFMPEG

Not much to this. I’ve set up a Plex server on a 2011ish MacBook air. This server is also responsible for hosting some homemade services on my local network.

Eventually I’d like to get it working with some sort of attached storage, but at the moment I’m using the MacBook’s HD which is relatively meager once you start filling up the HD with movies. I don’t need much though, since the main purpose is providing my kids with better educational content than is offered on Netlix / Prime / Hulu, as well as allowing us to eliminate our Disney+ subscription. The kids watch the same movies over and over again (he said as he cued up Olaf’s Frozen Adventure for the 10,000th time).

Warning: It isn’t a fast process, at least not on an old-ass MacBook. Doing a full season of a TV show takes forever since FFMPEG goes through line by line.

Warning: This appends .mp4 to the filename, meaning some-movie.mkv becomes some-movie.mkv.mp4. I’m working on a bulk file renaming tool to run afterward to format the titles into the Plex naming scheme ("MASH s01e09").



shrinkmov() {
    for movie in *;
        ffmpeg -i $movie -c:v libx264 -crf 18 "$movie.mp4";
        rm $movie;


shrinkmov() {
    for movie in $(ls);
        ffmpeg -i $movie -c:v libx264 -crf 18 "$movie.mp4";
        rm $movie;

Early use of nitazoxanide in mild Covid-19:randomized,placebo-controlled trial


The antiparasitic drug nitazoxanide is widely available and exerts broad-spectrum antiviral activity in vitro. However, there is no evidence of its impact on SARS-CoV-2 infection. In a multicenter, randomized, double-blind, placebo-controlled trial, adult patients who presented up to 3 days after onset of Covid-19 symptoms (dry cough, fever, and/or fatigue) were enrolled. After confirmation of SARS-CoV2 infection by RT-PCR on nasopharyngeal swab, patients were randomized 1:1 to receive either nitazoxanide (500 mg) or placebo, TID, for 5 days. The primary outcome was complete resolution of symptoms. Secondary outcomes were viral load, general laboratory tests, serum biomarkers of inflammation, and hospitalization rate. Adverse events were also assessed. From June 8 to August 20, 2020, 1,575 patients were screened. Of these, 392 (198 placebo, 194 nitazoxanide) were analyzed. Median time from symptom onset to first dose of study drug was 5 (4-5) days. At the 5-day study visit, symptom resolution did not differ between the nitazoxanide and placebo arms. However, at the 1-week follow-up, 78% in the nitazoxanide arm and 57% in the placebo arm reported complete resolution of symptoms (p=0.048). Swabs collected were negative for SARS-CoV-2 in 29.9% of patients in the nitazoxanide arm versus 18.2% in the placebo arm (p=0.009). Viral load was also reduced after nitazoxanide compared to placebo (p=0.006). No serious adverse events were observed. In patients with mild Covid-19, symptom resolution did not differ between the nitazoxanide and placebo groups after 5 days of therapy. However, early nitazoxanide therapy was safe and reduced viral load significantly.

Competing Interest Statement

Dr. Rocco reports personal fees from SANOFI as a DSMB member. The other authors declare no competing interests.

Clinical Trial


Funding Statement

Supported by the Brazilian Council for Scientific and Technological Development (CNPq), Brazilian Ministry of Science, Technology, and Innovation for Virus Network; Brasilia, Brazil, number: 403485/2020-7 and Funding Authority for Studies and Projects, Brasilia, Brazil, number: 01.20.0003.00.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.


The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The IRB/oversight body was the Brazilian National Committee of Ethics in Research (CONEP), and the approval number is CAAE: 32258920.0.1001.5257. The study was further approved by local committees of ethics in research for the seven health units. The investigators have followed all appropriate research reporting guidelines. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.


I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).


I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.


Data Availability

The investigators plan on turning all individual participant data (IPD) available after the publication of the manuscript, with cautious of not sharing confidential data.

Where do trees get their mass from?

Digging up the dirt on how trees grow.

Have you ever wondered where trees get their mass from? One of the more common answers, as seen in the video published in 2012 is that the mass (increasingly bigger size) of a tree comes from the soil. Which makes sense, right? After all, we are taught that plants need soil (enhanced “dirt”) to grow. According to Michigan State University Extension, problems typically arise when asked to explain why there isn’t a big hole around a tree. If the tree is using soil, then there must be less soil around it. But studies show virtually no difference in the amount of soil in a pot when a seed is planted from the amount of soil in the same pot when the plant from that seed is harvested. So where does the mass come from?

The mass of a tree is primarily carbon. The carbon comes from carbon dioxide used during photosynthesis. During photosynthesis, plants convert the sun’s energy into chemical energy which is captured within the bonds of carbon molecules built from atmospheric carbon dioxide and water. Yes, the carbon from carbon dioxide in the air we breathe out ends up in “food” molecules (called glucose) each of which contains 6 carbon atoms (and 12 hydrogen atoms and 6 oxygen atoms).

However, there is a negative side as well. Plants use the energy in some of the carbon molecules they make for the activities to keep themselves alive and to reproduce. This process is called cellular respiration, which all living things do. But there are still carbon molecules (glucose) left over. These left-over glucose molecules are used to form the complex structures of plants, such as leaves, stems, branches and roots as well as fruits, seeds, nuts or vegetables. Each year trees use the left-over carbon molecules to add to themselves, making themselves bigger in mass (size).

Voila! Most of the mass of trees is carbon. The processes involved are all pretty complicated and we can thank several Nobel Laureates for working out the details.

It is also important to note, the soil acts as an anchor for the plant through its roots as well as providing the plant with water and small amounts of nutrients that plants need, but the soil itself is not used.

To learn more about the ways 4-H youth can explore more about their environment, visit the science and technology page.

Did you find this article useful?

Text layout is a loose hierarchy of segmentation

I love text layout, and have been working with it in one form or other for over 35 years. Yet, knowledge about it is quite arcane. I don’t believe there is a single place where it’s all properly written down. I have some explanation for that: while basic text layout is very important for UI, games, and other contexts, a lot of the “professional” needs around text layout are embedded in much more complicated systems such as Microsoft Word or a modern Web browser.

A complete account of text layout would be at least a small book. Since there’s no way I can write that now, this blog post is a small step towards that – in particular, an attempt to describe the “big picture,” using the conceptual framework of a “loose hierarchy.” Essentially, a text layout engine breaks the input into finer and finer grains, then reassembles the results into a text layout object suitable for drawing, measurement, and hit testing.

The main hierarchy is concerned with laying out the entire paragraph as a single line of text. Line breaking is also important, but has a separate, parallel hierarchy.

The main text layout hierarchy

The hierarchy is: paragraph segmentation as the coarsest granularity, followed by rich text style and BiDi analysis, then itemization (coverage by font), then Unicode script, and shaping clusters as the finest.

diagram of layout hierarchy

Paragraph segmentation

The coarsest, and also simplest, segmentation task is paragraph segmentation. Most of the time, paragraphs are simply separated by newline (U+000A) characters, though Unicode in its infinite wisdom specifies a number of code point sequences that function as paragraph separators in plain text:

  • U+000A LINE FEED
  • U+000C FORM FEED
  • U+000D U+000A (CR + LF)
  • U+0085 NEXT LINE

In rich text, paragraphs are usually indicated through markup rather than special characters, for example <p> or <br> in HTML. But in this post, as in most text layout APIs, we’ll treat rich text as plain text + attribute spans.

Rich text style

A paragraph of rich text may contain spans that can affect formatting. In particular, choice of font, font weight, italic or no, and a number of other attributes can affect text layout. Thus, each paragraph is typically broken into a some number of style runs, so that within a run the style is consistent.

Note that some style changes don’t necessarily affect text layout. A classic example is color. Firefox, rather famously, does not define segmentation boundaries here for color changes. If a color boundary cuts a ligature, it uses fancy graphics techiques to render parts of the ligature in different color. But this is a subtle refinement and I think not required for basic text rendering. For more details, see Text Rendering Hates You.

Bidirectional analysis

Completely separate from the style spans, a paragraph may in general contain both left-to-right and right-to-left text. The need for bidirectional (BiDi) text is certainly one of the things that makes text layout more complicated.

Fortunately, this part of the stack is defined by a standard (UAX #9), and there are a number of good implementations. The interested reader is referred to Unicode Bidirectional Algorithm basics. The key takeaway here is that BiDi analysis is done on the plain text of the entire paragraph, and the result is a sequence of level runs, where the level of each run defines whether it is LTR or RTL.

The level runs and the style runs are then merged, so that in subsequent stages each run is of a consistent style and directionality. As such, for the purpose of defining the hierarchy, the result of BiDi analysis could alternatively be considered an implicit or derived rich text span.

In addition to BiDi, which I consider a basic requirement, a more sophisticated text layout engine will also be able to handle vertical writing modes, including mixed cases where short strings are horizontal within the vertical primary direction. Extremely sophisticated layout engines will also be able to handle ruby text and other ways of annotating the main text flow with intercalated strings. See Requirements for Japanese Text Layout for many examples of sophisticated layout requirements; the scope of this blog post really is basic text layout of the kind needed in user interfaces.

Itemization (font coverage)

Itemization is the trickiest and least well specified part of the hierarchy. There is no standard for it, and no common implementation. Rather, each text layout engine deals with it in its own special way.

Essentially, the result of itemization is to choose a single concrete font for a run, from a font collection. Generally a font collection consists of a main font (selected by font name from system fonts, or loaded as a custom asset), backed by a fallback stack, which are usually system fonts, but thanks to Noto it is possible to bundle a fallback font stack with an application, if you don’t mind spending a few hundred megabytes for the assets.

Why is it so tricky? A few reasons, which I’ll touch on.

First, it’s not so easy to determine whether a font can render a particular string of text. One reason is Unicode normalization. For example, the string “é” can be encoded as U+00E9 (in NFC encoding) or as U+0065 U+0301 (in NFD encoding). Due to the principle of Unicode equivalence, these should be rendered identically, but a font may have coverage for only one or the other in its Character to Glyph Index Mapping (cmap) table. The shaping engine has all the Unicode logic to handle these cases.

Of course, realistic fonts with Latin coverage will have both of these particular sequences covered in the cmap table, but edge cases certainly do happen, both in extended Latin ranges, and other scripts such as Hangul, which has complex normalization rules (thanks in part to a Korean standard for normalization which is somewhat at odds with Unicode). It’s worth noting that DirectWrite gets Hangul normalization quite wrong.

I believe a similar situation exists with the Arabic presentation forms; see Developing Arabic fonts for more detail on that.

Because of these tricky normalization and presentation issues, the most robust way to determine whether a font can render a string is to try it. This is how LibreOffice has worked for a while, and in 2015 Chromium followed. See also Eliminating Simple Text for more background on the Chromium text layout changes.

Another whole class of complexity is emoji. A lot of emoji can be rendered with either text or emoji presentation, and there are no hard and fast rules to pick one or the other. Generally the text presentation is in a symbol font, and the emoji presentation is in a separate color font. A particularly tough example is the smiling emoji, which began its encoding life as 0x01 in Code page 437, the standard 8-bit character encoding of the original IBM PC, and is now U+263A in Unicode. However, the suggested default presentation is text, which won’t do in a world which expects color. Apple on iOS unilaterally chose an emoji presentation, so many text stacks follow Apple’s lead. (Incidentally, the most robust way to encode such emoji is to append a variation selector to pin down the presentation.)

Another source of complexity when trying to write a cross-platform text layout engine is querying the system fonts. See Font fallback deep dive for more information about that.

I should note one thing, which might help people doing archaeology of legacy text stacks: it used to be pretty common for text layout to resolve “compatibility” forms such as NFKC and NFKD, and this can lead to various problems. But today it is more common to solve that particular problem by providing a font stack with massive Unicode coverage, including all the code points in the relevant compatibility ranges.


The shaping of text, or the transformation of a sequence of code points into a sequence of positioned glyphs, depends on the script. Some scripts, such as Arabic and Devanagari, have extremely elaborate shaping rules, while others, such as Chinese, are a fairly straightforward mapping from code point into glyph. Latin is somewhere in the middle, starting with a straightforward mapping, but ligatures and kerning are also required for high quality text layout.

Determining script runs is reasonably straightforward – many characters have a Unicode script property which uniquely identifies which script they belong to. However, some characters, such as space, are “common,” so the assigned script just continues the previous run.

A simple example is “hello мир”. This string is broken into two script runs: “hello “ is Latn, and “мир” is Cyrl.

Shaping (cluster)

At this point, we have a run of constant style, font, direction, and script. It is ready for shaping. Shaping is a complicated process that converts a string (sequence of Unicode code points) into positioned glyphs. For the purpose of this blog post, we can generally treat it as a black box. Fortunately, a very high quality open source implementation exists, in the form of HarfBuzz.

We’re not quite done with segmentation, though, as shaping assigns substrings in the input to clusters of glyphs. The correspondence depends a lot on the font. In Latin, the string “fi” is often shaped to a single glyph (a ligature). For complex scripts such as Devanagari, a cluster is most often a syllable in the source text, and complex reordering can happen within the cluster.

Clusters are important for hit testing, or determining the correspondence between a physical cursor position in the text layout and the offset within the text. Generally, they can be ignored if the text will only be rendered, not edited (or selected).

Note that these shaping clusters are distinct from grapheme clusters. The “fi” example has two grapheme clusters but a single shaping cluster, so a grapheme cluster boundary can cut a shaping cluster. Since it’s possible to move the cursor between the “f” and “i”, one tricky problem is to determine the cursor location in that case. Fonts do have a caret table, but implementation is spotty. A more robust solution is to portion the width of the cluster equally to each grapheme cluster within the cluster. See also Let’s Stop Ascribing Meaning to Code Points for a detailed dive into grapheme clusters.

Line breaking

While short strings can be considered a single strip, longer strings require breaking into lines. Doing this properly is quite a tricky problem. In this post, we treat it as a separate (small) hierarchy, parallel to the main text layout hierarchy above.

The problem can be factored into identifying line break candidates, then choosing a subset of those candidates as line breaks that satisfy the layout constraints. The main constraint is that lines should fit within the specified maximum width. It’s common to use a greedy algorithm, but high end typography tends to use an algorithm that minimizes a raggedness score for the paragraph. Knuth and Plass have a famous paper, Breaking Paragraphs into Lines, that describes the algorithm used in TeX in detail. But we’ll focus on the problems of determining candidates and measuring the widths, as these are tricky enough.

In theory, the Unicode Line Breaking Algorithm (UAX #14) identifies positions in a string that are candidate line breaks. In practice, there are some additional subtleties. For one, some languages (Thai is the most common) don’t use spaces to divide words, so need some kind of natural language processing (based on a dictionary) to identify word boundaries. For two, automatic hyphenation is often desirable, as it fills lines more efficiently and makes the right edge less ragged. Liang’s algorithm is most common for automatically inferring “soft hyphens,” and there are many good implementations of it.

Android’s line breaking implementation (in the Minikin library) applies an additional refinement: since email addresses and URLs are common in strings displayed on mobile devices, and since the UAX #14 rules give poor choices for those, it has an additional parser to detect those cases and apply different rules.

Finally, if words are very long or the maximum width is very narrow, it’s possible for a word to exceed that width. In some cases, the line can be “overfull,” but it’s more common to break the word at the last grapheme cluster boundary that still fits inside the line. In Android, these are known as “desperate breaks.”

So, to recap, after the paragraph segmentation (also known as “hard breaks”), there is a loose hierarchy of 3 line break candidates: word breaks as determined by UAX #14 (with possible “tailoring”), soft hyphens, and finally grapheme cluster boundaries. The first is preferred, but the other two may be used in order to satisfy the layout constraints.

This leaves another problem, which is suprisingly tricky to get fully right: how to measure the width of a line between two candidate breaks, in order to validate that it fits within the maximum width (or, in the more general case, to help compute a global raggedness score). For Latin text in a normal font, this seems almost ridiculously easy: just measure the width of each word, and add them up. But in the general case, things are nowhere nearly so simple.

First, while in Latin, most line break candidates are at space characters, in the fully general case they can cut anywhere in the text layout hierarchy, even in the middle of a cluster. An additional complication is that hyphenation can add a hyphen character.

Even without hyphenation, because shaping is Turing Complete, the width of a line (a substring between two line break candidates) can be any function. Of course, such extreme cases are rare; it’s most common for the widths to be exactly equal to the sum of the widths of the words, and even in the other cases this tends to be a good approximation.

So getting this exactly right in the general case is conceptually not difficult, but is horribly inefficient: for each candidate for the end of the line, perform text layout (mostly shaping) on the substring from the beginning of the line (possibly inserting a hyphen), and measure the width of that layout.

Very few text layout engines even try to handle this general case, using various heuristics and approximations which work well most of the time, but break down when presented with a font with shaping rules that change widths aggressively. DirectWrite does, however, using very clever techniques that took several years of iteration. The full story is in harfbuzz/harfbuzz#1463 (comment). Further analysis, towards a goal of getting this implemented in an open source text layout engine, is in yeslogic/allsorts#29. If and when either HarfBuzz or Allsorts implements the lower-level logic, I’ll probably want to write another blog post explaining in more detail how a higher level text layout engine can take advantage of it.

A great example of how line breaking can go wrong is Firefox bug 479829, in which an “f + soft hyphen + f” sequence in the text is shaped as the “ff” ligature, then the line is broken at the soft hyphen. Because Firefox reuses the existing shaping rather than reshaping the line, it actually renders with the ligature glyph split across lines:

Implementations to study

While I still feel a need for a solid, high-level, cross-platform text layout engine, there are good implementations to study. In open source, on of my favorites (though I am biased), is the Android text stack, based on Minikin for its lower levels. It is fairly capable and efficient, and also makes a concerted effort to get “all of Unicode” right, including emoji. It is also reasonably simple and the code is accessible.

While not open source, DirectWrite is also well worth study, as it is without question one of the most capable engines, supporting Word and the previous iteration of Edge before it was abandonded in favor of Chromium. Note that there is a proposal for a cross-platform implementation and also potentially to take it open-source. If that were to happen, it would be something of a game changer.

Chromium and Firefox are a rich source as well, especially as they’ve driven a lot of the improvements in HarfBuzz. However, their text layout stacks are quite complex and do not have a clean, documented API boundary with the rest of the application, so they are not as suitable for study as the others I’ve chosen here.


Paragraph and style segmentation (with BiDi) is done at higher levels, in and At that point, runs are handed to Minikin for lower-level processing. Most of the rest of the hierarchy is in Layout.cpp, and ultimately shaping is done by HarfBuzz.

Minikin also contains a sophisticated line breaking implementation, including Knuth-Plass style optimized breaking.

Android deals with shaping boundaries by using heuristics to further segment the text to implied word boundaries (which are also used as the grain for layout cache). If a font does shaping across these boundaries, the shaping context is simply lost. This is a reasonable compromise, especially in mobile, as results are always consistent, ie the width for measurement never mismatches the width for layout. And none of the fonts in the system stack have exotic behavior such as shaping across spaces.

Android does base its itemization on cmap coverage, and builds sophisticated bitmap structures for fast queries. As such, it can get normalization issues wrong, but overall this seems like a reasonable compromise. In particular, most of the time you’ll run into normalization issues is with Latin and the combining diacritical marks, both of which are supplied by Roboto, which in turn has massive Unicode coverage (and thus less need to rely on normalization logic). But with custom fonts, handling may be less than ideal, resulting in more fallback to Roboto than might actually be needed.

Note that Minikin was also the starting point for libTxt, the text layout library used in Flutter.


Some notes on things I’ve found while studying the API; these observations are quite a bit in the weeds, but might be useful to people wanting to deeply understand or engage the API.

Hit testing in DirectWrite is based on leading/trailing positions, while in Android it’s based on primary and secondary. The latter is more useful for text edition, but leading/trailing is a more well-defined concept (for one, it doesn’t rely on paragraph direction). For more information on this topic, see linebender/piet#323. My take is that proper hit testing requires iterating through the text layout to access lower level structures.

While Core Text (see below) exposes a hierarchy of objects, DirectWrite uses the TextLayout as the primary interface, and exposes internal structure (even including lines) by iterating over a callback per run in the confusingly named Draw method. The granularity of this callback is a glyph run, which corresponds to “script” in the hierarchy above. Cluster information is provided in an associated glyph run description structure.

There are other ways to access lower level text layout capabilities, including TextAnalyzer, which computes BiDi and line break opportunities, script runs, and shaping. In fact, the various methods on that interface represents much of the internal structure of the text layout engine. Itemization, however, is done in the FontFallback interface, which was added later.

Core Text

Another high quality implementation is Core Text. I don’t personally find it as well designed as DirectWrite, but it does get the job done. In general, though, Core Text is considered a lower level interface, and applications are recommended to use a higher level mechanism (Cocoa text on macOS, Text Kit on iOS).

When doing text layout on macOS, it’s probably better to use the platform-provided itemization method (CTFontCreateForString), rather than getting the font list and doing itemization in the client. See linebender/skribo#14 for more information on this tradeoff.


At this point, the Druid GUI toolkit does not have its own native text layout engine, but rather does provide a cross-platform API which is delegated to platform text layout engines, DirectWrite and Core Text in particular.

The situation on Linux is currently unsatisfactory, as it’s based on the Cairo toy text API. There is work ongoing to improve this, but no promises when.

While the Piet text API is currently fairly basic, I do think it’s a good starting point for text layout, especially in the Rust community. While the complexity of Web text basically forces browsers to do all their text layout from scratch, for UI text there are serious advantages to using the platform text layout capabilities, including more consistency with native UI, and less code to compile and ship.


I should at least mention Pango, which provides text layout capabilities for Gtk and other software. It is open source and has a long history, but is more focused on the needs of Linux and in my opinion is less suitable as a cross-platform engine, though there is porting work for both Windows and macOS. As evidence it hasn’t been keeping quite up to date, the Windows integration is all based on GDI+ rather than the more recent Direct2D and DirectWrite, so capabilities are quite limited by modern standards.

The question of level

A consistent theme in the design of text level APIs is: what level? Ideally the text layout engine provides a high level API, meaning that rich text (in some concrete representation) comes in, along with the fonts, and a text layout object comes out. However, this is not always adequate.

In particular, word processors and web browsers have vastly more complex layout requirements than can be expressed in a reasonable “attributed string” representation of rich text. For these applications, it makes sense to break apart the task of text layout, and provide unbundled access to these lower levels. Often, that corresponds to lower levels in the hierarchy I’ve presented. A good choice of boundary is style runs (including BiDi), as it simiplifies the question of rich text representation; expressing the style of a single run is simpler than a data structure which can represent all formatting requirements for the rich text.

Until more recently, web browsers tended to use platform text capabilities for the lower levels, but ultimately they needed more control, so for the most part, they do all the layout themselves, deferring to the platform only when absolutely necessary, for example to enumerate the system fonts for fallback.

The desire to accommodate both UI and browser needs motivated the design of the skribo API, and explains why it only handles single style runs. Unfortunately, the lack of a complementary high level driver proved to be quite a mistake, as there was no easy way for applications to use the library. We will be rethinking some of these decisions in coming months.

Other resources

A book in progress on text layout is Fonts and Layout for Global Scripts by Simon Cozens. There is more emphasis on complex script shaping and fonts, but touches on some of the same concepts as here.

Another useful resources is Modern text rendering with Linux: Overview, which has a Linux focus and explains Pango in more detail. It also links the SIGGRAPH 2018 – Digital typography slide deck, which is quite informative.

Thanks to Chris Morgan for review and examples.