From Tomatoes to Tumors:
How artificial intelligence is disrupting industries from the simple to the sophisticated
Artificial intelligence has changed both how I do my job and the trajectory of my career. I am not alone; artificial intelligence is disrupting many different industries and will only continue to grow more relevant as the technology matures. Currently, I provide legal support to AI researchers, and I can tell you they are pioneering technology that will dramatically change our lives. Whether you are a young professional just starting in your career, a parent preparing your children for the future, or a person simply watching our society grapple with the consequences of this revolution, it is important you understand how artificial intelligence works and how we got to this moment.
My journey into the world of artificial intelligence happened somewhat by accident as a hungry law student. I, like most law students at the University of Chicago, survived on a “balanced” diet of Giordano’s deep-dish pizza, Potbelly subs, Thai noodles, and vegan salads (i.e. the four restaurants that cater within a mile radius of the law school). Various student groups, law firms, and other organizations sponsor lunches luring you with a free meal. The alternative is eating a cliff bar and spending an hour in the law library nose-deep in a dusty tome reading about contract disputes over the definition of what constitutes a “chicken” (an actual case) or whether marooned British sailors deserved clemency after killing and eating a shipmate (also another actual case). Of course, fear of failure is a powerful motivator, so type-A law students looking for an edge, forego food in exchange for more book time. One day my hunger overpowered my dread of failing my first-year contracts class and I wandered into a lunch talk entitled “the Law of Robots”.
The presenter was Ed Walters, an adjunct professor at Georgetown and CEO of Fastcase, a legal research company. Walters began his presentation discussing initiatives by Tesla, Waymo and Google to develop self-driving cars and the ethical quandaries in the design and regulation of these vehicles. “Should the vehicles prioritize the safety of the passenger or a pedestrian when the car must decide whether to hit the pedestrian or swerve into a brick wall?” He asked. He then went on to discuss how we would need a new liability framework, because now it wasn’t just the humans who were responsible for the outcomes: “When two autonomous vehicles get in an accident, should the manufacturers be responsible for errors in the vehicle’s decision making or is it the ultimate responsibility of the owner/driver to be in control and monitor those decisions?” Walters said that in order to answer these and other questions, legal academics were looking back at nineteenth-century statutes related to horses as a framework for regulating autonomous vehicles.
After that discussion, Walters moved on to the effects of automation more generally. Walters warned us that this artificial intelligence revolution wasn’t just going to be relevant sometime in the next decade or more — once Google or Tesla have worked out all the kinks — but that AI is changing the world right now. He also warned that it wasn’t just taxi drivers and the teamsters who should be worried about automation, but that many industries, including the legal industry, were prime territory to experience the upheaval of automation.
I had assumed that automation would start doing the least-sophisticated jobs and then slowly creep up as technology advanced, but that was a mistaken assumption. Technology wasn’t just automating away cashier’s jobs at McDonalds, robots weren’t just replacing quality controllers on America’s farms, but rather artificial intelligence was transforming the opposite end of the job spectrum — outperforming doctors reading MRI scans to diagnose brain tumors.¹ In the legal industry, Dentons, the world’s largest law firm, was investing heavily in tools to automate legal research.²
Although my stomach was full, it was still a bit queasy upon leaving Walters’s presentation, because as a first-year law student, the software that Walters was describing sounded a lot like what I was hoping to do in my soon-to-be job post-graduation. Law school is highly theoretical, focusing on principles and legal philosophy instead of actual details about legal statutes and how to serve clients. If you asked me the one thing that I felt prepared to do after graduating, I would have replied “Well, I know how to do legal research and write a research paper.” If Walters was correct, it was quite possible that very quickly the tools essential for a lawyer might be shifting beneath my feet. Thankfully, I was able to secure a solid job offer at a reputable firm in Chicago, so I continued pursuing my plan to become a lawyer, but “The Law of Robots” stuck with me, and in the back of my head I resolved to keep my eyes open for signs of change.
For many reasons, you might think that the legal services industry is the last place you would see technological disruption. Courts and law firms are some of the stodgiest most insulated institutions in society, they are extremely slow to adopt any change. If you visit the Supreme Court building in Washington DC, you will notice that turtles and tortoise motifs abound in the architecture and decorative features. This is intentional. Some Supreme Court justices still don’t use email,³ and some of the partners I worked with still scribbled out their daily billing timesheets in pencil and then gave them to their secretaries to enter into the billing system. Tracking time was a significant undertaking and was accounted for in 6-minute increments. One might expect that things would carry on much the same as it always had been, but that was not my experience.
After taking the bar exam and joining my firm in 2017, during one of my first big assignments I learned the firm had just purchased a software subscription to a tool called KIRA, which was designed to assist reviewing contracts for legal diligence. Let me digress to describe briefly what goes on in legal diligence.
I worked as a transactional lawyer, which means I never saw the inside of a courtroom (if I did my job properly), rather I assisted with the buying and selling of companies and assets. In the ever-shifting plate tectonics of capitalism, companies form, grow, multiply, merge, or dissolve into bankruptcy, and at each stage lawyers help their clients understand what is being bought and sold and the risks associated with those decisions. In the course of its life, a company enters into hundreds of contracts with customers, suppliers, landlords, employees, partners, governments, etc. Each one of these contracts includes promises and obligations binding the company. In order to understand a company, you must understand its contracts.
When a company is looking to be bought by another company, a huge amount of time is spent by lawyers reading, analyzing, and summarizing these contracts to ensure that the buyer understands what they are buying. This is as sexy as I can make legal diligence sound; in practice it often means getting a call at 4:00 PM on a Friday saying that three thousand PDFs have been uploaded to a data room and that the client wants a high-level summary of the customer agreements by Monday morning — i.e. it means that your weekend is ruined sorting and reading through several hundred documents.⁴ Okay, forgive that detour, let me return to KIRA.
KIRA automated the worst parts about diligence. You could dump PDFs into KIRA and it would scan them, generate machine-searchable versions of the docs, and detect duplicates. KIRA also had several project management tools such as allowing you to create custom tags for documents and the ability to assign documents to particular attorneys to review. If that was all it did, I still would have worshipped it, but KIRA also was equipped with a form of artificial intelligence called “natural language processing” which allowed it to read through the contracts and identify key parts of the documents such as the party names, dates, as well as legal sections like indemnities, licenses, and warranties (things which lawyers care about and need to summarize for their clients). It would flag the language and highlight the text specific colors that allowed for faster skimming as well as copying relevant sections for our diligence reports. KIRA dramatically sped up the “first pass” through documents that associates normally had to do. In short, KIRA was just the type of software that Ed Walters had warned might significantly disrupt the legal industry and automate me out of a job.
Thankfully, I was lucky that KIRA was in its infant stages and wasn’t robust enough to fully replace me. While the language processing was good, it made mistakes, it would often misidentify sections and miss relevant language. Also, KIRA couldn’t provide qualitative assessments of any language in a contract, only identify the type of clause it was, so I still had to give my analysis of particular sections. However, despite these limitations, once I found out about KIRA, I immediately used it at every opportunity and tried to become as well-versed in its capacities to make myself standout from the crowd.
After digging around deeper into the software, I learned that KIRA wasn’t limited to identify only the parts of contracts that came preloaded. KIRA utilized something called “machine learning” and could be trained to search for additional types of clauses if I provided it with examples of what I wanted it to look for. On one assignment, I was tasked with reviewing employment agreements to ensure that they had proper language granting the employer rights to intellectual property developed by the employees during their employment. I realized that the company used a form agreement for these employment contracts and that the form had gone through several iterations over time. One of the versions of the company form had problematic language. I trained KIRA to search for the problematic clauses and within less than an hour produced a report showing which employment agreements had issues out of the several hundred other agreements. Without the software this task would have taken me around two days to complete, but the software dramatically enhanced my speed. My client was thrilled, not only because I had gotten them an answer much faster than they anticipated, but also because I only billed them for one hour of time rather than twelve hours of time it would have taken otherwise.
This sounds like all a happy story until you understand the economic model undergirding the modern law firm: it is a pyramid with partners at the top managing and billing out associate’s time below. The more associates each partner can keep busy, the greater the profits of the firm. Also, the larger the pool of associates, the more chances the firm has at selecting the brightest and best to become the next generation of partners. This relationship doesn’t just benefit the partners. Associates emerge from law school without any experience that would cause a client to trust them to make decisions or negotiate on their behalf. Graduates from the top schools demand hefty salaries and much of their time is spent learning, so the firm has to “write off” a lot of young associates’ time. If I’m a client, I’m not going to tolerate paying for five hours of work being done by a newly-minted attorney that has never drafted a non-disclosure agreement before when it would only take a third-year associate 30 minutes to crank it out. For the first couple of years after graduation, most firms lose money on associates. The way historically that associates make up for their inexperience and keep the lights on is doing the grunt work of things like contract diligence.
I benefited by using KIRA because it was just the first version, KIRA could augment my abilities, but it couldn’t completely replace me. I was able to harness those efficiencies to propel my own career, but in the aggregate, KIRA dried up the pool of tasks that could be done by junior attorneys. If I had stayed at the law firm, I probably could have ridden out the wave of automation, staying just ahead of the break, but I worry a lot about those who are in law school right now, because the tools are improving rapidly and even if they aren’t as good as a human, it just has to be good enough, when the alternative is run the software for $5 or pay an associate $500. Law firms have already started to integrate modules on artificial intelligence and robotic process automation into their training for new associates.⁵
I’m convinced Ed Walters was correct and my experience demonstrates what I believe is happening in a myriad of industries — AI is fundamentally changing things and not just in the legal services industry. In the past few years, machine learning applications have become ubiquitous in many industries:
· Agriculture- Harvesting equipment can weed out sub-par produce right in the fields without any human input. Using high-speed cameras, harvesters takes pictures of freshly-harvested crops, analyze the photos to identify defects such as green tomatoes, calculate the trajectory of the problematic fruit on a conveyor belt, and knock them out mid-air with fast-flicking fingers as they are being transferred to the loading truck.⁶
· Conservation- When my wife worked at National Geographic almost ten years ago, she told me about one intern who drew the short straw and had to review and log footage from cameras attached to humpback whales that were triggered to record every time there was noise detected. The triggers on the camera were very sensitive, so this poor kid spent his summer in the basement watching dozens of hours of footage of the open ocean and the back of a whale punctuated with the occasional guttural whale call in order to find the interesting five minutes of footage to be used in an upcoming project. Now conservation researchers utilize machine learning to process hundreds of hours of recordings from an array of 50 microphone stations in the dense forests of Africa in order to monitor an elusive population of elephants.⁷
· Education- Microsoft recently launched a project with 10,000 schools in India to reduce the number of students dropping out. Microsoft’s model utilizes details about enrollment, student performance, gender and socio-economic demographics, school infrastructure and teacher skills to find predictive patterns in order to identify at-risk students. Armed with the data, schools can intervene early with counseling and other services.⁸
· Epidemiology- The Center for Disease Control has partnered with Google to identify flu outbreaks. When certain keywords start appearing in the google searches in particular communities, the CDC can identify hotspots before patients start showing up in hospitals.⁹
· Human Resources- Companies such as Unilever, IBM, and Dunkin (Donuts) use video interviews that are reviewed by machine learning to analyze the applicants’ facial emotions and (supposedly) personality. In law school, I advised a company that used content from job applicants’ social media profiles to measure their potential as car salespeople.¹⁰
· Linguistics/Translation- Researchers have developed a program that creates “voice skins” to mimic people’s voices. I had the chance to participate in a demo, I was asked to read a sample training script in English and then the program generated replies in different languages with computer generated speech that mimicked my voice. It was not production ready, but the technology was passable, and crazy to hear myself speaking in the language Urdu.¹¹
· Medical Diagnostics- Machine learning models outperformed human doctors reviewing mammograms for breast cancer, showing both a lower rate of false-positives and false-negatives, even when two human doctors reviewed each mammogram.¹²
· Pharmaceuticals- In the pharmaceutical industry, in order to develop a new drug, scientists must first select a new lab-created molecule and then proceed through a time consuming and expensive process to test all the possible interactions the molecule has when introduced in the body. If we were better at identifying the right candidates for trials, we could significantly decrease the time and cost of drug development. Drug prediction models utilizing machine learning have been used to identify novel candidate biomolecules for disease targets such as multiple sclerosis and Ebola.¹³
· Professional Sports- In 2017, after winning the World Series, the Houston Astros were embroiled in scandal when it was revealed they had been stealing rival teams’ signs. Employees analyzed videos of catchers’ hand signals — which are used to communicate between the catcher and pitcher about what type of pitch should come next — and then input the signs and resulting pitches into software which would then find the true signals among the fake signs and their meanings.¹⁴
· Sales- Salespeople are turning to machine learning to help analyze and improve their sales pitches. Commercial software is available that listens to a person’s sales calls and monitors a wide range of characteristics: word usage, number of questions asked by the salesperson, the ratio between how much time the salesperson speaks compared to potential customers and provides a personalized report. Companies assemble metrics from their entire sales teams to identify trends and successful strategies.¹⁵
· Videography- iPhones have a feature that automatically curates your videos and photos into vacation highlight compilations and slideshows. With relevant music, titles and themes based on the image content and metadata. This technology was developed utilizing machine learning to identify good photos and how to order them in a manner that humans find pleasing.¹⁶
Despite these shifts, most people do not understand how machine learning works, why it is so important, and what it means for our future. This primer is my best attempt to explain, hopefully in a somewhat engaging manner, the artificial intelligence revolution.
In order to understand artificial intelligence, you first must understand three big trends which all have peaked in the last five years to make artificial intelligence really feasible. These trends are Moore’s Law, Big Data, and Machine Learning
Moore’s Law
The brain of any computer is its central processing unit or CPU. Modern CPUs conduct billions of simple operations every second to break down tasks into a series of tiny yes/no decisions which when scaled produce incredible results. The speed or power of a computer is constrained by the speed at which the CPU can perform those operations. The bits on the CPU that are responsible for handling operations are transistors. Roughly speaking, the more transistors you can cram into a smaller space, the faster your computer can run. (If you want a more fulsome and accurate explanation, ask Grandpa Gary who worked designing microchips for large corporate mainframe systems).
Since 1960, chipset makers have been engaged in an arms race to shrink transistors down to smaller and smaller sizes, cramming more and more transistors into a smaller silicon microchip. In 1965, Gordon Moore, one of the cofounders of Intel, made a bold prediction: the number of transistors in a microchip would double every two years for at least the next decade. This means that computers would get faster and cheaper at an exponential rate.¹⁷ His prediction, which came to be known as “Moore’s Law”, has proven to be remarkably prescient as it has continued to be true for the past 55 years.¹⁸
This shouldn’t be too surprising to anyone who has watched their smartphone improve dramatically as they upgrade every couple of years. Think about how far things have come in the short thirteen years since the iPhone was introduced in 2007. It appears that Moore’s Law is finally slowing down as transistors are reaching such tiny sizes that they are approaching the scale where traditional laws of physics cease to operate. Barring some breakthrough, CPU capacity may start to level off.
The exponential increase in the power of computing does not simply mean that we can have greater resolution videos on Netflix or that our emails load faster, it has made it possible for computers to out-perform human experts at certain tasks. In 1996, a team of scientists at IBM developed a super-computer to play chess nicknamed “Deep Blue.” Using logic developed by chess experts, Deep Blue analyzed a particular position on the chess board and then utilizing a method called a “Monte Carlo tree search” played out all the possible move combinations and assigning a score to the new positions and then played out every possible combination after that for each move. The machine could evaluate between 100 million to 200 million positions a second and would sometimes compute up to 50 levels deep.¹⁹ In 1996, Deep Blue faced off against Gary Kasparov in a match tournament. The Deep Blue team’s challenge was an audacious one as It is difficult to overstate the brilliance of Gary Kasparov. Many consider Kasparov the best chess player to ever play the game. He was world champion for an unparalleled 15 years from 1985 until 2000. No other player before or since has ever come close to Kasparov’s career dominating the world of chess. After six games, Kasparov emerged victorious having won 3, drawn 2 and lost only 1 game. Not to be deterred, and knowing that Moore’s law was on their side, the Deep Blue team went back to the drawing board and challenged Kasparov to a rematch the very next year.
Kasparov felt confident after winning Game 1. In Game 2, Kasparov tried to hoodwink the computer with what was known as a “poisoned pawn”, alluring the computer into a poor position. Similar tactics had worked previously against other chess engines. When Deep Blue failed to take the bait and eventually Kasparov resigned, he accused the IBM team of cheating by having a grandmaster audit Deep Blue’s decisions behind the curtain. The next three games ended in draws. Deep Blue would win in game 6, clinching the match and defeating the world champion.
The Deep Blue team retired from chess and moved on to other projects, but Moore’s law marched on. Today’s top chess engines — albeit aided by further algorithmic advancements — running on an iPad could easily beat Deep Blue.²⁰
Big Data
Just as the cost of computing has fallen precipitously over the course of the last half century, the cost of data storage has undergone a similar revolution. In 1981, IBM created the first hard drive capable of storing one gigabyte. The hard drive weighed 550 pounds and cost over $100,000, after adjusting for inflation.²¹ In 2007, I got a 1 GB flash drive in my Christmas stocking before heading out to my freshman year of college. Today, a one terabyte hard drive with a thousand times more storage can be purchased for $50.
As broadband and 4G wireless internet became more prevalent, the need to carry around large amounts of data storage in your home or on your person has become unnecessary. It’s much safer and cost effective to rent storage in “the Cloud.” Data storage is so cheap that many companies offer it freely as a means of luring users to their service (e.g. dropbox, google drive).
Because data can move freely and be stored cheaply, companies have been generating and capturing incredibly large amounts of data. For example, the Twitter community generates over 4 petabytes of data annually,²² which is the equivalent of 500 billion pages of standard typed text.²³ By one estimate, Google has over 15 exabytes of information, or the equivalent of all the words ever spoken by mankind… times three.²⁴
In the old days, data had to be carefully curated and trimmed to make sure you weren’t capturing clutter or weren’t storing things you wouldn’t need. Data was difficult to process and required special tagging or formatting to be accessible to either humans or the first-generation computers. Now because costs are so low, companies just suck up and track everything they can (that is until governments started imposing limits on what you could capture and before giant data breaches started costing companies huge amounts of money). Before companies were like old libraries, using a carefully curated card index system capturing only the most essential details: Title, Author, year published, decimal number, topic. Now companies operate like the Google Books project and create page-by-page scans of every book they can get their hands on (Google’s last count was 40 million distinct titles).²⁵
Data Analytics
The more data you have, the better decisions you can make. Humans are good at recognizing patterns in data and teasing out correlations, but living in an era of Big Data, the firehose of information to be processed has become so large that we had to develop computer-aided methods of making sense of it all, this field has become known as “data analytics” and “data science”. Using these sophisticated methods, machines can discover so many correlations in such seemingly random and disparate pieces of information that we start to think the machines are listening to our thoughts. One study found that 55% of Americans believe their iPhone, Alexa, or Google Home is secretly recording all their conversations because they start seeing ads on Facebook or YouTube for a new brand of mattress right after complaining to their spouse about having an aching back and a poor night’s sleep.²⁶ Multiple watchdog groups and researchers have failed to find any evidence this surveillance is happening.²⁷ I don’t believe employees at Google or Amazon are secretly activating your phone’s microphone and recording your conversations, and the reason they aren’t is because they don’t need to.
Every major retail/grocery store has a rewards program that is tied to a card or your telephone number. These retailers have begun tracking what consumers are buying which products, in what amounts, in what combinations, at what times, and use all of that data to help make better decisions about what to sell, how much to sell, where to place it, and how to advertise it. This data is powerful and can reveal even intimate details about our lives. To illustrate the power of tracking and analyzing consumer behavior, we should look at Target.
After going through several bloody years doing battle with Wal-Mart, Target realized that they needed to refocus their strategy. On the one hand, Target couldn’t compete with Wal-Mart’s margins as the everything store and for specialty items, it had difficulty matching the selection expertise of dedicated stores like electronics at Best Buy or toys at Toys-R-Us. To get people to change behavior and consider Target for their shopping was proving difficult. However, Target researchers realized there are certain times when people’s habits get thrown out the window and they are ripe for influencing. It just so happens that the holy grail of malleable consumers is sleep-deprived new parents going to the store for an emergency pack of diapers. New babies disrupt parents’ routines so completely that if you can insert yourself into their new habits and routines, you can win a customer for life. If Target could attract a tired dad for his emergency diaper run, they can likely count on him to grab a few groceries and whatever household goods along the way, and maybe even glance in the electronics section. The trouble is that Target isn’t the only one who has identified parents as ripe targets of opportunity. Once a baby is born, a matter of public record, parents are inundated with coupons and advertisements from a swarm of vendors all competing for their marginal dollar. Many companies employ baby registries to capture customers and sales a few months before the baby is born. Trying to capture the customer even earlier, Target started analyzing the purchasing histories of all the women who had signed up for Target’s baby registries to see how soon they could identify when women were pregnant.²⁸
The result of Target’s study was a list of 25 products that were highly correlated with first-trimester pregnancy and if purchased could accurately predict a woman’s due date to within a few weeks. The list wasn’t things like car seats, onesies, and bottles, since women weren’t buying actual baby items yet, but rather things vitamins and unscented lotion. Target used its pregnancy predictor to target women with advertisements and coupons at specific times throughout their pregnancies hoping to hook them into being Target-shoppers.
During the first gold-rush of the internet in the latter half of the 90s, everyone and their dog tried to set up a website and claim their fortune. The convenience of ordering a pizza or buying a new leash for your pet from your desktop seemed such an alluring prospect that it led to rampant speculation and runaway investment in internet companies in what came to be known as the “dot-com boom”. The textbook example was one online pet store “Pets.com”, founded in 1998, which at its peak in 1999 was valued at $300 million dollars. The company funneled millions of dollars into high-profile advertising, featuring their mascot as a float in the Macy’s Thanksgiving Day parade and in a commercial at the Super Bowl. Pets.com went belly up just a year later when the dot-com bubble burst and investors suddenly woke to the realization that dog food is heavy and that the convenience of the internet wasn’t enough to make a viable business model.
The survivors of the bust understood the internet wasn’t just about convenience but was about consumer data. The true power of the internet wasn’t just reaching people in their homes, but it was being able for the first time to keep up-to-the-second records of all transactions in the buying experience, providing vast amounts of insights into consumer behavior. The most successful internet companies were not just storefronts but “platforms” where people came to interact with each other or find information: eBay, Facebook, Google, PayPal, etc. Platform companies could use the information generated on their platform to design better user experiences and better target them with advertising and products.
A perfect example of the power of consumer data to transform an industry is Netflix. Netflix took the world by storm when its streaming service allowed for on-demand access to a huge catalogue of popular television and movies. Viewing on-demand, commercial-free on any device forever changed our way of consuming media. But if Netflix had just relied on its convenience factor alone, it would be dead today. Netflix was able to secure the streaming rights to popular shows at very favorable rates because none of the traditional networks put much value on streaming. The sentiment was “There isn’t that large a market of people with internet connections capable of streaming high-quality video” and “Why would someone want to watch old TV reruns when they could be watching new programming?” However, once Netflix proved the market and streaming exploded, the television networks started chomping at the bit waiting for those contracts to expire so they could get their beloved titles back and renegotiate. It became clear that if Netflix was going to renew rights to its most popular titles like “the Office” or “F.R.I.E.N.D.S.” then it would need to pay through the nose — if they even had the chance to renegotiate at all. Many rival networks planned to launch their own streaming services and wanted their own flagship titles to recover the audiences that Netflix had stolen.
Netflix executives could see the writing on the wall: in order to survive, Netflix would need to pivot into creating its own original programming. However, this was an incredibly risky endeavor. Netflix was not a production company and had no history of bringing projects from conception to market. Thankfully, Netflix did have one trick up its sleeve — it knew what people watched.
For decades, television networks had followed the same strategy to fill their summer season lineup. The studio would pick a handful of scripts to be made into a pilot episode. The pilot would be shown to test audiences. Those that scored well would then be green-lit to complete an initial season of 10–12 episodes. If a show was considered risky, the studio would approve just a handful of episodes (for example, the American version of “The Office” only filmed 6 episodes its first season).²⁹ If the show demonstrated good performance, then it would be renewed for an additional season consisting of 20+ episodes. Only rarely did studios deviate from this formula, the fact that “Friends” had a first season of 24 episodes was a huge sign that NBC believed in the success of the project.
In 2011, Netflix blew the system out of the water when it announced that it had inked a 2-season, 24-television-hour deal for a new political thriller ‘House of Cards’, with David Fincher, the director of Fight Club, and Kevin Spacey.³⁰ Because Netflix had vast amounts of data analytics about its user’s viewing habits and taste, Netflix was confident that there was appetite for the show and thus could dispense with the test audiences and the segmented rollout. Because it was willing to make such big bets, Netflix was able to attract top talent who were allured by the promise of a steady job and being liberated to experiment with larger unbounded storytelling.
The Internet of Things
The first ten years of this century were filled with new ways for us to transition our lives online with the creation of internet platform giants like eBay, Amazon, Facebook, and Google. The amount of data we generated about ourselves exploded in 2010 when smartphones put the power of your desktop in your pocket and suddenly everyone carried around their own personal computer everywhere. You would think that would be the high point in terms of how much data we create, but you would be wrong.
Since computing is so cheap, small, and energy efficient, internet connectivity is no longer confined to the high-end desktops, tablets, and smartphones, it is being added into your toothbrushes, refrigerators, cars, bbq grills, lightbulbs, and thermostats. Everything is becoming “smart”, exponentially increasing the number of devices gathering data about us and our environment. This phenomenon is called “the Internet of Things” or the “IoT”.
In 2016, Amazon debuted “Alexa,” a smart home assistant capable of managing smart devices in your house. At its debut, Alexa promised the ability to handle ~1000 skills.³¹ In 2019, Amazon reported Alexa’s repertoire has grown to encompass over 100,000 skills and that Amazon has sold over 100 million Alexa-enabled devices.³² Intel estimates that this year the number of IoT devices will cross the 200 billion device threshold — up from just 2 billion devices in 2006.³³ About 59 percent of the world’s population has internet access in some form, that means that for each person who has internet connection, there are 46 IoT devices.³⁴
So to recap, first, we have fast, cheap, and efficient computers at our disposal; second internet platforms and smart devices are capturing vast amounts of data about our relationships, personal activities, and our physical environment; and third we have access to virtually unlimited inexpensive data storage that we can access wirelessly virtually anywhere. The benefits of any of these revolutions in isolation would be tremendous, but the compounding benefits of them taken together has been transformative. It really is staggering to consider how profound the effects these have been in our lives and in our society.³⁵ However, these three revolutions are just the groundwork for the fourth revolution: Machine Learning.
Machine Learning
Historically, computers can execute astronomical numbers of instructions each second and can do so with very low error rates, but they suffer from three major drawbacks.
First, in order to perform calculations on data, the inputs need to be in a format that computers can interpret, and computers don’t use the same format as humans. It is extremely difficult using traditional methods to convert human-readable data to machine-readable data or to design machines to interpret human-readable data. Take for example written documents. Unless text was in a digital format it couldn’t be manipulated or utilized by a computer. We had hundreds of years’ worth of human knowledge captured in books that couldn’t be read by computers. Until just a few years ago converting digital scans of books into machine searchable text was slow and imprecise. The alternative has been either employing humans to enter data in formats for computers.
Second, computers are easily derailed by tiny errors and can take a long time to program. Every beginner programmer has pulled their hair out over a misplaced semicolon or a missing parenthesis. Thankfully, tools exist to act as a sort of spell-check and help identify common errors in code, but it still requires running a separate tool outside of the system to evaluate any computer program. The result is that writing computer programs requires lots of testing and iterating by people with advanced training. A huge community ethos of sharing has developed and thrived in the programming world because it is so tedious to have to reinvent the wheel.
Third, while computers can store values over time, the approach computers take through any given task or problem is always the same. There is no creativity, adaptation or improvement. Computers are faithful soldiers and follow rules with exactitude, but they are reliant on human programmers to innovate the logic. The result is that if a program is poorly written, then it repeats those inefficiencies over and over again, and if a program has to be tweaked, a human needs to look at the code, diagnose the issue and design a solution.
Because of these weaknesses, the world has been divided starkly into tasks fit for humans and tasks fit for computers. Say we wanted to design a machine that sorts cats from dogs. We place a camera in the machine to take pictures of the incoming specimens and then ask the computer to decide which label to attach to them. There are many breeds of dogs and cats with a wide variance of traits that exist among them. Also, even within a breed, individual animals appear distinct without any traits appearing in the same exact configuration twice. It would take a lot of effort using traditional methods to build out a taxonomy and create a decision tree that would enable a computer to accurately identify a cat or dog with a high degree of accuracy. Plus imagine how exponentially more difficult it would be if we wanted people to send in pictures of their cats or dogs and our machine to sort them. If we couldn’t control the lighting, the background, the angle of the photos, etc., it would approach becoming an impossible task using traditional computing methods. It would be too expensive, we’d simply outsource the problem for a human to sort through the photos.
In the 1990’s, the US Postal Service spent hundreds of millions of dollars paying companies like Siemens and Lockheed Martin to design machines that could decipher your grandmother’s idiosyncratic handwriting on the 15.5 billion pieces of mail that flow through the system annually.³⁶ Despite that, the USPS still has a department of 1700 human workers that review pictures of those cases that have proven too tricky for the computers.³⁷
Machine learning turns computer programming on its head. Instead of designing logic that a computer can execute, humans simply provide example data and define a set of desired outcomes and the computer develops a model with a logic to accomplish the task. To do this, the computer uses a process resembling the random mutations and natural selection in biology. The computer makes a series of random guesses about what model might detect a dog from a cat, runs that model on sample data, and then evaluates how the model performed. After running thousands of iterations of models, the computer pulls the models that had the best results and refines them by combining and reordering the logic in the models to produce a new generation of even better models. This process is called “training.”
If we tasked a machine learning algorithm to tackle the “dogs or cats” problem, after evaluating on a random assortment of characteristics, the model might randomly land on sorting the animals by their size. Since dogs are typically larger than cats, this might not be a terrible model, and accurately identify Labradors, German Shepherds, and Golden Retrievers correctly as dogs, however there are some drawbacks and the model would prove inadequate when confronted with a Pomeranian or a Pug. Another model might sort based on ear shape, since cats have more triangular ears, that model might perform better. If the two models were combined: “If animal is larger than X then it is a dog; if animal is smaller than X and has triangular ears it is a cat”, this hybrid model would likely perform significantly better. There may be edge cases where the model might still fail, such as mischaracterizing a Chihuahua as a cat. To be fair, reasonable minds could differ as to whether a Chihuahua is in fact a dog. If we kept it running, the algorithm would keep iterating and training until it developed a model with a set of rules that could sort cats and dogs with a high degree of accuracy.
It is important to understand this example is over-simplified. Humans don’t need to feed the computer concepts or traits such as eyes, mouth, tails, brown, size, fur etc. the computer breaks down the photos into RGB-color pixel values and the computer develops its own concepts. If we cracked the model open and asked it to explain itself, we might find logic that doesn’t resemble any characteristics that we as humans would recognize. All that is necessary for the humans to do is provide some pre-labeled training data identifying pictures as dogs or cats and enough computer processing power for the algorithm to train the models.
The process of training a machine learning model requires intense amounts of computing power that simply wasn’t feasible at scale up until just a few years ago. Similarly, the training datasets need to be very large in order for the trained model to be robust. Without the Moore’s Law and Big Data, machine learning would not be possible. Once trained, the models can run with just average hardware, making their applications ubiquitous. Machine learning allows computers to bridge the divide between our human world and machines. Now models can be trained to recognize objects, environments, text, sounds, emotions, musical tones all without any pre-processing, labelling, or conversion by humans. Machine learning also opens up a whole new range of problems that previously could not be solved by either humans or machines.
In 2010, a small group of computer science researchers at a company called DeepMind Technologies set out to see if machine learning techniques could teach a computer how to play the popular Asian board game of Go. Go is an ancient game with its origins dating back to the 4th Century BC (about a thousand years older than Chess). The rules are a bit abstract, but two players (black and white) take turns placing stones along the intersections of a 19 x 19 grid in an effort to secure more territory than their opponent. If a player’s stones are completely encircled by opponent stones, they are considered “captured” and removed from the board. The game continues until each player concedes there are no longer any valid spaces remaining at which point the territory is counted along with the number of captured stones to determine the victor.³⁸ This animation is an example of the first fifty moves played in a real game of Go.
Go is incredibly popular across Asia and has a long-storied history. Mastering the game of Go was considered one of the four essential arts for Chinese scholars along with music, calligraphy, and painting. Despite its relatively simple rules, mastering the game is incredibly difficult. Because of its open play style — at any time you can go on any of available spots among the 361 spaces — it is difficult to prioritize where to move and to evaluate your strategy against that of your opponent. Even top players find it difficult to assess which player is in a better position to win until close to the end of the game. Of course, strategies have been developed over time, but the game does not lend itself to traditional “openings” or repeated move configurations that are so common in chess, because games do not tend to look the same. Even the language players use to describe Go bespeaks its intuitive and mystical nature. Players have a vague sense that something is right, but they steer away from terms such as “good” or “bad” to describe their strategy or any particular move. Players often cannot articulate why they have made a move only that certain stones are “hard” or “soft”.
Because of its open-ended complexity, designing computer models that could play Go competitively was a moonshot project among computer researchers. The tree-branching method of brute-force calculation employed so effectively by Deep Blue against Kasparov simply doesn’t work for Go because the number of possible variations to compute is too large to make a meaningful impact in judgment as to what move is best. In Go, there are more board configurations on the standard 19x19 board than there are atoms in the known universe.
The DeepMind team developed an engine nicknamed “AlphaGo” by first training the machine to mimic the moves of human professionals after evaluating 30 million moves in 160,000 professional games. After being able to identify likely moves, AlphaGo used a tree approach to analyze likely moves and began playing against itself over and over again to learn what types of tree searches were the most profitable and what moves contributed to winning games.
In October 2015, the program faced off against the European Go champion Fan Hui, in a 5-game match. AlphaGo swept the match against Hui 5–0. It was the first time a computer program had successfully beat a professional Go player under regulation conditions. Six months later, AlphaGo faced off against Lee Sedol, one of the strongest players in the history of Go, having won the Go world championship 18 times.³⁹ In his home of South Korea, Sedol is seen as a national hero and a household name.
Sedol and the Go community had analyzed AlphaGo’s performance against Fan Hui and although impressed had identified several key weaknesses in its style and performance. In the lead-up to the match, analysts heavily favored Lee Sedol to win. The matches were televised live across the world. Sedol lost games 1, 2, and 3. Sedol was able to scratch out a victory in game 4 after playing a highly unexpected stone on move 78 creating a “wedge” that turned the tide of the game. Analysts were stunned by the highly innovative and unconventional move and dubbed it “God’s Touch”. AlphaGo itself offered high praise for the move. One of AlphaGo’s modules was to evaluate the likelihood that a human would make a particular move, after evaluating move 78, AlphaGo judged the move as 1 in 10,000.⁴⁰ AlphaGo would go on to again prevail in game 5 closing the match at 4–1. The story of AlphaGo would later be captured by a wonderful documentary named for the program in 2017, which is available on YouTube. AlphaGo’s success demonstrates the high levels of sophisticated strategy and problem-solving that can be achieved using machine learning tools.
Drawbacks and Unintended Consequences
Although this article outlines some of the transformative benefits of machine learning, I want to be careful that you don’t get the mistaken impression that machine learning models are perfect, or that using machine learning is always a good idea. This is still emerging technology and frequently things break when there are deficiencies either in the framework design or in the training data. I expect that the most problematic political and social issues of the next generation will arise from grappling with the drawbacks and unintended consequences of this technology.
One glaring example of AI weakness is that publicly available datasets of human faces are disproportionately White or Asian, and so facial recognition software are prone to higher numbers of mistakes when evaluating people of color.⁴¹ While this problem is annoying when a smartphone camera fails to focus and adjust lighting properly when taking pictures, it has potentially deadly consequences when law enforcement agencies misidentify an individual as a suspect when relying on faulty facial recognition technology.⁴²
You have to be very careful to correctly specify the desired outcomes and the boundaries in which the AI is allowed to operate.
Machine learning systems can replicate or exacerbate inequities in a system and can be difficult to evaluate. In law school, I did a study on mass incarceration and its causes. As part of our investigation, we looked at the increasing use of software that evaluates prisoners to identify good candidates for parole. While the technology shows promise since many parole boards are incredibly inconsistent in their approach when it comes to parole decisions, if we aren’t careful, algorithms can replicate or exacerbate racism in the system. Algorithms shouldn’t consider a parole candidate’s race when making a decision about whether that person deserves parole, but if the algorithm takes into account other characteristics which are highly correlated with race (such as a person’s address upon release), then the outcome would be the same as if the algorithm had used race. Even more troublesome is that because the model is driven by outcomes and decides itself what factors to consider and how to weigh them in its decisions, if humans are asked to open the black box and justify how the algorithm made its decisions, it might be so complex that humans can’t really understand it well enough to know if there are problems.
Beyond the unintended consequences and weaknesses in the technology itself, a whole other class of problems opens up with how the technology is implemented. Machine learning and Big Data make possible a surveillance state allowing autocratic governments to exert unprecedented levels of control over vulnerable populations. Since 2014, China has been developing a social credit score system which punishes citizens for minor infractions like “frivolous spending”, playing too many video games, or jaywalking. Citizens with low scores might find themselves blocked from purchasing airline tickets or have their children barred from attending their first pick of high schools. Using street cameras, drones, DNA tests, and mandatory government registration, China first rolled out the initiative in the northern Xinjiang province, to subdue millions of Uighurs, a Muslim ethnic group.⁴³ China plans on rolling out its social credit system to all its 1.4 billion citizens by the end of 2020.⁴⁴ Before it was impossible for humans to police everyone’s behavior, but we can now train the machines to watch people for us.
Conclusion
I didn’t realize that walking into a lunch presentation would set me on a course that would thus far define my career. I didn’t realize that in my first year of legal practice, I would come into contact with artificial intelligence software that dramatically altered how I would do my job. I didn’t realize that mastering KIRA would open doors to join one of the world’s leading technology companies. I didn’t realize that machine learning underpins all the critical infrastructure at every major technology company and without it the internet couldn’t function. I didn’t realize that with just a little bit of background with the subject I would be positioned to provide legal support with several hundred engineers pushing the limits of the field. I have benefitted immensely from being curious about and engaging with the emerging area of artificial intelligence and think that many other people deserve to understand what I’ve learned.
No matter what fields or industries you or your children find yourselves, it is likely that machine learning tools will be used to automate away or augment human decision making. Those unprepared and unfamiliar with these tools, may find themselves shut out of certain opportunities because some tasks have been automated out of existence or because they require certain technical skills. I don’t know what interns at National Geographic are doing these days, but it isn’t watching whale videos. In the future of my profession, I believe there will be fewer jobs for new associates since software will both augment the efficiency of existing associates and replace some of the foundational tasks traditionally reserved for the newest attorneys. Machine learning has the potential for them to make a real impact solving some of the world’s most difficult problems. We will need these tools to solve the problems of our complex and fast-moving global society. The growing pains and unintended consequences of machine learning will create many new problems that will need to be solved. I hope that something I’ve learned and shared will prove helpful to you as you along with me try to make sense of our present and future.
Citations
[4] It sounds like I’m complaining here, which I am, but still I count myself blessed. If I had been born twenty years earlier doing the same work before the internet it would have meant spending two weeks in a warehouse in Des Moines going through boxes and cataloging contents by hand with several other junior associates. At least I had my evenings and other moments with my family rather than drinking Sprite at some crappy karaoke bar with coworkers.
[6] Tomato Color Sorting Machine, youtube.com (published Nov. 25, 2017)
[8] Umamaheswara Rao, Govt ties up with Microsoft to check dropouts, Times of India (Apr 22, 2018);
Janakiram MSV, How Microsoft is Making a Big Impact with Machine Learning, Forbes (July 30, 2016)
[9] Kaya Yurieff, How Google searches are used to track infectious diseases, CNN Business (July 20, 2017); J. Ginbserg et al., Detecting influenza epidemics using search engine query data, research.google.com (retrieved October 1, 2020)
[11] Adam Polyak, Lior Wolf, Yaniv Taigman, TTS Skins Conversion via ASR, (Jul 26, 2020)
[14] Mark Rober, the popular youtuber, created a video showing how this is done in a format that is easily digestible for children and provides a great introduction to machine learning.
[15] Michael Lewis, author of Moneyball, did an excellent episode on his podcast “Against the Rules” which showcases this software and how it is affecting the industry.
[16] Anick Jesganun, Robots organize your photos, so you can procrastinate, AP News (Oct 12, 2016)
[18] A great visualization of Moore’s Law can be found here
[20] Due to slight differences in design, it is a bit difficult to make an apples to apples comparison between Deep Blue and current chess engines, but Gary Kasparov had at his peak an ELO chess ranking of 2879 (a number derived from winning and losing against other players with similar rankings), the current human world chess champion has a rating of 2838. Deep Blue didn’t play enough games to have an official ranking, but the current leading software has a current ranking of ~3390.
[21] The Cost of Data Storage Throughout the Years, Toshiba (Retrieved June 1, 2020)
[23] Brady Gavin, How Big Are Gigabytes, Terabytes, and Petabytes?, How-to Geek.com (May 25, 2018)
[24] Id.
[25] Haimin Lee, 15 Years of Google Books, blog.google.com (October 27, 2019)
[27] Joe Tidy, Why phones that secretly listen to us are a myth, BBC News (September 5, 2019); Daniel Howley, Why people think their phones are listening to them, Yahoo!finance (March 11, 2019)
[28] Charles Duhigg, How Companies Learn Your Secrets, NY Times (February 16, 2012)
[29] List of The Office (American TV series) episodes, wikipedia (retrieved June 1, 2020)
[30] House of Cards (American TV Series), wikipedia (retrieved June 1, 2020)
[31]Brian Barrett, Amazon Alexa Hits 10,000 Skills. Here Comes the Hard Part, Wired (February 23, 2017)
[33] A Guide to the Internet of Things, Intel (retrieved June 1, 2020)
[34] J. Clement, Worldwide Digital Population as of July 2020, Statista (October 29, 2020)
[35] Of course it isn’t all sunshine and roses. We have run headlong into the fire adopting all these technologies without stopping to seriously consider what unintended consequences we might create, and our other societal institutions have lagged to regulate and police this behavior.
[37] Id.
[38] Go (game), Wikipedia (retrieved June 1, 2020)
[39] Lee Sedol, Wikipedia (retrieved June 1, 2020)
[40] Cade Metz, In Two Moves, AlphaGo and Lee Sedol Redefined the Future, Wired.com (March 16, 2016)
[41] US government tests find even top-performing facial recognition systems misidentify people of color at rates five to 10 times higher than they do whites. See Tom Simonite, The Best Algorithms Struggle to Recognize Balck Faces Equally, Wired.com (July 22, 2019)
[42] Kashmir Hill, Wrongfully Accused by Algorithm, NY times (June 24, 2020)