Supply Chain Management: Averting a Covid-19 Catastrophe

Yossi Sheffi is a supply chain management expert who moves freely between business and academics. He founded several companies, and he sits on the boards of large corporations. He teaches engineering at MIT and has authored a half-dozen books that are read by engineers, economists, and business people. When I heard about his latest book, The New (Ab)Normal, in which he tackles the covid-19 disruption of the global supply chain, I got a copy as soon as I could, stayed up late reading, and got up early to finish it.

gosling-supply-chain
The gosling supply chain management system

The New (Ab)Normal

New (Ab)Normal was worth it. Sheffi’s insider views are compelling. He has talked with the executives and engineers who struggled to put food and toilet paper on supermarket shelves, produce and distribute medical protective gear, and prevent manufacturing plants from foundering from supply disruption.

Supply chains and the media

Sheffi has harsh words for some media. For example, he says empty supermarket shelves were misunderstood. Food and supplies were never lacking, but they were often in the wrong place. Until the lockdowns in March, a big share of U.S. meals were dispensed in restaurants, schools, and company cafeterias. These businesses purchase the same food as families, but they get it through a different supply network and packaged in different ways.

Cafeterias buy tons of shelled fresh eggs in gallon containers, but consumers buy cartons of eggs in supermarkets for cooking at home. When the eateries shut down or curtailed operations and people began eating at home, plenty of eggs were available, but someone had to redirect them to consumers in a form that was practical for a home kitchen. Sheffi says food shortages appeared in dramatic media video footage and sound bites, but not in supply chains.

Bursty services

Changing buying patterns worsened the appearance of shortages. Supermarket supply chains are adjusted to dispense supplies at a certain rate and level of burstiness. These are terms I know from network and IT service management. A bursty service has bursts of increased of activity followed by relatively quiet periods. At a bursty IT trouble ticket desk, thirty percent of a week’s tickets might be entered in one hour on Monday morning when employees return to work ready to tackle problems that they had put off solving during the previous week. A less bursty business culture might generate the same number of tickets per week, but with a more uniform rate of tickets per hour.

Bursty desks must be managed differently than steady desks. The manager of a bursty service desk must devise a way to deploy extra equipment and hire more staff to deal with peak activity on those hectic Monday mornings. Experienced managers also know that an unpredicted burst in tickets on a desk, say in the aftermath of a hurricane, will cause havoc and shortened tempers as irate customers wait for temporarily scarce resources. The best of them have contingency plans to deal with unanticipated bursts.

Cloud computing to the rescue

The rise of cloud computing architectures in the last decade has yielded increased flexibility for responding to bursts in digital activity. Pre-cloud, managers who had to provide service through activity bursts had to deploy purchased or leased servers with the capacity to handle peak periods of activity. Adding a physical server is a substantial financial investment that requires planning, sometimes changes in the physical plant, often added training and occasionally hiring new personnel.

Worse, the new capacity may remain idle during long non-peak periods, which is hard to explain to cost conscious business leaders. Some businesses are able to harvest off-peak capacity for other purposes, but many are not. Cloud computing offers on-demand computing with little upfront investment, greatly reducing the need to pay for unused capacity to improve service during peak periods.

The food supply

Covid-19 caused an unanticipated increase in the burstiness of supermarket sales. Under the threat of the virus, consumers began to shop once a week or less, buying larger quantities. Folks accustomed to picking up a few vegetables and a fresh protein on their way home from work began arriving at the store early in the morning to buy twenty-five-pound sacks of flour and dry beans, cases of canned vegetables, and bulk produce.

On the supply end with the farmers and packers, the quantities sold per month stayed fairly constant because the number of mouths being fed did not change, but in the stores, by afternoon, shelves were bare waiting for shipments arriving in the night because consumers were buying in bursts instead of their “a few items a day” pattern. This made for exciting media coverage of customers squabbling over the products remaining on the shelves. The media seldom pointed out that the shelves were full each morning after the night’s shipments had arrived and were on the shelves.

Toilet paper

The infamous toilet paper shortage was also either illusory or much more nuanced than media portrayals. Like restaurants and cafeterias, public restrooms took a big hit with the lockdowns. Like food, toilet paper consumption is inherently constant, but toilet paper purchasing burstiness and where the product is purchased varies.

Commercial toilet paper consumption plummeted as shoppers began to purchase consumer toilet paper in the same bursts that they were purchasing food supplies. There may have been some hoarding behavior, but many shoppers simply wanted to shrink their dangerous trips to the market by buying in bulk. Consumer toilet paper is not like the commercial toilet paper used in public restrooms, which is coarser and often dispensed in larger rolls from specialized holders. This presented supply problems similar to food supply issues.

Supply disruption

Supply chains had to respond quickly. Unlike digital services, responding to increased burstiness in supermarket sales required changes in physical inventory patterns. Increasing the supply of eggs by the dozen at supermarkets and decreasing eggs by the gallon on kitchen loading docks could not be addressed by dialing up a new batch of virtual cloud computers. New buying patterns had to be analyzed, revised orders had to be placed with packers, and trucks had to roll on highways.

Advances in supply chain management

Fortunately, supply chain reporting and analysis has jumped ahead in the last decade. Consumers see some of these advances on online sales sites like Amazon when they click on “Track package.” Unlike not too long ago when all they were offered was Amazon’s best estimated delivery date, they see the progress of their shipment from warehouse through shipping transfer points to the final delivery. Guesswork is eliminated: arrival and departure is recorded as the package passes barcode scanners.

The movement data is centralized in cloud data centers and dispensed to the consumer on demand. Many people have noted that Amazon shipments are not as reliable as they were pre-covid. However, the impression of unreliability would be much stronger without those “Track package” buttons.

Supply chain managers have access to the same kind of data on their shipments. In untroubled times, a shipping clerk’s educated estimate on an arrival time of a shipment of fresh eggs may have been adequate, but not in post-covid 2020, with its shifting demands and unpredictable delays. Those guesses can’t cope with an egg packing plant shut down for a week when the virus flares up or a shipment delayed by a quarantined truck driver.

Good news

Fortunately, with the data available today, these issues are visible in supply chain tracking systems. Orders can be immediately redirected to a different packing plant that has returned from a shutdown or dispatch a fresh relief driver instead of leaving a shipment to wait in a truck stop parking lot. Issues can be resolved today that would not have been visible as issues a decade ago. Consequently, supply chains have been strikingly resilient to the covid-19 disruption.

Supply chains were much different in my pioneer grandparents’ days. They grew their own meat, poultry, and vegetables, and lived without toilet paper for most of their lives. Although supply was less complicated, the effects of supply disruption, like a punishing thunderstorm destroying a wheat crop, was as significant as today’s disruptions.

In November of 2020 with steeply rising infection counts, predictions of new supply disruption occasionally surface. The response of supply chains so far this year leave me optimistic that we have seen the worst.

Election Tampering: Y2K Fears Redux?

For the last few days, I’ve been reading reports on the Trickbot takedown. U.S. Cyber Command and Microsoft have been hitting the large botnet, called Trickbot, that is controlled from eastern Europe, most likely Russia, and appears to have been maneuvering to interfere with the November 3rd U.S. election. The takedown steps apparently were planned strategically to give the botmasters little time to rebuild before the election. I sincerely hope the strategy succeeds. And I hope, and believe, that the Trickbot takedown is only the tip of an iceberg in a battle that is freezing out cyberattacks on our election.

We survived Y2K fears.

Trickbot

Trickbot, a botnet, is a multi-purpose covert criminal supercomputer cobbled together from thousands of hacked Windows computers. The botnet’s design offers few clues to hack victims that their devices are secret participants in criminal cyber attacks. The Trickbot crimes are mostly ransomware exploits for illegal profit. For some background on botnets see Home Network Setup: Smart Kitchen Crisis.

Y2K fears

The reports reminded me of Y2K fears, the year 2000 computer scare of 20 years ago. I hope those efforts are as successful as the Y2K remediation, which were so successful, Y2K was called a hoax by those who did not understand the computer industry.

Y2K and Bolivian basket weavers

I remember the Y2K affair well. It was no hoax. Everyone in the computing industry knew trouble was coming. Already in the early 1980s the issue was a hot topic among engineers. The problem went back to the early days of computing when data storage was expensive. It’s hard to believe today, but in the early days, the fastest and most reliable computer memory was hand-crafted by weavers from copper wire and tiny donut shaped ferrite magnets called cores. My meatware memory is not entirely reliable, but I remember hearing that Bolivian basket weavers were recruited to manufacture woven core memory. The computers on the Apollo moon mission were based on hard-wired ferrite cores.

Today, we talk about terabytes (trillions of bytes) but in those days, even a single K (1012 bytes) of memory cost thousands of dollars. I guess everyone today knows that a byte is eight bits, a sequence of eight 0s and 1s. Each donut magnet in core memory represented a single 0 or 1. The costing rule of thumb for handcrafted core memory was $1 a bit. At that price, a terabyte of memory would cost 8 trillion dollars, roughly the market price of 8 Amazons.

In the 1960s and 70s, programmers did not waste storage. They saved a few bytes by storing the year as two digits. 1913 was “13” and 1967 was “67.”

Most business programs used “binary coded decimal.” Storing the year as 2 digits instead of 4 was a savings of 8 bits: at near eight bucks a bit, close to $70 in 2020 dollars. Put another way, today, the price of those two 1970 bytes will buy 2 terabytes of flash memory, delivered to your door by Amazon. That flash memory would have cost 16 trillion dollars in 1970, filled several warehouses, and probably generated enough heat to warm New England for the winter.

Dates and the year 2000

Dates are used heavily in business computing, somewhat less in scientific computing. Accounting, scheduling, supply chain management all depend on date calculations. Today most of these calculations are handled in a few standard code libraries that are used over and over, but those libraries did not exist when the programs that ran the world’s business in 1999 were written. Each time a program needed a date calculation, a programmer wrote code.

Man, did they write code. Programmers delight in rolling their own, writing their own code. Coming up with an entirely original mundane date calculation will make a skilled coder’s heart sing. And there is joy in writing code that looks like it does one thing and does something quite different. These tastes decree that given an opportunity, coders will come up with many obscure and abstruse ways to calculate and analyze dates.

When I was hiring coders, I challenged candidates to describe an algorithm to determine the late fee on an invoice payment if 1% was to be added for each calendar month that had passed after the date the invoice was cut. If you think that description’s a little vague, you’re right. A hidden part of the challenge was for the coder to determine exactly what I had asked for. I don’t believe I ever got two identical solutions, and some would have behaved wildly if the calculations had crossed the Y2K boundary.

The industry took Y2K seriously

For good reason, in the late nineties, the industry got serious about the approaching crisis.

Y2K was a big deal. By 1995, I, along with many of my colleagues, had intentionally forgotten how to code in COBOL, the mainframe programming language of most 20th century business programs that were at the heart of Y2K issues. I won’t say that COBOL is a bad language, but few programmers cared for its wordy style. The scarcity of COBOL programmers elevated the language to a money skill in the last days of the century.

Y2K in my products

At that time, I was managing the development of a service desk product that I had designed several years before. I was coding almost exclusively in C++ on Unix and Windows, which uses an internal representation for dates and times that would not, in theory, have Y2K problems.

Nevertheless, management, who didn’t know that our product was theoretically immune to Y2K, declared my Fiscal Year 2000 bonus would depend on the absence of Y2K errors in our code. I smiled when I read the letter that described my bonus terms. Most years, fickle market conditions that I could not influence decided my bonus. This time, my bonus was in the bag due to a wise previous decision to build on a platform that sidestepped the central Y2K issue.

Still, I don’t mess around with my bonus. I started feeding Y2K use cases into our test plans, just to be sure.

I’m glad I did. Management, bless their bottom-line focused hearts, were right to worry about Y2K. It’s been a long time, but I estimate my team spotted and fixed a dozen significant Y2K defects that could have brought the product down or caused crippling errors.

Our defects were not COBOL style issues, but they stemmed from the two-digit year mindset that then pervaded business coding. For example, serial numbers often embed a two-digit year somewhere. A clever developer might use that fact and create a Y2K defect from it.

If those dozen defects had kicked in simultaneously on New Year’s Eve, service desks would have begun failing mid-Pacific in a pattern that would repeat itself as New Year’s Day followed the sun around the globe to Hawaii the next day. That was in a product running on platforms that were supposedly immune to Y2K errors. The devil in it was that a service desk would be used to manage the response to incidents that arose from other Y2K system crashes, compounding the chaos.

A real threat

People who are not responsible for building and maintaining computer systems probably don’t realize what happens when a raft of defects appear at the same time. Troubleshooting the origin of a malfunction caused by a single mistake can be difficult. With two mistakes, the problem becomes more complex and confusing. Each added source compounds the difficulty. At a certain point, the system seems to behave randomly, and you want to delete the whole thing and start over.

In the bad old days, we often saw large software projects break in dozens of places at once when we fired up them up for the first time. We used to call it “big bang” testing. Since writing code is more fun than testing code, many developers embraced the methodology and put off testing. But untested code is buggy code. Those big bangs were intractable messes of defects that could take weeks and months to untangle. We soon learned to test small units of code early and often, before mild-mannered projects became monsters.

As testing proceeded in development labs, engineers began to recognize that Y2K threatened to be a mass conflagration like those big bang debacles. Worse. Y2K bugs were everywhere. Their corruption extended to hundreds of systems. Unremediated Y2K threatened software mayhem like I have never seen and hope to never to see.

The reaction

Some engineers seized the limelight by overreacting. Doomsday prophets got wind of what was happening in development labs all over the globe. They went to the media with predictions of the imminent collapse of financial systems, communications, and power grids, which threatened to halt the economy and provoke the mother of all economic disasters. ATMs and traffic lights were about to go out of control. The preppers stockpiled guns, freeze-dried chili, and toilet paper enough to isolate for months.

Y2K at CA Technologies

I checked in to the CA Technologies (then Computer Associates) development lab in Kirkland on the Seattle east side early in the morning of December 31, 1999. As the senior technology manager in Seattle, I had orders to gather a team of developers and support people to man an emergency response hotline. The idea was that any Computer Associates customer with any Y2K problem could call the hotline and get expert help. Most engineers thought this was a transparent marketing ploy to take advantage of fears that had been whipped to a frenzy by the doomsday crowd.

Highly publicized orders were issued that development teams were on the hook until the last Y2K customer issue was resolved. The company supplied special Y2K tee-shirts. A buffet of sandwiches and periodic deliveries of pizza and other snacks were set up in a large conference room we called the board room. A closed-circuit television feed from headquarters in New York was beamed onto a then-rare wall-sized flat video screen. Rumor said champagne was scheduled to arrive at midnight.

The big letdown

I honestly can’t remember if the bubbly ever appeared. Late in the afternoon, I told all my crew they could leave if they wanted. I had to stay until midnight, but there was no reason to spoil their New Year’s celebration.

Why? On the Pacific Coast, we were among the last time zones to flip to 2000. In the morning, a customer called to headquarters in New York with a minor problem in Australia. It was fixed in minutes–mostly a user misunderstanding as I recall. The Kirkland team was not called on once. Typical developers, the crew loaded up on free sandwiches and pizza, took the loot to their cubicles and silently worked on code. A few salespeople wandered in to check on the action; there was none.

After all the buildup, why the big meh? Because humans aren’t stupid. The industry responded to the danger with testing and fixing. I’ve seen and believe estimates that upwards of $200 billion were spent on Y2K remediation in the 1990s. That was money well-spent. Consequently, Y2K came and went with barely a ripple.

The Y2K hoax

I take it back about humans not being stupid. We immediately began to hear about the Y2K hoax, a conspiracy and scare tactic for whatever purpose the speaker or writer found convenient. I’m sure the loudest criers of hoax were the same loudmouths who screamed computer Armageddon. I’d like to roll back the calendar and give the world a taste of what would have happened if Y2K had been ignored.

Actually, I wish Y2K had been ignored outside the industry and the people who understood the problem were allowed to quietly fix it without all the noise.

But that wouldn’t be right either. We were not heroes. The cost of the Y2K remediation was the price of poor judgement. Acting as if it did not occur would only encourage future bad choices. The remediation was nothing to be proud of. The industry should be called to account.

Nonetheless, in my dark moments, I have no patience for people who broadcast opinions but don’t carry water, put out fires, and make things work. Not everyone was fortunate enough to be in on the action of Y2K, not everyone has the training and experience to know what it takes to keep our computer-dependent society viable.

Y2K and the general election

Which brings us back to the Trickbot takedown. November 3rd 2020 has begun to smell to me like the approach of January 1st 2000. I see real danger and a dead serious response. I’m not an active member of the cybersecurity community, but I keep up. I have no doubt that criminals, extremists from every corner of the political spectrum, and foreign nation-states are planning cyber attacks to extort payments from election agencies, stop people from voting, slow vote tallies, and question results.

Election tampering hoax

But I also see the seriousness and competence of the efforts to prevent and neutralize these bad actors. Some signs point to success. Already, two weeks before the election, 29 million voters have voted, almost five times the number that had voted at this point in 2016. I’m sure election day will be tough, but I will not be the least surprised to hear another big meh on November 4, followed by cries of “election interference hoax!” from every direction, but from my vantage now, it’s clear it is no hoax.

But I hope it looks like a hoax.

BYOD- The Agreement

BYOD, Bring Your Own Device, is important, but it has its growing pains.

BYOD is, in a sense, a symmetric reflection of enterprise cloud computing. In cloud computing, the enterprise delegates the provision and maintenance of backend infrastructure to a cloud provider. In BYOD, the enterprise delegates the provision and maintenance of frontend infrastructure to its own employees. In both cloud and BYOD, the enterprise and its IT team loses some control.

BYOD has issues similar to the basic cloud computing and out-sourcing problem: how does an enterprise protect itself when it grants a third party substantial control of its business? For cloud, the third party is the cloud provider, for out-sourcing, it is the out-sourcer. For BYOD, it is the enterprise’s own employees.

Nevertheless, enterprises have responded to BYOD and cloud differently. When an enterprise decides to embark on a cloud implementation, it is both a technical and a business decision. On the technical side, engineers ask questions about supported architectures and interfaces, adequate capacities, availability, and the like. On the business side, managers examine billing rates and contracts, service level agreements, security issues and governance. Audits are performed, and future audits planned. Only after these rounds of due diligence are cloud contracts signed. Sometimes the commitments are made more casually, but best practice has become to treat cloud implementations with businesslike due diligence.

On the BYOD side, similar due diligence should occur, but the form of that due diligence has yet to shake out completely. A casual attitude is common. BYOD is a win on the balance sheet and cash flow statement and a spike in employee satisfaction. This enthusiasm for BYOD has meant that BYOD policy agreements, the equivalent of cloud contracts and service level agreements, are not as common as might be expected.

This is understandable. The issues are complex. BYOD becomes safer for the enterprise as the stringency of the BYOD policy increases. However, a stringent policy is not so attractive to employees. It can force them to purchase from a short list of acceptable devices with an equally short list of acceptable apps, accept arbitrary scans of their device, and even agree to arbitrary total reset of the device by the enterprise. With this kind of control, employees may not be so enthusiastic about BYOD. At the same time, privacy issues may arise and there is some speculation that some current hacking laws might prevent employers from intruding on employee devices.

There are also complex support issues. Must the employer replace or repair the employee’s device when the device is damaged on the employer’s premises while performing work for the employer? This situation is very similar to a cloud outage in which the consumer and provider contend over whether the cause was the consumer’s virtual load balancer or the provider’s infrastructure that caused the outage. In the cloud case, best practice is to have contracts and service level agreements that lay down the rules for resolving the conflict. BYOD needs the same. The challenge is to formulate agreements that benefit both the enterprise and the employee.

In my current book and in this blog, I talk about some of the complexity of BYOD, how it complicates and challenges IT management. BYOD is a challenge, but it does not have to be the tsunami.

Some key questions are

  • How much control does the enterprise retain over its data and processes?
  • What rights does the enterprise have to deal with breaches in integrity?
  • What responsibility does the enterprise have for the physical device owned by the employee?

There are reasonable answers for all these questions although they will vary from enterprise to enterprise. When the answers take the form of signed agreements between the enterprise and the employees, IT can begin to support BYOD realistically. Security can be checked and maintained, incidents can be dealt with, and break/fix decisions are not yelling matches or worse.

With reasonable agreements in place, BYOD support can get real. There is more to say about real, efficient BYOD support that I hope to discuss in the future.

BYOD and TCO

One reason enterprises are readily accepting BYOD is they see an opportunity to reduce the TCO (Total Cost of Ownership) of computing equipment. The thinking goes that if employees pay for their laptops, tablets, and smartphones themselves instead of using company equipment, the company saves a bundle in TCO. Of course that is not as straight forward as it may appear because the initial purchase cost of a piece of equipment is often only a small fraction of the TCO.

Nevertheless, BYOD does eliminate a large item from capital expenditures. Cloud also promises to reduce capital expenditure by shifting capital equipment purchases to operational cloud service fees. That is a true benefit, but it is still a paper transaction. Unlike cloud, BYOD capital savings is real money that will never have to be spent, not a shift from a capital column to an operational column. Operational expenses are generally easier to manage than capital expenses, but no expense is easiest of all.

And it gets better. If an employee dumps a can of coke on his company laptop at lunch, the company usually ends up paying for a replacement, but when the laptop belongs to the employee, the employee buys a replacement. The service desk and the IT department will not burn hours trying to revive the dead soldier and the IT department probably will not be responsible for reimaging a hard drive and restoring backups.

Put another way, traditional break/fix service is not the same in a BYOD environment, and service desks may someday completely drop that aspect of support. But hold it! A fellow employee in the office once inadvertently bumped over cup of coffee on my laptop. As I remember it, my productivity zeroed out for a few hours while IT services delivered a loaner to me and acquisitions expedited a replacement. If I had owned it and had to replace it myself, I would have been out of commission for at least a day while I shopped around for a good buy on a new laptop and worked on restoring the system as best I could. Not only that, I would have been a pretty grumpy employee, who might even think the company owed me for placing me in a laptop-destroying environment.

This leads to a question: what kind of enterprise support is needed in the BYOD age? What are the legitimate limits? A major clue comes from the way these devices are supported outside the enterprise now. We all know that iPhone and Android apps are supported differently than traditional software. Do service desks need to take a lesson from the app stores? I think so. I’ll talk about this more in a future post.