Why Enterprises should Build Infrastructure for Artificial Intelligence – AI first

Posted on November 23, 2021November 23, 2021 by admin

Why Enterprises should Build Infrastructure for Artificial Intelligence - AI first

Artificial Intelligence - AI is bringing new levels of Automation to everything from Cars and Kiosks to Utility Grids, Healthcare, Life Sciences, and Financial Networks. But it’s easy to forget that before the enterprise can automate the world, it has to Automate itself first.

As with most complicated systems, IT Infrastructure Management is ripe for Intelligent Automation. As data loads become larger and more complex and the infrastructure itself extends beyond the Datacenter into the Cloud and the edge, the speed at which new environments are provisioned, optimized, and decommissioned will soon exceed the capabilities of even an army of human operators. That means Artificial Intelligence - AI will be needed on the ground level to handle the demands of Artificial Intelligence - AI initiatives higher up the IT stack.

Artificial Intelligence - AI begins with Infrastructure

In a classic Catch-22, however, most enterprises are running into trouble deploying Artificial Intelligence - AI on their infrastructure, in large part because they lack the tools to leverage the technology in a meaningful way. A recent survey by Run: AI shows that few Artificial Intelligence - AI algorithms and models are getting into production – less than 10% at some organizations – with many data scientists still resorting to manual access to GPUs and other elements of Data Infrastructure to get projects to the finish line.

Another study by Global Surveys showed that just 17% of AI and IT practitioners report seeing high utilization of hardware resources, with 28% reporting that much of their infrastructure remains idle for large periods of time. And this is after their organizations have poured millions of dollars into new hardware, software, and Cloud Resources, in large part to leverage Artificial Intelligence - AI, Machine Learning - ML, and Deep Learning.

If the enterprise is to successfully carry out the transformation from traditional modes of operation to fully digitized ones, Artificial Intelligence - AI will have to play a prominent role. IT consultancy Aarav Solutions points out that Artificial Intelligence - AI is invaluable when it comes to automating infrastructure support, security, resource provisioning, and a host of other activities. Its secret sauce is the capability to analyze massive data sets at high speed and with far greater accuracy than manual processes, giving decision-makers granular insight into the otherwise hidden forces affecting their operations.

A deeper look into all the interrelated functions that go into Infrastructure Management on a daily basis, sparks wonder at how the enterprise has gotten this far without Artificial Intelligence - AI. XenonStack COO and CDS Jagreet Kaur Gill, recently highlighted the myriad functions that can be kicked into hyper-speed with Artificial Intelligence - AI, everything from Capacity Planning and Resource Utilization to Anomaly Detection and Real-Time Root Cause Analysis. With the ability to track and manage literally millions of events at a time, Artificial Intelligence - AI will provide the foundation that allows the enterprise to maintain the scale, reliability, and dynamism of the digital economy.

Artificial Intelligence and Edge Computing

With this kind of management stack in place, says Sandeep Singh, vice president of storage marketing at HPE, it’s not too early to start talking about Artificial Intelligence - AI and Operations (AIOps) driven frameworks and fully autonomous IT operations, particularly in greenfield deployments between the Edge and the Cloud. The Edge, after all, is where much of the storage and processing of the Internet of Things - IoT, Industrial Internet of Things - IIoT, and Internet of Medical Things - IoMT data will take place. But it is also characterized by a highly dispersed physical footprint, with small, interconnected nodes pushed as close to user devices as possible. But its very nature, then, the Edge must be Autonomous. Using AIOps, organizations will be able to build self-sufficient, Real-Time Analytics and decision-making capabilities, while at the same time ensuring maximum uptime and fail-over should anything happen to disrupt operations at a given endpoint.

Looking forward, it’s clear that Artificial Intelligence - AI empowered infrastructure will be more than just a competitive advantage, but an operational necessity. With the amount of data generated by an increasingly connected world, plus the quickly changing nature of all the digital processes and services this entails, there is simply no other way to manage these environments without AI.

Intelligence will be the driving force in enterprise operations as the decade unfolds, but just like any other technology initiative, it must be implemented from the ground up – and that process starts with infrastructure.

SOURCE: VentureBeat | Author: Arthur Cole

URL: https://venturebeat.com/2021/11/22/why-enterprises-should-build-ai-infrastructure-first/

Microsoft Ignite 2021 | Book of News

Posted on November 4, 2021November 4, 2021 by admin

Microsoft Ignite 2021 | Book of News

For the latest Technology and Business Innovations announced by Microsoft, we recommend that you visit the Microsoft Ignite Book of News, which is "Online" and located at: https://news.microsoft.com/ignite-november-2021-book-of-news/

Introduction:

Welcome everyone to Microsoft Ignite, and once again we have a book’s worth of news about Microsoft 365, Azure, Dynamics 365, Security, Power Platform, AI and much more.

Our goal with the Book of News is to provide you with a guide to all the announcements we are making, with all the detail you need. Our standing goal remains as it has always been – to make it as easy as possible for you to navigate all the latest information and provide key details on the topics you are most interested in.

Microsoft Ignite is a seminal moment for our company. We will welcome more than 100,000 global attendees across a variety of industries to experience our latest and greatest technologies while also getting a sneak peek at new products and services that will be coming in the future.

The backdrop for our news at Ignite is the Microsoft Cloud. The Microsoft Cloud powers an organization’s digital capability, while providing the safeguards necessary to keep data confidential and secure. There is no question that the past year and a half has been a catalyst for structural change in every industry, from the adoption of telehealth in healthcare, to digital wallets in financial services, to curbside pick-up and contact-less shopping in retail.

Digital technology will be more necessary than ever, for every organization, in every sector. The implications for IT are profound.
Fundamentally, we are moving into an era in which people expect their digital data to be available anywhere, at any time and on any device.
We have a great lineup of news and some really exciting moments planned for this year’s Ignite. I hope that you can join us.

As always, send us your feedback! We want to know how we can do better. Are you getting the information and context you need? What can we do to make the experience ever better next time?

Foreword by Frank X. Shaw

What is the Book of News?

The Microsoft Ignite Book of News is your guide to key news items that we are announcing at Microsoft Ignite. The interactive Table of Contents gives you the option to select the items you are interested in, and the translation capabilities make the Book of News more accessible globally. (Just click the Translate button above the Table of Contents to enable translations.)

We also pulled together a folder of imagery related to a few of the news items. Please take a look at the imagery here.

We hope the Book of News provides all the information, executive insight and context you need. If you have any questions or feedback regarding content in the Book of News, please email eventcom@microsoft.com.

Using Machine Learning – ML to Predict High-Impact Research

Posted on May 31, 2021November 4, 2021 by admin

Using Machine Learning - ML to Predict High-Impact Research

DELPHI, an artificial intelligence framework, can give an “early-alert” signal for future key technologies by learning from patterns gleaned from previous scientific publications.

Using machine learning to predict high-impact research

An artificial intelligence framework built by MIT researchers can give an “early-alert” signal for future high-impact technologies, by learning from patterns gleaned from previous scientific publications.

In a retrospective test of its capabilities, DELPHI, short for Dynamic Early-warning by Learning to Predict High Impact, was able to identify all pioneering papers on an experts’ list of key foundational biotechnologies, sometimes as early as the first year after their publication.

James W. Weis, a research affiliate of the MIT Media Lab, and Joseph Jacobson, a professor of media arts and sciences and head of the Media Lab’s Molecular Machines research group, also used DELPHI to highlight 50 recent scientific papers that they predict will be high impact by 2023. Topics covered by the papers include DNA nanorobots used for cancer treatment, high-energy density lithium-oxygen batteries, and chemical synthesis using deep neural networks, among others.

The researchers see DELPHI as a tool that can help humans better leverage funding for scientific research, identifying “diamond in the rough” technologies that might otherwise languish and offering a way for governments, philanthropies, and venture capital firms to more efficiently and productively support science.

“In essence, our algorithm functions by learning patterns from the history of science, and then pattern-matching on new publications to find early signals of high impact,” says Weis. “By tracking the early spread of ideas, we can predict how likely they are to go viral or spread to the broader academic community in a meaningful way.”

The paper has been published in Nature Biotechnology.

Searching for the “diamond in the rough”

The machine learning algorithm developed by Weis and Jacobson takes advantage of the vast amount of digital information that is now available with the exponential growth in scientific publication since the 1980s. But instead of using one-dimensional measures, such as the number of citations, to judge a publication’s impact, DELPHI was trained on a full time-series network of journal article metadata to reveal higher-dimensional patterns in their spread across the scientific ecosystem.

The result is a knowledge graph that contains the connections between nodes representing papers, authors, institutions, and other types of data. The strength and type of the complex connections between these nodes determine their properties, which are used in the framework. “These nodes and edges define a time-based graph that DELPHI uses to learn patterns that are predictive of high future impact,” explains Weis.

Together, these network features are used to predict scientific impact, with papers that fall in the top 5 percent of time-scaled node centrality five years after publication considered the “highly impactful” target set that DELPHI aims to identify. These top 5 percent of papers constitute 35 percent of the total impact in the graph. DELPHI can also use cutoffs of the top 1, 10, and 15 percent of time-scaled node centrality, the authors say.

DELPHI suggests that highly impactful papers spread almost virally outside their disciplines and smaller scientific communities. Two papers can have the same number of citations, but highly impactful papers reach a broader and deeper audience. Low-impact papers, on the other hand, “aren’t really being utilized and leveraged by an expanding group of people,” says Weis.

The framework might be useful in “incentivizing teams of people to work together, even if they don’t already know each other — perhaps by directing funding toward them to come together to work on important multidisciplinary problems,” he adds.

Compared to citation number alone, DELPHI identifies over twice the number of highly impactful papers, including 60 percent of “hidden gems,” or papers that would be missed by a citation threshold.

"Advancing fundamental research is about taking lots of shots on goal and then being able to quickly double down on the best of those ideas,” says Jacobson. “This study was about seeing whether we could do that process in a more scaled way, by using the scientific community as a whole, as embedded in the academic graph, as well as being more inclusive in identifying high-impact research directions."

The researchers were surprised at how early in some cases the “alert signal” of a highly impactful paper shows up using DELPHI. “Within one year of publication we are already identifying hidden gems that will have significant impact later on,” says Weis.

He cautions, however, that DELPHI isn’t exactly predicting the future. “We’re using machine learning to extract and quantify signals that are hidden in the dimensionality and dynamics of the data that already exist.”

Fair, efficient, and effective funding

The hope, the researchers say, is that DELPHI will offer a less-biased way to evaluate a paper’s impact, as other measures such as citations and journal impact factor number can be manipulated, as past studies have shown.

“We hope we can use this to find the most deserving research and researchers, regardless of what institutions they’re affiliated with or how connected they are,” Weis says.

As with all machine learning frameworks, however, designers and users should be alert to bias, he adds. “We need to constantly be aware of potential biases in our data and models. We want DELPHI to help find the best research in a less-biased way — so we need to be careful our models are not learning to predict future impact solely on the basis of sub-optimal metrics like h-Index, author citation count, or institutional affiliation.”

DELPHI could be a powerful tool to help scientific funding become more efficient and effective, and perhaps be used to create new classes of financial products related to science investment.

“The emerging metascience of science funding is pointing toward the need for a portfolio approach to scientific investment,” notes David Lang, executive director of the Experiment Foundation. “Weis and Jacobson have made a significant contribution to that understanding and, more importantly, its implementation with DELPHI.”

It’s something Weis has thought about a lot after his own experiences in launching venture capital funds and laboratory incubation facilities for biotechnology startups.

“I became increasingly cognizant that investors, including myself, were consistently looking for new companies in the same spots and with the same preconceptions,” he says. “There’s a giant wealth of highly-talented people and amazing technology that I started to glimpse, but that is often overlooked. I thought there must be a way to work in this space — and that machine learning could help us find and more effectively realize all this unmined potential.”

Source: Massachusetts Institute of Technology

Source URL: https://news.mit.edu/2021/using-machine-learning-predict-high-impact-research-0517?utm_campaign=Learning%20Posts&utm_content=167488607&utm_medium=social&utm_source=twitter&hss_channel=tw-3018841323

Understanding Design Docs Principles

Posted on May 30, 2021November 4, 2021 by admin

Run Your Data Projects Effectively with The Right Design Docs

A good design docs is inseparable from A Good Data Scientist and Engineer — Vincent Tatan, Google ML Engineer

In most cases, Engineers spent 18 months contemplating and writing documents on how best to serve the customer. — Eugene Yan, Amazon Data Scientist

Last month, I presented the undeniable importance of Design Docs which explained design docs as the conceptual lighthouses to build and run machine learning systems.

As design docs are hugely important, I would love to share my principles to create design docs to strongly execute your data projects.

Why?

Design docs provide conceptual lighthouses to guide your data projects.

Design docs conceptually guides you in every step to understand your goals, impacts, and executions to benefit stakeholders. Design docs ensure your projects land with impacts.
Design docs save you time to design to highlight implementations and alternative solutions before executing them.
Design docs host discussions among teams to brainstorm best solutions and implement data projects.
Design docs serve as permanent artifacts to solidify your ideas for future collaborations.

Who?

Your audience is the key reason why you write design docs. In every design docs writeup, you must understand your audiences such as:

Yourself: To identify learning journeys, brainstorm ideas and future impactful projects.
Team members: To identify collaboration points, escalations, and system specific impacts. In the design docs, You need to align your assumptions to team members’ prior knowledge.
Cross departments: To identify cross departmental collaborations. Your design docs need to communicate prior knowledge and success metrics.
Executives: To make decisions. You need to provide solid recommendations to move the needle on high level metrics (e.g: user adoptions, revenue, and goodwill)
External: To foster professional reputations and network. You need to deliver solid takeaways and avoid using jargons .

Finding the Right Design Docs Types for The Right Space (Context)

I would like to highlight three different contexts which require various types of design docs. For terminology, I would highlight these contexts as solutions spaces: Architecture Space, Implementation Space, and Idea Space.

Architecture Space (Stable)

Design Docs Characteristics

Objectives: Document high level architecture on systems which solve complex problems and leads to direct user impacts.
Main Audience: Executives, Tech Leads
Time: Slow and stable. Ideally, it rarely changes except due to disruptions.

Types of Design Docs

Architecture Doc: Document high level system architectures with clear objectives For example, project Google Loon aims to solve the scarcity of reliable internet infrastructures.
North Star Metrics: Identify critical metrics to measure success/failures for executive communications. For example, in customer facing apps, the metrics will be user adoptions while it will be protecting users in abuse fighting apps.

Implementation Spaces (Launch)

Design Docs Characteristics

Objectives: To facilitate system designs implementations for example data storage, ML Ops, data privacy access, etc. This document ensures data products are launched, scaled, and evaluated properly.
Main Audience: Tech Leads, Cross departments (especially up/downstream applications)
Time: Moderate changes

Types of Design Docs

System Design: Highlight system implementation flowcharts, up/downstream interactions, data storage, appeals, etc.
Timeline Launch Documents: Highlight progress and timeline for a system to launch. In Google, we have Standard Operating Procedures (SOP) to ensure each launch is properly maintained and scaled.
Privacy Documents: Manage confidential data regarding users or other sensitive agencies.

Idea Spaces (Experimental)

Design Docs Characteristics

Objectives: To experiment minor tweaks, idea brainstorm, and quick feedback gathering. Idea spaces allow data professionals to seek ideas to deliver big impacts quickly.
Main Audience: Everyone including cross department
Time: Highly dynamic. One pagers are drafted, analysed and discarded on a daily basis. Your goal is to fail quickly and move on.

Types of Design Docs

One pager: Fast moving design docs to facilitate early idea reviews. As ideas grow to proven concepts, the one pager will be promoted into two pagers and system design docs.
Learning Journeys: Identify learning journeys in terms of past presentations, design documents, models launched. In big companies, the learning journeys are necessary to keep track of changes that happen very quickly in cross department and regional collaborations.
Pre Execution Evaluations: What are the expected impacts if we launch a product (e.g: models / tweaks)?
Post Execution Evaluations: What are the impacts after the past launched product (e.g: models / tweaks)?
Pre Mortem: What could go wrong when the product is launched?
Post Mortem: What has gone wrong after the past launched products?

“If you don’t know what you want to achieve in your presentation, your audience never will.” — Harvey Diamond

Five principles To Manage Your Design Docs

These are simple guides on how you can manage design docs.

Start from Ideas: Always start your experiment on one pager (idea spaces). Create a thought experiment and brainstorm quickly before investing further time into the idea.
Invest Time in Lower Level Spaces: The higher the space (Idea → Implementation → Architecture), the more time you should invest. Spend at most 1 week on one pager (idea space), 1 month on implementation/analysis space, and 1 quarter/semester on architectural space. Of course this depends on the scope, but you get the gist.
Prioritize Promising One Pagers: Promote and scale your one pagers based on impacts. If the idea is intended for cross collaboration, spend more time deliberating how your goals align the high level spaces (North Star Metrics, System Design, etc).
Land Into North Star: For general success metrics, focus on ideas with lowest time investments (e.g: small tweaks on machine learning), with high impacts to North Star Metrics. This helps you to build solid foundations to land in higher spaces.
Point Directly to Golden Nugget: Your design docs need to point the audience to the golden nugget as direct as possible. The higher the space is, the more direct this golden nugget should be.

Conclusion

In general, by knowing these principles, you will create design docs to help you:

Navigate dynamic, ambiguous or not well understood projects through all conceptual spaces.
Generate high impact projects that could be promotable for executive communications.
Optimize time investments for the best impacts which highlight the North Star Metrics.

I hope this article helps you create design docs and run your data projects effectively.

Soli Deo Gloria.

About the Author

Vincent fights internet abuse with ML @ Google. Vincent uses advanced data analytics, machine learning, and software engineering to protect Chrome and Gmail users.

Apart from his stint at Google, Vincent is also a featured writer for Towards Data Science Medium to guide aspiring ML and data practitioners with 1M+ viewers globally.

During his free time, Vincent studies for ML Master Degree in Georgia Tech and trains for triathlons/cycling trips.

Lastly, please reach out to Vincent via LinkedIn, Medium or Youtube Channel

References

Eugene Yan post How to Write Design Docs for Machine Learning Systems
Clement Mihailescu on What is a Design Doc in Software Engineering

Source: Towards Data Science

Source Twitter: @TDataScience

Source URL: https://towardsdatascience.com/understanding-design-docs-principles-for-achieving-data-scientists-53e6d5ad6f7e

How to Write Better with The Why, What, How Framework

Posted on May 30, 2021November 4, 2021 by admin

Here’s a story from the early days of Amazon Web Services: Before writing any code, engineers spent 18 months contemplating and writing documents on how best to serve the customer. Amazon believes this is the fastest way to work—thinking deeply about what the customer needs before executing on that rigorously refined vision.

Similarly, as a data scientist, though I solve problems via code, a lot of the work happens before writing any code. Such work takes the form of thinking and/via writing documents. This is especially so in Amazon, which is famous for its writing culture.

This post (and the next) answers the most voted-for question on the topic poll:

How to write design documents for data science/machine learning projects?

I’ll start by sharing three documents I’ve written: one-pagers, design documents, and after action reviews. Then, I’ll reveal the framework I use to structure most of my writing, including this post. In the next post, we’ll discuss design docs.

One-pagers, design docs, after-action reviews

I usually write three types of documents when building/operating a system. The first two help to get alignment and feedback; the last is used to reflect—all three assist with thinking deeply and improving outcomes.

One-pagers: I use these to achieve alignment with business/product stakeholders. Also used as background memos for quarterly/yearly prioritization. In a single page, they should allow readers to quickly understand the problem, expected outcomes, proposed solution, and high-level approach. Extremely useful to reference when you’re deep in the weeds of a project, or encounter scope creep.

Design docs: I use these to get feedback from fellow scientists and engineers. They help identify design issues early in the process. Furthermore, you can iterate on design docs more rapidly than on systems, especially if said systems are already in production. It usually covers methodology and system design, and includes experiment results and technical benchmarks (if available).

Design docs are more commonly seen in engineering projects; not so much for data science/machine learning. Nonetheless, I’ve found it invaluable for building better ML systems and products.

After-action reviews: I use these to reflect after shipping a project, or after a major error. If it’s a project review, we cover what went well (and not so well), follow-up actions, and how to do better next time. It’s like a scrum retrospective, except with more time to think and written as a document. The knowledge can then be shared with other teams.

If it’s an error review (e.g., the system goes down), we diagnose the root cause and identify follow-up actions to prevent reoccurrence. Nowhere do we blame individuals. The intent is to discuss what we can do better and share the (sometimes painful) lessons with the greater organization. Amazon calls these Correction of Errors; here’s how it looks like.

Writing framework: Why, What, How, (Who)

The Why-What-How framework is so simple that it sounds like a reading/writing lesson for first graders. Nonetheless, it guides most, if not all, of my work documents. My writing on this site also follows it (the other format being lists like this and this).

Why: Start by explaining Why the document is important. This is often framed around the problem or opportunity we want to address, and the expected benefits. We might also answer the question of Why now?

Think of this as the hook for your document. After reading the Why, readers should feel compelled to blaze through the rest of your doc (and hopefully commit to your proposal). In resource-strapped environments (e.g., start-ups), this section convinces decision-makers to invest resources into your idea.

Thus, it’s critical that—after reading this section—your audience understands the problem and context. Describe it simply in their terms: customer benefits, business gains, productivity improvements. Contrast the two Whys below; which is better suited for a business audience?

“We need to procure GPU clusters for distributed training of SOTA deep learning models that will improve nDCG@10 by 20%.”

“We need to invest in infrastructure to improve customer recommendations, with an expected conversion and revenue uplift of 5%.”

The first one might be a tad exaggerated, but I’ve seen Whys that start like that. 🤦‍♂️ It’s a great way to lose the audience from the get-go.

What: After the audience is convinced we should solve the problem, share what a good solution looks like. What are the expected outcomes and ways to measure them?

One way to frame What is via measures of success and constraints. Measures of success define what a good (or bad) solution looks like; constraints define what solutions can (and cannot) do. Together, they enable readers to evaluate and decide on proposals, make trade-offs, and provide feedback.

Another way of framing What is via requirements. Business requirements specify the expected customer experience, uplift to business metrics (success measures), and budget (constraints). They might also be framed as product or functional requirements. Technical requirements specify throughput, latency, security, privacy, etc., usually as constraints.

How: Finally, explain How you’ll achieve the Why and What. This includes methodology, high-level design, tech decisions, etc. It’s also useful to add how you’re not implementing it (i.e., out of scope).

The depth of this section depends on the document. For one-pagers, it could be a paragraph or two on deliverables, with details in the appendix. For design docs, you may want to include a system context diagram, tech decisions (e.g., centralized vs. distributed, EC2 vs. EMR vs. SageMaker), offline experiment results (e.g., hit rate, nDCG), and benchmarks (e.g., throughput, latency, instance count).

Having a solid Why and What provides context and makes this section easier to write. It also makes it easier for readers to evaluate and give feedback on your idea. Conversely, poorly articulated intent and requirements make it difficult to spot a good solution even when it’s in front of us.

(Who): While writing docs, we should keep our audience in mind. Although Who may not show up as a section in the doc, it’ll influence how it turns out (topics, depth, language).

A document for business leaders will (and should!) look very different from a document for engineers. Difference audiences will focus on different aspects: customer pain points, business outcomes, ROI vs. technical requirements, design choices, API specifications.

Writing with your Who in mind makes for more productive discussions and feedback. We don’t ask business leaders for feedback on infra choices, and we don’t ask devops engineers for guidance on business strategy.

How to use the framework to structure your docs

Here are some examples of using Why-What-How to structure a one-pager, design doc, after-action review, and my writing on this site.

	Why?	What?	How?
One-Pager	• Problem or opportunity • Hypothesized benefits	• Success metrics • Constraints	• Deliverables • Define out-of-scope
Design Doc	• Why the problem is important • Expected ROI	• Business / product requirements • Technical requirements & constraints	• Methodology & system design • Diagrams, experiment results, tech choices, integration
After-action Review	• Context of incident • Root cause analysis (5 Whys)	• Tangible & intangible impact • Estimates (e.g., downtime, $)	• Follow-up actions & owners
Writing on this site	• Why reading the post is important (e.g., anecdotes)	• The topic being discussed (e.g., documents we write at work)	• The insight being shared (e.g., Why-What-How, examples)

Why: Our data science team (in an e-commerce company) is challenged to help customers discover products easier. Senior leaders hypothesize that better product discovery will improve customer engagement and business outcomes.

What: First-order metrics are engagement (e.g., CTR) and revenue (e.g., conversion, revenue per session). Second-order metrics include app usage (e.g., daily active users) and retention (e.g., monthly active users). Constraints are set via a budget and timeline.

How: The team considered several online (e.g., search, recommendations) and offline (e.g., targeted emails, push notifications) approaches. Their analysis showed the majority of customer activity occurs on product pages. Thus, an item-to-item (i2i) recommender—on product pages—is hypothesized to yield the greatest ROI.

Appendix: Breakdown of inbound channels and site activity, overview of the various approaches, detailed explanation on recommendation systems.

Design document example

Why: Currently, our product pages lack a way for users to discover similar products. To address this, we are building an i2i recommender to improve product discoverability and customer engagement.

What: Business requirements are similar to those specified in the one-pager, albeit with greater detail. We collaborated with the web and mobile app teams to define technical requirements such as throughput (> 1,000 requests per second), latency (<150ms at p99), and availability (99% uptime). Our constraints include cost (<10% of revenue generated, with an absolute threshold) and integration points.

How: This will be the meatiest section of the design doc. We’ll share the methodology and high-level design, including system-context-diagrams, tech choices, initial offline evaluation metrics (for ML), and address aspects of throughput, latency, cost, security, data privacy, integration, etc.

Appendix: Trade-offs, what was considered but excluded, API specs, UI, etc.

After-action review example

Context: During a peak sales day (11/11), the i2i recommender was not visible on product pages for a period of time. This was discovered by category managers inspecting their products’ discounts.

Why (5 Whys): The spike in traffic led to increased latency (>150ms) when serving recommendations. The increased latency led to the recommender widget timing out—and not being shown—on product pages. While autoscaling was enabled, it hit the instance quotas and could not scale beyond that. Though we conducted load tests at 3x normal traffic, these were insufficient as peak traffic was 30x normal traffic. In addition, it was not discovered earlier because our alarms didn’t account for results not being displayed.

What: Customer experience was unaffected as product pages continued to load within expected latency. Nonetheless, not serving recommendations led to loss of expected revenue. Based on revenue attributed to the recommender during the rest of the day, the estimated loss is $x.

How: We will take these follow-up actions to prevent a repeated incident and detect similar issues earlier. These are their respective owners.

Appendix: Timeline of incident, overall learnings and recommendations.

Personal writing example

Why: Why is writing documents important? Share anecdote. Mention it’s highly voted-for.

What: What documents do I write? Share some examples.

How: Explain the Why-What-How approach and share examples of how I use it.

Writing docs is expensive, but cheap

Writing documents cost money. They take time to write, review, and iterate on—this is time that could have been spent on implementation.

Nonetheless, writing is a cheap way to ensure we solve the right problems in the right way. They save money by helping teams avoid rabbit holes or building systems that aren’t used. They also help align stakeholders, improve initial ideas, and scale knowledge.

If the problem is ambiguous, the proposed solution contentious, the effort required high (> 3-6 months), and/or consensus is required across multiple teams, starting with a document will save effort in the medium to long term.

So before you start your next project, write a document using Why-What-How. Here’s more detail about one-pagers (and other things I do before starting a project).

Source: Eugene Yan

Source URL: https://eugeneyan.com/writing/writing-docs-why-what-how/

Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently an Applied Scientist at Amazon. Previously, he led the data science teams at Lazada and uCare.ai. He also writes & speaks about effective data science, data/ML systems, and career growth.