Name: GitHub Roadmap Webinar, Q1 2026
Uploaded: 2026-03-25T16:16:13.508Z
Duration: 1 h 36 s
Description: GitHub Roadmap Webinar, Q1 2026

Transcript for "GitHub Roadmap Webinar, Q1 2026": Hello, everyone. We will get started in a second. K. We're one past the hour. Thank you, everyone, and welcome to our GitHub product road map updates in March. I'm here with Evan Boyle, and we have a lot of fun stuff planned for you today. So the way that I wanted to get us started today was with kind of talking a little bit more about the heart of GitHub. And, you know, as you know, GitHub was created by developers for developers, and we've seen and made a lot of changes in the past several years. But what has never changed is that commitment to collaboration, to open source, and to developers. And that's you. I'm on a 180,000,000 plus, are the center or the heart of it. And one of the key things about developers is kind of what they love most is that craft and that creation process and the impact of the work that that can have. I always say that humanity already created the fire and the will and everything else is going to be powered by software. And human progress will be tied to the software that we create. And at the same time, how we develop software, it's changing. Right? It's we're seeing this in the fact that the problem was never kind of the lack of ideas, but really was the distance between knowing what should happen and being able to make that happen. And that distance now is collapsing. And it's being made really, really fast because of AI. And to a degree, creating a line of code nowadays is a commodity. Now we're seeing this ourselves too in GitHub GitHub, and we stylized this a slightly bit. But on GitHub GitHub, if you go into that repository, if you go into the insights and you look our committers and our trends, you will see that our cloud coding agent is number one and by a far margin. And then you have other agents. Our release agents are constantly working on it too. And then we have code review. Code review not only reviews your code, but it's also able to have suggestions that get checked in. And that's why it shows up as a contributor as well. So we have these agents constantly working on the repo. And we're seeing this also even in how we have our pizza teams. So if you think about it before, I had a team of 10 or 20 people. Nowadays, I'm probably dividing that into teams of three or four and then augmenting that with agents as well. So how we're developing software and how we're structuring our teams, it's changing. And AI is democratizing being able to create a line of code. Now there's good news to that. Right? Software creation is no longer limited to developers. Ideas become working products in minutes. Small teams gain a lot of leverage. So creation is super easy, but at the same time, that also comes with some responsibilities as well. And what we're seeing is that code is generating faster, but now you need to verify it. You need to validate it, and then kind of pull request becomes a bottleneck. We're also seeing that the cost can spiral very quickly. And then you have to deal with security vulnerabilities. You have to be really careful about all of this new code that you're creating as well. Now teams can get fragmented because of that and then trust in the system can start to erode. So we have to treat this new wave with both the good and the bad and be able to balance that. So for me and for us, we have to continue to seek that truth. Right? It's not just about faster code then. It's about better teamwork and stronger product outcomes. And that's the main thing. It's that teamwork and those stronger product outcomes is what is gonna allow us to be successful. So our vision in the future is really center on people and their focus on craft, on how teams build with velocity, on how products gain users, and how companies can scale with intelligence and trust. That those are kind of the evergreen elements that go into the better teamwork and stronger product outcomes. Now if I think about that and the evolution then of Keyhub, then I think about really collapsing the SDLC or where the software development life cycle is today together with a product development life cycle and creating a combined engineering system, one that is agent native or agent first. If you think about Snap, at that moment, they said, you know, we have this camera. Why can't we be camera first instead of feed first as an example? Right? Like, the evolution of GitHub needs to be agent first and agent native. And then those agents will expand local and cloud, and they're gonna create a network that is always on looking after your code, looking after your team, looking after your products. And they're going to do that on a continuous improvement loop. They're going to be made better and better and better. Now as we're doing that, the platform core, core platform still needs to be great for people, right, and great for developers. And that's why we're continuing investment in how we actually load issues, how we load fill requests, how we load the website. Here's an example from a shift that we did in January 22 where 35% of issue loads are now less than two hundred milliseconds. And that number has continued to expand as well. It's also about thinking about what is or how are we thinking about open source in the future. So we listen a lot to our community. And on February 13, we shared the ability to turn off those pull requests to only contributors and then be able to funnel things more to the issue workflows at the end. And then we follow that up with advancements that we're doing on StackDisc as well. And that is to say, well, we want to break it break break into smaller pieces all of these commits that are happening by these agents and by these people and that collaboration and not have, you know, things of 5,000, 10,000 lines of code changing in every single PR. Now our community has really loved that. And, again, this we are all about developer love. And we wanna continue to go in and invest in the core platform itself. The core platform needs to be super, super performant, super, super stable. And I we just want you to know that we're working really hard to make that happen. Now once that platform is stable and performant, then we can start adding all of these other workflows and making the system then agent native. Now one of the key things that we want to show you today is how we are even evolving ourselves on how we develop GitHub. And one of the prime examples of this is our CLI product. We just recently GA ed Covyla CLI. So if you haven't given it a try, please do. And this is a companion to our IDE extension overall. Now the core of the CLI is the coding agent, and that coding agent is critical for our success. So not only is multimodal, but it also supports memory, supports skills, it supports MCPs and APIs. It has access to tools and the file system. And then we will be extending that with, you know, computer and sandbox environment. And then, of course, being able to have a fleet of agents with sub agents take any work that you give it. So as we think going forward, kind of the central system and the kernel of Keyhub will be this coding agent and this orchestration platform. So what we wanted to do today is to then bring Evan to share with you how 10 engineers ship 500 PRs every single week, and what is it that we did to transform to make that happen. And I'll share stop sharing my screen and then hand it over to him. My name is Evan Boyle. I am, the engineering manager for the Copilot CLI. I'm really excited to talk to you about this crazy journey, that our team has has gone on. So, yes, very small team of 10, shipping 500 PRs a week, and it continues to accelerate every week. So I want to walk you through a little bit of setup in context about our team, talk through some of the trends that you may be seeing online, what agentic coding actually means and then how teams like ours have to adapt the cost of going fast. So much like Clint Eastwood, I myself am a straight shooter, I'm going to give you the good, the bad, and the ugly of using coding agents. They are an extreme accelerant, but you have to know how to wield them and you have to know how to adapt your SDLC to make the best use of them. And so this is just one of about four repos that our team owns. It's about 700,000 lines of code. Most interesting thing here is that 54% of the code in this repo is tests. This is going to be really important later when we talk about feedback loops and guardrails for agents. And going back a little bit, our team owns the Copilot CLI, the core coding agent runtime that powers CCA and the Copilot CLI and CCR. We also own the Copilot SDK that's now powering other products across GitHub and Microsoft and more things that you'll see coming in the future. So this is a super small, scrappy team, 10 people owning a massive surface area and consistently we're shipping 500 plus PRs a week and that number is just growing over time. And so I want to speak to both audiences in the room. Sure there's lots of people here who are extremely excited about AI coding agents and some who are skeptical. The reality, you know, probably, lies somewhere in between, but I want to make sure I bring everyone along for the for the journey. So I love this quote, it's from this guy Jeffrey Immanuel. So he talks about it as imagine you're graduating in 2005 with your math degree, and you have 20 of your closest math buddy friends, they're all geniuses, they're sitting around on the couch for the summer with nothing to do, you have this brilliant startup idea, all you have to do to get them to work on your startup idea, and build something meaningful is to buy them pizza and soda. Keep them supplied with pizza and soda. They'll work as hard as possible on your startup idea. And I think this is a great analogy for coding agents. We have near infinite coding ability at our fingertips, ability to automate issue triage, all sorts of parts of the SDLC, and we just have to figure out how activate that, right? Well, that sounds great, but he goes on to say, well, that's why you need, to spend $7,000 a month on, 30 different, dollars 200 a month coding agent subscriptions. This is crazy. I don't think you need to do that. I don't think anyone needs to do that. So should you really believe everything that you see online, there's a lot of crazy discourse going on out there. There are these agent frameworks that talk about anthropomorphization. Maybe you have a mare and you have a crew and you have agents with different roles and you treat them like individual humans, you go completely hands off and delegate, your entire backlog into this crew of agents and wait for software to come out the other side. There's also full self driving like rough loops, like hey, just turn on my coding agent, and coerce it to run overnight, or maybe for a week and I'm gonna come back and check-in on the results later. Generate 100,000 lines of code, I'm not gonna be able to read and review all of that. So the real question is does any of this work at team or org scale? I think the real answer is that all of these techniques have their place, but for teams that I've seen go off the deep end and exclusively work in this way, every single one of them has come back at some point and said we're in our stabilization phase. And this is a euphemism for the architecture has degraded, bad patterns have been copied all over the code base, don't understand how it works anymore. It's regressing all of the time. And so the truth is that a lot of what you see online is play. It's people in their free time. It's solo indie developers working on side projects. It's not teamwork on a mature code base that requires stability and scale. When you have millions of customers, when people depend on you for mission critical systems, it requires a much different level of care, sophistication, attention to detail. And the truth is that software development has not really changed all that much. Well, will all of this work someday? Will we be able to go completely hands off, lights out? Maybe. You know. So what does agentic coding actually mean? Well, if you think about how a coding agent works, it's really two things. It's one, using tools and two, iterating on feedback loops. And the simplest way to think about a coding agent is a shell tool, either a bash tool, a PowerShell tool. You can build a coding agent that has nothing but that because it can use the shell to read files, cat content into files, it can write code that replaces content within files, it can use grep, it can manipulate the file system, it has complete access to your operating system. And then we layer on other tools on top, tools that are for human UX so that humans can work better with coding agents, things like planning mode. And then we have other tools that are optimized for performance and getting the most juice out of a coding agent. But at its core, a coding agent could be built with just a bash tool. And then the feedback loop is how an agent validates its changes. At its core, you give the agent a problem, it will form a hypothesis about that problem, make a code change, and then attempt to verify it. And that's why tests are important in your code base. If your agent doesn't have a way to, execute and run its changes and see if it can confirm its hypothesis, it's very likely that it'll make mistakes and it won't be able to self correct and go through this loop of using tools, verifying it using tools, verifying it. Right? And so we have this traditional kind of software development life cycle that is now accelerated by agents. Right? The the SDLC doesn't look all of that different. We still run through the same steps, but we need to use agents at every step of that to help us achieve the most. Right? So I'd like to go ahead and switch over to a demo now. And I have a I have a PR here that I'd that I'd like to go ahead and review. And this is a, you know, quite a quite a complex area of our system. I'm not on a Windows device right now. I'm not actually a Windows expert, but Tim is and our coding agents are. And I found that certain models are actually better at finding Windows bugs than others. So what I want to go ahead and do here is use the slash review command. So this allows us to spin up sub agents that are optimized for code review, for high signal code review. But the thing that we can actually do with this is spin up multiple code review agents to run-in parallel with different models. So I can say review, I want you to review this PR, check out the branch, then run, review with Opus, GPT 5.4, and Gemini. I'm gonna go ahead and hit enter. And so when we run code review with multiple different models, we find the highest signal bugs possible. We allow the agents to kind of argue with each other, propose different hypotheses, find where there's overlap and where there's not. And so this is going to run. It'll take ten minutes. We're going to come back to this, and I'm going to show you another demo in the meantime. In this one, I want to use a similar kind of multi agent technique. I've heard so many anecdotes where someone says I was working on this really hard bug with one model and then I switched over to a different model, and after spending four hours trying to figure this out, the new model got it immediately. And so this power of the Copilot CLI and Copilot being able to use multiple models, open source models, state of the art models from from every model provider available. This is something that we can use to fix bugs too. And so the Copilot CLI has great integration with, with with GitHub issues. So I can just type hash here, pick a bug that I wanna I wanna investigate, and I can say, investigate this using sub agents. Let's use the general purpose sub agent with, GPT 5.4, Opus, and Gemini. Try to reprove this bug and come up with a hypothesis for a fix. Alright. And so this is gonna go ahead, go pull down that context from the GitHub issue. I didn't even have to leave my terminal to find it or to bring it in. And then it's gonna spin up three of these general purpose sub agents that is effectively a copy of the main agent loop, and it's gonna start investigating this bug and allowing these agents to kind of collaborate with each other to find, the highest statistical likelihood of what is causing this bug. And if we go over back here, you can see on my code review, that we actually have these agents running right now. So we have the code review agent with OPUS 4.6, with GPT 5.4 and Gemini three Pro. This is going to run-in the background. When these coding agents are done, they're actually going to alert the Copilot CLI to resume execution. And I can actually add a follow-up comment on here too because I only asked it to do the code review, but, actually, I'm gonna I'm gonna add a steering message in here. When you're done, please post a message back to the PR for, my good friend t Pope as a comment. There we go. And so it's just gonna go ahead and do the code review for me. While these are running, let's go ahead and go back to our slides, and we can, continue and wrap up at the end on our demo. So how did our team have to change? Right? I'm sure many of you have gone to your manager at one point in time and said, hey. You know, our our developer experience is is is, you know, really unproductive. I'm I'm waiting two hours on CI. I like, you know, I can't get anything checked in. And and your manager, you know, or maybe the wrong manager might say something like, well, you know, iterating on CI doesn't really sound like delivering shareholder value. How about we ship some more features? But the truth is it's not DevX anymore. It's turned into agent experience. It's how productive can your agent be making code changes, iterating on that feedback loop, and verifying what it's doing in reality. So this means faster tests, stricter linters, better CI feedback loops, and smaller, kind of more verifiable units of work. Five, ten thousand line PRs are something that we draw a hard line on, on our team. Those almost never get reviewed and merged. We really try hard to break our changes up into small sets. So some of the things practically that we've had to do, we enabled E to E replay and response caching in our E to E test suite and this took our E test down from two minutes at that point in time down to twenty five seconds. This was back in December when our E to E test suite was much smaller. We find some areas occasionally that the coding agent actually has a hard time making changes in. It might repeatedly introduce bugs, and so when we find those areas, we backfill test coverage to a pretty extreme degree to make sure that those tricky areas are as safe as possible for the agent to work inside of and be safe in. We actually found a point where we were shipping so many PRs that we were sharing a mono repo with, about 100 other engineers and over several other teams. And our team's kind of PR velocity and the the combined velocity of all these teams using agents became a big bottleneck. And so we had to move the runtime, and the Copilot CLI into a separate repository so that we could keep that merge too kicking along. And then even things like, you know, linting that might seem academic, when you're using agents, every single static analysis signal that you can feed into it is important. So we upgraded to type checked linting. And initially, just added ESLint ignores to all of the, all of the the the files, that already had linting errors. So we we just said, okay. We'll ignore all the existing linting errors. We'll just make sure that new code, uses the type checked linting. But what we found was that the coding agents are statistical sort of pattern matching machines and when they grep your code base and see ESLint ignore errors, they will replicate those. We call this kind of compounding technical debt, the propensity for coding agents to spread bad patterns when they're reading them when they exist within your code base. We had to go back in and remove all of the, you know, 1,500, ESLint ignore exceptions over several PRs. This was a lot of work, but, you know, we went and we did it. We had the help of coding agents, Copilot coding agent in the cloud, the CLI locally. And then as we've gone on, we wanna keep CI fast. So, you know, when when runtime, you know, gets up to twelve minutes for our ED test suite, we shard it so that it can run parallel and finish in, you know, three minutes instead. This enables feedback loops where we allow the agent to open a PR. And for tricky features like the one that I showed you that Tim was working on where we have behavioral differences across macOS, Linux, and Windows, it's really helpful to allow the agent to run E to E tests, iterate on changes, and discover bugs cross platform without the developer locally having to have all of those environments available to them at once. What are the costs of going fast? It's not all sunshine, gumdrops, and rainbows. Coding agents are a hell of a drug, and it really requires discipline to stay close to the actual code layers in your architecture. When you allow architectural layering violations to slip into your code base, those tend to propagate. What is a very clean code base to start if you allow agents unsupervised without discipline of continuing to do code review, without building the agentic systems that define what good architecture looks like, then you encounter this compounding technical debt that can bring your system to a screeching halt. We find a massive kind of review backlog. We've built a lot of tools, lot of automations, and a lot of testing that allow us to have more confidence when we ship code so that code review is less onerous, but our team still spends about 50% of our hours in the day reviewing code. We spend a ton of time reviewing code. We're really disciplined about doing that, and we're building some tools that make that code review process faster that you might see in the coming months. I hope to be able to share more about that in the future. So the shift in how we think. I think, the biggest thing is that coding agents make the code itself cheap. It's easy to build a prototype. Right? It doesn't necessarily make the right code or the right architecture cheap, but it makes it easy to explore the right ideas. So if you're going to go and design a new system and you have three competing approaches, you might spend a week on a design document or you can prototype all of those changes, all three approaches in an hour or a couple of hours by delegating it to different coding agent instances. And then you can run scale tests, run performance tests. You can read that code yourself and understand what the trade offs are, which one looks cleaner. Now when you go to have that design review with your team, not only did you spend less time, but that design is actually grounded in in reality and rubber meeting the road. So we spend a lot of time learning by doing. We're not afraid to throw away code or rewrite it. And and really, the thing that I want to underpin here for everyone is that good tests are your safety net. Spending time on your testing infrastructure, on various testing strategies, making sure that agents have ways to verify things E to E, verify things in a production like environment that's safe with well scoped credentials. All of these things are super, super important for you to have productive and safe experience on a large mature code base. How we're thinking about the future. Parallel sub agents are starting to work for certain use cases. Copilot CLI has fleets and the ability for agents to kind of coordinate work, dependency where work queues are are, you know, becoming a thing so that agents can parallelize work without without stepping on each other's toes. But there are still some tasks where, you know, parallel sub agents are are not appropriate. So we're actively researching this. We have science teams inside of GitHub working on all of these things, More of these kind of composite model workflows. So I showed you some of these demos where we're using multiple models to go and review code, multiple models to go and solve bugs, but we're actually kind of trying to detect this in real time and decide, hey, do we need to escalate to another model? Do we need to escalate to a different model family that might have a slightly different training distribution? We're seeing more ambient agents as well. Many of you may have seen AgenTic workflows. This is an excellent thing to set up so that you can run kind of continuous performance regression detection, continuous improvements to your code base. There are even portions of code bases that can now be delegated to kind of agents running in the majority of maintenance. And so with that, let's go ahead and wrap up our demo and see what happened here. Let's see. Gemini is still running on our, so I'm going to ask it to check again because two of the agents finished, but Gemini is still running here. And it looks like, hey, the summary from our code review agent, Gemini came back clean, Opus came back clean, and GPT 5.4 actually found a potential Simulink, escape issue in in in fallback. And this just goes to show you, like, hey. GPT 54 actually found what what sounds like a legit bug, but the other two agents didn't. And so if we go back to this PR let's see. There we go. On my behalf, Copilot CLI went ahead and and, added added some great contests for for Tim to go and and digest here. You know, hope this was helpful, folks. This is the story of how, you know, me and nine of my best engineering friends, are are shipping 500 PRs a week, how we're doing it safely at a at a high level of quality. And I hope that you enjoyed this talk. Thank you, Evan. You could stop sharing, and then I'll take it from here. And, like, just give me a second, and we'll share it again. So, hopefully, you enjoyed that. We just wanted to start making this road map calls a little bit more too about the product and be a little bit more heavy in demos and how we ourselves at GitHub are using the tools. As we're building this agent native engineering system, one of the key things that we wanna do, as I said, is make sure that we have a very solid foundation, speed, performance, availability, and then build these agentic capabilities on top of it. So if you think about it from a road map perspective and into each one of the phases and plan, encode, review, test and secure, and then operation, We wanna start bringing in this network of agents. So things like triage agent, things like signals, and being able to very quickly make sure that you understand what needs your attention right away. We were talking about the coding agent, the CLI, the agent mode in the IDE, and there was actually a question in the q and a about sandboxes. Yes. We are investing in sandboxes, and you'll see an announcement from us soon there. And then in in review, we recently announced our code review has reviewed over 60,000,000 PRs already, but we're trying to extend that even further with a validation platform and then more and more capabilities on the testing side. And then after you build, you also wanna have this automations running constantly. So I did how next team shipped AgenTek workflows, and we're gonna continue to expand that and make it a very integral part of the platform. We have already code security, and that will continue to ship left as well. And then we need intelligence plus trust. Right? So there's metrics on the ROI, cost and budgets, very good integration with SRE agents are in the industry, and then given the ability to have this MCP and skills. So it's just on, like, a very high level how we're thinking about the road map of Keyhub going forward. Now if I go in into the core platform and with a little bit more detail, here are the things that we have shipped, but you could also see what is next. We talked about pull request stacks, but we also gonna have a repost dashboard. We're gonna have a pools dashboard in preview that is gonna redesign it. We're thinking about threaded comments for PRs, importing projects by query, parallel action steps, and save you for issues. So it's a lot more coming into the core of the product as well. We recently GA Copilot CLI, and we have a preview of the SDK. You should expect the SDK to also make it into GA in the coming quarter. And then that technical preview of agentic workflows will start transforming itself and getting into the product with our Cobala Cloud coding agent. And then we have agent sessions. And right now, those sessions, you know, you could take one and delegate it into the cloud. We're gonna make it available so you could very quickly go from local to cloud into mobile and not lose a step at any point in time as well. Now from a quality perspective, you need confidence in why you ship. And we have two main things for that, our code quality product and our security product. And then those will be integrated into the code review flows and also into this new validation platform that we will be announcing in the next quarter. Now we also think that specs and planning mode together with architecture diagrams or with what we sometimes call ADRs or architecture design documents as well are going to be important. So those will become first class citizens into the many things that you end up doing in the platform as well. And then, as I said, we cannot do this at scale unless we have governance and unless you have the ability to have policies, permissions, unless you could observe what all of these agents in this network are doing and you could audit and then again set policies on those. And unless you could have metrics and ROI for you to understand what is it that I'm spending and how am I getting the return on investment on that. So from an intelligence plus trust perspective, this is all of the items that we have shipped in the platform. And what's next is a stronger agent identity, stronger agent permissions, traces, unlocks being served and saved and being able to have you have the ability to transfer them into your own data lake. More integration and have a native MCP and scale ratios within GitHub. Sandbox management of the machines and the workflows that use it, and then integration with things like cloud PCs as an example from a Microsoft perspective or agent three sixty five as well. Now we also know that you need business continuity. So things like model long term support is a new feature that we have added. So we just recently announced that GPT 5.53 codecs is going to be available for twelve months. That's what we call our LTS on the models. But you need the same, right? Like you need the ability to not only work with one model provider but across them too. So there will be more announcements in our roadmap on that. Then on cost control and budgets, we already have the concept of a cost center in GitHub. And we're going to be making that gristier and gristier for you to set budgets at a team level, at a user level and at a pool organization level, too. And then from a metrics and ROI, what you should be thinking is a transition from our end, from just saying here's activity and engagement into being able to understand, are my PRs getting faster? Is the quality of my product improving? Am I attaining the right business results out of the investments that we're making as well? So there will be a transition from just here's the usage of Copilot, here's the usage of agents into what is the ROI of them, what are the metrics, and we'll have a point of view in them. What are the metrics that I can be looking at that tells me that I'm achieving that velocity, that tells me that I'm achieving, you know, that quality at the end, and hence, business outcomes that my team and my organization have. Now, with that, we wanted to leave time for Q and A as well. So that is the last slide. And with that, I will switch over. And Eric, if you want to moderate the Q and A. Yes. Great. So what we're doing for the q and a here is I'm gonna just go in order of the most upvoted, and then I know comments and questions will continue to roll in. There's a lot of you here, so we'll try to get to as many questions as we can. We have twenty minutes. The first one, Mario, is for you. Is there a public road map of GitHub Copilot in a document form that I can access? Yes. So, do keep a public roadmap project. And many of the things kind of make it there. We will go and make sure that we transfer many of the things that we're doing here into there. So, if you see things are not in there, just let us know. But we usually update them at the beginning of the quarter. So you will see the April where we get this update into it. But that project public road map is for is what you should do to follow it. Great. Okay. How do you prevent automation drift in situations where teams overtrust AI agents? How do you have Copilot balanced with meaningful human reviews? Evan, do you want to take that one and then I could add? Yes. Absolutely. I I think this really depends on the stage of maturity for your your product. I would say if we we have had some, some early, you know, prototypes where we've moved extremely quickly. For instance, you know, written a 100,000 lines of of Rust code, and and then gotten to a point where, you know, the the demos we were building were compelling enough that we decided, you know, let's go ahead and make this real. And as a part of that, we we brought in some some experts from from the Blackbird team, who are very deep into into Rust, you know, had them work together with with, the folks who had built that prototype to, you know, build skills that can run inside of CI, you know, kind of kind of build best practices into, you know, the core agent loop itself, you know, sub agents, custom agents that are that are designed for for these sorts of things to provide better guardrails, like once you get the code to the level of quality that that you want it. I would say for, you know, more mature code bases where we're applying these these techniques, I think, the the way you review code shifts. For me, personally, I care much less about, you know, sort of stylistic things and and nits. It's it's very easy if my coworker, you know, goes and merges a PR. The the activation energy for me to follow-up and push my own changes or push up my own PR is almost zero. And so we try to optimize for velocity but in a safe way. And so that means focusing on is the change correct? Is there the correct testing in place? Do we trust those tests? Is it testing the right thing? And then is the architecture correct? Does this align in a way where our system is going to be able to continue to evolve quickly without debt in the future? And we even build automation into actions to scan and to look for both of these things. You can run deterministic code coverage tools, but those can't tell you if your tests are actually testing the right thing. We try to use a combination of, like, one, shifting shifting where we need to apply that human judgment, augmenting that human judgment with agents kind of running in CI and customizations to the code review process, as well. Great. Anything else to add, Mario, or next one? Everyone did a great job in covering myself. I agree. Nice job, Evan. Okay. So now here, there's a few alignment questions. How do we see the future of the CLI against the Versus code extension? Are we planning feature parity, or will CLI be needed for more advanced features? Yeah. I'll take that one. And so if you have played with the latest in Versus Code, you probably noticed that there is now a setting for you to just use the Copilot CLI as the agent loop. So you could select from actually doing that in the in the cloud as well, and that will be using our cloud coding agent. The cloud coding agent and the CLI share the same you think about kernel on heart overall. Or you could go in and use the Versus code existing one. In the future, all of those three things kind of collide into one or merge into one overall. But right now in Versus Code, you're gonna be able to just select, you know, the default one. You could select Covala CLI. You could even select cloud code as the main agent loop, or you could see that even codex if you install the codex extension as well. So think about it from a Versus code perspective. We want all of these agents and this agent network to be pressing in there and give you that freedom and that choice at any point in time. And you could utilize your Covalo subscription to go across all of them or all. So, again, you could utilize Covalo CLI, Cloud Code, Codecs, or you could utilize the same one you know, the default one that the Versus Code team ships. Great. Thank you, Mario. Eric asks and there's a lot of questions kind of along these lines. So any any depth that you can give this, what about Azure DevOps? Azure DevOps. Without that, it's a little bit complicated. I think last year, we announced that we are recommending our Azure DevOps customers to move their repos into GitHub. And, you know, we're continuing that recommendation. We're adding significant amount of value within the GitHub ecosystem on top of that of those repos with this cloud network of agents as well. So if I think if I bubble out or kind of zoom out a little bit, we at GitHub and our coding tools work with any version control system on it, but the majority of that platform and value that gets provided is necessitates the repo to be in GitHub. So, you know, we talk to the Azure DevOps teams constantly. We still have a set of features that we support in there. But for you to get the full value of the platform, we need those repos to be within the GitHub platform itself. Great. Evan, let's give Mario a rest for a moment. Back to you. There's just a lot of questions about the scalability of the work, right, and kind of back on the idea of the review process. This is a different take on it. Is human understanding of the code base still important? At the velocity of code generation and reviews, it feels like people are shepherding agents and quickly losing context of overall code bases. Is this system level understanding a thing of the past? No. Absolutely not. It's it's actually more important now than than ever. I think your your ability to, you know, type out code or to know how to add the appropriate, you know, syntax for a type guard or a generic in in a particular programming languages is lower. I think what is much more important is your ability to kind of think globally about how the pieces fit together. For the Copilot CLI codebase in particular, I feel like I've never come up to speed on a codebase so fast. And the reason is is that I had this coding agent that I could sit there and rubber duck with all day long. Right? Like, I can ask it questions. I don't have to go and bug my teammates or my engineers who built a feature. I can crack open the CLI or another coding agent tool and get answers to my questions and to dive into the architecture very, very quickly. This not only allowed me to come up to speed fast, but it allows me to to to kind of stay up to speed. And I would say we encourage every member of our team to work across the entire code base. We do have people who are dedicated to each area as sort of owners of the architecture. They're the ones who are responsible for keeping that part of the system in their head, having a point of view on how we're going to align it, and evolve it in the future, and whether or not particular changes fit in with those plans. And and so I would say, yeah, humans absolutely more important than ever. But when when you're shipping that many PRs, you know, per week, you know, 500 a week, thousands per month, I do look at every time I go into an area of the code base that I know, I assume, hey. I have a rough understanding of what the architecture knows. I know what it was like two weeks ago when I was in here, but I still have to reconstruct my view of reality, like like from from scratch effectively. I know that code has changed a lot, so I need to go and dive into the details. And that's much the way that a that a coding agent works. It's it start you know, it has memory, but it starts from a fresh context window. It's gonna grep. It's gonna explore that code base. It's gonna read in key files and build up that understanding. And to some degree, I have to do a little bit of that every time I go into a new area of the code base. The difference is versus five years ago that I I can do that in a matter of, you know, a minute or two versus having to spend hours pouring over something. Awesome. I you. do wanna add a couple of things in there, which I think is interesting. So as an example, I studied electrical engineering as my major, and one of the key classes in electrical engineering is physics. Because if you really understand silicon and physics, then you do an amazing job in being able to do, as an example, circuit design or understanding radio waves and RF frequencies and all of those type of things. Same thing. Like, if you have a developer that has a really good understanding of databases, then they're able to design a back end that can scale overall. And the same thing is happening right now. I I I think software development and the skill set and the scrap of it will continue to exist. Yes. We're gonna have multiple layers of abstractions. And like Evan said, now, you know, in my brain, an entire code base does not fit. So now I have this robot dog agent that I could utilize to get and paging the right information on it. But a deep understanding of software development principles and systems is still required to create great software. And in my opinion, that will continue to exist. So just like, you know, to be great electrical engineer, you need to understand physics. And to be great at back ends, you might need to understand networking and and kind of hardware. And to be great at creating UX, you need to understand this front end networks and how DOM rendering actually happens. Right? So that will continue to be a true statement. Now the the thing is I could don't have to be an expert in everything, and I could actually rely on some of these agents to educate me really quickly in an area, and I could dedicate the part where I'm really have that craft going forward. And I could add that creativity. And that's the biggest unlock right now. And while we strongly believe that the 180,000,000 developers that we have right now in the platform will go to 1,000,000,000 because every single IW, information worker, every single person out there is going to be able to create software, but you're still going to have the people that are experts on it. Awesome. Thanks, Mario. That reminds me of a question we had in one of these from a student that was, essentially, should I keep studying? Do I need to learn these things? Is that even relevant in the future? And I think that answer captures a lot of that same thinking in in a different light. Evan, we're gonna stick with you, and then we have a few more for Mario after that. If you're using this is about multi agents. If you're using several models and each come up with a different hypothesis I don't know if this is happening in your workflows, but how do you then how do they or how do you decide which hypothesis is the most optimal solution? Do you have strategies for having different models communicate with each other during your processes? Yeah. Absolutely. So I think there are two modes of operation for the agents here. The sub agents can pass messages to each other and communicate. But in those demos that I showed you, there's, you know, kind of an outer agent loop that's running the primary context window, and then it delegates each of those kind of review tasks or those bug investigation tasks to a a different sub agent with its own context model. So each of those sub agents goes and does its investigation and then produces a report and sends it back to kind of the main coding agent, the main outer loop. And that main coding agent in that example is responsible for taking all of the feedback from all of the coding agents and then going and investigating those hypotheses itself. Ultimately, all of this bottoms out on human judgment, exactly what Mario was mentioning before. It bottoms out on, your understanding of the system, your understanding of whatever bug you're investigating, whether it's network issues or sim links and differences in behavior of file systems across multiple platforms, that's really where the human judgment ultimately comes in. And when you you have those skills, when you have that ability, that just helps you go even faster. Yeah. I was gonna answer rock paper scissor as a joke. But I would say like, the other day, I was in a conversation with Adrian and Evan. And think about this from a human perspective. What you wanna do usually is you might talk to your peers just like, hey. I'm thinking about x, y, and z. Let me get feedback on it. And then you take all of those inputs, and then you have your own judgment into, okay. I should be we should be going x or should be going y. And I think the same thing applies over here. Now if you get this, you know, let's say, data inputs or new ways of actually looking at a problem, then you might and will probably be making better decisions as as long as you're kind of doing that from first principles overall. So I actually think, and I do all the time, the ability to not only ask my peers for information and feedback, but ask multiple of these LLMs for information and feedback for me to get a complete picture in my head of how should I be looking at the problem. So that that's my motives of operation, and then that will get my judgment to be better and better and better as the time passes by. Great. So, Mario. We are back with you. Do you plan to provide a safe way for agents to know up to date docs for libraries? There's a little bit more here. Haven't seen any MCPs with security thinking, which enterprises need. security. Yes. We plan to do a lot more in the security space, specifically from an identity capability. In fact, our cloud coding agent does call call what we call CodeQL in order to make sure that it runs a scan prior to, you know, committing that code as an example. And there's a lot more that we're doing in that space. I I don't wanna kind of spill all the beans for the team, but we plan to kind of have a set of these capabilities more into shifting left into the process instead of after they commit through an action and doing a scan there. Great. Here's another one on, with so many PRs now being open due to the increase of developer speed, are you going to be working on improving the speed that actions run? Yes. And I talk about Ben is the leader from product on CI. And and, look, we saw, since the beginning of the year, an incredible, incredible growth in actions. And think about 20% or more week over week into the load that people are generating because there's so many PRs being generated today. We're going as fast as possible getting more and more machines to keep up with that load at the moment, not only within our data centers, but across cloud providers as well. So I'm excited about what we're gonna do there now. We do have to continue to invest in persistent storage as an example to be able to speed up these builds. We do have custom images, which helps a lot, but there's a lot more that we could be doing in the persistent storage layer. We need to go in and provision faster and faster machines as well and be able to let our customers pay for that compute. But, yes, this is something top of mind for us, which is now that the bottleneck starts being PRs and that CI and validation platform, We need to probably, you know, 10 x, 100 x the compute available to our users. Thank you. This is Evan or Mario. So either one of you jump in. Can I use copilot the Copilot CLI to do a task in several repos at the same time? Yeah. I I do this I do this all the time. When I'm working across code bases that have dependencies, for instance, the Copilot CLI and the Copilot SDK, like if I want a new add a new feature to the core coding agent loop and then expose it into the SDK, I'll check both of those out into the same folder, basically, and then start start the coding agent, the CLI, in in that route. Super productive workflow allows me to get PRs ready for both kind of the core runtime and the Copilot SDK at the same time. I've also done this with, for instance, you know, the GitHub monolith and the CodingAgent runtime and a codebase called SWE Agent D, which is of the brain behind CCA. So I do this all the time on on very large code bases, multimillion line code bases, and the agent's extremely productive at working across them. Great. Okay. Mario, when we think about, premium requests and optimization, I'm gonna kinda reframe this one a little bit. How would you optimize for premium requests and their use when you're thinking about multi agent reviews? How to optimize? I think the the the question directly is how many premium requests are used when you're using multi agent workflows, Alright. which I think use multiple of them at that moment. I think you wanna like, the way that we think about that is if the ROI is there, then those token spend or those PRUs is gonna be there. Meaning, if the application itself can sustain a spend, then as long as you are below that spend curve, then you're gonna be in the positive margin of it. And that's the way that we think about it. So if you think about a, let's say, a bug might cost you $20, but you're spending on the, let's say, 16¢ to be able to find it, then the ROI is there for you to spend those tokens. So we don't try to optimize them. And, clearly, you could optimize things from a context window perspective. We do a lot of optimization at the cash level as well. But the way that I want us to think about that is mainly there is an ROI of that product and there's a spend then that you could be below you know, you should be below the ROI of the product, mainly the ability of that product to earn revenue or or, you know, general margins on it. Great. Cool. More or less, if I if I actually bubble up, I would say there's gonna be this thing about efficiency that we are gonna help you a lot more on. We don't think that about us PRUs, but think about it more. How can I control and make sure that at any point in time, I'm below the that ROI curve when it comes to my spend? And we sometimes call that FinOps in this space. So there will be features that we will do to make sure that, you know, let's say, you don't have to send to Opus or to 5.4 for you to write this markdown file, then we will go in and route to the right model at the right time to be able to control it. So that we we are spending time, and you will see more and more announcements from us. As an example, the feature auto is that when you say the auto model routing, we give you a 10% discount. And we try to route to a set of these family models to be able to control spend. So you think about a slider. You will have a slider that says, you know what? I actually need to control spend a significant amount on this project and this code base, or I could just be using the best thing possible at any point in time because the ROI is there for this project or code base. And then we'll give you the ability to have that slider, and that would involve multimodal routing plus more efficiencies that we could have at a past pace. Great. K. Evan, there's been a few on this. So that 500 PRs per week is conceptually it's people are trying to wrap their heads around it. So how do you concretely get to that number? Does that mean that you're trusting to code review blindly at least on a good number of those PRs? And then there's a secondary question, which is how is your QA team adapted to dev teams that open that many PRs per week? Yeah. That's a that's a great question. So on on the first piece, you know, if you think about, you know, 500 PRs a week, that's about 100 per business day. Some a fair number of these actually get merged on weekends as well. So you think that's about 10 PRs per engineer per day, right? And so that kind of means the team is realistically spending about half or maybe a little bit more of their time. So if you think four to five hours a day on code review, like we're likely spending more time reviewing code than we are writing code, especially when you think about the when you're using agents to generate the code, there's an author kind of self review before you go into PR, right? And so so with that, you can kind of think, hey, four or five hours a day, that's about thirty minutes kind of per PR. Some PRs may only be 10 to a 100 lines of code. It might be mostly tests. It's much faster to to kind of review them, and so that might only take five or ten minutes. And there might be some VRs that you pour more time into. So one part of it is like, yes, we just do have the discipline to spend more time on code review, that's an increasingly larger portion of our job. The second piece is shifting what we're looking for in in reviewing code, not reviewing style, focusing more on reviewing tests, reviewing the architecture, like, also building, you know, agent skills, automations, agentic workflows that can run, when when the PR runs. So there there's there's many, many different techniques that you can use, but but we do really try to stay very close to to that code. And so we we never merge a PR without without looking at the code. Fantastic. Thank you. Thank you for that. There were a lot of questions about that, so I think that's tactical and helpful. With that, we are at the last five seconds of our call. So thank you to Mario. Thank you to Evan for presenting, and thank you to all of you for all the great questions that you asked in the chat. A lot of engagement. We really appreciate you and your time and and your your investment in what we're doing here. So thank you for very much for joining us today, and we'll see you next time.