There are several concepts out there and they involve broad topics that do not have a very clear or common set of definitions. But let's try to elucidate the real problem that strikes me as the definition.
First, running a test tool, telnet, network, monitoring, ping or any service without authorization from the responsible team and taking service out of the air, regardless of time, is an attack, no matter the motivation. You were not aware of the SLA of the contracted services, nor of what routines the environment is currently running, and the contractual availability level (including penalties) is measured by the availability of the service. A test, can only be considered test, if there is a behavior to be tested and knowledge of who operates the environment, since by definition, a test is a controlled action. Even security researchers only carry out actions or disclose data under authorization.
Second, Stress, Overload, and Security tests again are, by definition, controlled scoped actions. The technical responsibility for the failure should lie with the team that should implement the specification that generates the reason for the test. For example, you have as a nonfunctional requirement of a project, meet a x rate of tcp connections, per second. This type of requirement is based on the predicted number of users of the operation / product. And the test again, with objective and previous definition generated by the demand (which was the basis for equipment purchase, clustering, redundancy, disk speed, etc.) is the reason for the execution of the tool. To find out if the test meets the specifications. So, if it was specified that the DDoS load would be fixed via firewall, it would be blamed if it was via code, Dev's fault.
Third, if you work with operations with incremental product / software processes that have no requirements in this respect (should be?), usually the definitions of team roles and the most trivial solutions are the answer. But, I would consider fault of the 2 teams. Because someone should have checked, in serious environments the implementation of redundancy is something standard for services whether it will be in the software or hardware layer is just a cost benefit issue.
The environment recovery procedures themselves are regularly tested for internal auditing and stability / recovery status checking. The type of recovery depends on what was specified in the operating procedures. So the guilt, defined by the roles and action plans, is already defined.
If no one thought that you could have an attack, it's everybody's. Why discuss the guilt about something that was not defined and find a culprit (a team or person) besides of being incoherence, does nothing to solve the problem and still creates discord.
This is much more transparent in projects with formal specification or projects with real-time requirements or formal proofs of the software where you need to prove that the software meets the requirements. In environments where this is not so 'deep' in the culture, sometimes it goes unnoticed.
You can find more information in references to formal methods and software engineering in general, in addition to the security books themselves, but, "there is no silver bullet."
A good discussion of software flaws can be found in the book How to Break Codes ( link )
You can check for standardized security material and vulnerability and attack patterns here ( link )
And an introduction and vision of the need to use formal methods can be found at the link: link