The end of one long README

Apparently we've reached the limit of GitHub rendering (512K).
Time to change the structure accordingly - exercises and questions
will move to sub-directories in the exercises directory.

This is the first patch in performing this transition.
This commit is contained in:
abregman 2021-11-10 00:55:03 +02:00
parent 41b0f06dc3
commit cf69197a44
22 changed files with 6305 additions and 6318 deletions

6314
README.md

File diff suppressed because it is too large Load Diff

526
exercises/ansible/README.md Normal file
View File

@ -0,0 +1,526 @@
## Ansible
### Ansible Exercises
|Name|Topic|Objective & Instructions|Solution|Comments|
|--------|--------|------|----|----|
| My First Task | Tasks | [Exercise](my_first_task.md) | [Solution](solutions/my_first_task.md)
| Upgrade and Update Task | Tasks | [Exercise](update_upgrade_task.md) | [Solution](solutions/update_upgrade_task.md)
| My First Playbook | Playbooks | [Exercise](my_first_playbook.md) | [Solution](solutions/my_first_playbook.md)
### Ansible Self Assesment
<details>
<summary>Describe each of the following components in Ansible, including the relationship between them:
* Task
* Module
* Play
* Playbook
* Role
</summary><br><b>
Task a call to a specific Ansible module
Module the actual unit of code executed by Ansible on your own host or a remote host. Modules are indexed by category (database, file, network, …) and also referred to as task plugins.
Play One or more tasks executed on a given host(s)
Playbook One or more plays. Each play can be executed on the same or different hosts
Role Ansible roles allows you to group resources based on certain functionality/service such that they can be easily reused. In a role, you have directories for variables, defaults, files, templates, handlers, tasks, and metadata. You can then use the role by simply specifying it in your playbook.
</b></details>
<details>
<summary>How Ansible is different from other automation tools? (e.g. Chef, Puppet, etc.)</summary><br><b>
Ansible is:
* Agentless
* Minimal run requirements (Python & SSH) and simple to use
* Default mode is "push" (it supports also pull)
* Focus on simpleness and ease-of-use
</b></details>
<details>
<summary>True or False? Ansible follows the mutable infrastructure paradigm</summary><br><b>
True. In immutable infrastructure approach, you'll replace infrastructure instead of modifying it.<br>
Ansible rather follows the mutable infrastructure paradigm where it allows you to change the configuration of different components, but this approach is not perfect and has its own disadvantges like "configuration drift" where different components may reach different state for different reasons.
</b></details>
<details>
<summary>True or False? Ansible uses declarative style to describe the expected end state</summary><br><b>
False. It uses a procedural style.
</b></details>
<details>
<summary>What kind of automation you wouldn't do with Ansible and why?</summary><br><b>
While it's possible to provision resources with Ansible, some prefer to use tools that follow immutable infrastructure paradigm.
Ansible doesn't saves state by default. So a task that creates 5 instances for example, when executed again will create additional 5 instances (unless
additional check is implemented or explicit names are provided) while other tools might check if 5 instances exist. If only 4 exist (by checking the state file for example), one additional instance will be created to reach the end goal of 5 instances.
</b></details>
<details>
<summary>How do you list all modules and how can you see details on a specific module?</summary><br><br>
1. Ansible online docs
2. `ansible-doc -l` for list of modules and `ansible-doc [module_name]` for detailed information on a specific module
</b></details>
#### Ansible - Inventory
<details>
<summary>What is an inventory file and how do you define one?</summary><br><b>
An inventory file defines hosts and/or groups of hosts on which Ansible tasks executed upon.
An example of inventory file:
```
192.168.1.2
192.168.1.3
192.168.1.4
[web_servers]
190.40.2.20
190.40.2.21
190.40.2.22
```
</b></details>
<details>
<summary>What is a dynamic inventory file? When you would use one?</summary><br><br>
A dynamic inventory file tracks hosts from one or more sources like cloud providers and CMDB systems.
You should use one when using external sources and especially when the hosts in your environment are being automatically<br>
spun up and shut down, without you tracking every change in these sources.
</b></details>
#### Ansible - Variables
<details>
<summary>Modify the following task to use a variable instead of the value "zlib" and have "zlib" as the default in case the variable is not defined
```
- name: Install a package
package:
name: "zlib"
state: present
```
</summary><br><b>
```
- name: Install a package
package:
name: "{{ package_name|default('zlib') }}"
state: present
```
</b></details>
<details>
<summary>How to make the variable "use_var" optional?
```
- name: Install a package
package:
name: "zlib"
state: present
use: "{{ use_var }}"
```
</summary><br><b>
With "default(omit)"
```
- name: Install a package
package:
name: "zlib"
state: present
use: "{{ use_var|default(omit) }}"
```
</b></details>
<details>
<summary>What would be the result of the following play?</summary><br><b>
```
---
- name: Print information about my host
hosts: localhost
gather_facts: 'no'
tasks:
- name: Print hostname
debug:
msg: "It's me, {{ ansible_hostname }}"
```
When given a written code, always inspect it thoroughly. If your answer is “this will fail” then you are right. We are using a fact (ansible_hostname), which is a gathered piece of information from the host we are running on. But in this case, we disabled facts gathering (gather_facts: no) so the variable would be undefined which will result in failure.
</b></details>
<details>
<summary>When the value '2017'' will be used in this case: `{{ lookup('env', 'BEST_YEAR') | default('2017', true) }}`?</summary><br><b>
when the environment variable 'BEST_YEAR' is empty or false.
</b></details>
<details>
<summary>If the value of certain variable is 1, you would like to use the value "one", otherwise, use "two". How would you do it?</summary><br><b>
`{{ (certain_variable == 1) | ternary("one", "two") }}`
</b></details>
<details>
<summary>The value of a certain variable you use is the string "True". You would like the value to be a boolean. How would you cast it?</summary><br><b>
`{{ some_string_var | bool }}`
</b></details>
<details>
<summary>You want to run Ansible playbook only on specific minor version of your OS, how would you achieve that?</summary><br><b>
</b></details>
<details>
<summary>What the "become" directive used for in Ansible?</summary><br><b>
</b></details>
<details>
<summary>What are facts? How to see all the facts of a certain host?</summary><br><b>
</b></details>
<details>
<summary>What would be the result of running the following task? How to fix it?
```
- hosts: localhost
tasks:
- name: Install zlib
package:
name: zlib
state: present
```
</summary><br><b>
</b></details>
<details>
<summary>Which Ansible best practices are you familiar with?. Name at least three</summary><br><b>
</b></details>
<details>
<summary>Explain the directory layout of an Ansible role</summary><br><b>
</b></details>
<details>
<summary>What 'blocks' are used for in Ansible?</summary><br><b>
</b></details>
<details>
<summary>How do you handle errors in Ansible?</summary><br><b>
</b></details>
<details>
<summary>You would like to run a certain command if a task fails. How would you achieve that?</summary><br><b>
</b></details>
<details>
<summary>Write a playbook to install zlib and vim on all hosts if the file /tmp/mario exists on the system.</summary><br><b>
```
---
- hosts: all
vars:
mario_file: /tmp/mario
package_list:
- 'zlib'
- 'vim'
tasks:
- name: Check for mario file
stat:
path: "{{ mario_file }}"
register: mario_f
- name: Install zlib and vim if mario file exists
become: "yes"
package:
name: "{{ item }}"
state: present
with_items: "{{ package_list }}"
when: mario_f.stat.exists
```
</b></details>
<details>
<summary>Write a single task that verifies all the files in files_list variable exist on the host</summary><br><b>
```
- name: Ensure all files exist
assert:
that:
- item.stat.exists
loop: "{{ files_list }}"
```
</b></details>
<details>
<summary>Write a playbook to deploy the file /tmp/system_info on all hosts except for controllers group, with the following content</summary><br><b>
```
I'm <HOSTNAME> and my operating system is <OS>
```
Replace <HOSTNAME> and <OS> with the actual data for the specific host you are running on
The playbook to deploy the system_info file
```
---
- name: Deploy /tmp/system_info file
hosts: all:!controllers
tasks:
- name: Deploy /tmp/system_info
template:
src: system_info.j2
dest: /tmp/system_info
```
The content of the system_info.j2 template
```
# {{ ansible_managed }}
I'm {{ ansible_hostname }} and my operating system is {{ ansible_distribution }
```
</b></details>
<details>
<summary>The variable 'whoami' defined in the following places:
* role defaults -> whoami: mario
* extra vars (variables you pass to Ansible CLI with -e) -> whoami: toad
* host facts -> whoami: luigi
* inventory variables (doesnt matter which type) -> whoami: browser
According to variable precedence, which one will be used?</summary><br><b>
The right answer is toad.
Variable precedence is about how variables override each other when they set in different locations. If you didnt experience it so far Im sure at some point you will, which makes it a useful topic to be aware of.
In the context of our question, the order will be extra vars (always override any other variable) -> host facts -> inventory variables -> role defaults (the weakest).
Here is the order of precedence from least to greatest (the last listed variables winning prioritization):
1. command line values (eg “-u user”)
2. role defaults [[1\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id15)
3. inventory file or script group vars [[2\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id16)
4. inventory group_vars/all [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
5. playbook group_vars/all [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
6. inventory group_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
7. playbook group_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
8. inventory file or script host vars [[2\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id16)
9. inventory host_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
10. playbook host_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17)
11. host facts / cached set_facts [[4\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id18)
12. play vars
13. play vars_prompt
14. play vars_files
15. role vars (defined in role/vars/main.yml)
16. block vars (only for tasks in block)
17. task vars (only for the task)
18. include_vars
19. set_facts / registered vars
20. role (and include_role) params
21. include params
22. extra vars (always win precedence)
A full list can be found at [PlayBook Variables](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence) . Also, note there is a significant difference between Ansible 1.x and 2.x.
</b></details>
<details>
<summary>For each of the following statements determine if it's true or false:
* A module is a collection of tasks
* Its better to use shell or command instead of a specific module
* Host facts override play variables
* A role might include the following: vars, meta, and handlers
* Dynamic inventory is generated by extracting information from external sources
* Its a best practice to use indention of 2 spaces instead of 4
* notify used to trigger handlers
* This “hosts: all:!controllers” means run only on controllers group hosts</summary><br><b>
</b></details>
<details>
<summary>Explain the Diffrence between Forks and Serial & Throttle.</summary><br><b>
`Serial` is like running the playbook for each host in turn, waiting for completion of the complete playbook before moving on to the next host. `forks`=1 means run the first task in a play on one host before running the same task on the next host, so the first task will be run for each host before the next task is touched. Default fork is 5 in ansible.
```
[defaults]
forks = 30
```
```
- hosts: webservers
serial: 1
tasks:
- name: ...
```
Ansible also supports `throttle` This keyword limits the number of workers up to the maximum set via the forks setting or serial. This can be useful in restricting tasks that may be CPU-intensive or interact with a rate-limiting API
```
tasks:
- command: /path/to/cpu_intensive_command
throttle: 1
```
</b></details>
<details>
<summary>What is ansible-pull? How is it different from how ansible-playbook works?</summary><br><b>
</b></details>
<details>
<summary>What is Ansible Vault?</summary><br><b>
</b></details>
<details>
<summary>Demonstrate each of the following with Ansible:
* Conditionals
* Loops
</summary><br><b>
</b></details>
<details>
<summary>What are filters? Do you have experience with writing filters?</summary><br><b>
</b></details>
<details>
<summary>Write a filter to capitalize a string</summary><br><b>
```
def cap(self, string):
return string.capitalize()
```
</b></details>
<details>
<summary>You would like to run a task only if previous task changed anything. How would you achieve that?</summary><br><b>
</b></details>
<details>
<summary>What are callback plugins? What can you achieve by using callback plugins?</summary><br><b>
</b></details>
<details>
<summary>What is Ansible Collections?</summary><br><b>
</b></details>
<details>
<summary>What is the difference between `include_task` and `import_task`?</summary><br><b>
</b></details>
<details>
<summary>File '/tmp/exercise' includes the following content
```
Goku = 9001
Vegeta = 5200
Trunks = 6000
Gotenks = 32
```
With one task, switch the content to:
```
Goku = 9001
Vegeta = 250
Trunks = 40
Gotenks = 32
```
</summary><br><b>
```
- name: Change saiyans levels
lineinfile:
dest: /tmp/exercise
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
with_items:
- { regexp: '^Vegeta', line: 'Vegeta = 250' }
- { regexp: '^Trunks', line: 'Trunks = 40' }
...
```
</b></details>
#### Ansible - Execution and Strategy
<details>
<summary>True or False? By default, Ansible will execute all the tasks in play on a single host before proceeding to the next host</summary><br><b>
False. Ansible will execute a single task on all hosts before moving to the next task in a play. As for today, it uses 5 forks by default.<br>
This behaviour is described as "strategy" in Ansible and it's configurable.
</b></details>
<details>
<summary>What is a "strategy" in Ansible? What is the default strategy?</summary><br><b>
A strategy in Ansible describes how Ansible will execute the different tasks on the hosts. By default Ansible is using the "Linear strategy" which defines that each task will run on all hosts before proceeding to the next task.
</b></details>
<details>
<summary>What strategies are you familiar with in Ansible?</summary><br><b>
- Linear: the default strategy in Ansible. Run each task on all hosts before proceeding.
- Free: For each host, run all the tasks until the end of the play as soon as possible
- Debug: Run tasks in an interactive way
</b></details>
<details>
<summary>What the <code>serial</code> keyword is used for?</summary><br><b>
It's used to specify the number (or percentage) of hosts to run the full play on, before moving to next number of hosts in the group.
For example:
```
- name: Some play
hosts: databases
serial: 4
```
If your group has 8 hosts. It will run the whole play on 4 hosts and then the same play on another 4 hosts.
</b></details>
#### Ansible Testing
<details>
<summary>How do you test your Ansible based projects?</summary><br><b>
</b></details>
<details>
<summary>What is Molecule? How does it works?</summary><br><b>
</b></details>
<details>
<summary>You run Ansibe tests and you get "idempotence test failed". What does it mean? Why idempotence is important?</summary><br><b>
</b></details>
#### Ansible - Debugging
<details>
<summary>How to find out the data type of a certain variable in one of the playbooks?</summary><br><b>
"{{ some_var | type_debug }}"
</b></details>
#### Ansible - Collections
<details>
<summary>What are collections in Ansible?</summary><br><b>
</b></details>

View File

@ -1,6 +0,0 @@
## Ansible, Minikube and Docker
* Write a simple program in any language you want that outputs "I'm on %HOSTNAME%" (HOSTNAME should be the actual host name on which the app is running)
* Write a Dockerfile which will run your app
* Create the YAML files required for deploying the pods
* Write and run an Ansible playbook which will install Docker, Minikube and kubectl and then create a deployment in minikube with your app running.

1446
exercises/aws/README.md Normal file

File diff suppressed because it is too large Load Diff

305
exercises/cicd/README.md Normal file
View File

@ -0,0 +1,305 @@
## CI/CD
### CI/CD Exercises
|Name|Topic|Objective & Instructions|Solution|Comments|
|--------|--------|------|----|----|
| Set up a CI pipeline | CI | [Exercise](ci_for_open_source_project.md) | | |
| Deploy to Kubernetes | Deployment | [Exercise](deploy_to_kubernetes.md) | [Solution](solutions/deploy_to_kubernetes/README.md) | |
| Jenkins - Remove Jobs | Jenkins Scripts | [Exercise](remove_jobs.md) | [Solution](solutions/remove_jobs_solution.groovy) | |
| Jenkins - Remove Builds | Jenkins Sripts | [Exercise](remove_builds.md) | [Solution](solutions/remove_builds_solution.groovy) | |
### CI/CD Self Assessment
<details>
<summary>What is Continuous Integration?</summary><br><b>
A development practice where developers integrate code into a shared repository frequently. It can range from a couple of changes every day or a week to a couple of changes in one hour in larger scales.
Each piece of code (change/patch) is verified, to make the change is safe to merge. Today, it's a common practice to test the change using an automated build that makes sure the code can integrated. It can be one build which runs several tests in different levels (unit, functional, etc.) or several separate builds that all or some has to pass in order for the change to be merged into the repository.
</b></details>
<details>
<summary>What is Continuous Deployment?</summary><br><b>
A development strategy used by developers to release software automatically into production where any code commit must pass through an automated testing phase. Only when this is successful is the release considered production worthy. This eliminates any human interaction and should be implemented only after production-ready pipelines have been set with real-time monitoring and reporting of deployed assets. If any issues are detected in production it should be easy to rollback to previous working state.
For more info please read [here](https://www.atlassian.com/continuous-delivery/continuous-deployment)
</b></details>
<details>
<summary>Can you describe an example of a CI (and/or CD) process starting the moment a developer submitted a change/PR to a repository?</summary><br><b>
There are many answers for such a question, as CI processes vary, depending on the technologies used and the type of the project to where the change was submitted.
Such processes can include one or more of the following stages:
* Compile
* Build
* Install
* Configure
* Update
* Test
An example of one possible answer:
A developer submitted a pull request to a project. The PR (pull request) triggered two jobs (or one combined job). One job for running lint test on the change and the second job for building a package which includes the submitted change, and running multiple api/scenario tests using that package. Once all tests passed and the change was approved by a maintainer/core, it's merged/pushed to the repository. If some of the tests failed, the change will not be allowed to merged/pushed to the repository.
A complete different answer or CI process, can describe how a developer pushes code to a repository, a workflow then triggered to build a container image and push it to the registry. Once in the registry, the k8s cluster is applied with the new changes.
</b></details>
<details>
<summary>What is Continuous Delivery?</summary><br><b>
A development strategy used to frequently deliver code to QA and Ops for testing. This entails having a staging area that has production like features where changes can only be accepted for production after a manual review. Because of this human entanglement there is usually a time lag between release and review making it slower and error prone as compared to continous deployment.
For more info please read [here](https://www.atlassian.com/continuous-delivery/continuous-deployment)
</b></details>
<details>
<summary>What is difference between Continuous Delivery and Continuous Deployment?</summary><br><b>
Both encapsulate the same process of deploying the changes which were compiled and/or tested in the CI pipelines.<br>
The difference between the two is that Continuous Delivery isn't fully automated process as opposed to Continuous Deployment where every change that is tested in the process is eventually deployed to production. In continuous delivery someone is either approving the deployment process or the deployment process is based on constraints and conditions (like time constraint of deploying every week/month/...)
</b></details>
<details>
<summary>What CI/CD best practices are you familiar with? Or what do you consider as CI/CD best practice?</summary><br><b>
* Commit and test often.
* Testing/Staging environment should be a clone of production environment.
* Clean up your environments (e.g. your CI/CD pipelines may create a lot of resources. They should also take care of cleaning up everything they create)
* The CI/CD pipelines should provide the same results when executed locally or remotely
* Treat CI/CD as another application in your organization. Not as a glue code.
* On demand environments instead of pre-allocated resources for CI/CD purposes
* Stages/Steps/Tasks of pipelines should be shared between applications or microservices (don't re-invent common tasks like "cloning a project")
</b></details>
<details>
<summary>You are given a pipeline and a pool with 3 workers: virtual machine, baremetal and a container. How will you decide on which one of them to run the pipeline?</summary><br><b>
</b></details>
<details>
<summary>Where do you store CI/CD pipelines? Why?</summary><br><b>
There are multiple approaches as to where to store the CI/CD pipeline definitions:
1. App Repository - store them in the same repository of the application they are building or testing (perhaps the most popular one)
2. Central Repository - store all organization's/project's CI/CD pipelines in one separate repository (perhaps the best approach when multiple teams test the same set of projects and they end up having many pipelines)
3. CI repo for every app repo - you separate CI related code from app code but you don't put everything in one place (perhaps the worst option due to the maintenance)
4. The platform where the CI/CD pipelines are being executed (e.g. Kubernetes Cluster in case of Tekton/OpenShift Pipelines).
</b></details>
<details>
<summary>How do you perform plan capacity for your CI/CD resources? (e.g. servers, storage, etc.)</summary><br><b>
</b></details>
<details>
<summary>How would you structure/implement CD for an application which depends on several other applications?</summary><br><b>
</b></details>
<details>
<summary>How do you measure your CI/CD quality? Are there any metrics or KPIs you are using for measuring the quality?</summary><br><b>
</b></details>
#### CI/CD - Jenkins
<details>
<summary>What is Jenkins? What have you used it for?</summary><br><b>
Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.
Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.
</b></details>
<details>
<summary>What are the advantages of Jenkins over its competitors? Can you compare it to one of the following systems?
* Travis
* Bamboo
* Teamcity
* CircleCI</summary><br><b>
</b></details>
<details>
<summary>What are the limitations or disadvantages of Jenkins?</summary><br><b>
This might be considered to be an opinionated answer:
* Old fashioned dashboards with not many options to customize it
* Containers readiness (this has improved with Jenkins X)
* By itself, it doesn't have many features. On the other hand, there many plugins created by the community to expand its abilities
* Managing Jenkins and its piplines as a code can be one hell of a nightmare
</b></details>
<details>
<summary>Explain the following:
- Job
- Build
- Plugin
- Node or Worker
- Executor</summary><br><b>
- Job is an automation definition = what and where to execute once the user clicks on "build"
- Build is a running instance of a job. You can have one or more builds at any given point of time (unless limited by confiugration)
- A worker is the machine/instance on which the build is running. When a build starts, it "acquires" a worker out of a pool to run on it.
- An executor is variable of the worker, defining how many builds can run on that worker in parallel. An executor value of 3 means, that 3 builds can run at any point on that executor (not necessarily of the same job. Any builds)
</b></details>
<details>
<summary>What plugins have you used in Jenkins?</summary><br><b>
</b></details>
<details>
<summary>Have you used Jenkins for CI or CD processes? Can you describe them?</summary><br><b>
</b></details>
<details>
<summary>What type of jobs are there? Which types have you used?</summary><br><b>
</b></details>
<details>
<summary>How did you report build results to users? What ways are there to report the results?</summary><br><b>
You can report via:
* Emails
* Messaging apps
* Dashboards
Each has its own disadvantages and advantages. Emails for example, if sent too often, can be eventually disregarded or ignored.
</b></details>
<details>
<summary>You need to run unit tests every time a change submitted to a given project. Describe in details how your pipeline would look like and what will be executed in each stage</summary><br><b>
The pipelines will have multiple stages:
* Clone the project
* Install test dependencies (for example, if I need tox package to run the tests, I will install it in this stage)
* Run unit tests
* (Optional) report results (For example an email to the users)
* Archive the relevant logs/files
</b></details>
<details>
<summary>How to secure Jenkins?</summary><br><b>
[Jenkins documentation](https://www.jenkins.io/doc/book/security/securing-jenkins/) provides some basic intro for securing your Jenkins server.
</b></details>
<details>
<summary>Describe how do you add new nodes (agents) to Jenkins</summary><br><b>
You can describe the UI way to add new nodes but better to explain how to do in a way that scales like a script or using dynamic source for nodes like one of the existing clouds.
</b></details>
<details>
<summary>How to acquire multiple nodes for one specific build?</summary><br><b>
</b></details>
<details>
<summary>Whenever a build fails, you would like to notify the team owning the job regarding the failure and provide failure reason. How would you do that?</summary><br><b>
</b></details>
<details>
<summary>There are four teams in your organization. How to prioritize the builds of each team? So the jobs of team x will always run before team y for example</summary><br><b>
</b></details>
<details>
<summary>If you are managing a dozen of jobs, you can probably use the Jenkins UI. But how do you manage the creation and deletion of hundreds of jobs every week/month?</summary><br><b>
</b></details>
<details>
<summary>What are some of Jenkins limitations?</summary><br><b>
* Testing cross-dependencies (changes from multiple projects together)
* Starting builds from any stage (although Cloudbees implemented something called checkpoints)
</b></details>
<details>
<summary>What is the different between a scripted pipeline to declarative pipeline? Which type are you using?</summary><br><b>
</b></details>
<details>
<summary>How would you implement an option of a starting a build from a certain stage and not from the beginning?</summary><br><b>
</b></details>
<details>
<summary>Do you have experience with developing a Jenkins plugin? Can you describe this experience?</summary><br><b>
</b></details>
<details>
<summary>Have you written Jenkins scripts? If yes, what for and how they work?</summary><br><b>
</b></details>
#### CI/CD - GitHub Actions
<details>
<summary>What is a Workflow in GitHub Actions?</summary><br><b>
A YAML file that defines the automation actions and instructions to execute upon a specific event.<br>
The file is placed in the repository itself.
A Workflow can be anything - running tests, compiling code, building packages, ...
</b></details>
<details>
<summary>What is a Runner in GitHub Actions?</summary><br><b>
A workflow has to be executed somewhere. The environment where the workflow is executed is called Runner.<br>
A Runner can be an on-premise host or GitHub hoste
</b></details>
<details>
<summary>What is a Job in GitHub Actions?</summary><br><b>
A job is a series of steps which are executed on the same runner/environment.<br>
A workflow must include at least one job.
</b></details>
<details>
<summary>What is an Action in GitHub Actions?</summary><br><b>
An action is the smallest unit in a workflow. It includes the commands to execute as part of the job.
</b></details>
<details>
<summary>In GitHub Actions workflow, what the 'on' attribute/directive is used for?</summary><br><b>
Specify upon which events the workflow will be triggered.<br>
For example, you might configure the workflow to trigger every time a changed is pushed to the repository.
</b></details>
<details>
<summary>True or False? In Github Actions, jobs are executed in parallel by deafult</summary><br><b>
True
</b></details>
<details>
<summary>How to create dependencies between jobs so one job runs after another?</summary><br><b>
Using the "needs" attribute/directive.
```
jobs:
job1:
job2:
needs: job1
```
In the above example, job1 must complete successfully before job2 runs
</b></details>
<details>
<summary>How to add a Workflow to a repository?</summary><br><b>
CLI:
1. Create the directory `.github/workflows` in the repository
2. Add a YAML file
UI:
1. In the repository page, click on "Actions"
2. Choose workflow and click on "Set up this workflow"
</b></details>

View File

Before

Width:  |  Height:  |  Size: 1.1 KiB

After

Width:  |  Height:  |  Size: 1.1 KiB

108
exercises/cloud/README.md Normal file
View File

@ -0,0 +1,108 @@
## Cloud
<details>
<summary>What is Cloud Computing? What is a Cloud Provider?</summary><br><b>
Cloud computing refers to the delivery of on-demand computing services
over the internet on a pay-as-you-go basis.
In simple words, Cloud computing is a service that lets you use any computing
service such as a server, storage, networking, databases, and intelligence,
right through your browser without owning anything. You can do anything you
can think of unless it doesnt require you to stay close to your hardware.
Cloud service providers are companies that establish public clouds, manage private clouds, or offer on-demand cloud computing components (also known as cloud computing services) like Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service(SaaS). Cloud services can reduce business process costs when compared to on-premise IT.
</b></details>
<details>
<summary>What are the advantages of cloud computing? Mention at least 3 advantages</summary><br><b>
* Pay as you go: you are paying only for what you are using. No upfront payments and payment stops when resources are no longer used.
* Scalable: resources are scaled down or up based on demand
* High availability: resources and applications provide seamless experience, even when some services are down
* Disaster recovery
</b></details>
<details>
<summary>True or False? Cloud computing is a consumption-based model (users only pay for for resources they use)</summary><br><b>
True
</b></details>
<details>
<summary>What types of Cloud Computing services are there?</summary><br><b>
IAAS - Infrastructure as a Service
PAAS - Platform as a Service
SAAS - Software as a Service
</b></details>
<details>
<summary>Explain each of the following and give an example:
* IAAS
* PAAS
* SAAS</summary><br><b>
* IAAS - Users have control over complete Operating System and don't need to worry about the physical resources, which is managed by Cloud Service Provider.
* PAAS - CLoud Service Provider takes care of Operating System, Middlewares and users only need to focus on our Data and Application.
* SAAS - A cloud based method to provide software to users, software logics running on cloud, can be run on-premises or managed by Cloud Service Provider.
</b></details>
<details>
<summary>What types of clouds (or cloud deployments) are there?</summary><br><b>
* Public - Cloud services sharing computing resources among multiple customers
* Private - Cloud services having computing resources limited to specific customer or organization, managed by third party or organizations itself
* Hybrid - Combination of public and private clouds
</b></details>
<details>
<summary>What are the differences between Cloud Providers and On-Premise solution?</summary><br><b>
In cloud providers, someone else owns and manages the hardware, hire the relevant infrastructure teams and pays for real-estate (for both hardware and people). You can focus on your business.
In On-Premise solution, it's quite the opposite. You need to take care of hardware, infrastructure teams and pay for everything which can be quite expensive. On the other hand it's tailored to your needs.
</b></details>
<details>
<summary>What is Serverless Computing?</summary><br><b>
The main idea behind serverless computing is that you don't need to manage the creation and configuration of server. All you need to focus on is splitting your app into multiple functions which will be triggered by some actions.
It's important to note that:
* Serverless Computing is still using servers. So saying there are no servers in serverless computing is completely wrong
* Serverless Computing allows you to have a different paying model. You basically pay only when your functions are running and not when the VM or containers are running as in other payment models
</b></details>
<details>
<summary>Can we replace any type of computing on servers with serverless?</summary><br><b>
</b></details>
<details>
<summary>Is there a difference between managed service to SaaS or is it the same thing?</summary><br><b>
</b></details>
<details>
<summary>What is auto scaling?</summary><br><b>
AWS definition: "AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost"
Read more about auto scaling [here](https://aws.amazon.com/autoscaling)
</b></details>
<details>
<summary>True or False? Auto Scaling is about adding resources (such as instances) and not about removing resource</summary><br><b>
False. Auto scaling adjusts capacity and this can mean removing some resources based on usage and performances.
</b></details>
#### Cloud - Security
<details>
<summary>How to secure instances in the cloud?</summary><br><b>
* Instance should have minimal permissions needed. You don't want an instance-level incident to become an account-level incident
* Instances should be accessed through load balancers or bastion hosts. In other words, they should be off the internet (in a private subnet behind a NAT).
* Using latest OS images with your instances (or at least apply latest patches)
</b></details>

View File

@ -0,0 +1,887 @@
## Containers
### Containers Exercises
|Name|Topic|Objective & Instructions|Solution|Comments|
|--------|--------|------|----|----|
|Running Containers|Intro|[Exercise](running_containers.md)|[Solution](solutions/running_containers.md)
|Working with Images|Image|[Exercise](working_with_images.md)|[Solution](solutions/working_with_images.md)
|My First Dockerfile|Dockerfile|[Exercise](write_dockerfile_run_container.md)|
|Run, Forest, Run!|Restart Policies|[Exercise](run_forest_run.md)|[Solution](solutions/run_forest_run.md)
|Layer by Layer|Image Layers|[Exercise](image_layers.md)|[Solution](solutions/image_layers.md)
|Containerize an application | Containerization |[Exercise](containerize_app.md)|[Solution](solutions/containerize_app.md)
|Multi-Stage Builds|Multi-Stage Builds|[Exercise](multi_stage_builds.md)|[Solution](solutions/multi_stage_builds.md)
### Containers Self Assesment
<details>
<summary>What is a Container?</summary><br><b>
This can be tricky to answer since there are many ways to create a containers:
- Docker
- systemd-nspawn
- LXC
If to focus on OCI (Open Container Initiative) based containers, it offers the following [definition](https://github.com/opencontainers/runtime-spec/blob/master/glossary.md#container): "An environment for executing processes with configurable isolation and resource limitations. For example, namespaces, resource limits, and mounts are all part of the container environment."
</b></details>
<details>
<summary>Why containers are needed? What is their goal?</summary><br><b>
OCI provides a good [explanation](https://github.com/opencontainers/runtime-spec/blob/master/principles.md#the-5-principles-of-standard-containers): "Define a unit of software delivery called a Standard Container. The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container."
</b></details>
<details>
<summary>How are containers different from virtual machines (VMs)?</summary><br><b>
The primary difference between containers and VMs is that containers allow you to virtualize
multiple workloads on a single operating system while in the case of VMs, the hardware is being virtualized to run multiple machines each with its own guest OS.
You can also think about it as containers are for OS-level virtualization while VMs are for hardware virtualization.
* Containers don't require an entire guest operating system as VMs. Containers share the system's kernel as opposed to VMs. They isolate themselves via the use of kernel's features such as namespaces and cgroups
* It usually takes a few seconds to set up a container as opposed to VMs which can take minutes or at least more time than containers as there is an entire OS to boot and initialize as opposed to containers which has share of the underlying OS
* Virtual machines considered to be more secured than containers
* VMs portability considered to be limited when compared to containers
</b></details>
<details>
<summary>Do we need virtual machines in the edge of containers? Are they still relevant?</summary><br><b>
</b></details>
<details>
<summary>In which scenarios would you use containers and in which you would prefer to use VMs?</summary><br><b>
You should choose VMs when:
* You need run an application which requires all the resources and functionalities of an OS
* You need full isolation and security
You should choose containers when:
* You need a lightweight solution
* Running multiple versions or instances of a single application
</b></details>
<details>
<summary>Describe the process of containerizing an application</summary><br><b>
1. Write a Dockerfile that includes your app (including the commands to run it) and its dependencies
2. Build the image using the Dockefile you wrote
3. You might want to push the image to a registry
4. Run the container using the image you've built
</b></details>
#### Containers - OCI
<details>
<summary>What is the OCI?</summary><br><b>
OCI (Open Container Initiative) is an open governance established in 2015 to standardize container creation - mostly image format and runtime. At that time there were a number of parties involved and the most prominent one was Docker.
Specifications published by OCI:
- [image-spec](https://github.com/opencontainers/image-spec)
- [runtime-spec](https://github.com/opencontainers/runtime-spec)
</b></details>
<details>
<summary>Which operations OCI based containers must support?</summary><br><b>
Create, Kill, Delete, Start and Query State.
</b></details>
#### Containers - Basic Commands
<details>
<summary>How to list all the containers on a given host?</summary><br><b>
In the case of Docker, use: `docker container ls`<br>
In the case of Podman, it's not very different: `podman container ls`
</b></details>
<details>
<summary>How to run a container?</summary><br><b>
Docker: `docker container run ubuntu`<br>
Podman: `podman container run ubuntu`
</b></details>
<details>
<summary>Why after running <code>podman container run ubuntu</code> the output of <code>podman container ls</code> is empty?</summary><br><b>
Because the container immediately exits after running the ubuntu image. This is completely normal and expected as containers designed to run a service or a app and exit when they are done running it.<br>
If you want the container to keep running, you can run a command like `sleep 100` which will run for 100 seconds or you can attach to terminal of the container with a command similar: `podman container run -it ubuntu /bin/bash`
</b></details>
<details>
<summary>How to attach your shell to a terminal of a running container?</summary><br><b>
`podman container exec -it [container id/name] bash`
This can be done in advance while running the container: `podman container run -it [image:tag] /bin/bash`
</b></details>
<details>
<summary>True or False? You can remove a running container if it doesn't running anything</summary><br><b>
False. You have to stop the container before removing it.
</b></details>
<details>
<summary>How to stop and remove a container?</summary><br><b>
`podman container stop <container id/name> && podman container rm <container id/name>`
</b></details>
<details>
<summary>What happens when you run <code>docker container run ubuntu</code>?</summary><br><b>
1. Docker client posts the command to the API server running as part of the Docker daemon
2. Docker daemon checks if a local image exists
1. If it exists, it will use it
2. If doesn't exists, it will go to the remote registry (Docker Hub by default) and pull the image locally
3. containerd and runc are instructed (by the daemon) to create and start the container
</b></details>
<details>
<summary>How to run a container in the background?</summary><br><b>
With the -d flag. It will run in the background and will not attach it to the terminal.
`docker container run -d httpd` or `podman container run -d httpd`
</b></details>
#### Containers - Images
<details>
<summary>What is a container image?</summary><br><b>
* An image of a container contains the application, its dependencies and the operating system where the application is executed.<br>
* It's a collection of read-only layers. These layers are loosely coupled
* Each layer is assembled out of one or more files
</b></details>
<details>
<summary>Why container images are relatively small?</summary><br><b>
* Most of the images don't contain Kernel. They share and access the one used by the host on which they are running
* Containers intended to run specific application in most cases. This means they hold only what the application needs in order to run
</b></details>
<details>
<summary>How to list the container images on certain host?</summary><br><b>
`podman image ls`<br>
`docker image ls`
Depends on which containers engine you use.
</b></details>
<details>
<summary>How the centralized location, where images are stored, is called?</summary><br><b>
Registry
</b></details>
<details>
<summary>A registry contains one or more <code>____</code> which in turn contain one or more <code>____</code></summary><br><b>
A registry contains one or more repositories which in turn contain one or more images.
</b></details>
<details>
<summary>How to find out which registry do you use by default from your environment?</summary><br><b>
Depends on the containers technology you are using. For example, in case of Docker, it can be done with `docker info`
```
> docker info
Registry: https://index.docker.io/v1
```
</b></details>
<details>
<summary>How to retrieve the latest ubuntu image?</summary><br><b>
`docker image pull ubuntu:latest`
</b></details>
<details>
<summary>True or False? It's not possible to remove an image if a certain container is using it</summary><br><b>
True. You should stop and remove the container before trying to remove the image it uses.
</b></details>
<details>
<summary>True or False? If a tag isn't specified when pulling an image, the 'latest' tag is being used</summary><br><b>
True
</b></details>
<details>
<summary>True or False? Using the 'latest' tag when pulling an image means, you are pulling the most recently published image</summary><br><b>
False. While this might be true in some cases, it's not guaranteed that you'll pull the latest published image when using the 'latest' tag.<br>
For example, in some images, 'edge' tag is used for the most recently published images.
</b></details>
<details>
<summary>Where pulled images are stored?</summary><br><b>
Depends on the container technology being used. For example, in case of Docker, images are stored in `/var/lib/docker/`
</b></details>
<details>
<summary>Explain container image layers</summary><br><b>
- The layers of an image is where all the content is stored - code, files, etc.
- Each layer is independent
- Each layer has an ID that is an hash based on its content
- The layers (as the image) are immutable which means a change to one of the layers can be easily identified
</b></details>
<details>
<summary>True or False? Changing the content of any of the image layers will cause the hash content of the image to change</summary><br><b>
True. These hashes are content based and since images (and their layers) are immutable, any change will cause the hashes to change.
</b></details>
<details>
<summary>How to list the layers of an image?</summary><br><b>
In case of Docker, you can use `docker image inspect <name>`
</b></details>
<details>
<summary>True or False? In most cases, container images contain their own kernel</summary><br><b>
False. They share and access the one used by the host on which they are running.
</b></details>
<details>
<summary>True or False? A single container image can have multiple tags</summary><br><b>
True. When listing images, you might be able to see two images with the same ID but different tags.
</b></details>
<details>
<summary>What is a dangling image?</summary><br><b>
It's an image without tags attached to it.
One way to reach this situation is by building an image with exact same name and tag as another already existing image. It can be still referenced by using its full SHA.
</b></details>
<details>
<summary>How to see changes done to a given image over time?</summary><br><b>
In the case of Docker, you could use `docker history <name>`
</b></details>
<details>
<summary>True or False? Multiple images can share layers</summary><br><b>
True.<br>
One evidence for that can be found in pulling images. Sometimes when you pull an image, you'll see a line similar to the following:<br>
`fa20momervif17: already exists`
This is because it recognizes such layer already exists on the host, so there is no need to pull the same layer twice.
</b></details>
<details>
<summary>What is the digest of an image? What problem does it solves?</summary><br><b>
Tags are mutable. This is mean that we can have two different images with the same name and the same tag. It can be very confusing to see two images with the same name and the same tag in your environment. How would you know if they are truly the same or are they different?<br>
This is where "digests` come handy. A digest is a content-addressable identifier. It isn't mutable as tags. Its value is predictable and this is how you can tell if two images are the same content wise and not merely by looking at the name and the tag of the images.
</b></details>
<details>
<summary>True or False? A single image can support multiple architectures (Linux x64, Windows x64, ...)</summary><br><b>
True.
</b></details>
<details>
<summary>What is a distribution hash in regards to layers?</summary><br><b>
- Layers are compressed when pushed or pulled
- distribution hash is the hash of the compressed layer
- the distribution hash used when pulling or pushing images for verification (making sure no one tempered with image or layers)
- It's also used for avoiding ID collisions (a case where two images have exactly the same generated ID)
</b></details>
<details>
<summary>How multi-architecture images work? Explain by describing what happens when an image is pulled</summary><br><b>
1. A client makes a call to the registry to use a specific image (using an image name and optionally a tag)
2. A manifest list is parsed (assuming it exists) to check if the architecture of the client is supported and available as a manifest
3. If it is supported (a manifest for the architecture is available) the relevant manifest is parsed to obtain the IDs of the layers
4. Each layer is then pulled using the obtained IDs from the previous step
</b></details>
<details>
<summary>How to check which architectures a certain container image supports?</summary><br><b>
`docker manifest inspect <name>`
</b></details>
<details>
<summary>How to check what a certain container image will execute once we'll run a container based on that image?</summary><br><b>
Look for "Cmd" or "Entrypoint" fields in the output of `docker image inspec <image name>`
</b></details>
<details>
<summary>How to view the instructions that were used to build image?</summary><br><b>
`docker image history <image name>:<tag>`
</b></details>
<details>
<summary>How <code>docker image build</code> works?</summary><br><b>
1. Docker spins up a temporary container
2. Runs a single instruction in the temporary container
3. Stores the result as a new image layer
4. Remove the temporary container
5. Repeat for every instruction
</b></details>
<details>
<summary>What is the role of cache in image builds?</summary><br><b>
When you build an image for the first time, the different layers are being cached. So, while the first build of the image might take time, any other build of the same image (given that Dockerfile didn't change or the content used by the instructions) will be instant thanks to the caching mechanism used.
In little bit more details, it works this way:
1. The first instruction (FROM) will check if base image already exists on the host before pulling it
2. For the next instruction, it will check in the build cache if an existing layer was built from the same base image + if it used the same instruction
1. If it finds such layer, it skips the instruction and links the existing layer and it keeps using the cache.
2. If it doesn't find a matching layer, it builds the layer and the cache is invalidated.
Note: in some cases (like COPY and ADD instructions) the instruction might stay the same but if the content of what being copied is changed then the cache is invalidated. The way this check is done is by comparing the checksum of each file that is being copied.
</b></details>
<details>
<summary>What ways are there to reduce container images size?</summary><br><b>
* Reduce number of instructions - in some case you may be able to join layers by installing multiple packages with one instructions for example or using `&&` to concatenate RUN instructions
* Using smaller images - in some cases you might be using images that contain more than what is needed for your application to run. It is good to get overview of some images and see whether you can use smaller images that you are usually using.
* Cleanup after running commands - some commands, like packages installation, create some metadata or cache that you might not need for running the application. It's important to clean up after such commands to reduce the image size
* For Docker images, you can use multi-stage builds
</b></details>
<details>
<summary>What are the pros and cons of squashing images?</summary><br><b>
Pros:
* Smaller image
* Reducing number of layers (especially if the image has lot of layers)
Cons:
* No sharing of the image layers
* Push and pull can take more time (because no matching layers found on target)
</b></details>
#### Containers - Volume
<details>
<summary>How to create a new volume?</summary><br><b>
`docker volume create some_volume`
</b></details>
#### Containers - Dockerfile
<details>
<summary>What is a Dockerfile?</summary><br><b>
Different container engines (e.g. Docker, Podman) can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text file that contains all the instructions for building an image which containers can use.
</b></details>
<details>
<summary>What is the instruction in all Dockefiles and what does it mean?</summary><br><b>
The first instruction is `FROM <image name>`<br>
It specifies the base layer of the image to be used. Every other instruction is a layer on top of that base image.
</b></details>
<details>
<summary>List five different instructions that are available for use in a Dockerfile</summary><br><b>
* WORKDIR: sets the working directory inside the image filesystems for all the instructions following it
* EXPOSE: exposes the specified port (it doesn't adds a new layer, rather documented as image metadata)
* ENTRYPOINT: specifies the startup commands to run when a container is started from the image
* ENV: sets an environment variable to the given value
* USER: sets the user (and optionally the user group) to use while running the image
</b></details>
<details>
<summary>What are some of the best practices regarding container images and Dockerfiles that you are following?</summary><br><b>
* Include only the packages you are going to use. Nothing else.
* Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result.
* Do not use environment variables to share secrets
* Use images from official repositories
* Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else.
* If are using the apt package manager, you might use 'no-install-recommends' with `apt-get install` to install only main dependencies (instead of suggested, recommended packages)
</b></details>
<details>
<summary>What is the "build context"?</summary><br><b>
[Docker docs](https://docs.docker.com/engine/reference/commandline/build): "A builds context is the set of files located in the specified PATH or URL"
</b></details>
<details>
<summary>What is the difference between ADD and COPY in Dockerfile?</summary><br><b>
COPY takes in a source and destination. It lets you copy in a file or directory from the build context into the Docker image itself.<br>
ADD lets you do the same, but it also supports two other sources. You can use a URL instead of a file or directory from the build context. In addition, you can extract a tar file from the source directly into the destination.
Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. Thats because its more transparent than ADD. COPY only supports the basic copying of files from build context into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious.
</b></details>
<details>
<summary>What is the difference between CMD and RUN in Dockerfile?</summary><br><b>
RUN lets you execute commands inside of your Docker image. These commands get executed once at build time and get written into your Docker image as a new layer.
CMD is the command the container executes by default when you launch the built image. A Dockerfile can only have one CMD.
You could say that CMD is a Docker run-time operation, meaning its not something that gets executed at build time. It happens when you run an image. A running image is called a container.
</b></details>
<details>
<summary>How to create a new image using a Dockerfile?</summary><br><b>
The following command is executed from within the directory where Dockefile resides:
`docker image build -t some_app:latest .`
`podman image build -t some_app:latest .`
</b></details>
<details>
<summary>Do you perform any checks or testing on your Dockerfiles?</summary><br><b>
One option is to use [hadolint](https://github.com/hadolint/hadolint) project which is a linter based on Dockerfile best practices.
</b></details>
<details>
<summary>Which instructions in Dockerfile create new layers?</summary><br><b>
Instructions such as FROM, COPY and RUN, create new image layers instead of just adding metadata.
</b></details>
<details>
<summary>Which instructions in Dockerfile create image metadata and don't create new layers?</summary><br><b>
Instructions such as ENTRYPOINT, ENV, EXPOSE, create image metadata and they don't create new layers.
</b></details>
<details>
<summary>Is it possible to identify which instruction create a new layer from the output of <code>docker image history</code>?</summary><br><b>
</b></details>
#### Containers - Architecture
<details>
<summary>How container achieve isolation from the rest of the system?</summary><br><b>
Through the use of namespaces and cgroups. Linux kernel has several types of namespaces:
- Process ID namespaces: these namespaces include independent set of process IDs
- Mount namespaces: Isolation and control of mountpoints
- Network namespaces: Isolates system networking resources such as routing table, interfaces, ARP table, etc.
- UTS namespaces: Isolate host and domains
- IPC namespaces: Isolates interprocess communications
- User namespaces: Isolate user and group IDs
- Time namespaces: Isolates time machine
</b></details>
<details>
<summary>Describe in detail what happens when you run `podman/docker run hello-world`?</summary><br><b>
Docker/Podman CLI passes your request to Docker daemon.
Docker/Podman daemon downloads the image from Docker Hub
Docker/Podman daemon creates a new container by using the image it downloaded
Docker/Podman daemon redirects output from container to Docker CLI which redirects it to the standard output
</b></details>
<details>
<summary>Describe difference between cgroups and namespaces </summary><br><b>
cgroup: Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour.
namespace: wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
In short:
Cgroups = limits how much you can use;
namespaces = limits what you can see (and therefore use)
Cgroups involve resource metering and limiting:
memory
CPU
block I/O
network
Namespaces provide processes with their own view of the system
Multiple namespaces: pid,net, mnt, uts, ipc, user
</b></details>
#### Containers - Docker Architecture
<details>
<summary>Which components/layers compose the Docker technology?</summary><br><b>
1. Runtime - responsible for starting and stopping containers
2. Daemon - implements the Docker API and takes care of managing images (including builds), authentication, security, networking, etc.
3. Orchestrator
</b></details>
<details>
<summary>What components are part of the Docker engine?</summary><br><b>
- Docker daemon
- containerd
- runc
</b></details>
<details>
<summary>What is the low-level runtime?</summary><br><b>
- The low level runtime is called runc
- It manages every container running on Docker host
- Its purpose is to interact with the underlying OS to start and stop containers
- Its reference implementation is of the OCI (Open Containers Initiative) container-runtime-spec
- It's a small CLI wrapper for libcontainer
</b></details>
<details>
<summary>What is the high-level runtime?</summary><br><b>
- The high level runtime is called containerd
- It was developed by Docker Inc and at some point donated to CNCF
- It manages the whole lifecycle of a container - start, stop, remove and pause
- It take care of setting up network interfaces, volume, pushing and pulling images, ...
- It manages the lower level runtime (runc) instances
- It's used both by Docker and Kubernetes as a container runtime
- It sits between Docker daemon and runc at the OCI layer
Note: running `ps -ef | grep -i containerd` on a system with Docker installed and running, you should see a process of containerd
</b></details>
<details>
<summary>True or False? The docker daemon (dockerd) performs lower-level tasks compared to containerd</summary><br><b>
False. The Docker daemon performs higher-level tasks compared to containerd.<br>
It's responsible for managing networks, volumes, images, ...
</b></details>
<details>
<summary>Describe in detail what happens when you run `docker pull image:tag`?</summary><br><b>
Docker CLI passes your request to Docker daemon. Dockerd Logs shows the process
docker.io/library/busybox:latest resolved to a manifestList object with 9 entries; looking for a unknown/amd64 match
found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:400ee2ed939df769d4681023810d2e4fb9479b8401d97003c710d0e20f7c49c6
pulling blob \"sha256:61c5ed1cbdf8e801f3b73d906c61261ad916b2532d6756e7c4fbcacb975299fb Downloaded 61c5ed1cbdf8 to tempfile /var/lib/docker/tmp/GetImageBlob909736690
Applying tar in /var/lib/docker/overlay2/507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7/diff" storage-driver=overlay2
Applied tar sha256:514c3a3e64d4ebf15f482c9e8909d130bcd53bcc452f0225b0a04744de7b8c43 to 507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7, size: 1223534
</b></details>
<details>
<summary>Describe in detail what happens when you run a container</summary><br><b>
1. The Docker client converts the run command into an API payload
2. It then POST the payload to the API endpoint exposed by the Docker daemon
3. When the daemon receives the command to create a new container, it makes a call to containerd via gRPC
4. containerd converts the required image into an OCI bundle and tells runc to use that bundle for creating the container
5. runc interfaces with the OS kernel to pull together the different constructs (namespace, cgroups, etc.) used for creating the container
6. Container process is started as a child-process of runc
7. Once it starts, runc exists
</b></details>
<details>
<summary>True or False? Killing the Docker daemon will kill all the running containers</summary><br><b>
False. While this was true at some point, today the container runtime isn't part of the daemon (it's part of containerd and runc) so stopping or killing the daemon will not affect running containers.
</b></details>
<details>
<summary>True or False? containerd forks a new instance runc for every container it creates</summary><br><b>
True
</b></details>
<details>
<summary>True or False? Running a dozen of containers will result in having a dozen of runc processes</summary><br><b>
False. Once a container is created, the parent runc process exists.
</b></details>
<details>
<summary>What is shim in regards to Docker?</summary><br><b>
shim is the process that becomes the container's parent when runc process exists. It's responsible for:
- Reporting exit code back to the Docker daemon
- Making sure the container doesn't terminate if the daemon is being restarted. It does so by keeping the stdout and stdin open
</b></details>
<details>
<summary>What `podman commit` does?. When will you use it?</summary><br><b>
Create a new image from a containers changes
</b></details>
<details>
<summary>How would you transfer data from one container into another?</summary><br><b>
</b></details>
<details>
<summary>What happens to data of the container when a container exists?</summary><br><b>
</b></details>
<details>
<summary>Explain what each of the following commands do:
* docker run
* docker rm
* docker ps
* docker pull
* docker build
* docker commit
</summary><br><b>
</b></details>
<details>
<summary>How do you remove old, non running, containers?</summary><br><b>
1. To remove one or more Docker images use the docker container rm command followed by the ID of the containers you want to remove.
2. The docker system prune command will remove all stopped containers, all dangling images, and all unused networks
3. docker rm $(docker ps -a -q) - This command will delete all stopped containers. The command docker ps -a -q will return all existing container IDs and pass them to the rm command which will delete them. Any running containers will not be deleted.
</b></details>
<details>
<summary>How the Docker client communicates with the daemon?</summary><br><b>
Via the local socket at `/var/run/docker.sock`
</b></details>
<details>
<summary>Explain Docker interlock</summary><br><b>
</b></details>
<details>
<summary>What is Docker Repository?</summary><br><b>
</b></details>
<details>
<summary>Explain image layers</summary><br><b>
A Docker image is built up from a series of layers. Each layer represents an instruction in the images Dockerfile. Each layer except the very last one is read-only.
Each layer is only a set of differences from the layer before it. The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer.
The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged.
Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state.
</b></details>
<details>
<summary>What best practices are you familiar related to working with containers?</summary><br><b>
</b></details>
<details>
<summary>How do you manage persistent storage in Docker?</summary><br><b>
</b></details>
<details>
<summary>How can you connect from the inside of your container to the localhost of your host, where the container runs?</summary><br><b>
</b></details>
<details>
<summary>How do you copy files from Docker container to the host and vice versa?</summary><br><b>
</b></details>
#### Containers - Docker Compose
<details>
<summary>Explain what is Docker compose and what is it used for</summary><br><b>
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your applications services. Then, with a single command, you create and start all the services from your configuration.
For example, you can use it to set up ELK stack where the services are: elasticsearch, logstash and kibana. Each running in its own container.<br>
In general, it's useful for running applications which composed out of several different services. It let's you manage it as one deployed app, instead of different multiple separate services.
</b></details>
<details>
<summary>Describe the process of using Docker Compose</summary><br><br>
* Define the services you would like to run together in a docker-compose.yml file
* Run `docker-compose up` to run the services
</b></details>
#### Containers - Docker Images
<details>
<summary>What is Docker Hub?</summary><br><b>
One of the most common registries for retrieving images.
</b></details>
<details>
<summary>How to push an image to Docker Hub?</summary><br><b>
`docker image push [username]/[image name]:[tag]`
For example:
`docker image mario/web_app:latest`
</b></details>
<details>
<summary>What is the difference between Docker Hub and Docker cloud?</summary><br><b>
Docker Hub is a native Docker registry service which allows you to run pull
and push commands to install and deploy Docker images from the Docker Hub.
Docker Cloud is built on top of the Docker Hub so Docker Cloud provides
you with more options/features compared to Docker Hub. One example is
Swarm management which means you can create new swarms in Docker Cloud.
</b></details>
<details>
<summary>Explain Multi-stage builds</summary><br><b>
Multi-stages builds allow you to produce smaller container images by splitting the build process into multiple stages.
As an example, imagine you have one Dockerfile where you first build the application and then run it. The whole build process of the application might be using packages and libraries you don't really need for running the application later. Moreover, the build process might produce different artifacts which not all are needed for running the application.
How do you deal with that? Sure, one option is to add more instructions to remove all the unnecessary stuff but, there are a couple of issues with this approach:
1. You need to know what to remove exactly and that might be not as straightforward as you think
2. You add new layers which are not really needed
A better solution might be to use multi-stage builds where one stage (the build process) is passing the relevant artifacts/outputs to the stage that runs the application.
</b></details>
<details>
<summary>True or False? In multi-stage builds, artifacts can be copied between stages</summary><br><b>
True. This allows us to eventually produce smaller images.
</b></details>
<details>
<summary>What <code>.dockerignore</code> is used for?</summary><br><b>
By default, Docker uses everything (all the files and directories) in the directory you use as build context.<br>
`.dockerignore` used for excluding files and directories from the build context
</b></details>
#### Containers - Networking
<details>
<summary>What container network standards or architectures are you familiar with?</summary><br><b>
CNM (Container Network Model):
* Requires distrubited key value store (like etcd for example) for storing the network configuration
* Used by Docker
CNI (Container Network Interface):
* Network configuration should be in JSON format
</b></details>
#### Containers - Docker Networking
<details>
<summary>What network specification Docker is using and how its implementation is called?</summary><br><b>
Docker is using the CNM (Container Network Model) design specification.<br>
The implementation of CNM specification by Docker is called "libnetwork". It's written in Go.
</b></details>
<details>
<summary>Explain the following blocks in regards to CNM:
* Networks
* Endpoints
* Sandboxes</summary><br><b>
* Networks: software implementation of an switch. They used for grouping and isolating a collection of endpoints.
* Endpoints: Virtual network interfaces. Used for making connections.
* Sandboxes: Isolated network stack (interfaces, routing tables, ports, ...)
</b></details>
<details>
<summary>True or False? If you would like to connect a container to multiple networks, you need multiple endpoints</summary><br><b>
True. An endpoint can connect only to a single network.
</b></details>
<details>
<summary>What are some features of libnetwork?</summary><br><b>
* Native service discovery
* ingress-based load balancer
* network control plane and management plane
</b></details>
#### Containers - Security
<details>
<summary>What security best practices are there regarding containers?</summary><br><b>
* Install only the necessary packages in the container
* Don't run containers as root when possible
* Don't mount the Docker daemon unix socket into any of the containers
* Set volumes and container's filesystem to read only
* DO NOT run containers with `--privilged` flag
</b></details>
<details>
<summary>A container can cause a kernel panic and bring down the whole host. What preventive actions can you apply to avoid this specific situation?</summary><br><b>
* Install only the necessary packages in the container
* Set volumes and container's filesystem to read only
* DO NOT run containers with `--privilged` flag
</b></details>
#### Containers - Docker in Production
<details>
<summary>What are some best practices you following in regards to using containers in production?</summary><br><b>
Images:
* Use images from official repositories
* Include only the packages you are going to use. Nothing else.
* Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result.
* Do not use environment variables to share secrets
* Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else.
Components:
* Secured connection between components (e.g. client and server)
</b></details>
<details>
<summary>True or False? It's recommended for production environments that Docker client and server will communicate over network using HTTP socket</summary><br><b>
False. Communication between client and server shouldn't be done over HTTP since it's insecure. It's better to enforce the daemon to only accept network connection that are secured with TLS.<br>
Basically, the Docker daemon will only accept secured connections with certificates from trusted CA.
</b></details>
<details>
<summary>What forms of self-healing options available for Docker containers?</summary><br><b>
Restart Policies. It allows you to automatically restart containers after certain events.
</b></details>
<details>
<summary>What restart policies are you familiar with?</summary><br><b>
* always: restart the container when it's stopped (not with `docker container stop`)
* unless-stopped: restart the container unless it was in stopped status
* no: don't restart the container at any point (default policy)
* on-failure: restart the container when it exists due to an error (= exit code different than zero)
</b></details>
#### Containers - Docker Misc
<details>
<summary>Explain what is Docker Bench</summary><br><b>
</b></details>

435
exercises/devops/README.md Normal file
View File

@ -0,0 +1,435 @@
## DevOps
<details>
<summary>What is DevOps?</summary><br><b>
You can answer it by describing what DevOps means to you and/or rely on how companies define it. I've put here a couple of examples.
Amazon:
"DevOps is the combination of cultural philosophies, practices, and tools that increases an organizations ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market."
Microsoft:
"DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The contraction of “Dev” and “Ops” refers to replacing siloed Development and Operations to create multidisciplinary teams that now work together with shared and efficient practices and tools. Essential DevOps practices include agile planning, continuous integration, continuous delivery, and monitoring of applications."
Red Hat:
"DevOps describes approaches to speeding up the processes by which an idea (like a new software feature, a request for enhancement, or a bug fix) goes from development to deployment in a production environment where it can provide value to the user. These approaches require that development teams and operations teams communicate frequently and approach their work with empathy for their teammates. Scalability and flexible provisioning are also necessary. With DevOps, those that need power the most, get it—through self service and automation. Developers, usually coding in a standard development environment, work closely with IT operations to speed software builds, tests, and releases—without sacrificing reliability."
Google:
"...The organizational and cultural movement that aims to increase software delivery velocity, improve service reliability, and build shared ownership among software stakeholders"
</b></details>
<details>
<summary>What are the benefits of DevOps? What can it help us to achieve?</summary><br><b>
* Collaboration
* Improved delivery
* Security
* Speed
* Scale
* Reliability
</b></details>
<details>
<summary>What are the anti-patterns of DevOps?</summary><br><b>
A couple of examples:
* One person is in charge of specific tasks. For example there is only one person who is allowed to merge the code of everyone else into the repository.
* Treating production differently from development environment. For example, not implementing security in development environment
* Not allowing someone to push to production on Friday ;)
</b></details>
<details>
<summary>How would you describe a successful DevOps engineer or a team?</summary><br><b>
The answer can focus on:
* Collaboration
* Communication
* Set up and improve workflows and processes (related to testing, delivery, ...)
* Dealing with issues
Things to think about:
* What DevOps teams or engineers should NOT focus on or do?
* Do DevOps teams or engineers have to be innovative or practice innovation as part of their role?
</b></details>
#### Tooling
<details>
<summary>What are you taking into consideration when choosing a tool/technology?</summary><br><b>
A few ideas to think about:
* mature/stable vs. cutting edge
* community size
* architecture aspects - agent vs. agentless, master vs. masterless, etc.
* learning curve
</b></details>
<details>
<summary>Can you describe which tool or platform you chose to use in some of the following areas and how?
* CI/CD
* Provisioning infrastructure
* Configuration Management
* Monitoring & alerting
* Logging
* Code review
* Code coverage
* Issue Tracking
* Containers and Containers Orchestration
* Tests</summary><br><b>
This is a more practical version of the previous question where you might be asked additional specific questions on the technology you chose
* CI/CD - Jenkins, Circle CI, Travis, Drone, Argo CD, Zuul
* Provisioning infrastructure - Terraform, CloudFormation
* Configuration Management - Ansible, Puppet, Chef
* Monitoring & alerting - Prometheus, Nagios
* Logging - Logstash, Graylog, Fluentd
* Code review - Gerrit, Review Board
* Code coverage - Cobertura, Clover, JaCoCo
* Issue tracking - Jira, Bugzilla
* Containers and Containers Orchestration - Docker, Podman, Kubernetes, Nomad
* Tests - Robot, Serenity, Gauge
</b></details>
<details>
<summary>A team member of yours, suggests to replace the current CI/CD platform used by the organization with a new one. How would you reply?</summary><br><b>
Things to think about:
* What we gain from doing so? Are there new features in the new platform? Does the new platform deals with some of the limitations presented in the current platform?
* What this suggestion is based on? In other words, did he/she tried out the new platform? Was there extensive technical research?
* What does the switch from one platform to another will require from the organization? For example, training users who use the platform? How much time the team has to invest in such move?
</b></details>
#### Version Control
<details>
<summary>What is Version Control?</summary><br><b>
* Version control is the sytem of tracking and managing changes to software code.
* It helps software teams to manage changes to source code over time.
* Version control also helps developers move faster and allows software teams to preserve efficiency and agility as the team scales to include more developers.
</b></details>
<details>
<summary>What is a commit?</summary><br><b>
* In Git, a commit is a snapshot of your repo at a specific point in time.
* The git commit command will save all staged changes, along with a brief description from the user, in a “commit” to the local repository.
</b></details>
<details>
<summary>What is a merge?</summary><br><b>
* Merging is Git's way of putting a forked history back together again. The git merge command lets you take the independent lines of development created by git branch and integrate them into a single branch.
</b></details>
<details>
<summary>What is a merge conflict?</summary><br><b>
* A merge conflict is an event that occurs when Git is unable to automatically resolve differences in code between two commits. When all the changes in the code occur on different lines or in different files, Git will successfully merge commits without your help.
</b></details>
<details>
<summary>What best practices are you familiar with regarding version control?</summary><br><b>
* Use a descriptive commit message
* Make each commit a logical unit
* Incorporate others' changes frequently
* Share your changes frequently
* Coordinate with your co-workers
* Don't commit generated files
</b></details>
<details>
<summary>Would you prefer a "configuration->deployment" model or "deployment->configuration"? Why?</summary><br><b>
Both have advantages and disadvantages.
With "configuration->deployment" model for example, where you build one image to be used by multiple deployments, there is less chance of deployments being different from one another, so it has a clear advantage of a consistent environment.
</b></details>
<details>
<summary>Explain mutable vs. immutable infrastructure</summary><br><b>
In mutable infrastructure paradigm, changes are applied on top of the existing infrastructure and over time
the infrastructure builds up a history of changes. Ansible, Puppet and Chef are examples of tools which
follow mutable infrastructure paradigm.
In immutable infrastructure paradigm, every change is actually a new infrastructure. So a change
to a server will result in a new server instead of updating it. Terraform is an example of technology
which follows the immutable infrastructure paradigm.
</b></details>
#### Software Distribution
<details>
<summary>Explain "Software Distribution"</summary><br><b>
Read [this](https://venam.nixers.net/blog/unix/2020/03/29/distro-pkgs.html) fantastic article on the topic.
From the article: "Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped."
</b></details>
<details>
<summary>Why are there multiple software distributions? What differences they can have?</summary><br><b>
Different distributions can focus on different things like: focus on different environments (server vs. mobile vs. desktop), support specific hardware, specialize in different domains (security, multimedia, ...), etc. Basically, different aspects of the software and what it supports, get different priority in each distribution.
</b></details>
<details>
<summary>What is a Software Repository?</summary><br><b>
Wikipedia: "A software repository, or “repo” for short, is a storage location for software packages. Often a table of contents is stored, as well as metadata."
Read more [here](https://en.wikipedia.org/wiki/Software_repository)
</b></details>
<details>
<summary>What ways are there to distribute software? What are the advantages and disadvantages of each method?</summary><br><b>
* Source - Maintain build script within version control system so that user can build your app after cloning repository. Advantage: User can quickly checkout different versions of application. Disadvantage: requires build tools installed on users machine.
* Archive - collect all your app files into one archive (e.g. tar) and deliver it to the user. Advantage: User gets everything he needs in one file. Disadvantage: Requires repeating the same procedure when updating, not good if there are a lot of dependencies.
* Package - depends on the OS, you can use your OS package format (e.g. in RHEL/Fefodra it's RPM) to deliver your software with a way to install, uninstall and update it using the standard packager commands. Advantages: Package manager takes care of support for installation, uninstallation, updating and dependency management. Disadvantage: Requires managing package repository.
* Images - Either VM or container images where your package is included with everything it needs in order to run successfully. Advantage: everything is preinstalled, it has high degree of environment isolation. Disadvantage: Requires knowledge of building and optimizing images.
</b></details>
<details>
<summary>Are you familiar with "The Cathedral and the Bazaar models"? Explain each of the models</summary><br><b>
* Cathedral - source code released when software is released
* Bazaar - source code is always available publicly (e.g. Linux Kernel)
</b></details>
<details>
<summary>What is caching? How does it works? Why is it important?</summary><br><b>
Caching is fast access to frequently used resources which are computationally expensive or IO intensive and do not change often. There can be several layers of cache that can start from CPU caches to distributed cache systems. Common ones are in memory caching and distributed caching. <br/> Caches are typically data structures that contains some data, such as a hashtable or dictionary. However, any data structure can provide caching capabilities, like set, sorted set, sorted dictionary etc. While, caching is used in many applications, they can create subtle bugs if not implemented correctly or used correctly. For example,cache invalidation, expiration or updating is usually quite challenging and hard.
</b></details>
<details>
<summary>Explain stateless vs. stateful</summary><br><b>
Stateless applications don't store any data in the host which makes it ideal for horizontal scaling and microservices.
Stateful applications depend on the storage to save state and data, typically databases are stateful applications.
</b></details>
<details>
<summary>What is Reliability? How does it fit DevOps?</summary><br><b>
Reliability, when used in DevOps context, is the ability of a system to recover from infrastructure failure or disruption. Part of it is also being able to scale based on your organization or team demands.
</b></details>
<details>
<summary>What "Availability" means? What means are there to track Availability of a service?</summary><br><b>
</b></details>
<details>
<summary>Why 100% availability isn't a target? Why most companies or teams set it to be 99%.X?</summary><br><b>
</b></details>
<details>
<summary>Describe the workflow of setting up some type of web server (Apache, IIS, Tomcat, ...)</summary><br><b>
</b></details>
<details>
<summary>How a web server works?</summary><br><b>
</b></details>
<details>
<summary>Explain "Open Source"</summary><br><b>
</b></details>
<details>
<summary>Describe me the architecture of service/app/project/... you designed and/or implemented</summary><br><b>
</b></details>
<details>
<summary>What types of tests are you familiar with?</summary><br><b>
Styling, unit, functional, API, integration, smoke, scenario, ...
You should be able to explain those that you mention.
</b></details>
<details>
<summary>You need to install periodically a package (unless it's already exists) on different operating systems (Ubuntu, RHEL, ...). How would you do it?</summary><br><b>
There are multiple ways to answer this question (there is no right and wrong here):
* Simple cron job
* Pipeline with configuration management technology (such Puppet, Ansible, Chef, etc.)
...
</b></details>
<details>
<summary>What is Chaos Engineering?</summary><br><b>
Wikipedia: "Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions"
Read about Chaos Engineering [here](https://en.wikipedia.org/wiki/Chaos_engineering)
</b></details>
<details>
<summary>What is "infrastructure as code"? What implementation of IAC are you familiar with?</summary><br><b>
IAC (infrastructure as code) is a declerative approach of defining infrastructure or architecture of a system. Some implementations are ARM templates for Azure and Terraform that can work across multiple cloud providers.
</b></details>
<details>
<summary>What benefits infrastructure-as-code has?</summary><br><b>
- fully automated process of provisioning, modifying and deleting your infrastructure
- version control for your infrastructure which allows you to quickly rollback to previous versions
- validate infrastructure quality and stability with automated tests and code reviews
- makes infrastructure tasks less repetitive
</b></details>
<details>
<summary>How do you manage build artifacts?</summary><br><b>
Build artifacts are usually stored in a repository. They can be used in release pipelines for deployment purposes. Usually there is retention period on the build artifacts.
</b></details>
<details>
<summary>What Continuous Integration solution are you using/prefer and why?</summary><br><b>
</b></details>
<details>
<summary>What deployment strategies are you familiar with or have used?</summary><br><b>
There are several deployment strategies:
* Rolling
* Blue green deployment
* Canary releases
* Recreate strategy
</b></details>
<details>
<summary>You joined a team where everyone developing one project and the practice is to run tests locally on their workstation and push it to the repository if the tests passed. What is the problem with the process as it is now and how to improve it?</summary><br><b>
</b></details>
<details>
<summary>Explain test-driven development (TDD)</summary><br><b>
</b></details>
<details>
<summary>Explain agile software development</summary><br><b>
</b></details>
<details>
<summary>What do you think about the following sentence?: "implementing or practicing DevOps leads to more secure software"</summary><br><b>
</b></details>
<details>
<summary>Do you know what is a "post-mortem meeting"? What is your opinion on that?</summary><br><b>
</b></details>
<details>
<summary>What is a configuration drift? What problems is it causing?</summary><br><b>
Configuration drift happens when in an environment of servers with the exact same configuration and software, a certain server
or servers are being applied with updates or configuration which other servers don't get and over time these servers become
slightly different than all others.
This situation might lead to bugs which hard to identify and reproduce.
</b></details>
<details>
<summary>How to deal with a configuration drift?</summary><br><b>
Configuration drift can be avoided with desired state configuration (DSC) implementation. Desired state configuration can be a declarative file that defined how a system should be. There are tools to enforce desired state such a terraform or azure dsc. There are incramental or complete strategies.
</b></details>
<details>
<summary>Explain Declarative and Procedural styles. The technologies you are familiar with (or using) are using procedural or declarative style?</summary><br><b>
Declarative - You write code that specifies the desired end state
Procedural - You describe the steps to get to the desired end state
Declarative Tools - Terraform, Puppet, CloudFormation
Procedural Tools - Ansible, Chef
To better emphasize the difference, consider creating two virtual instances/servers.
In declarative style, you would specify two servers and the tool will figure out how to reach that state.
In procedural style, you need to specify the steps to reach the end state of two instances/servers - for example, create a loop and in each iteration of the loop create one instance (running the loop twice of course).
</b></details>
<details>
<summary>Do you have experience with testing cross-projects changes? (aka cross-dependency)</summary><br><b>
Note: cross-dependency is when you have two or more changes to separate projects and you would like to test them in mutual build instead of testing each change separately.
</b></details>
<details>
<summary>Have you contributed to an open source project? Tell me about this experience</summary><br><b>
</b></details>
<details>
<summary>What is Distributed Tracing?</summary><br><b>
</b></details>
<details>
<summary>What is GitOps?</summary><br><b>
GitLab: "GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD tooling, and applies them to infrastructure automation".
Read more [here](https://about.gitlab.com/topics/gitops)
</b></details>
#### SRE
<details>
<summary>What are the differences between SRE and DevOps?</summary><br><b>
Google: "One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel."
Read more about it [here](https://sre.google/sre-book/introduction)
</b></details>
<details>
<summary>What SRE team is responsible for?</summary><br><b>
Google: "the SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services"
Read more about it [here](https://sre.google/sre-book/introduction)
</b></details>
<details>
<summary>What is an error budget?</summary><br><b>
Atlassian: "An error budget is the maximum amount of time that a technical system can fail without contractual consequences."
Read more about it [here](https://www.atlassian.com/incident-management/kpis/error-budget)
</b></details>
<details>
<summary>What do you think about the following statement: "100% is the only right availability target for a system"</summary><br><b>
Wrong. No system can guarantee 100% availability as no system is safe from experiencing zero downtime.
Many systems and services will fall somewhere between 99% and 100% uptime (or at least this is how most systems and services should be).
</b></details>
<details>
<summary>What are MTTF (mean time to failure) and MTTR (mean time to repair)? What these metrics help us to evaluate?</summary><br><b>
* MTTF (mean time to failure) other known as uptime, can be defined as how long the system runs before if fails.
* MTTR (mean time to recover) on the other hand, is the amount of time it takes to repair a broken system.
* MTBF (mean time between failures) is the amount of time between failures of the system.
</b></details>
<details>
<summary>What is the role of monitoring in SRE?</summary><br><b>
Google: "Monitoring is one of the primary means by which service owners keep track of a systems health and availability"
Read more about it [here](https://sre.google/sre-book/introduction)
</b></details>

200
exercises/git/README.md Normal file
View File

@ -0,0 +1,200 @@
## Git
|Name|Topic|Objective & Instructions|Solution|Comments|
|--------|--------|------|----|----|
| My first Commit | Commit | [Exercise](exercises/git/commit_01.md) | [Solution](exercises/git/solutions/commit_01_solution.md) | |
| Time to Branch | Branch | [Exercise](exercises/git/branch_01.md) | [Solution](exercises/git/solutions/branch_01_solution.md) | |
| Squashing Commits | Commit | [Exercise](exercises/git/squashing_commits.md) | [Solution](exercises/git/solutions/squashing_commits.md) | |
<details>
<summary>How do you know if a certain directory is a git repository?</summary><br><b>
You can check if there is a ".git" directory.
</b></details>
<details>
<summary>Explain the following: <code>git directory</code>, <code>working directory</code> and <code>staging area</code></summary><br><b>
This answer taken from [git-scm.com](https://git-scm.com/book/en/v1/Getting-Started-Git-Basics#_the_three_states)
"The Git directory is where Git stores the meta data and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. Its sometimes referred to as the index, but its becoming standard to refer to it as the staging area."
</b></details>
<details>
<summary>What is the difference between <code>git pull</code> and <code>git fetch</code>?</summary><br><b>
Shortly, git pull = git fetch + git merge
When you run git pull, it gets all the changes from the remote or central
repository and attaches it to your corresponding branch in your local repository.
git fetch gets all the changes from the remote repository, stores the changes in
a separate branch in your local repository
</b></details>
<details>
<summary>How to check if a file is tracked and if not, then track it?</summary><br><b>
There are different ways to check whether a file is tracked or not:
- `git ls-file <file>` -> exit code of 0 means it's tracked
- `git blame <file>`
...
</b></details>
<details>
<summary>How can you see which changes have done before committing them?</summary><br><b>
`git diff```
</b></details>
<details>
<summary>What <code>git status</code> does?</summary><br><b>
</b></details>
<details>
<summary>You have two branches - main and devel. How do you make sure devel is in sync with main?</summary><br><b>
```
git checkout main
git pull
git checkout devel
git merge main
```
</b></details>
#### Git - Merge
<details>
<summary>You have two branches - main and devel. How do you put devel into main?</summary><br><b>
git checkout main
git merge devel
git push origin main
</b></details>
<details>
<summary>How to resolve git merge conflicts?</summary><br><b>
<p>
First, you open the files which are in conflict and identify what are the conflicts.
Next, based on what is accepted in your company or team, you either discuss with your
colleagues on the conflicts or resolve them by yourself
After resolving the conflicts, you add the files with `git add <file_name>`
Finally, you run `git rebase --continue`
</p>
</b></details>
<details>
<summary>What merge strategies are you familiar with?</summary><br><b>
Mentioning two or three should be enough and it's probably good to mention that 'recursive' is the default one.
recursive
resolve
ours
theirs
This page explains it the best: https://git-scm.com/docs/merge-strategies
</b></details>
<details>
<summary>Explain Git octopus merge</summary><br><b>
Probably good to mention that it's:
* It's good for cases of merging more than one branch (and also the default of such use cases)
* It's primarily meant for bundling topic branches together
This is a great article about Octopus merge: http://www.freblogg.com/2016/12/git-octopus-merge.html
</b></details>
<details>
<summary>What is the difference between <code>git reset</code> and <code>git revert</code>?</summary><br><b>
<p>
`git revert` creates a new commit which undoes the changes from last commit.
`git reset` depends on the usage, can modify the index or change the commit which the branch head
is currently pointing at.
</p>
</b></details>
#### Git - Rebase
<details>
<summary>You would like to move forth commit to the top. How would you achieve that?</summary><br><b>
Using the `git rebase` command
</b></details>
<details>
<summary>In what situations are you using <code>git rebase</code>?</summary><br><b>
</b></details>
<details>
<summary>How do you revert a specific file to previous commit?</summary><br><b>
```
git checkout HEAD~1 -- /path/of/the/file
```
</b></details>
<details>
<summary>How to squash last two commits?</summary><br><b>
</b></details>
<details>
<summary>What is the <code>.git</code> directory? What can you find there?</summary><br><b>
The <code>.git</code> folder contains all the information that is necessary for your project in version control and all the information about commits, remote repository address, etc. All of them are present in this folder. It also contains a log that stores your commit history so that you can roll back to history.
This info copied from [https://stackoverflow.com/questions/29217859/what-is-the-git-folder](https://stackoverflow.com/questions/29217859/what-is-the-git-folder)
</b></details>
<details>
<summary>What are some Git anti-patterns? Things that you shouldn't do</summary><br><b>
* Not waiting too long between commits
* Not removing the .git directory :)
</b></details>
<details>
<summary>How do you remove a remote branch?</summary><br><b>
You delete a remote branch with this syntax:
git push origin :[branch_name]
</b></details>
<details>
<summary>Are you familiar with gitattributes? When would you use it?</summary><br><b>
gitattributes allow you to define attributes per pathname or path pattern.<br>
You can use it for example to control endlines in files. In Windows and Unix based systems, you have different characters for new lines (\r\n and \n accordingly). So using gitattributes we can align it for both Windows and Unix with `* text=auto` in .gitattributes for anyone working with git. This is way, if you use the Git project in Windows you'll get \r\n and if you are using Unix or Linux, you'll get \n.
</b></details>
<details>
<summary>How do you discard local file changes? (before commit)</summary><br><b>
`git checkout -- <file_name>`
</b></details>
<details>
<summary>How do you discard local commits?</summary><br><b>
`git reset HEAD~1` for removing last commit
If you would like to also discard the changes you `git reset --hard``
</b></details>
<details>
<summary>True or False? To remove a file from git but not from the filesystem, one should use <code>git rm </code></summary><br><b>
False. If you would like to keep a file on your filesystem, use `git reset <file_name>`
</b></details>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,437 @@
## Terraform
<details>
<summary>Explain what Terraform is and how does it works</summary><br><b>
[Terraform.io](https://www.terraform.io/intro/index.html#what-is-terraform-): "Terraform is an infrastructure as code (IaC) tool that allows you to build, change, and version infrastructure safely and efficiently."<br>
</b></details>
<details>
<summary>Why one would prefer using Terraform and not other technologies? (e.g. Ansible, Puppet, CloudFormation)</summary><br><b>
A common *wrong* answer is to say that Ansible and Puppet are configuration management tools
and Terraform is a provisioning tool. While technically true, it doesn't mean Ansible and Puppet can't
be used for provisioning infrastructure. Also, it doesn't explain why Terraform should be used over
CloudFormation if at all.
The benefits of Terraform over the other tools:
* It follows the immutable infrastructure approach which has benefits like avoiding a configuration drift over time
* Ansible and Puppet are more procedural (you mention what to execute in each step) and Terraform is declarative since you describe the overall desired state and not per resource or task. You can give the example of going from 1 to 2 servers in each tool. In Terraform you specify 2, in Ansible and puppet you have to only provision 1 additional server so you need to explicitly make sure you provision only another one server.
</b></details>
<details>
<summary>How do you structure your Terraform projects?</summary><br><b>
terraform_directory
providers.tf -> List providers (source, version, etc.)
variables.tf -> any variable used in other files such as main.tf
main.tf -> Lists the resources
</b></details>
<details>
<summary>True or False? Terraform follows the mutable infrastructure paradigm</summary><br><b>
False. Terraform follows immutable infrastructure paradigm.
</b></details>
<details>
<summary>True or False? Terraform uses declarative style to describe the expected end state</summary><br><b>
True
</b></details>
<details>
<summary>What is HCL?</summary><br><b>
HCL stands for Hashicorp Configuration Language. It is the language Hashicorp made to use as the configuration language for a number of its tools, including terraform.
</b></details>
<details>
<summary>Explain what is "Terraform configuration"</summary><br><b>
A configuration is a root module along with a tree of child modules that are called as dependencies from the root module.
</b></details>
<details>
<summary>Explain what the following commands do:
* <code>terraform init</code>
* <code>terraform plan</code>
* <code>terraform validate</code>
* <code>terraform apply</code>
</summary><br><b>
<code>terraform init</code> scans your code to figure which providers are you using and download them.
<code>terraform plan</code> will let you see what terraform is about to do before actually doing it.
<code>terraform validate</code> checks if configuration is syntactically valid and internally consistent within a directory.
<code>terraform apply</code> will provision the resources specified in the .tf files.
</b></details>
#### Terraform - Resources
<details>
<summary>What is a "resource"?</summary><br><b>
HashiCorp: "Terraform uses resource blocks to manage infrastructure, such as virtual networks, compute instances, or higher-level components such as DNS records. Resource blocks represent one or more infrastructure objects in your Terraform configuration."
</b></details>
<details>
<summary>Explain each part of the following line: `resource "aws_instance" "web_server" {...}`</summary><br><b>
- resource: keyword for defining a resource
- "aws_instance": the type of the resource
- "web_server": the name of the resource
</b></details>
<details>
<summary>What is the ID of the following resource: `resource "aws_instance" "web_server" {...}`</summary><br><b>
`aws_instance.web_server`
</b></details>
<details>
<summary>True or False? Resource ID must be unique within a workspace</summary><br><b>
True
</b></details>
<details>
<summary>Explain each of the following in regards to resources
* Arguments
* Attributes
* Meta-arguments</summary><br><b>
- Arguments: resource specific configurations
- Attributes: values exposed by the resource in a form of `resource_type.resource_name.attribute_name`. They are set by the provider or API usually.
- Meta-arguments: Functions of Terraform to change resource's behaviour
</b></details>
#### Terraform - Providers
<details>
<summary>Explain what is a "provider"</summary><br><b>
[terraform.io](https://www.terraform.io/docs/language/providers/index.html): "Terraform relies on plugins called "providers" to interact with cloud providers, SaaS providers, and other APIs...Each provider adds a set of resource types and/or data sources that Terraform can manage. Every resource type is implemented by a provider; without providers, Terraform can't manage any kind of infrastructure."
</b></details>
<details>
<summary>What is the name of the provider in this case: `resource "libvirt_domain" "instance" {...}`</summary><br><b>
libvirt
</b></details>
#### Terraform - Variables
<details>
<summary>What are Input Variables in Terraform? Why one should use them?</summary><br><b>
Input variables serve as parameters to the module in Terraform. They allow you for example to define once the value of a variable and use that variable in different places in the module so next time you would want to change the value, you will change it in one place instead of changing the value in different places in the module.
</b></details>
<details>
<summary>How to define variables?</summary><br><b>
```
variable "app_id" {
type = string
description = "The id of application"
default = "some_value"
}
```
Usually they are defined in their own file (vars.tf for example).
</b></details>
<details>
<summary>How variables are used in modules?</summary><br><b>
They are referenced with `var.VARIABLE_NAME`
vars.tf:
```
variable "memory" {
type = string
default "8192"
}
variable "cpu" {
type = string
default = "4"
}
```
main.tf:
```
resource "libvirt_domain" "vm1" {
name = "vm1"
memory = var.memory
cpu = var.cpu
}
```
</b></details>
<details>
<summary>How would you enforce users that use your variables to provide values with certain constraints? For example, a number greater than 1</summary><br><b>
Using `validation` block
```
variable "some_var" {
type = number
validation {
condition = var.some_var > 1
error_message = "you have to specify a number greater than 1"
}
}
```
</b></details>
<details>
<summary>What is the effect of setting variable as "sensitive"?</summary><br><b>
It doesn't show its value when you run `terraform apply` or `terraform plan` but eventually it's still recorded in the state file.
</b></details>
<details>
<summary>True or Fales? If an expression's result depends on a sensitive variable, it will be treated as sensitive as well</summary><br><b>
True
</b></details>
<details>
<summary>The same variable is defined in the following places:
- The file `terraform.tfvars`
- Environment variable
- Using `-var` or `-var-file`
According to varaible precedence, which source will be used first?</summary><br><b>
The order is:
- Environment variable
- The file `terraform.tfvars`
- Using `-var` or `-var-file`
</b></details>
<details>
<summary>What other way is there to define lots of variables in more "simplified" way?</summary><br><b>
Using `.tfvars` file which contains variable consists of simple variable names assignments this way:
```
x = 2
y = "mario"
z = "luigi"
```
</b></details>
#### Terraform - State
<details>
<summary>What <code>terraform.tfstate</code> file is used for?</summary><br><b>
It keeps track of the IDs of created resources so that Terraform knows what it's managing.
</b></details>
<details>
<summary>How do you rename an existing resource?</summary><br><b>
terraform state mv
</b></details>
<details>
<summary>Why does it matter where you store the tfstate file? Where would you store it?</summary><br><b>
- tfstate contains credentials in plain text. You don't want to put it in publicly shared location
- tfstate shouldn't be modified concurrently so putting it in a shared location available for everyone with "write" permissions might lead to issues. (Terraform remote state doesn't has this problem).
- tfstate is in important file. As such, it might be better to put it in a location that has regular backups.
As such, tfstate shouldn't be stored in git repositories. secured storage such as secured buckets, is a better option.
</b></details>
<details>
<summary>Which command is responsible for creating state file?</summary><br><b>
- terraform apply file.terraform
- Above command will create tfstate file in the working folder.
</b></details>
<details>
<summary>By default where does the state get stored?</summary><br><b>
- The state is stored by default in a local file named terraform.tfstate.
</b></details>
<details>
<summary>Can we store tfstate file at remote location? If yes, then in which condition you will do this?</summary><br><b>
- Yes, It can also be stored remotely, which works better in a team environment. Given condition that remote location is not publicly accessible since tfstate file contain sensitive information as well. Access to this remote location must be only shared with team members.
</b></details>
<details>
<summary>Mention some best practices related to tfstate</summary><br><b>
- Don't edit it manually. tfstate was designed to be manipulated by terraform and not by users directly.
- Store it in secured location (since it can include credentials and sensitive data in general)
- Backup it regularly so you can roll-back easily when needed
- Store it in remote shared storage. This is especially needed when working in a team and the state can be updated by any of the team members
- Enabled versioning if the storage where you store the state file, supports it. Versioning is great for backups and roll-backs in case of an issue.
</b></details>
<details>
<summary>How and why concurrent edits of the state file should be avoided?</summary><br><b>
If there are two users or processes concurrently editing the state file it can result in invalid state file that doesn't actually represents the state of resources.<br>
To avoid that, Terraform can apply state locking if the backend supports that. For example, AWS s3 supports state locking and consistency via DynamoDB. Often, if the backend support it, Terraform will make use of state locking automatically so nothing is required from the user to activate it.
</b></details>
<details>
<summary>Describe how you manage state file(s) when you have multiple environments (e.g. development, staging and production)</summary><br><b>
There is no right or wrong here, but it seems that the overall preferred way is to have a dedicated state file per environment.
</b></details>
<details>
<summary>How to write down a variable which changes by an external source or during <code>terraform apply</code>?</summary><br><b>
You use it this way: <code>variable “my_var” {}</code>
</b></details>
<details>
<summary>You've deployed a virtual machine with Terraform and you would like to pass data to it (or execute some commands). Which concept of Terraform would you use?</summary><br><b>
[Provisioners](https://www.terraform.io/docs/language/resources/provisioners)
</b></details>
#### Terraform - Provisioners
<details>
<summary>What are "Provisioners"? What they are used for?</summary><br><b>
Provisioners used to execute actions on local or remote machine. It's extremely useful in case you provisioned an instance and you want to make a couple of changes in the machine you've created without manually ssh into it after Terraform finished to run and manually run them.
</b></details>
<details>
<summary>What is <code>local-exec</code> and <code>remote-exec</code> in the context of provisioners?</summary><br><b>
</b></details>
<details>
<summary>What is a "tainted resource"?</summary><br><b>
It's a resource which was successfully created but failed during provisioning. Terraform will fail and mark this resource as "tainted".
</b></details>
<details>
<summary>What <code>terraform taint</code> does?</summary><br><b>
<code>terraform taint resource.id</code> manually marks the resource as tainted in the state file. So when you run <code>terraform apply</code> the next time, the resource will be destroyed and recreated.
</b></details>
<details>
<summary>What types of variables are supported in Terraform?</summary><br><b>
string
number
bool
list(<TYPE>)
set(<TYPE>)
map(<TYPE>)
object({<ATTR_NAME> = <TYPE>, ... })
tuple([<TYPE>, ...])
</b></details>
<details>
<summary>What is a data source? In what scenarios for example would need to use it?</summary><br><b>
Data sources lookup or compute values that can be used elsewhere in terraform configuration.
There are quite a few cases you might need to use them:
* you want to reference resources not managed through terraform
* you want to reference resources managed by a different terraform module
* you want to cleanly compute a value with typechecking, such as with <code>aws_iam_policy_document</code>
</b></details>
<details>
<summary>What are output variables and what <code>terraform output</code> does?</summary><br><b>
Output variables are named values that are sourced from the attributes of a module. They are stored in terraform state, and can be used by other modules through <code>remote_state</code>
</b></details>
<details>
<summary>Explain Modules</summary>
A Terraform module is a set of Terraform configuration files in a single directory. Modules are small, reusable Terraform configurations that let you manage a group of related resources as if they were a single resource. Even a simple configuration consisting of a single directory with one or more .tf files is a module. When you run Terraform commands directly from such a directory, it is considered the root module. So in this sense, every Terraform configuration is part of a module.
</b></details>
<details>
<summary>What is the Terraform Registry?</summary><br><b>
The Terraform Registry provides a centralized location for official and community-managed providers and modules.
</b></details>
<details>
<summary>Explain <code>remote-exec</code> and <code>local-exec</code></summary><br><b>
</b></details>
<details>
<summary>Explain "Remote State". When would you use it and how?</summary><br><b>
Terraform generates a `terraform.tfstate` json file that describes components/service provisioned on the specified provider. Remote
State stores this file in a remote storage media to enable collaboration amongst team.
</b></details>
<details>
<summary>Explain "State Locking"</summary><br><b>
State locking is a mechanism that blocks an operations against a specific state file from multiple callers so as to avoid conflicting operations from different team members. Once the first caller's operation's lock is released the other team member may go ahead to
carryout his own operation. Nevertheless Terraform will first check the state file to see if the desired resource already exist and
if not it goes ahead to create it.
</b></details>
<details>
<summary>What is the "Random" provider? What is it used for</summary><br><b>
The random provider aids in generating numeric or alphabetic characters to use as a prefix or suffix for a desired named identifier.
</b></details>
<details>
<summary>How do you test a terraform module?</summary><br><b>
Many examples are acceptable, but the most common answer would likely to be using the tool <code>terratest</code>, and to test that a module can be initialized, can create resources, and can destroy those resources cleanly.
</b></details>
<details>
<summary>Aside from <code>.tfvars</code> files or CLI arguments, how can you inject dependencies from other modules?</summary><br><b>
The built-in terraform way would be to use <code>remote-state</code> to lookup the outputs from other modules.
It is also common in the community to use a tool called <code>terragrunt</code> to explicitly inject variables between modules.
</b></details>
<details>
<summary>What is Terraform import?</summary><br><b>
Terraform import is used to import existing infrastucture. It allows you to bring resources created by some other means (eg. manually launched cloud resources) and bring it under Terraform management.
</b></details>
<details>
<summary>How do you import existing resource using Terraform import?</summary><br><b>
1. Identify which resource you want to import.
2. Write terraform code matching configuration of that resource.
3. Run terraform command <code>terraform import RESOURCE ID</code><br>
eg. Let's say you want to import an aws instance. Then you'll perform following:
1. Identify that aws instance in console
2. Refer to it's configuration and write Terraform code which will look something like:
```
resource "aws_instance" "tf_aws_instance" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = {
Name = "import-me"
}
}
```
3. Run terraform command <code>terraform import aws_instance.tf_aws_instance i-12345678</code>
</b></details>

View File

@ -1,5 +1,3 @@
#!/bin/bash #!/usr/bin/env bash
# We dont care about non alphanumerics filenames so we just ls | grep to shorten the script. echo $(( $(grep -E "\[Exercise\]|</summary>" -c README.md exercises/*/README.md | awk -F: '{ s+=$2 } END { print s }' )))
echo $(( $(grep \</summary\> -c README.md) + $(grep -i Solution README.md | grep \.md -c) ))