docs: Add 'Customizing library models for Rust' documentation#21727
docs: Add 'Customizing library models for Rust' documentation#21727coadaflorin wants to merge 11 commits intomainfrom
Conversation
Add documentation for customizing library models for Rust using data extension files. This follows the pattern of existing documentation for other languages (Java, Python, Ruby, Go, C#, C++, JavaScript). The documentation covers: - Rust-specific extensible predicates (sourceModel, sinkModel, summaryModel, neutralModel) with their simplified schema - Canonical path syntax for identifying Rust functions and methods - Examples using real models from the codebase (sqlx, reqwest, std::env, std::path, Iterator::map) - Access path token reference (Argument, Parameter, ReturnValue, Element, Field, Reference, Future) - Source and sink kind reference - Threat model integration Also updates codeql-for-rust.rst to include the new page in the toctree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add barrierModel and barrierGuardModel sections to the Rust library models documentation, following the pattern established in PR #21523 for other languages. Includes: - New extensible predicate descriptions in the overview - Example: barrier for SQL injection using escape_sql - Example: barrier guard for path injection using is_safe_path - Reference material for both barrierModel and barrierGuardModel Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
geoffw0
left a comment
There was a problem hiding this comment.
Partially reviewed. I need to continue from "Examples of custom model definitions", then check final rendering and links. We will also want a docs team review at some point.
| - **Free functions**: ``crate::module::function``, for example ``std::env::var`` or ``std::fs::read_to_string``. | ||
| - **Inherent methods**: ``<Type>::method``, for example ``<std::fs::File>::open``. | ||
| - **Trait methods with a concrete type**: ``<Type as Trait>::method``, for example ``<std::fs::File as std::io::Read>::read_to_end``. | ||
| - **Trait methods with a wildcard type**: ``<_ as Trait>::method``, for example ``<_ as core::clone::Clone>::clone``. This form matches any type that implements the trait and is useful for modeling broadly applicable trait methods. |
There was a problem hiding this comment.
I don't see this section in the doc for other languages, I think Copilot may have synthesised it entirely ... but it looks really helpful, and as far as I can tell, correct.
There was a problem hiding this comment.
As someone with no familiarity with rust, it looks helpful to me. (Assuming it's correct.)
There was a problem hiding this comment.
Correct. But perhaps worth mentioning that any concrete models <Foo as core::clone::Clone>::clone will take precedence.
There was a problem hiding this comment.
Does that only apply if the concrete model has the exact same input / output / kind parameters?
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Add the 'Publish data extension files in a CodeQL model pack to share' section, matching the structure used in C#, C++, Go, and Java docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new Rust language guide page describing how to write CodeQL data extensions for Rust library modeling, and wires it into the Rust docs index.
Changes:
- Added a new documentation page describing Rust-specific modeling concepts (canonical paths, access paths, and extensible predicates).
- Added the new page to the Rust language guide toctree and link list.
Show a summary per file
| File | Description |
|---|---|
| docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rst | New guide page explaining how to create Rust library models with data extensions. |
| docs/codeql/codeql-language-guides/codeql-for-rust.rst | Adds the new guide page to the Rust documentation navigation. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
| It would also be possible to merge the two rows into one by using a comma-separated list in the second value: | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| extensions: | ||
| - addsTo: | ||
| pack: codeql/rust-all | ||
| extensible: summaryModel | ||
| data: | ||
| - ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"] | ||
|
|
||
| This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``. | ||
|
|
||
| .. note:: | ||
|
|
||
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. |
There was a problem hiding this comment.
This merged example changes the meaning compared to the two-row version above. In the two-row version, the receiver flow is modeled from Argument[self].Reference (and the text states this is needed because join takes &self), but the merged form drops .Reference entirely. Either remove this merged example, or adjust it so it preserves the same access-path semantics (and clarify any limitations if the shorthand can’t express per-operand tokens).
| It would also be possible to merge the two rows into one by using a comma-separated list in the second value: | |
| .. code-block:: yaml | |
| extensions: | |
| - addsTo: | |
| pack: codeql/rust-all | |
| extensible: summaryModel | |
| data: | |
| - ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"] | |
| This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``. | |
| .. note:: | |
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. | |
| In this case, the two rows should not be merged into one by using a comma-separated list in the second value. | |
| The receiver flow is modeled as ``Argument[self].Reference``, while the first argument is modeled as | |
| ``Argument[0]``. Since these access paths are different, keeping them as separate rows preserves the | |
| correct semantics for ``Path::join``. | |
| .. note:: | |
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. If different operands require different access-path tokens, model them using separate rows instead of a single comma-separated shorthand. |
geoffw0
left a comment
There was a problem hiding this comment.
More reviewing. I still need to continue from "Example: Add a neutral model"...
| Examples of custom model definitions | ||
| ------------------------------------- | ||
|
|
||
| The examples in this section are based on models from the standard CodeQL Rust query pack published by GitHub. They demonstrate how to add tuples to extend extensible predicates that are used by the standard queries. |
There was a problem hiding this comment.
I'm not sure why we say that the models are in the query pack. They're in the library pack. But we use the same phrasing in several other languages where this is the case.
I'm not an expert on packaging though.
There was a problem hiding this comment.
That's a good point. The docs does cover this and mention that a query pack does include the libraries & model packs too https://docs.github.com/en/code-security/concepts/code-scanning/codeql/codeql-query-packs
There was a problem hiding this comment.
OK, it may be technically accurate then. As I said, I'm not an expert on packaging.
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
geoffw0
left a comment
There was a problem hiding this comment.
Finished my initial review. I'm happy to approve once my comments are answered, though we should also get docs approval before merging. I have not checked the final rendering and in particular that all links work.
|
|
||
| - **path**: Canonical path of the function or method. | ||
| - **input**: Access path to the input whose flow is blocked. | ||
| - **acceptingValue**: The value that the conditional check must return for the barrier to apply. Usually ``"true"`` or ``"false"``. |
There was a problem hiding this comment.
I've never seen values other than true or false as an accepting value in a barrier guard. @hvitved are other values possible?
| - **acceptingValue**: The value that the conditional check must return for the barrier to apply. Usually ``"true"`` or ``"false"``. | |
| - **acceptingValue**: The value that the conditional check must return for the barrier to apply. Either ``"true"`` or ``"false"``. |
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Summary
Adds a new documentation page: Customizing library models for Rust, following the pattern of existing documentation for other languages:
What's included
The documentation covers Rust-specific concepts:
sourceModel,sinkModel,summaryModel,neutralModelwith Rust's simplified 3-5 column schema (vs Java/Go's 9-10 column schema)crate::module::function,<Type>::method,<Type as Trait>::method,<_ as Trait>::method)Reference(for&T),Future(for async),Fieldwith Rust enum variant syntaxsqlxreqwest::getstd::env::varreqwest::Response::text(async)std::path::Path::join(multiple inputs)Iterator::map(higher-order, wildcard trait)Option::mapChanges
docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rstdocs/codeql/codeql-language-guides/codeql-for-rust.rst— added toctree entry and description