Paper:1508.04025v5.pdf
Code:文章未提供,见 Appendix
核心思想:通过在 Decoder 的每一步使用 Encoder 信息,并对 Encoder 信息赋予不同权重来获得更好的 Decoder 结果。
Paper:1508.04025v5.pdf
Code:文章未提供,见 Appendix
核心思想:通过在 Decoder 的每一步使用 Encoder 信息,并对 Encoder 信息赋予不同权重来获得更好的 Decoder 结果。
Paper:node2vec: Scalable Feature Learning for Networks
核心思想:通过给网络节点的邻居定义一个灵活的概念,并设计了一个能够有效探索邻居多样性的有偏随机游走程序,来学习网络的节点表征。
论文:TextRank: Bringing Order into Texts
代码:networkx/pagerank_alg.py at master · networkx/networkx
核心思想:TextRank 是基于 Google PageRank 的一种关键词(句子)提取方法,它的本质是对文本 Token 按窗口构建节点和边(实际为节点在一定窗口范围内的共现关系),根据 PageRank 得到节点的 Score 排序。
paper: Reformer: The Efficient Transformer
code: trax/trax/models/reformer at master · google/trax
Transformer 在训练时成本过高(尤其是长句子),文章提出两种改进方法:
paper: 1409.0473.pdf
作者猜测 encoder 中使用固定长度的向量(即将句子编码成一个固定长度的向量)可能是 performance 的瓶颈。因此提出一种能够自动 search 源句子中与预测词相关的部分。
1 | $ rustc hello.rs |
Regular comments which are ignored by the compiler:
// Line comments which go to the end of the line.
/* Block comments which go to the closing delimiter. */
Doc comments which are parsed into HTML library documentation:
/// Generate library docs for the following item.
//! Generate library docs for the enclosing item.
Printing is handled by a series of macros
defined in std::fmt
some of which include:
format!
: write formatted text to String
print!
: same as format!
but the text is printed to the console (io::stdout).println!
: same as print!
but a newline is appended.eprint!
: same as format!
but the text is printed to the standard error (io::stderr).eprintln!
: same as eprint!
but a newline is appended.std::fmt
contains many traits
which govern the display of text.
fmt::Debug
: Uses the {:?}
marker. Format text for debugging purposes.fmt::Display
: Uses the {}
marker. Format text in a more elegant, user friendly fashion.Debug
All types which want to use std::fmt
formatting traits
require an implementation to be printable.
All types can derive
(automatically create) the fmt::Debug
implementation. This is not true for fmt::Display
which must be manually implemented.
So fmt::Debug
definitely makes this printable but sacrifices some elegance. Rust also provides “pretty printing” with {:#?}
.
Display
Any new container type which is not generic,fmt::Display
can be implemented.
Testcase: List
?
or try!
can deal with all results:
1 | write!(f, "{}", value)?; |
Formatting
This formatting functionality is implemented via traits, and there is one trait for each argument type. The most common formatting trait is Display
, which handles cases where the argument type is left unspecified: {}
for instance.
format!("{}", foo)
-> "3735928559"
format!("0x{:X}", foo)
-> "0xDEADBEEF"
format!("0o{:o}", foo)
-> "0o33653337357"
Scalar Types
i8
, i16
, i32
, i64
, i128
and isize
(pointer size)u8
, u16
, u32
, u64
, u128
and usize
(pointer size)f32
, f64
char
Unicode scalar values like 'a'
, 'α'
and '∞'
(4 bytes each)bool
either true
or false
()
, whose only possible value is an empty tuple: ()
Compound Types
[1, 2, 3]
(1, true)
Variables can always be type annotated. Numbers may additionally be annotated via a suffix or by default. Integers default to i32
and floats to f64
.
Integer: 0x
, 0o
or 0b
, means 16, 8, 2
Underscores: 1_000
is 1000
, 0.000_001
is 0.000001
We need to tell the compiler the type of the literals we use.
A tuple is a collection of values of different types.
An array is a collection of objects of the same type T
, stored in contiguous memory. Their size, which is known at compile time, is part of their type signature [T; size]
.
A slice is a two-word object, the first word is a pointer to the data, and the second word is the length of the slice. Slices can be used to borrow a section of an array, and have the type signature &[T]
.
struct
: define a structureenum
: define an enumerationConstants can also be created via the const
and static
keywords.
Three types of structures:
The enum
keyword allows the creation of a type which may be one of a few different variants. Any variant which is valid as a struct
is also valid as an enum
.
use
The use
declaration can be used so manual scoping isn’t needed.
Rust has two different types of constants which can be declared in any scope including global. Both require explicit type annotation:
const
: An unchangeable value (the common case).static
: A possibly mut
able variable with 'static
lifetime. The static lifetime is inferred and does not have to be specified. Accessing or modifying a mutable static variable is unsafe
.Rust provides type safety via static typing. Variable bindings can be type annotated when declared.
Variable bindings are immutable by default. They have a scope, and are constrained to live in a block.
It’s possible to declare variable bindings first, and initialize them later. However, this form is seldom used, as it may lead to the use of uninitialized variables.
Rust provides no implicit type conversion (coercion) between primitive types. But, explicit type conversion (casting) can be performed using the as
keyword.
Numeric literals can be type annotated by adding the type as a suffix. The type of unsuffixed numeric literals will depend on how they are used. If no constraint exists, the compiler will use i32
for integers, and f64
for floating-point numbers.
The type
statement can be used to give a new name to an existing type. Types must have CamelCase
names, or the compiler will raise a warning.
The main use of aliases is to reduce boilerplate; for example the IoResult
type is an alias for the Result
type.
Rust addresses conversion between types by the use of traits. The generic conversions will use the From
and Into
traits. However there are more specific ones for the more common cases, in particular when converting to and from String
s.
From
The From
trait allows for a type to define how to create itself from another type, hence providing a very simple mechanism for converting between several types. There are numerous implementations of this trait within the standard library for conversion of primitive and common types.
1 | // str to String |
Into
The Into
trait is simply the reciprocal of the From
trait. That is, if you have implemented the From
trait for your type you get the Into
implementation for free.
Similar to From
and Into
, TryFrom
and TryInto
are generic traits for converting between types. Unlike From
/Into
, the TryFrom
/TryInto
traits are used for fallible conversions, and as such, return Result
s.
Convert to String
To convert any type to a String
is as simple as implementing the ToString
trait for the type.
Rather than doing so directly, you should implement the fmt::Display
trait which automagically provides ToString
and also allows printing the type.
Parsing a String
One of the more common types to convert a string into is a number. The idiomatic approach to this is to use the parse
function and either to arrange for type inference or to specify the type to parse using the ‘turbofish’ syntax.
Blocks are expressions too, so they can be used as values in assignments. The last expression in the block will be assigned to the place expression such as a local variable. However, if the last expression of the block ends with a semicolon, the return value will be ()
.
if
-else
conditionals are expressions, and, all branches must return the same type.
A loop
keyword to indicate an infinite loop.
Nesting and Labels
It’s possible to break
or continue
outer loops when dealing with nested loops. In these cases, the loops must be annotated with some 'label
, and the label must be passed to the break
/continue
statement.
Returning from loops
One of the uses of a loop
is to retry an operation until it succeeds. If the operation returns a value though, you might need to pass it to the rest of the code: put it after the break
, and it will be returned by the loop
expression.
The for in
construct can be used to iterate through an Iterator
. If not specified, the for
loop will apply the into_iter
function on the collection provided to convert the collection into an iterator. This is not the only means to convert a collection into an iterator however, the other functions available include iter
and iter_mut
.
iter
- This borrows each element of the collection through each iteration. into_iter
- This consumes the collection so that on each iteration the exact data is provided.iter_mut
- This mutably borrows each element of the collection, allowing for the collection to be modified in place.For pointers, a distinction needs to be made between destructuring and dereferencing as they are different concepts which are used differently from a language like C
.
*
&
, ref
, and ref mut
A match
guard can be added to filter the arm.
match
provides the @
sigil for binding values to names.
if let
can be used to match any enum valueif let
allows to match enum non-parameterized variants, even if the enum doesn’t #[derive(PartialEq)]
, neither we implement PartialEq
for it. In such case, classic if Foo::Bar==a
fails, because instances of such enum are not comparable for equality. However, if let
works.Functions are declared using the fn
keyword. Its arguments are type annotated, just like variables, and, if the function returns a value, the return type must be specified after an arrow ->
.
The final expression in the function will be used as return value. Alternatively, the return
statement can be used to return a value earlier from within the function, even from inside loops or if
s.
Methods are functions attached to objects. These methods have access to the data of the object and its other methods via the self
keyword. Methods are defined under an impl
block.
Closures in Rust, also called lambda expressions or lambdas, are functions that can capture the enclosing environment. Calling a closure is exactly like calling a function. However, both input and return types can be inferred and input variable names must be specified. Other characteristics of closures include:
||
instead of ()
around input variables.{}
) for a single expression (mandatory otherwise).Closures are inherently flexible and will do what the functionality requires to make the closure work without annotation. This allows capturing to flexibly adapt to the use case, sometimes moving and sometimes borrowing. Closures can capture variables:
&T
&mut T
T
They preferentially capture variables by reference and only go lower when required. Using move
before vertical pipes forces closure to take ownership of captured variables.
When taking a closure as an input parameter, the closure’s complete type must be annotated using one of a few traits
. In order of decreasing restriction, they are:
Fn
: the closure captures by reference (&T
)FnMut
: the closure captures by mutable reference (&mut T
)FnOnce
: the closure captures by value (T
)On a variable-by-variable basis, the compiler will capture variables in the least restrictive manner possible.
Closures succinctly capture variables from enclosing scopes.
1 | // `F` must be generic. |
When a closure is defined, the compiler implicitly creates a new anonymous structure to store the captured variables inside, meanwhile implementing the functionality via one of the traits
: Fn
, FnMut
, or FnOnce
for this unknown type. This type is assigned to the variable which is stored until calling.
Since this new type is of unknown type, any usage in a function will require generics. However, an unbounded type parameter <T>
would still be ambiguous and not be allowed. Thus, bounding by one of the traits
: Fn
, FnMut
, or FnOnce
(which it implements) is sufficient to specify its type.
If a function takes a closure as parameter, then any function that satisfies the trait bound of that closure can be passed as a parameter.
Returning closures as output parameters is possible. However, anonymous closure types are, by definition, unknown, so we have to use impl Trait
to return them. The valid traits for returning a closure are:
Fn
FnMut
FnOnce
Beyond this, the move
keyword must be used, which signals that all captures occur by value. This is required because any captures by reference would be dropped as soon as the function exited, leaving invalid references in the closure.
Functions that take one or more functions and/or produce a more useful function. HOFs and lazy iterators give Rust its functional flavor.
Diverging functions never return. They are marked using !
, which is an empty type.
1 | fn foo() -> ! { |
The main advantage of this type is that it can be cast to any other one and therefore used at places where an exact type is required, for instance in match
branches.
It is also the return type of functions that loop forever (e.g. loop {}
) like network servers or functions that terminates the process (e.g. exit()
).
A module is a collection of items: functions, structs, traits, impl
blocks, and even other modules.
By default, the items in a module have private visibility, but this can be overridden with the pub
modifier. Only the public items of a module can be accessed from outside the module scope.
pub(in path)
only visible within the given path. path
must be a parent or ancestor modulepub(self)
only visible within the current module, which is the same as leaving them private
pub(super)
only visible within the parent module
Private parent items will still restrict the visibility of a child item, even if it is declared as visible within a bigger scope.
Structs have an extra level of visibility with their fields. The visibility defaults to private, and can be overridden with the pub
modifier. This visibility only matters when a struct is accessed from outside the module where it is defined, and has the goal of hiding information (encapsulation).
use
DeclarationThe use
declaration can be used to bind a full path to a new name, for easier access.
1 | // extern crate deeply; // normally, this would exist and not be commented out! |
super
and self
The super
and self
keywords can be used in the path to remove ambiguity when accessing items and to prevent unnecessary hardcoding of paths.
Modules can be mapped to a file/directory hierarchy.
If some_file.rs
has mod
declarations in it, then the contents of the module files would be inserted in places where mod
declarations in the crate file are found, before running the compiler over it. In other words, modules do not get compiled individually, only crates get compiled.
A crate can be compiled into a binary or into a library. By default, rustc
will produce a binary from a crate. This behavior can be overridden by passing the --crate-type
flag to rustc
.
1 | $ rustc --crate-type=lib rary.rs |
Libraries get prefixed with “lib”, and by default they get named after their crate file, but this default name can be overridden using the crate_name
attribute.
extern crate
To link a crate to this new library, the extern crate
declaration must be used. This will not only link the library, but also import all its items under a module named the same as the library. The visibility rules that apply to modules also apply to libraries.
1 | # Where library.rlib is the path to the compiled library, assumed that it's |
cargo
is the official Rust package management tool.
Create a Rust project:
1 | # A binary |
Cargo.toml
config file:
1 | [package] |
cargo
is more than a dependency manager. All of the available configuration options are listed in the format specification of Cargo.toml
.
To build our project we can execute cargo build
anywhere in the project directory (including subdirectories!). We can also do cargo run
to build and run. (Note that it only rebuilds what it has not already built, similar to make
).
More binaries:
1 | foo |
All binaries should under ./bin/
.
Rust has first-class support for unit and integration testing. Organizationally, we can place unit tests in the modules they test and integration tests in their own tests/
directory:
1 | foo |
Cargo may run multiple tests concurrently, so make sure that they don’t race with each other.
1 | [package] |
Cargo provides the script with inputs via environment variables specified here that can be used. The script provides output via stdout. All lines printed are written to target/debug/build//output
.
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
When attributes apply to a whole crate, their syntax is #![crate_attribute]
, and when they apply to a module or item, the syntax is #[item_attribute]
(notice the missing bang !
).
Attributes can take arguments with different syntaxes:
#[attribute = "value"]
#[attribute(key = "value")]
#[attribute(value)]
Attributes can have multiple values and can be separated over multiple lines, too:
1 |
dead_code
The compiler provides a dead_code
lint) that will warn about unused functions. #[allow(dead_code)]
is an attribute that disables the dead_code
lint.
The crate_type
attribute can be used to tell the compiler whether a crate is a binary or a library (and even which type of library), and the crate_name
attribute can be used to set the name of the crate.
However, it is important to note that both the crate_type
and crate_name
attributes have no effect whatsoever when using Cargo, the Rust package manager. Since Cargo is used for the majority of Rust projects, this means real-world uses of crate_type
and crate_name
are relatively limited.
1 | // This crate is a library |
No longer need to pass the --crate-type
flag to rustc
.
1 | $ rustc lib.rs |
cfg
Conditional compilation is possible through two different operators:
cfg
attribute: #[cfg(...)]
in attribute positioncfg!
macro: cfg!(...)
in boolean expressionsCustom conditionals must be passed to rustc
using the --cfg
flag.
1 |
|
Run:
1 | $ rustc --cfg some_condition custom.rs && ./custom |
Generics is the topic of generalizing types and functionalities to broader cases. The simplest and most common use of generics is for type parameters. Generic type parameters” are typically represented as <T>
.
In Rust, “generic” also describes anything that accepts one or more generic type parameters <T>
. Any type specified as a generic type parameter is generic, and everything else is concrete (non-generic).
The same set of rules can be applied to functions: a type T
becomes generic when preceded by <T>
.
A function call with explicitly specified type parameters looks like: fun::<A, B, ...>()
.
Similar to functions, implementations require care to remain generic.
1 | struct S; |
trait
s can also be generic.
1 | // A trait generic over `T`. |
When working with generics, the type parameters often must use traits as bounds to stipulate what functionality a type implements.
1 | // Define a function `printer` that takes a generic type `T` which |
Bounding restricts the generic to types that conform to the bounds.
1 | struct S<T: Display>(T); |
Another effect of bounding is that generic instances are allowed to access the methods of traits specified in the bounds.
As an additional note, where
clauses can also be used to apply bounds in some cases to be more expressive.
Even if a trait
doesn’t include any functionality, you can still use it as a bound.
Multiple bounds can be applied with a +
. Like normal, different types are separated with ,
.
A bound can also be expressed using a where
clause immediately before the opening {
, rather than at the type’s first mention. Additionally, where
clauses can apply bounds to arbitrary types, rather than just to type parameters.
Some cases that a where
clause is useful:
When specifying generic types and bounds separately is clearer
1 | impl <A: TraitB + TraitC, D: TraitE + TraitF> MyTrait<A, D> for YourType {} |
When using a where
clause is more expressive than using normal syntax
The newtype
idiom gives compile time guarantees that the right type of value is supplied to a program.
“Associated Items” refers to a set of rules pertaining to item
s of various types. It is an extension to trait
generics, and allows trait
s to internally define new items.
A trait
that is generic over its container type has type specification requirements - users of the trait
must specify all of its generic types. The use of “Associated types” improves the overall readability of code by moving inner types locally into a trait as output types.
1 | // `A` and `B` are defined in the trait via the `type` keyword. |
A phantom type parameter is one that doesn’t show up at runtime, but is checked statically (and only) at compile time.
Data types can use extra generic type parameters to act as markers or to perform type checking at compile time. These extra parameters hold no storage values, and have no runtime behavior.
Rust enforces RAII (Resource Acquisition Is Initialization), so whenever an object goes out of scope, its destructor is called and its owned resources are freed.
Double check for memory errors using valgrind
: rustc raii.rs && valgrind ./raii
The notion of a destructor in Rust is provided through the Drop
trait. This trait is not required to be implemented for every type, only implement it for your type if you require its own destructor logic.
Because variables are in charge of freeing their own resources, resources can only have one owner.
When doing assignments (let x = y
) or passing function arguments by value (foo(x)
), the ownership of the resources is transferred. In Rust-speak, this is known as a move.
Mutability of data can be changed when ownership is transferred.
Most of the time, we’d like to access data without taking ownership over it. To accomplish this, Rust uses a borrowing mechanism. Instead of passing objects by value (T
), objects can be passed by reference (&T
).
The compiler statically guarantees (via its borrow checker) that references always point to valid objects.
Mutable data can be mutably borrowed using &mut T
. This is called a mutable reference and gives read/write access to the borrower. In contrast, &T
borrows the data via an immutable reference, and the borrower can read the data but not modify it.
When data is immutably borrowed, it also freezes. Frozen data can’t be modified via the original object until all references to it go out of scope.
Data can be immutably borrowed any number of times, but while immutably borrowed, the original data can’t be mutably borrowed. On the other hand, only one mutable borrow is allowed at a time. The original data can be borrowed again only after the mutable reference has been used for the last time.
When doing pattern matching or destructuring via the let
binding, the ref
keyword can be used to take references to the fields of a struct/tuple.
A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid. No names or types are assigned to label lifetimes.
The borrow checker uses explicit lifetime annotations to determine how long references should be valid.
1 | foo<'a> |
Use lifetimes requires generics, this lifetime syntax indicates that the lifetime of foo
may not exceed that of 'a
. Explicit annotation of a type has the form &'a T
where 'a
has already been introduced.
Function signatures with lifetimes have a few constraints:
static
.Additionally, returning references without input is banned if it would result in returning references to invalid data.
T: 'a
: All references in T
must outlive lifetime 'a
.
T: Trait + 'a
: Type T
must implement trait Trait
and all references in T
must outlive 'a
.
A 'static
lifetime is the longest possible lifetime, and lasts for the lifetime of the running program. A 'static
lifetime may also be coerced to a shorter lifetime. There are two ways to make a variable with 'static
lifetime, and both are stored in the read-only memory of the binary:
static
declaration.string
literal which has type: &'static str
.Some lifetime patterns are overwhelmingly common and so the borrow checker will allow you to omit them to save typing and to improve readability. This is known as elision. Elision exists in Rust solely because these patterns are common.
A trait
is a collection of methods defined for an unknown type: Self
. They can access other methods declared in the same trait. Traits can be implemented for any data type.
The compiler is capable of providing basic implementations for some traits via the #[derive]
attribute.
Eq
, PartialEq
, Ord
, PartialOrd
.Clone
, to create T
from &T
via a copy.Copy
, to give a type ‘copy semantics’ instead of ‘move semantics’.Hash
, to compute a hash from &T
.Default
, to create an empty instance of a data type.Debug
, to format a value using the {:?}
formatter.dyn
The Rust compiler needs to know how much space every function’s return type requires. This means all your functions have to return a concrete type. A trait like Animal
cannot return, because different implementations will need different amounts of memory.
However, there’s an easy workaround. Instead of returning a trait object directly, our functions return a Box
which contains some Animal
. A box
is just a reference to some memory in the heap.
In Rust, many of the operators can be overloaded via traits. For example, the +
operator in a + b
calls the add
method (as in a.add(b)
). This add
method is part of the Add
trait. Hence, the +
operator can be used by any implementor of the Add
trait.
The Drop
trait only has one method: drop
, which is called automatically when an object goes out of scope.
Box
, Vec
, String
, File
, and Process
are some examples of types that implement the Drop
trait to free resources. The Drop
trait can also be manually implemented for any custom data type.
The Iterator
trait is used to implement iterators over collections such as arrays. The trait requires only a method to be defined for the next
element, which may be manually defined in an impl
block or automatically defined (as in arrays and ranges).
impl Trait
If your function returns a type that implements MyTrait
, you can write its return type as -> impl MyTrait
.
Rust doesn’t have “inheritance”, but you can define a trait as being a superset of another trait.
A type can implement many different traits. Fully Qualified Syntax is used to disambiguate those methods.
Macros look like functions, except that their name ends with a bang !
, but instead of generating a function call, macros are expanded into source code that gets compiled with the rest of the program. However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don’t get unexpected precedence bugs.
println!
which could take any number of arguments, depending on the format string!.The arguments of a macro are prefixed by a dollar sign $
and type annotated with a designator. Some of the available designators:
block
expr
is used for expressionsident
is used for variable/function namesitem
literal
is used for literal constantspat
(pattern)path
stmt
(statement)tt
(token tree)ty
(type)vis
(visibility qualifier)Macros can be overloaded to accept different combinations of arguments. In that regard, macro_rules!
can work similarly to a match block.
Macros can use +
in the argument list to indicate that an argument may repeat at least once, or *
, to indicate that the argument may repeat zero or more times.
An explicit panic
is mainly useful for tests and dealing with unrecoverable errors. For prototyping it can be useful, for example when dealing with functions that haven’t been implemented yet, but in those cases the more descriptive unimplemented
is better. In tests panic
is a reasonable way to explicitly fail.
The Option
type is for when a value is optional or when the lack of a value is not an error condition. For example the parent of a directory - /
and C:
don’t have one. When dealing with Option
s, unwrap
is fine for prototyping and cases where it’s absolutely certain that there is guaranteed to be a value. However expect
is more useful since it lets you specify an error message in case something goes wrong anyway.
When there is a chance that things do go wrong and the caller has to deal with the problem, use Result
. You can unwrap
and expect
them as well (please don’t do that unless it’s a test or quick prototype).
panic
It prints an error message, starts unwinding the stack, and usually exits the program.
Option
vs unwrap
?
You can unpack Option
s by using match
statements, but it’s often easier to use the ?
operator. If x
is an Option
, then evaluating x?
will return the underlying value if x
is Some
, otherwise it will terminate whatever function is being executed and return None
.
map
match
is a valid method for handling Option
s. However, you may eventually find heavy usage tedious, especially with operations only valid with an input. In these cases, combinators can be used to manage control flow in a modular fashion.
and_then
map()
was described as a chainable way to simplify match
statements. However, using map()
on a function that returns an Option
results in the nested Option<Option<Food>>
. Chaining multiple calls together can then become confusing. and_then()
calls its function input with the wrapped value and returns the result. If the Option
is None
, then it returns None
instead.
Result
Result
is a richer version of the Option
type that describes possible error instead of possible absence.
Ok(T)
: An element T
was found
Err(E)
: An error was found with element E
map
for Result
Option
‘s map
, and_then
, and many other combinators are also implemented for Result
.
Result
Errors found in a specific module often have the same Err
type, so a single alias can succinctly define all associated Results
. This is so useful that the std
library even supplies one: io::Result
!
We can simply stop executing the function and return the error if one occurs. For some, this form of code can be easier to both read and write.
?
Sometimes we just want the simplicity of unwrap
without the possibility of a panic
.
Upon finding an Err
, there are two valid actions to take:
panic!
which we already decided to try to avoid if possiblereturn
because an Err
means it cannot be handled?
is almost1 exactly equivalent to an unwrap
which return
s instead of panic
king on Err
s.
Sometimes an Option
needs to interact with a Result
, or a Result
needs to interact with a Result
.
Result
s out of Option
sThe most basic way of handling mixed error types is to just embed them in each other.
There are times when we’ll want to stop processing on errors (like with ?
) but keep going when the Option
is None
. A couple of combinators come in handy to swap the Result
and Option
.
Sometimes it simplifies the code to mask all of the different errors with a single type of error. Rust allows us to define our own error types. In general, a “good” error type:
Err(EmptyVec)
Err("Please use a vector with at least one element".to_owned())
Err(BadChar(c, position))
Err("+ cannot be used here".to_owned())
Box
ing errorsA way to write simple code while preserving the original errors is to Box
them. The drawback is that the underlying error type is only known at runtime and not statically determined.
?
?
was previously explained as either unwrap
or return Err(err)
. This is only mostly true. It actually means unwrap
or return Err(From::from(err))
. Since From::from
is a conversion utility between different types, this means that if you ?
where the error is convertible to the return type, it will convert automatically.
An alternative to boxing errors is to wrap them in your own error type.
Result
sfilter_map
calls a function and filters out the results that are None
.
Result
implements FromIter
so that a vector of results (Vec>
) can be turned into a result with a vector (Result, E>
). Once an Result::Err
is found, the iteration will terminate. This same technique can be used with Option
.
String
s like: "hello world"
[1, 2, 3]
Option
Result
Box
All values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a Box
. A box is a smart pointer to a heap allocated value of type T
. When a box goes out of scope, its destructor is called, the inner object is destroyed, and the memory on the heap is freed.
Boxed values can be dereferenced using the *
operator; this removes one layer of indirection.
Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time. A vector is represented using 3 parameters:
There are two types of strings in Rust: String
and &str
.
A String
is stored as a vector of bytes (Vec
), but guaranteed to always be a valid UTF-8 sequence. String
is heap allocated, growable and not null terminated.
&str
is a slice (&[u8]
) that always points to a valid UTF-8 sequence, and can be used to view into a String
, just like &[T]
is a view into Vec
.
Sometimes it’s desirable to catch the failure of some parts of a program instead of calling panic!
. The Option
enum has two variants:
None
, to indicate failure or lack of valueSome(value)
, a tuple struct that wraps a value
with type T
Sometimes it is important to express why an operation failed. The Result
enum has two variants:
Ok(value)
which indicates that the operation succeeded, and wraps the value
returned by the operation. (value
has type T
)Err(why)
, which indicates that the operation failed, and wraps why
, which (hopefully) explains the cause of the failure. (why
has type E
)?
is used at the end of an expression returning a Result
, and is equivalent to a match expression, where the Err(err)
branch expands to an early Err(From::from(err))
, and the Ok(ok)
branch expands to an ok
expression.
panic
The panic!
macro can be used to generate a panic and start unwinding its stack. While unwinding, the runtime will take care of freeing all the resources owned by the thread by calling the destructor of all its objects.
Where vectors store values by an integer index, HashMap
s store values by key. HashMap
keys can be booleans, integers, strings, or any other type that implements the Eq
and Hash
traits.
Create a HashMap with a certain starting capacity using HashMap::with_capacity(uint)
, or use HashMap::new()
to get a HashMap with a default initial capacity (recommended).
Any type that implements the Eq
and Hash
traits can be a key in HashMap
. This includes:
bool
(though not very useful since there is only two possible keys)int
, uint
, and all variations thereofString
and &str
(protip: you can have a HashMap
keyed by String
and call .get()
with an &str
)All collection classes implement Eq
and Hash
if their contained type also respectively implements Eq
and Hash
. For example, Vec
will implement Hash
if T
implements Hash
.
You can easily implement Eq
and Hash
for a custom type with just one line: #[derive(PartialEq, Eq, Hash)]
Consider a HashSet
as a HashMap
where we just care about the keys ( HashSet
is, in actuality, just a wrapper around HashMap
). Sets have 4 primary operations (all of the following calls return an iterator):
union
: get all the unique elements in both sets.difference
: get all the elements that are in the first set but not the second.intersection
: get all the elements that are only in both sets.symmetric_difference
: get all the elements that are in one set or the other, but not bothWhen multiple ownership is needed, Rc
(Reference Counting) can be used. Rc
keeps track of the number of the references which means the number of owners of the value wrapped inside an Rc
.
Reference count of an Rc
increases by 1 whenever an Rc
is cloned, and decreases by 1 whenever one cloned Rc
is dropped out of the scope. Cloning an Rc
never do a deep copy. Cloning creates just another pointer to the wrapped value, and increments the count.
The standard library provides great threading primitives out of the box. These, combined with Rust’s concept of Ownership and aliasing rules, automatically prevent data races.
Although we’re passing references across thread boundaries, Rust understands that we’re only passing read-only references, and that thus no unsafety or data races can occur. Because we’re move
-ing the data segments into the thread, Rust will also ensure the data is kept alive until the threads exit, so no dangling pointers occur.
Channels allow a unidirectional flow of information between two end-points: the Sender
and the Receiver
.
The Path
struct represents file paths in the underlying filesystem. There are two flavors of Path
: posix::Path
, for UNIX-like systems, and windows::Path
, for Windows. The prelude exports the appropriate platform-specific Path
variant.
Note that a Path
is not internally represented as an UTF-8 string, but instead is stored as a vector of bytes (Vec
). Therefore, converting a Path
to a &str
is not free and may fail (an Option
is returned).
The File
struct represents a file that has been opened (it wraps a file descriptor), and gives read and/or write access to the underlying file. All the File
methods return the io::Result
type, which is an alias for Result
.
The method lines()
returns an iterator over the lines of a file. File::open
expects a generic, AsRef
. That’s what read_lines()
expects as input.
The process::Output
struct represents the output of a finished child process, and the process::Command
struct is a process builder.
The std::Child
struct represents a running child process, and exposes the stdin
, stdout
and stderr
handles for interaction with the underlying process via pipes.
Wait for a process::Child
to finish, you must call Child::wait
, which will return a process::ExitStatus
.
The command line arguments can be accessed using std::env::args
, which returns an iterator that yields a String
for each argument.Matching can be used to parse simple arguments.
Rust provides a Foreign Function Interface (FFI) to C libraries. Foreign functions must be declared inside an extern
block annotated with a #[link]
attribute containing the name of the foreign library.
Most unit tests go into a tests
mod with the #[cfg(test)]
attribute. Test functions are marked with the #[test]
attribute.
Tests fail when something in the test function panics. There are some helper macros:
assert!(expression)
- panics if expression evaluates to false
.assert_eq!(left, right)
and assert_ne!(left, right)
- testing left and right expressions for equality and inequality respectively.In Rust 2018, your unit tests can return Result<()>
, which lets you use ?
in them!
To check functions that should panic under certain circumstances, use attribute #[should_panic]
. This attribute accepts optional parameter expected =
with the text of the panic message. If your function can panic in multiple ways, it helps make sure your test is testing the correct panic.
Tests can be marked with the #[ignore]
attribute to exclude some tests. Or to run them with command cargo test -- --ignored
The primary way of documenting a Rust project is through annotating the source code. Documentation comments are written in markdown and support code blocks in them. Rust takes care about correctness, so these code blocks are compiled and used as tests.
The main purpose of documentation tests is to serve as examples that exercise the functionality, which is one of the most important guidelines. It allows using examples from docs as complete code snippets. But using ?
makes compilation fail since main
returns unit
. The ability to hide some source lines from documentation comes to the rescue: one may write fn try_main() -> Result<(), ErrorType>
, hide it and unwrap
it in hidden main
.
Unit tests are testing one module in isolation at a time: they’re small and can test private code. Integration tests are external to your crate and use only its public interface in the same way any other code would. Their purpose is to test that many parts of your library work correctly together.
Sometimes there is a need to have dependencies for tests (examples, benchmarks) only. Such dependencies are added to Cargo.toml
in the [dev-dependencies]
section. These dependencies are not propagated to other packages which depend on this package.
one should try to minimize the amount of unsafe code in a code base.
Unsafe annotations in Rust are used to bypass protections put in place by the compiler; specifically, there are four primary things that unsafe is used for:
unsafe
(including calling a function over FFI)Raw Pointers
Raw pointers *
and references &T
function similarly, but references are always safe because they are guaranteed to point to valid data due to the borrow checker. Dereferencing a raw pointer can only be done through an unsafe block.
Calling Unsafe Functions
Some functions can be declared as unsafe
, meaning it is the programmer’s responsibility to ensure correctness instead of the compiler’s. One example of this is std::slice::from_raw_parts
which will create a slice given a pointer to the first element and a length.
Raw identifiers let you use keywords where they would not normally be allowed. This is particularly useful when Rust introduces new keywords, and a library using an older edition of Rust has a variable or function with the same name as a keyword introduced in a newer edition.
rustdoc
.Use cargo doc
to build documentation in target/doc
.
Use cargo test
to run all tests (including documentation tests), and cargo test --doc
to only run documentation tests.
These commands will appropriately invoke rustdoc
(and rustc
) as required.
Doc comments
Doc comments are very useful for big projects that require documentation. When running Rustdoc, these are the comments that get compiled into documentation. They are denoted by a ///
, and support Markdown.
To run the tests, first build the code as a library, then tell rustdoc where to find the library so it can link it into each doctest program.
Encoder takes an input sequence and create a contextualized representation of it, then passed to a decoder which generates a task-specific output sequence.
pub use
cargo yank
cargo install
In addition to grouping functionality, encapsulating implementation details lets you reuse code at a higher level: once you’ve implemented an operation, other code can call that code via the code’s public interface without knowing how the implementation works. The way you write code defines which parts are public for other code to use and which parts are private implementation details that you reserve the right to change. This is another way to limit the amount of detail you have to keep in your head.
1 | $ cargo new proj |