Paper:node2vec: Scalable Feature Learning for Networks
核心思想:通过给网络节点的邻居定义一个灵活的概念,并设计了一个能够有效探索邻居多样性的有偏随机游走程序,来学习网络的节点表征。
Paper:node2vec: Scalable Feature Learning for Networks
核心思想:通过给网络节点的邻居定义一个灵活的概念,并设计了一个能够有效探索邻居多样性的有偏随机游走程序,来学习网络的节点表征。
论文:TextRank: Bringing Order into Texts
代码:networkx/pagerank_alg.py at master · networkx/networkx
核心思想:TextRank 是基于 Google PageRank 的一种关键词(句子)提取方法,它的本质是对文本 Token 按窗口构建节点和边(实际为节点在一定窗口范围内的共现关系),根据 PageRank 得到节点的 Score 排序。
paper: Reformer: The Efficient Transformer
code: trax/trax/models/reformer at master · google/trax
Transformer 在训练时成本过高(尤其是长句子),文章提出两种改进方法:
paper: 1409.0473.pdf
作者猜测 encoder 中使用固定长度的向量(即将句子编码成一个固定长度的向量)可能是 performance 的瓶颈。因此提出一种能够自动 search 源句子中与预测词相关的部分。
1 | $ rustc hello.rs |
Regular comments which are ignored by the compiler:
// Line comments which go to the end of the line./* Block comments which go to the closing delimiter. */Doc comments which are parsed into HTML library documentation:
/// Generate library docs for the following item.//! Generate library docs for the enclosing item.Printing is handled by a series of macros defined in std::fmt some of which include:
format!: write formatted text to Stringprint!: same as format! but the text is printed to the console (io::stdout).println!: same as print! but a newline is appended.eprint!: same as format! but the text is printed to the standard error (io::stderr).eprintln!: same as eprint!but a newline is appended.std::fmt contains many traits which govern the display of text.
fmt::Debug: Uses the {:?} marker. Format text for debugging purposes.fmt::Display: Uses the {} marker. Format text in a more elegant, user friendly fashion.Debug
All types which want to use std::fmt formatting traits require an implementation to be printable.
All types can derive (automatically create) the fmt::Debug implementation. This is not true for fmt::Display which must be manually implemented.
So fmt::Debug definitely makes this printable but sacrifices some elegance. Rust also provides “pretty printing” with {:#?}.
Display
Any new container type which is not generic,fmt::Display can be implemented.
Testcase: List
? or try! can deal with all results:
1 | write!(f, "{}", value)?; |
Formatting
This formatting functionality is implemented via traits, and there is one trait for each argument type. The most common formatting trait is Display, which handles cases where the argument type is left unspecified: {} for instance.
format!("{}", foo) -> "3735928559"format!("0x{:X}", foo) -> "0xDEADBEEF"format!("0o{:o}", foo) -> "0o33653337357"Scalar Types
i8, i16, i32, i64, i128 and isize (pointer size)u8, u16, u32, u64, u128 and usize (pointer size)f32, f64char Unicode scalar values like 'a', 'α' and '∞' (4 bytes each)bool either true or false(), whose only possible value is an empty tuple: ()Compound Types
[1, 2, 3](1, true)Variables can always be type annotated. Numbers may additionally be annotated via a suffix or by default. Integers default to i32 and floats to f64.
Integer: 0x, 0o or 0b, means 16, 8, 2
Underscores: 1_000 is 1000, 0.000_001 is 0.000001
We need to tell the compiler the type of the literals we use.
A tuple is a collection of values of different types.
An array is a collection of objects of the same type T, stored in contiguous memory. Their size, which is known at compile time, is part of their type signature [T; size].
A slice is a two-word object, the first word is a pointer to the data, and the second word is the length of the slice. Slices can be used to borrow a section of an array, and have the type signature &[T].
struct: define a structureenum: define an enumerationConstants can also be created via the const and static keywords.
Three types of structures:
The enum keyword allows the creation of a type which may be one of a few different variants. Any variant which is valid as a struct is also valid as an enum.
use
The use declaration can be used so manual scoping isn’t needed.
Rust has two different types of constants which can be declared in any scope including global. Both require explicit type annotation:
const: An unchangeable value (the common case).static: A possibly mutable variable with 'static lifetime. The static lifetime is inferred and does not have to be specified. Accessing or modifying a mutable static variable is unsafe.Rust provides type safety via static typing. Variable bindings can be type annotated when declared.
Variable bindings are immutable by default. They have a scope, and are constrained to live in a block.
It’s possible to declare variable bindings first, and initialize them later. However, this form is seldom used, as it may lead to the use of uninitialized variables.
Rust provides no implicit type conversion (coercion) between primitive types. But, explicit type conversion (casting) can be performed using the as keyword.
Numeric literals can be type annotated by adding the type as a suffix. The type of unsuffixed numeric literals will depend on how they are used. If no constraint exists, the compiler will use i32 for integers, and f64 for floating-point numbers.
The type statement can be used to give a new name to an existing type. Types must have CamelCase names, or the compiler will raise a warning.
The main use of aliases is to reduce boilerplate; for example the IoResult type is an alias for the Result type.
Rust addresses conversion between types by the use of traits. The generic conversions will use the From and Into traits. However there are more specific ones for the more common cases, in particular when converting to and from Strings.
From
The From trait allows for a type to define how to create itself from another type, hence providing a very simple mechanism for converting between several types. There are numerous implementations of this trait within the standard library for conversion of primitive and common types.
1 | // str to String |
Into
The Into trait is simply the reciprocal of the From trait. That is, if you have implemented the From trait for your type you get the Into implementation for free.
Similar to From and Into, TryFrom and TryInto are generic traits for converting between types. Unlike From/Into, the TryFrom/TryInto traits are used for fallible conversions, and as such, return Results.
Convert to String
To convert any type to a String is as simple as implementing the ToString trait for the type.
Rather than doing so directly, you should implement the fmt::Display trait which automagically provides ToString and also allows printing the type.
Parsing a String
One of the more common types to convert a string into is a number. The idiomatic approach to this is to use the parse function and either to arrange for type inference or to specify the type to parse using the ‘turbofish’ syntax.
Blocks are expressions too, so they can be used as values in assignments. The last expression in the block will be assigned to the place expression such as a local variable. However, if the last expression of the block ends with a semicolon, the return value will be ().
if-else conditionals are expressions, and, all branches must return the same type.
A loop keyword to indicate an infinite loop.
Nesting and Labels
It’s possible to break or continue outer loops when dealing with nested loops. In these cases, the loops must be annotated with some 'label, and the label must be passed to the break/continue statement.
Returning from loops
One of the uses of a loop is to retry an operation until it succeeds. If the operation returns a value though, you might need to pass it to the rest of the code: put it after the break, and it will be returned by the loop expression.
The for in construct can be used to iterate through an Iterator. If not specified, the for loop will apply the into_iter function on the collection provided to convert the collection into an iterator. This is not the only means to convert a collection into an iterator however, the other functions available include iter and iter_mut.
iter - This borrows each element of the collection through each iteration.into_iter - This consumes the collection so that on each iteration the exact data is provided.iter_mut - This mutably borrows each element of the collection, allowing for the collection to be modified in place.For pointers, a distinction needs to be made between destructuring and dereferencing as they are different concepts which are used differently from a language like C.
*&, ref, and ref mutA match guard can be added to filter the arm.
match provides the @ sigil for binding values to names.
if let can be used to match any enum valueif let allows to match enum non-parameterized variants, even if the enum doesn’t #[derive(PartialEq)], neither we implement PartialEq for it. In such case, classic if Foo::Bar==a fails, because instances of such enum are not comparable for equality. However, if let works.Functions are declared using the fn keyword. Its arguments are type annotated, just like variables, and, if the function returns a value, the return type must be specified after an arrow ->.
The final expression in the function will be used as return value. Alternatively, the return statement can be used to return a value earlier from within the function, even from inside loops or ifs.
Methods are functions attached to objects. These methods have access to the data of the object and its other methods via the self keyword. Methods are defined under an impl block.
Closures in Rust, also called lambda expressions or lambdas, are functions that can capture the enclosing environment. Calling a closure is exactly like calling a function. However, both input and return types can be inferred and input variable names must be specified. Other characteristics of closures include:
|| instead of () around input variables.{}) for a single expression (mandatory otherwise).Closures are inherently flexible and will do what the functionality requires to make the closure work without annotation. This allows capturing to flexibly adapt to the use case, sometimes moving and sometimes borrowing. Closures can capture variables:
&T&mut TTThey preferentially capture variables by reference and only go lower when required. Using move before vertical pipes forces closure to take ownership of captured variables.
When taking a closure as an input parameter, the closure’s complete type must be annotated using one of a few traits. In order of decreasing restriction, they are:
Fn: the closure captures by reference (&T)FnMut: the closure captures by mutable reference (&mut T)FnOnce: the closure captures by value (T)On a variable-by-variable basis, the compiler will capture variables in the least restrictive manner possible.
Closures succinctly capture variables from enclosing scopes.
1 | // `F` must be generic. |
When a closure is defined, the compiler implicitly creates a new anonymous structure to store the captured variables inside, meanwhile implementing the functionality via one of the traits: Fn, FnMut, or FnOnce for this unknown type. This type is assigned to the variable which is stored until calling.
Since this new type is of unknown type, any usage in a function will require generics. However, an unbounded type parameter <T> would still be ambiguous and not be allowed. Thus, bounding by one of the traits: Fn, FnMut, or FnOnce (which it implements) is sufficient to specify its type.
If a function takes a closure as parameter, then any function that satisfies the trait bound of that closure can be passed as a parameter.
Returning closures as output parameters is possible. However, anonymous closure types are, by definition, unknown, so we have to use impl Trait to return them. The valid traits for returning a closure are:
FnFnMutFnOnceBeyond this, the move keyword must be used, which signals that all captures occur by value. This is required because any captures by reference would be dropped as soon as the function exited, leaving invalid references in the closure.
Functions that take one or more functions and/or produce a more useful function. HOFs and lazy iterators give Rust its functional flavor.
Diverging functions never return. They are marked using !, which is an empty type.
1 | fn foo() -> ! { |
The main advantage of this type is that it can be cast to any other one and therefore used at places where an exact type is required, for instance in match branches.
It is also the return type of functions that loop forever (e.g. loop {}) like network servers or functions that terminates the process (e.g. exit()).
A module is a collection of items: functions, structs, traits, impl blocks, and even other modules.
By default, the items in a module have private visibility, but this can be overridden with the pub modifier. Only the public items of a module can be accessed from outside the module scope.
pub(in path) only visible within the given path. path must be a parent or ancestor module
pub(self) only visible within the current module, which is the same as leaving them private
pub(super) only visible within the parent module
Private parent items will still restrict the visibility of a child item, even if it is declared as visible within a bigger scope.
Structs have an extra level of visibility with their fields. The visibility defaults to private, and can be overridden with the pub modifier. This visibility only matters when a struct is accessed from outside the module where it is defined, and has the goal of hiding information (encapsulation).
use DeclarationThe use declaration can be used to bind a full path to a new name, for easier access.
1 | // extern crate deeply; // normally, this would exist and not be commented out! |
super and selfThe super and self keywords can be used in the path to remove ambiguity when accessing items and to prevent unnecessary hardcoding of paths.
Modules can be mapped to a file/directory hierarchy.
If some_file.rs has mod declarations in it, then the contents of the module files would be inserted in places where mod declarations in the crate file are found, before running the compiler over it. In other words, modules do not get compiled individually, only crates get compiled.
A crate can be compiled into a binary or into a library. By default, rustc will produce a binary from a crate. This behavior can be overridden by passing the --crate-type flag to rustc.
1 | $ rustc --crate-type=lib rary.rs |
Libraries get prefixed with “lib”, and by default they get named after their crate file, but this default name can be overridden using the crate_name attribute.
extern crateTo link a crate to this new library, the extern crate declaration must be used. This will not only link the library, but also import all its items under a module named the same as the library. The visibility rules that apply to modules also apply to libraries.
1 | # Where library.rlib is the path to the compiled library, assumed that it's |
cargo is the official Rust package management tool.
Create a Rust project:
1 | # A binary |
Cargo.toml config file:
1 | [package] |
cargo is more than a dependency manager. All of the available configuration options are listed in the format specification of Cargo.toml.
To build our project we can execute cargo build anywhere in the project directory (including subdirectories!). We can also do cargo run to build and run. (Note that it only rebuilds what it has not already built, similar to make).
More binaries:
1 | foo |
All binaries should under ./bin/.
Rust has first-class support for unit and integration testing. Organizationally, we can place unit tests in the modules they test and integration tests in their own tests/ directory:
1 | foo |
Cargo may run multiple tests concurrently, so make sure that they don’t race with each other.
1 | [package] |
Cargo provides the script with inputs via environment variables specified here that can be used. The script provides output via stdout. All lines printed are written to target/debug/build//output.
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
When attributes apply to a whole crate, their syntax is #![crate_attribute], and when they apply to a module or item, the syntax is #[item_attribute] (notice the missing bang !).
Attributes can take arguments with different syntaxes:
#[attribute = "value"]#[attribute(key = "value")]#[attribute(value)]Attributes can have multiple values and can be separated over multiple lines, too:
1 |
dead_codeThe compiler provides a dead_code lint that will warn about unused functions. #[allow(dead_code)] is an attribute that disables the dead_code lint.
The crate_type attribute can be used to tell the compiler whether a crate is a binary or a library (and even which type of library), and the crate_name attribute can be used to set the name of the crate.
However, it is important to note that both the crate_type and crate_name attributes have no effect whatsoever when using Cargo, the Rust package manager. Since Cargo is used for the majority of Rust projects, this means real-world uses of crate_type and crate_name are relatively limited.
1 | // This crate is a library |
No longer need to pass the --crate-type flag to rustc.
1 | $ rustc lib.rs |
cfgConditional compilation is possible through two different operators:
cfg attribute: #[cfg(...)] in attribute positioncfg! macro: cfg!(...) in boolean expressionsCustom conditionals must be passed to rustc using the --cfg flag.
1 |
|
Run:
1 | $ rustc --cfg some_condition custom.rs && ./custom |
Generics is the topic of generalizing types and functionalities to broader cases. The simplest and most common use of generics is for type parameters. Generic type parameters" are typically represented as <T>.
In Rust, “generic” also describes anything that accepts one or more generic type parameters <T>. Any type specified as a generic type parameter is generic, and everything else is concrete (non-generic).
The same set of rules can be applied to functions: a type T becomes generic when preceded by <T>.
A function call with explicitly specified type parameters looks like: fun::<A, B, ...>().
Similar to functions, implementations require care to remain generic.
1 | struct S; |
traits can also be generic.
1 | // A trait generic over `T`. |
When working with generics, the type parameters often must use traits as bounds to stipulate what functionality a type implements.
1 | // Define a function `printer` that takes a generic type `T` which |
Bounding restricts the generic to types that conform to the bounds.
1 | struct S<T: Display>(T); |
Another effect of bounding is that generic instances are allowed to access the methods of traits specified in the bounds.
As an additional note, where clauses can also be used to apply bounds in some cases to be more expressive.
Even if a trait doesn’t include any functionality, you can still use it as a bound.
Multiple bounds can be applied with a +. Like normal, different types are separated with ,.
A bound can also be expressed using a where clause immediately before the opening {, rather than at the type’s first mention. Additionally, where clauses can apply bounds to arbitrary types, rather than just to type parameters.
Some cases that a where clause is useful:
When specifying generic types and bounds separately is clearer
1 | impl <A: TraitB + TraitC, D: TraitE + TraitF> MyTrait<A, D> for YourType {} |
When using a where clause is more expressive than using normal syntax
The newtype idiom gives compile time guarantees that the right type of value is supplied to a program.
“Associated Items” refers to a set of rules pertaining to items of various types. It is an extension to trait generics, and allows traits to internally define new items.
A trait that is generic over its container type has type specification requirements - users of the trait must specify all of its generic types. The use of “Associated types” improves the overall readability of code by moving inner types locally into a trait as output types.
1 | // `A` and `B` are defined in the trait via the `type` keyword. |
A phantom type parameter is one that doesn’t show up at runtime, but is checked statically (and only) at compile time.
Data types can use extra generic type parameters to act as markers or to perform type checking at compile time. These extra parameters hold no storage values, and have no runtime behavior.
Rust enforces RAII (Resource Acquisition Is Initialization), so whenever an object goes out of scope, its destructor is called and its owned resources are freed.
Double check for memory errors using valgrind: rustc raii.rs && valgrind ./raii
The notion of a destructor in Rust is provided through the Drop trait. This trait is not required to be implemented for every type, only implement it for your type if you require its own destructor logic.
Because variables are in charge of freeing their own resources, resources can only have one owner.
When doing assignments (let x = y) or passing function arguments by value (foo(x)), the ownership of the resources is transferred. In Rust-speak, this is known as a move.
Mutability of data can be changed when ownership is transferred.
Most of the time, we’d like to access data without taking ownership over it. To accomplish this, Rust uses a borrowing mechanism. Instead of passing objects by value (T), objects can be passed by reference (&T).
The compiler statically guarantees (via its borrow checker) that references always point to valid objects.
Mutable data can be mutably borrowed using &mut T. This is called a mutable reference and gives read/write access to the borrower. In contrast, &T borrows the data via an immutable reference, and the borrower can read the data but not modify it.
When data is immutably borrowed, it also freezes. Frozen data can’t be modified via the original object until all references to it go out of scope.
Data can be immutably borrowed any number of times, but while immutably borrowed, the original data can’t be mutably borrowed. On the other hand, only one mutable borrow is allowed at a time. The original data can be borrowed again only after the mutable reference has been used for the last time.
When doing pattern matching or destructuring via the let binding, the ref keyword can be used to take references to the fields of a struct/tuple.
A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid. No names or types are assigned to label lifetimes.
The borrow checker uses explicit lifetime annotations to determine how long references should be valid.
1 | foo<'a> |
Use lifetimes requires generics, this lifetime syntax indicates that the lifetime of foo may not exceed that of 'a. Explicit annotation of a type has the form &'a T where 'a has already been introduced.
Function signatures with lifetimes have a few constraints:
static.Additionally, returning references without input is banned if it would result in returning references to invalid data.
T: 'a: All references in T must outlive lifetime 'a.
T: Trait + 'a: Type T must implement trait Trait and all references in T must outlive 'a.
A 'static lifetime is the longest possible lifetime, and lasts for the lifetime of the running program. A 'static lifetime may also be coerced to a shorter lifetime. There are two ways to make a variable with 'static lifetime, and both are stored in the read-only memory of the binary:
static declaration.string literal which has type: &'static str.Some lifetime patterns are overwhelmingly common and so the borrow checker will allow you to omit them to save typing and to improve readability. This is known as elision. Elision exists in Rust solely because these patterns are common.
A trait is a collection of methods defined for an unknown type: Self. They can access other methods declared in the same trait. Traits can be implemented for any data type.
The compiler is capable of providing basic implementations for some traits via the #[derive] attribute.
Eq, PartialEq, Ord, PartialOrd.Clone, to create T from &T via a copy.Copy, to give a type ‘copy semantics’ instead of ‘move semantics’.Hash, to compute a hash from &T.Default, to create an empty instance of a data type.Debug, to format a value using the {:?} formatter.dynThe Rust compiler needs to know how much space every function’s return type requires. This means all your functions have to return a concrete type. A trait like Animal cannot return, because different implementations will need different amounts of memory.
However, there’s an easy workaround. Instead of returning a trait object directly, our functions return a Box which contains some Animal. A box is just a reference to some memory in the heap.
In Rust, many of the operators can be overloaded via traits. For example, the + operator in a + b calls the add method (as in a.add(b)). This add method is part of the Add trait. Hence, the + operator can be used by any implementor of the Add trait.
The Drop trait only has one method: drop, which is called automatically when an object goes out of scope.
Box, Vec, String, File, and Process are some examples of types that implement the Drop trait to free resources. The Drop trait can also be manually implemented for any custom data type.
The Iterator trait is used to implement iterators over collections such as arrays. The trait requires only a method to be defined for the next element, which may be manually defined in an impl block or automatically defined (as in arrays and ranges).
impl TraitIf your function returns a type that implements MyTrait, you can write its return type as -> impl MyTrait.
Rust doesn’t have “inheritance”, but you can define a trait as being a superset of another trait.
A type can implement many different traits. Fully Qualified Syntax is used to disambiguate those methods.
Macros look like functions, except that their name ends with a bang !, but instead of generating a function call, macros are expanded into source code that gets compiled with the rest of the program. However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don’t get unexpected precedence bugs.
println! which could take any number of arguments, depending on the format string!.The arguments of a macro are prefixed by a dollar sign $ and type annotated with a designator. Some of the available designators:
blockexpr is used for expressionsident is used for variable/function namesitemliteral is used for literal constantspat (pattern)pathstmt (statement)tt (token tree)ty (type)vis (visibility qualifier)Macros can be overloaded to accept different combinations of arguments. In that regard, macro_rules! can work similarly to a match block.
Macros can use + in the argument list to indicate that an argument may repeat at least once, or *, to indicate that the argument may repeat zero or more times.
An explicit panic is mainly useful for tests and dealing with unrecoverable errors. For prototyping it can be useful, for example when dealing with functions that haven’t been implemented yet, but in those cases the more descriptive unimplemented is better. In tests panic is a reasonable way to explicitly fail.
The Option type is for when a value is optional or when the lack of a value is not an error condition. For example the parent of a directory - / and C: don’t have one. When dealing with Options, unwrap is fine for prototyping and cases where it’s absolutely certain that there is guaranteed to be a value. However expect is more useful since it lets you specify an error message in case something goes wrong anyway.
When there is a chance that things do go wrong and the caller has to deal with the problem, use Result. You can unwrap and expect them as well (please don’t do that unless it’s a test or quick prototype).
panicIt prints an error message, starts unwinding the stack, and usually exits the program.
Option vs unwrap?You can unpack Options by using match statements, but it’s often easier to use the ? operator. If x is an Option, then evaluating x? will return the underlying value if x is Some, otherwise it will terminate whatever function is being executed and return None.
mapmatch is a valid method for handling Options. However, you may eventually find heavy usage tedious, especially with operations only valid with an input. In these cases, combinators can be used to manage control flow in a modular fashion.
and_thenmap() was described as a chainable way to simplify match statements. However, using map() on a function that returns an Option results in the nested Option<Option<Food>>. Chaining multiple calls together can then become confusing. and_then() calls its function input with the wrapped value and returns the result. If the Option is None, then it returns None instead.
ResultResult is a richer version of the Option type that describes possible error instead of possible absence.
Ok(T): An element T was found
Err(E): An error was found with element E
map for ResultOption’s map, and_then, and many other combinators are also implemented for Result.
ResultErrors found in a specific module often have the same Err type, so a single alias can succinctly define all associated Results. This is so useful that the std library even supplies one: io::Result!
We can simply stop executing the function and return the error if one occurs. For some, this form of code can be easier to both read and write.
?Sometimes we just want the simplicity of unwrap without the possibility of a panic.
Upon finding an Err, there are two valid actions to take:
panic! which we already decided to try to avoid if possiblereturn because an Err means it cannot be handled? is almost1 exactly equivalent to an unwrap which returns instead of panicking on Errs.
Sometimes an Option needs to interact with a Result, or a Result needs to interact with a Result.
Results out of OptionsThe most basic way of handling mixed error types is to just embed them in each other.
There are times when we’ll want to stop processing on errors (like with ?) but keep going when the Option is None. A couple of combinators come in handy to swap the Result and Option.
Sometimes it simplifies the code to mask all of the different errors with a single type of error. Rust allows us to define our own error types. In general, a “good” error type:
Err(EmptyVec)Err("Please use a vector with at least one element".to_owned())Err(BadChar(c, position))Err("+ cannot be used here".to_owned())Boxing errorsA way to write simple code while preserving the original errors is to Box them. The drawback is that the underlying error type is only known at runtime and not statically determined.
?? was previously explained as either unwrap or return Err(err). This is only mostly true. It actually means unwrap or return Err(From::from(err)). Since From::from is a conversion utility between different types, this means that if you ? where the error is convertible to the return type, it will convert automatically.
An alternative to boxing errors is to wrap them in your own error type.
Resultsfilter_map calls a function and filters out the results that are None.
Result implements FromIter so that a vector of results (Vec>) can be turned into a result with a vector (Result, E>). Once an Result::Err is found, the iteration will terminate. This same technique can be used with Option.
Strings like: "hello world"[1, 2, 3]OptionResultBoxAll values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a Box. A box is a smart pointer to a heap allocated value of type T. When a box goes out of scope, its destructor is called, the inner object is destroyed, and the memory on the heap is freed.
Boxed values can be dereferenced using the * operator; this removes one layer of indirection.
Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time. A vector is represented using 3 parameters:
There are two types of strings in Rust: String and &str.
A String is stored as a vector of bytes (Vec), but guaranteed to always be a valid UTF-8 sequence. String is heap allocated, growable and not null terminated.
&str is a slice (&[u8]) that always points to a valid UTF-8 sequence, and can be used to view into a String, just like &[T] is a view into Vec.
Sometimes it’s desirable to catch the failure of some parts of a program instead of calling panic!. The Option enum has two variants:
None, to indicate failure or lack of valueSome(value), a tuple struct that wraps a value with type TSometimes it is important to express why an operation failed. The Result enum has two variants:
Ok(value) which indicates that the operation succeeded, and wraps the value returned by the operation. (value has type T)Err(why), which indicates that the operation failed, and wraps why, which (hopefully) explains the cause of the failure. (why has type E)? is used at the end of an expression returning a Result, and is equivalent to a match expression, where the Err(err) branch expands to an early Err(From::from(err)), and the Ok(ok) branch expands to an ok expression.
panicThe panic! macro can be used to generate a panic and start unwinding its stack. While unwinding, the runtime will take care of freeing all the resources owned by the thread by calling the destructor of all its objects.
Where vectors store values by an integer index, HashMaps store values by key. HashMap keys can be booleans, integers, strings, or any other type that implements the Eq and Hash traits.
Create a HashMap with a certain starting capacity using HashMap::with_capacity(uint), or use HashMap::new() to get a HashMap with a default initial capacity (recommended).
Any type that implements the Eq and Hash traits can be a key in HashMap. This includes:
bool (though not very useful since there is only two possible keys)int, uint, and all variations thereofString and &str (protip: you can have a HashMap keyed by String and call .get() with an &str)All collection classes implement Eq and Hash if their contained type also respectively implements Eq and Hash. For example, Vec will implement Hash if T implements Hash.
You can easily implement Eq and Hash for a custom type with just one line: #[derive(PartialEq, Eq, Hash)]
Consider a HashSet as a HashMap where we just care about the keys ( HashSet is, in actuality, just a wrapper around HashMap). Sets have 4 primary operations (all of the following calls return an iterator):
union: get all the unique elements in both sets.difference: get all the elements that are in the first set but not the second.intersection: get all the elements that are only in both sets.symmetric_difference: get all the elements that are in one set or the other, but not bothWhen multiple ownership is needed, Rc(Reference Counting) can be used. Rc keeps track of the number of the references which means the number of owners of the value wrapped inside an Rc.
Reference count of an Rc increases by 1 whenever an Rc is cloned, and decreases by 1 whenever one cloned Rc is dropped out of the scope. Cloning an Rc never do a deep copy. Cloning creates just another pointer to the wrapped value, and increments the count.
The standard library provides great threading primitives out of the box. These, combined with Rust’s concept of Ownership and aliasing rules, automatically prevent data races.
Although we’re passing references across thread boundaries, Rust understands that we’re only passing read-only references, and that thus no unsafety or data races can occur. Because we’re move-ing the data segments into the thread, Rust will also ensure the data is kept alive until the threads exit, so no dangling pointers occur.
Channels allow a unidirectional flow of information between two end-points: the Sender and the Receiver.
The Path struct represents file paths in the underlying filesystem. There are two flavors of Path: posix::Path, for UNIX-like systems, and windows::Path, for Windows. The prelude exports the appropriate platform-specific Path variant.
Note that a Path is not internally represented as an UTF-8 string, but instead is stored as a vector of bytes (Vec). Therefore, converting a Path to a &str is not free and may fail (an Option is returned).
The File struct represents a file that has been opened (it wraps a file descriptor), and gives read and/or write access to the underlying file. All the File methods return the io::Result type, which is an alias for Result.
The method lines() returns an iterator over the lines of a file. File::open expects a generic, AsRef. That’s what read_lines() expects as input.
The process::Output struct represents the output of a finished child process, and the process::Command struct is a process builder.
The std::Child struct represents a running child process, and exposes the stdin, stdout and stderr handles for interaction with the underlying process via pipes.
Wait for a process::Child to finish, you must call Child::wait, which will return a process::ExitStatus.
The command line arguments can be accessed using std::env::args, which returns an iterator that yields a String for each argument.Matching can be used to parse simple arguments.
Rust provides a Foreign Function Interface (FFI) to C libraries. Foreign functions must be declared inside an extern block annotated with a #[link] attribute containing the name of the foreign library.
Most unit tests go into a tests mod with the #[cfg(test)] attribute. Test functions are marked with the #[test] attribute.
Tests fail when something in the test function panics. There are some helper macros:
assert!(expression) - panics if expression evaluates to false.assert_eq!(left, right) and assert_ne!(left, right) - testing left and right expressions for equality and inequality respectively.In Rust 2018, your unit tests can return Result<()>, which lets you use ? in them!
To check functions that should panic under certain circumstances, use attribute #[should_panic]. This attribute accepts optional parameter expected = with the text of the panic message. If your function can panic in multiple ways, it helps make sure your test is testing the correct panic.
Tests can be marked with the #[ignore] attribute to exclude some tests. Or to run them with command cargo test -- --ignored
The primary way of documenting a Rust project is through annotating the source code. Documentation comments are written in markdown and support code blocks in them. Rust takes care about correctness, so these code blocks are compiled and used as tests.
The main purpose of documentation tests is to serve as examples that exercise the functionality, which is one of the most important guidelines. It allows using examples from docs as complete code snippets. But using ? makes compilation fail since main returns unit. The ability to hide some source lines from documentation comes to the rescue: one may write fn try_main() -> Result<(), ErrorType>, hide it and unwrap it in hidden main.
Unit tests are testing one module in isolation at a time: they’re small and can test private code. Integration tests are external to your crate and use only its public interface in the same way any other code would. Their purpose is to test that many parts of your library work correctly together.
Sometimes there is a need to have dependencies for tests (examples, benchmarks) only. Such dependencies are added to Cargo.toml in the [dev-dependencies] section. These dependencies are not propagated to other packages which depend on this package.
one should try to minimize the amount of unsafe code in a code base.
Unsafe annotations in Rust are used to bypass protections put in place by the compiler; specifically, there are four primary things that unsafe is used for:
unsafe (including calling a function over FFI)Raw Pointers
Raw pointers * and references &T function similarly, but references are always safe because they are guaranteed to point to valid data due to the borrow checker. Dereferencing a raw pointer can only be done through an unsafe block.
Calling Unsafe Functions
Some functions can be declared as unsafe, meaning it is the programmer’s responsibility to ensure correctness instead of the compiler’s. One example of this is std::slice::from_raw_parts which will create a slice given a pointer to the first element and a length.
Raw identifiers let you use keywords where they would not normally be allowed. This is particularly useful when Rust introduces new keywords, and a library using an older edition of Rust has a variable or function with the same name as a keyword introduced in a newer edition.
rustdoc.Use cargo doc to build documentation in target/doc.
Use cargo test to run all tests (including documentation tests), and cargo test --doc to only run documentation tests.
These commands will appropriately invoke rustdoc (and rustc) as required.
Doc comments
Doc comments are very useful for big projects that require documentation. When running Rustdoc, these are the comments that get compiled into documentation. They are denoted by a ///, and support Markdown.
To run the tests, first build the code as a library, then tell rustdoc where to find the library so it can link it into each doctest program.
Encoder takes an input sequence and create a contextualized representation of it, then passed to a decoder which generates a task-specific output sequence.
1 | $ cargo new proj |
pub usecargo yankcargo installIn addition to grouping functionality, encapsulating implementation details lets you reuse code at a higher level: once you’ve implemented an operation, other code can call that code via the code’s public interface without knowing how the implementation works. The way you write code defines which parts are public for other code to use and which parts are private implementation details that you reserve the right to change. This is another way to limit the amount of detail you have to keep in your head.
Paper: 1603.01360.pdf
code:
核心思想:pretrained + character-based 词表示分别学习形态和拼写,Bi-LSTM + CRF 和基于转移的模型均可以对输出标签的依赖关系建模。
看了 Related Work 后发现很多想法其实早就冒出来了,不同的论文在不同点上使用了不同的方法,本篇恰好用这样的方法取得了最好的效果。其实,我觉得更加有意思的是基于转移的模型,它构建了一个 action 的时间序列,感觉更加抽象,想法更加精妙。
Recently, I have read The Rust Programming Language book which is really a comprehensive combination. I prefer to write some brief note when I am reading a book, just for times of reviews in the future. Here is the same.