Wednesday, February 27, 2019

Virtual File System Settings.


Virtual filesystem Highlights(VFS)

-When you want to specify properties for a VFS for a particular host, the rules are as follows.

1)    You can specify VFS properties as parameters. The format of the reference to a VFS property is vfs.scheme.property.host.
The following list describes the subparts of the format:
·         The vfs subpart is required to identify this as a virtual filesystem configuration property.
·         The scheme subpart represents the VFS driver's scheme (or VFS type), such as http, sftp, or zip.
·         The property subpart is the name of a VFS driver's ConfigBuilder's setter (the specific VFS element that you want to set).
·         The host optionally defines a specific IP address or hostname that this setting applies to.
2)     
Any string configuration parameter of a VFS scheme is supported by default.


The VFS configuration options follow the format:
vfs.scheme.parameter...

Where:
‘Vfs ' is required to indicate a VFS configuration option
‘Scheme ' is required to indicate the VFS scheme against which the parameter will be applied
‘Parameter ' is the name of the VFS-Scheme configuration parameter to apply

Note: This is case sensitive.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

SFTP Notes:
The standard SFTP URL format is sftp://username:password@host:port/path
If you are using private key authentication, password is not supported. If your key requires a passphrase you must use the 'authkeypassphrase' configuration parameter.

The configuration options support limiting of the option to a specific host.
This is accomplished by adding the hostname, separated by a ' . ' after the parameter name.
For instance, you may wish to turn of StrictHostKeyChecking for a specific host only.
Examples:
vfs.sftp.StrictHostKeyChecking.sftp.myhost.net
vfs.sftp.StrictHostKeyChecking.192.168.1.5

SFTP supports the following options:
StrictHostKeyChecking - If 'no', the certificate of any remote host will be accepted. If 'yes', the remote host must exist in the known hosts file (~/.ssh/known_hosts).
authkeypassphrase - An optional passphrase that may be required to use the identity key
identity - The fully qualified path to the private key used for SFTP authentication

All Three ways shown below.

àsftp--key Authentication:
sftp://${sftp_authkey_username}@${sftp_authkey_host}/${sftp_authkey_path}


àsftp---Standard Authentication
sftp://${sftp_stdauth_username}:${sftp_stdauth_password}@${sftp_stdauth_host}/${sftp_stdauth_path}

àzip simple:
zip://${zip_file_path}!/${zip_internal_path}

 =====================================================================================

The Variables must be supplied. Based on the hierarchy the interface will check in to find the variable in the curly bracket. The interface will check in few different places like Path file/Environment Variables/Variables supplied.

The following MUST be set through variables for this example to run:

sftp_authkey_username            : johndoe

sftp_stdauth_username             : johndoe(Username/Password  example host)

sftp_stdauth_password             : password 

sftp_authkey_host                     : 192.168.5?.10?(Key Authentication example                             host)

sftp_authkey_path                     : home(Key Authentication example path on sftp             server)

sftp_stdauth_host                      :(Username/Password  example host)

sftp_stdauth_path                      : (Username/Password Authentication example                       path on sftp server)

vfs.http.proxyHost                      : 192.168.1.2??(A parameter that exists but is           not used by any steps in this transformation)

vfs.sftp.StrictHostKeyChecking : no(Accept the encryption key of any SFTP                                                                                server)

vfs.sftp.authkeypassphrase.192.168.5?.10?: password(The passphrase for the                                                                               private key used for authentication                                                                           to the specified SFTP host)

vfs.sftp.identity.192.168.5?.10?         : C:\temp\auth\private_key(The private key                                                                              used for authentication to the                                                                                     specified SFTP host)

zip_file_path                                       : /c:/temp/temp.zip(Local path to a zip file)

zip_internal_path                                :(Path inside of zip file to extract)

Wednesday, August 10, 2016

Hive authentication, authorization, Metastore authentication & authorization with hive important commands



>hive.metastore.connect.retries:Number of retries while opening a connection to metastore.
>hive.metastore.client.connect.retry.delay: Number of seconds for the client to wait between consecutive connection attempts
>hive.metastore.batch.retrieve.max: Maximum number of objects (tables/partitions) can be retrieved from metastore in one batch. The higher the number, the less the number of round trips is needed to the Hive metastore server, but it may also cause higher memory requirement at the client side.
>Javax.jdo.option.ConnectionURL: JDBC connect string for a JDBC metastore.
>javax.jdo.option.ConnectionDriverName: Driver class name for a JDBC metastore.
>hive -S -e "describe formatted <table_name> ;" | grep 'Location' | awk '{ print $NF }'
>Hive.server2.table.type.mapping = classic(HIVE : Exposes the hive's native table tyes like MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW
     CLASSIC : More generic types like TABLE and VIEW)
>Hive.security.authenticator.manager = org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator.(OR)
org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>hive.security.authorization.manager  to org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory. This will ensure that any table or views created by hive-cli have default privileges granted for the owner.
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
>hive.security.metastore.authenticator.manager=
Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator .
>hive.security.metastore.authorization.manager=
Add org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly to hive.security.metastore.authorization.manager. (It takes a comma separated list, so you can add it along with StorageBasedAuthorization parameter, if you want to enable that as well).
MetaStoreAuthzAPIAuthorizerEmbedOnly: This setting disallows any of the authorization api calls to be invoked in a remote metastore. HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api. Hive cli and any other remote metastore users would be denied authorization when they try to make authorization api calls. This restricts the authorization api to privileged HiveServer2 process. You should also ensure that the metastore rdbms access is restricted to the metastore server and hiverserver2.
You can Set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider .
org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly
DefaultHiveMetastoreAuthorizationProvider
This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, use StorageBasedAuthorizationProvider as instructed above.
Storage based authorization :
When a user runs a Hive query or command, the privileges granted to the user and her "current roles" are checked
Users, who have access to the Hive CLI, HDFS commands, Pig command line, 'hadoop jar' command, etc., are considered privileged users. In an organization, it is typically only the teams that work on ETL workloads that need such access. These tools don't access the data through HiveServer2, and as a result their access is not authorized through SQL Standard Based Hive Authorization model. For Hive CLI, Pig, and MapReduce users access to Hive tables can be controlled using storage based authorization enabled on the metastore server.
Note, that through the use of HDFS ACL (available in Hadoop 2.4 onwards) you have a lot of flexibility in controlling access to the file system, which in turn provides more flexibility with Storage Based Authorization. This functionality is available as of Hive 0.14
>HiveServer2 has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for.
SQL Standards Based Authorization (introduced in Hive 0.13.0, HIVE-5837) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration. 
>>> That is, you can have storage based authorization enabled for metastore API calls (in the Hive metastore) and have SQL standards based authorization enabled in HiveServer2 at the same time.

SQL Standard Based Hive Authorization model:
When a user runs a Hive query or command, the privileges granted to the user and her "current roles" are checked. The user can be any user that the hiveserver2 authentication mode supports.
To provide security through this option, the client will have to be secured. This can be done by allowing users access only through Hive Server2, and by restricting the user code and non-SQL commands that can be run. The checks will happen against the user who submits the request, but the query will run as the Hive server user.
Most users such as business analysts tend to use SQL and ODBC/JDBC through HiveServer2 and their access can be controlled using this SQL Standard Based Hive Authorization model.
Commands such as dfs, add, delete, compile, and reset are disabled when this authorization is enabled
The set commands used to change Hive configuration are restricted to a smaller safe set. This is controlled using the hive.security.authorization.sqlstd.confwhitelist configuration parameter in hive-site.xml.
Privileges to add or drop functions and macros are restricted to the admin role.
To enable users to use functions, the ability to create permanent functions has been added. A user in the admin role can run commands to create these functions, which all users can then use.
The Hive transform clause is also disabled when this authorization is enabled.
The privileges(SELECT ● INSERT ● UPDATE ● DELETE ● ALL PRIVILEGES ) apply to table and views. The above privileges are not supported on databases.
Database ownership is considered for certain actions.
URI is another object in Hive, as Hive allows the use of URI in SQL syntax.
The above privileges are not applicable on URI objects. URI used are expected to point to a file/directory in a file system. Authorization is done based on the permissions the user has on the file/directory.

Object Ownership

For certain actions, the ownership of the object (table/view/database) determines if you are authorized to perform the action.
The user who creates the table, view or database becomes its owner. In the case of tables and views, the owner gets all the privileges with grant option.
The user who created the database becomes the owner role can also be the owner of a database. The "alter database" command can be used to set the owner of a database to a role.

Users and Roles

Privileges can be granted to users as well as roles.
Users can belong to one or more roles.
There are two roles with special meaning – public and admin.
All users belong to the public role. You use this role in your grant statement to grant a privilege to all users.

When a user runs a Hive query or command, the privileges granted to the user and her "current roles" are checked. The current roles can be seen using the "show current roles;" command. All of the user's roles except for the admin role will be in the current roles by default, although you can use the "set role" command to set a specific role as the current role.

Users who do the work of a database administrator are expected to be added to the admin role.
They have privileges for running additional commands such as "create role" and "drop role". They can also access objects that they haven’t been given explicit access to. However, a user who belongs to the admin role needs to run the "set role" command before getting the privileges of the admin role, as this role is not in current roles by default.
IMPORTANT HINT:SQL Standards Based Authorization is disabled For HIVE CLI. This is because secure access control is not possible for the Hive command line using an access control policy in Hive, because users have direct access to HDFS and so they can easily bypass the SQL standards based authorization checks or even disable it altogether. Disabling this avoids giving a false sense of security to users.
COMMANDS:

>CREATE ROLE role_name;
>DROP ROLE role_name;
>SHOW CURRENT ROLES;
>SET ROLE (role_name|ALL|NONE);
>SHOW ROLES;
>GRANT role_name [, role_name] ...
TO principal_specification [, principal_specification] ...
[ WITH ADMIN OPTION ];

principal_specification
       : USER user
       | ROLE role


>REVOKE [ADMIN OPTION FOR] role_name [, role_name] ...
FROM principal_specification [, principal_specification] ... ;

principal_specification
       : USER user
      | ROLE role

>SHOW ROLE GRANT (USER|ROLE) principal_name;

>0: jdbc:hive2://localhost:10000> GRANT role1 TO USER user1;
No rows affected (0.058 seconds)

>SHOW PRINCIPALS role_name;

>0: jdbc:hive2://localhost:10000> SHOW PRINCIPALS role1;

>GRANT
            priv_type [, priv_type ] ...
            ON table_or_view_name
            TO principal_specification [, principal_specification] ...
            [WITH GRANT OPTION];

>REVOKE [GRANT OPTION FOR]
             priv_type [, priv_type ] ...
            ON table_or_view_name
             FROM principal_specification [, principal_specification] ... ;

principal_specification
             : USER user
             | ROLE role

priv_type
             : INSERT | SELECT | UPDATE | DELETE | ALL


>SHOW GRANT [principal_name] ON (ALL| ([TABLE] table_or_view_name)

>0: jdbc:hive2://localhost:10000> show grant user ashutosh on table hivejiratable;

>0: jdbc:hive2://localhost:10000> show grant user ashutosh on all;

>0: jdbc:hive2://localhost:10000> show grant on table hivejiratable;



Actions





CREATE TABLE
ALTER TABLE DROP PARTITION
ALTER INDEX PROPERTIES
CREATE MACRO
SHOW COLUMNS

DROP TABLE
ALTER TABLE (all of them except the ones above)
SELECT
DROP MACRO
SHOW TABLE STATUS

DESCRIBE TABLE
TRUNCATE TABLE
INSERT
MSCK (metastore check)
SHOW TABLE PROPERTIES

SHOW PARTITIONS
CREATE VIEW
UPDATE
ALTER DATABASE
CREATE TABLE AS SELECT

ALTER TABLE LOCATION
ALTER VIEW PROPERTIES
DELETE
CREATE DATABASE
CREATE INDEX

ALTER PARTITION LOCATION
ALTER VIEW RENAME
LOAD
EXPLAIN
DROP INDEX

ALTER TABLE ADD PARTITION
DROP VIEW PROPERTIES
SHOW CREATE TABLE
DROP DATABASE
ALTER INDEX REBUILD


DROP VIEW
CREATE FUNCTION




ANALYZE TABLE
DROP FUNCTION




>>> EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query(shows all entities that need to be authorized to execute a query, as well as any authorization failures.)